Larger AI companies, such as Openai, Microsoft and Meta, are turning to a process known as “distillation” by global races, creating AI models that are inexpensive to be adopted by consumers and businesses.
This technique attracted extensive attention by Chinese Deepseek to use it to build a powerful and efficient AI model based on open source systems released by competitors Meta and Alibaba. The breakthrough robbed Silicon Valley’s confidence in AI leadership, leading Wall Street investors to wipe out billions of dollars worth of US tech stocks.
Through distillation, companies take large-scale language models known as the “teacher” model. This produces the following possible words in the sentence: Teacher models generate data, train smaller “student” models, and help to quickly transfer knowledge and predictions of larger models.
Distillation has been widely used for many years, but recent advances have led industry experts to believe the process will be an increasingly boon for startups seeking cost-effective ways to build applications based on technology.
“Distillation is very magical,” said Olivier Goodement, product director for Openai’s platform. “It’s essentially the process of getting a very large smart frontier model and using that model to teach a small model. . Very cheap and very fast execution is very capable.”
Large-scale language models such as Openai’s GPT-4, Google’s Gemini, and Meta’s Llama require enormous amounts of data and computing power to develop and maintain. Companies have not revealed exact figures on how much it costs to train large models, but it could be hundreds of millions of dollars.
Thanks to distillation, developers and businesses can access the features of these models for just a small fraction of the price, allowing app developers to run AI models quickly on devices such as laptops and smartphones.
Recommended
Developers can use Openai’s platform for distillation by learning from large language models that support products such as ChatGpt. Openai’s biggest backer, Microsoft distilled a small language family of model PHI as part of its commercial partnership after investing nearly $14 billion in the company using GPT-4.
However, the San Francisco-based startup says it believes it has delved into the Deepseek Distilled Ophinai model to train its competitors. Deepseek has not commented on the claim.
Distillation can be used to create high-performance models, but experts add that it is more limited.
“Distillation presents an interesting trade-off. Smaller models inevitably reduce their capabilities,” says Ahmed Awadallah of Microsoft Research.
David Cox, vice president of AI models at IBM Research, said most companies don’t need large models to run their products, while distilled companies are strong enough to run on small devices such as customer service chatbots and phones.
“Whenever (make it cheaper), there’s very little reason not to do it when it delivers the right performance you want,” he added.
This challenges many business models of major AI companies. Even if the developer uses distillation models from companies like Openai, it costs much less to run and is less expensive to create, resulting in less revenue. Model makers like Openai often have less billing for use of distillation models due to their low computational load.
However, the Openai God argued that a large linguistic model is still necessary for the “task of high intelligence and high interests” that “companies are willing to pay more for high levels of accuracy and reliability.” He added that larger models are also needed to discover new models that can distill into smaller features.
Still, the company aims to ensure that large models are not distilled to train their competitors. Openai has a team that monitors usage and can generate vast amounts of data to remove access to users suspected of exporting and training their rivals. However, many of this behavior occurs retroactively.
“Openai has been trying to protect against distillation for a long time, but it’s extremely difficult to avoid that completely,” said Douwe Kiela, CEO of Contextual AI, a startup building information search tool for businesses.
Recommended
Distillation is also a victory for open model proponents, making the technology freely available for developers to build. Deepseek has also opened up its recent models for developers.
“We use (distillation) and put it in our products right away,” said Yann Lecun, chief AI scientist at Meta. “That’s the overall idea of open source. As long as these processes are open, you’ll benefit from everyone and all other advancements.”
Distillation means that model makers can spend billions of dollars on advancing the capabilities of their AI systems, but as recent releases from DeepSeek show, they can also face competitors who keep up frequently. This raises questions about the benefits of FirstMovers when building LLM if they can replicate the functionality in a few months.
“In a world where things are moving so fast. . You actually spend a lot of money, you can do it the hard way, and then the rest of the field will be corrected to your heels,” said Cox of IBM. “So it’s an interesting and tricky business environment.”
Additional Reports of Michael Acton of San Francisco