How did DeepSeek build its AI with less money?


AI companies typically train their chatbots using supercomputers packed with 16,000 specialised chips or more. But DeepSeek said it needed only about 2,000. — AP

SAN FRANCISCO: Last month, US financial markets tumbled after a Chinese startup called DeepSeek said it had built one of the world’s most powerful artificial intelligence systems using far fewer computer chips than many experts thought possible.

AI companies typically train their chatbots using supercomputers packed with 16,000 specialised chips or more. But DeepSeek said it needed only about 2,000.

As DeepSeek engineers detailed in a research paper published just after Christmas, the startup used several technological tricks to significantly reduce the cost of building its system. Its engineers needed only about US$6mil (RM26.7mil) in raw computing power, roughly one-tenth of what Meta spent in building its latest AI technology.

What exactly did DeepSeek do? Here is a guide.

How are AI technologies built?

The leading AI technologies are based on what scientists call neural networks, mathematical systems that learn their skills by analysing enormous amounts of data.

The most powerful systems spend months analysing just about all the English text on the Internet as well as many images, sounds and other multimedia. That requires enormous amounts of computing power.

About 15 years ago, AI researchers realised that specialised computer chips called graphics processing units, or GPUs, were an effective way of doing this kind of data analysis. Companies like Silicon Valley chipmaker Nvidia originally designed these chips to render graphics for computer video games. But GPUs also had a knack for running the math that powered neural networks.

As companies packed more GPUs into their computer data centers, their AI systems could analyse more data.

But the best GPUs cost around US$40,000 (RM177,491), and they need huge amounts of electricity. Sending the data between chips can use more electrical power than running the chips themselves.

How was DeepSeek able to reduce costs?

It did many things. Most notably, it embraced a method called “mixture of experts.”

Companies usually created a single neural network that learned all the patterns in all the data on the Internet. This was expensive, because it required enormous amounts of data to travel between GPU chips.

If one chip was learning how to write a poem and another was learning how to write a computer program, they still needed to talk to each other, just in case there was some overlap between poetry and programming.

With the mixture of experts method, researchers tried to solve this problem by splitting the system into many neural networks: one for poetry, one for computer programming, one for biology, one for physics and so on. There might be 100 of these smaller “expert” systems. Each expert could concentrate on its particular field.

Many companies have struggled with this method, but DeepSeek was able to do it well. Its trick was to pair those smaller “expert” systems with a “generalist” system.

The experts still needed to trade some information with one another, and the generalist – which had a decent but not detailed understanding of each subject – could help coordinate interactions between the experts.

It is a bit like an editor’s overseeing a newsroom filled with specialist reporters.

And that is more efficient?

Much more. But that is not the only thing DeepSeek did. It also mastered a simple trick involving decimals that anyone who remembers his or her elementary school math class can understand.

There is math involved in this?

Remember your math teacher explaining the concept of pi. Pi, also denoted as π, is a number that never ends: 3.14159265358979 ...

You can use π to do useful calculations, like determining the circumference of a circle. When you do those calculations, you shorten π to just a few decimals: 3.14. If you use this simpler number, you get a pretty good estimation of a circle’s circumference.

DeepSeek did something similar – but on a much larger scale – in training its AI technology.

The math that allows a neural network to identify patterns in text is really just multiplication – lots and lots and lots of multiplication. We’re talking months of multiplication across thousands of computer chips.

Typically, chips multiply numbers that fit into 16 bits of memory. But DeepSeek squeezed each number into only 8 bits of memory – half the space. In essence, it lopped several decimals from each number.

This meant that each calculation was less accurate. But that didn’t matter. The calculations were accurate enough to produce a really powerful neural network.

That’s it?

Well, they added another trick.

After squeezing each number into 8 bits of memory, DeepSeek took a different route when multiplying those numbers together. When determining the answer to each multiplication problem – making a key calculation that would help decide how the neural network would operate – it stretched the answer across 32 bits of memory. In other words, it kept many more decimals. It made the answer more precise.

So any high school student could have done this?

Well, no. The DeepSeek engineers showed in their paper that they were also very good at writing the very complicated computer code that tells GPUs what to do. They knew how to squeeze even more efficiency out of these chips.

Few people have that kind of skill. But serious AI labs have the talented engineers needed to match what DeepSeek has done.

Then why didn’t they do this already?

Some AI labs may be using at least some of the same tricks already. Companies like OpenAI do not always reveal what they are doing behind closed doors.

But others were clearly surprised by DeepSeek’s work. Doing what the startup did is not easy. The experimentation needed to find a breakthrough like this involves millions of dollars – if not billions – in electrical power.

In other words, it requires enormous amounts of risk.

“You have to put a lot of money on the line to try new things – and often, they fail,” said Tim Dettmers, a researcher at the Allen Institute for Artificial Intelligence in Seattle who specialises in building efficient AI systems and previously worked as an AI researcher at Meta.

“That is why we don’t see much innovation: People are afraid to lose many millions just to try something that doesn’t work,” he added.

Many pundits pointed out that DeepSeek’s US$6mil covered only what the startup spent when training the final version of the system. In their paper, the DeepSeek engineers said they had spent additional funds on research and experimentation before the final training run. But the same is true of any cutting-edge AI project.

DeepSeek experimented, and it paid off. Now, because the Chinese startup has shared its methods with other AI researchers, its technological tricks are poised to significantly reduce the cost of building AI. – ©2025 The New York Times Company

Follow us on our official WhatsApp channel for breaking news alerts and key updates!

Next In Tech News

Google's newest AI can control a robot and show it how to hold coffee
For most drivers who switch to electric, there's no going back
Microsoft 365: How to dodge the price hike for Office subscriptions
Telegram's Durov allowed to leave France amid probe, AFP reports
WHO calls on industry to protect gamers’ hearing
Starship, carrying Tesla's bot, set for Mars by end-2026: Elon Musk
OpenAI and Musk agree to fast tracked trial over for-profit shift
How to spend less time on social media (or leave it altogether)
From emotional bonds with chatbots to the impact of AI on government jobs in the US
Schools use AI to monitor kids, hoping to prevent violence. Our investigation found security risks

Others Also Read