How did DeepSeek build its AI with less money?

AI
Saturday, 15 Feb 2025
10:00 AM MYT

Related News

AI 4h ago

Bringing AI into the classroom

Nation 1d ago

AI roadmap by 3Q this year

China 15h ago

China’s open source AI is ‘a catalyst for global progress’, Jensen Huang says

AI companies typically train their chatbots using supercomputers packed with 16,000 specialised chips or more. But DeepSeek said it needed only about 2,000. — AP

SAN FRANCISCO: Last month, US financial markets tumbled after a Chinese startup called DeepSeek said it had built one of the world’s most powerful artificial intelligence systems using far fewer computer chips than many experts thought possible.

AI companies typically train their chatbots using supercomputers packed with 16,000 specialised chips or more. But DeepSeek said it needed only about 2,000.

As DeepSeek engineers detailed in a research paper published just after Christmas, the startup used several technological tricks to significantly reduce the cost of building its system. Its engineers needed only about US$6mil (RM26.7mil) in raw computing power, roughly one-tenth of what Meta spent in building its latest AI technology.

What exactly did DeepSeek do? Here is a guide.

How are AI technologies built?

The leading AI technologies are based on what scientists call neural networks, mathematical systems that learn their skills by analysing enormous amounts of data.

The most powerful systems spend months analysing just about all the English text on the Internet as well as many images, sounds and other multimedia. That requires enormous amounts of computing power.

StarPicks

EDUPRENEURS IN THE MAKING�

About 15 years ago, AI researchers realised that specialised computer chips called graphics processing units, or GPUs, were an effective way of doing this kind of data analysis. Companies like Silicon Valley chipmaker Nvidia originally designed these chips to render graphics for computer video games. But GPUs also had a knack for running the math that powered neural networks.

As companies packed more GPUs into their computer data centers, their AI systems could analyse more data.

But the best GPUs cost around US$40,000 (RM177,491), and they need huge amounts of electricity. Sending the data between chips can use more electrical power than running the chips themselves.

How was DeepSeek able to reduce costs?

It did many things. Most notably, it embraced a method called “mixture of experts.”

Companies usually created a single neural network that learned all the patterns in all the data on the Internet. This was expensive, because it required enormous amounts of data to travel between GPU chips.

If one chip was learning how to write a poem and another was learning how to write a computer program, they still needed to talk to each other, just in case there was some overlap between poetry and programming.

With the mixture of experts method, researchers tried to solve this problem by splitting the system into many neural networks: one for poetry, one for computer programming, one for biology, one for physics and so on. There might be 100 of these smaller “expert” systems. Each expert could concentrate on its particular field.

Many companies have struggled with this method, but DeepSeek was able to do it well. Its trick was to pair those smaller “expert” systems with a “generalist” system.

The experts still needed to trade some information with one another, and the generalist – which had a decent but not detailed understanding of each subject – could help coordinate interactions between the experts.

It is a bit like an editor’s overseeing a newsroom filled with specialist reporters.

And that is more efficient?

Much more. But that is not the only thing DeepSeek did. It also mastered a simple trick involving decimals that anyone who remembers his or her elementary school math class can understand.

There is math involved in this?

Remember your math teacher explaining the concept of pi. Pi, also denoted as π, is a number that never ends: 3.14159265358979 ...

You can use π to do useful calculations, like determining the circumference of a circle. When you do those calculations, you shorten π to just a few decimals: 3.14. If you use this simpler number, you get a pretty good estimation of a circle’s circumference.

DeepSeek did something similar – but on a much larger scale – in training its AI technology.

The math that allows a neural network to identify patterns in text is really just multiplication – lots and lots and lots of multiplication. We’re talking months of multiplication across thousands of computer chips.

Typically, chips multiply numbers that fit into 16 bits of memory. But DeepSeek squeezed each number into only 8 bits of memory – half the space. In essence, it lopped several decimals from each number.

This meant that each calculation was less accurate. But that didn’t matter. The calculations were accurate enough to produce a really powerful neural network.

That’s it?

Well, they added another trick.

After squeezing each number into 8 bits of memory, DeepSeek took a different route when multiplying those numbers together. When determining the answer to each multiplication problem – making a key calculation that would help decide how the neural network would operate – it stretched the answer across 32 bits of memory. In other words, it kept many more decimals. It made the answer more precise.

So any high school student could have done this?

Well, no. The DeepSeek engineers showed in their paper that they were also very good at writing the very complicated computer code that tells GPUs what to do. They knew how to squeeze even more efficiency out of these chips.

Few people have that kind of skill. But serious AI labs have the talented engineers needed to match what DeepSeek has done.

Then why didn’t they do this already?

Some AI labs may be using at least some of the same tricks already. Companies like OpenAI do not always reveal what they are doing behind closed doors.

But others were clearly surprised by DeepSeek’s work. Doing what the startup did is not easy. The experimentation needed to find a breakthrough like this involves millions of dollars – if not billions – in electrical power.

In other words, it requires enormous amounts of risk.

“You have to put a lot of money on the line to try new things – and often, they fail,” said Tim Dettmers, a researcher at the Allen Institute for Artificial Intelligence in Seattle who specialises in building efficient AI systems and previously worked as an AI researcher at Meta.

“That is why we don’t see much innovation: People are afraid to lose many millions just to try something that doesn’t work,” he added.

Many pundits pointed out that DeepSeek’s US$6mil covered only what the startup spent when training the final version of the system. In their paper, the DeepSeek engineers said they had spent additional funds on research and experimentation before the final training run. But the same is true of any cutting-edge AI project.

DeepSeek experimented, and it paid off. Now, because the Chinese startup has shared its methods with other AI researchers, its technological tricks are poised to significantly reduce the cost of building AI. – ©2025 The New York Times Company

Topic:

AI Technology

Is this article useful?

Report a mistake

What is the issue about?

Spelling and grammatical error

Factually incorrect

Story is irrelevant

Email (optional)

Thank you for your report!

Next In Tech News

Others Also Read

MULTIMEDIA: Social media leads the way for Chinese tourists in Malaysia

starplus19 Jul 2025

Symbol	Open	High	Low	Last	Chg	%Chg	Vol ('00)
HSI-CWGI	0.155	0.155	0.140	0.145	-0.010	-6.45	1,765,439
HSI-CWGS	0.120	0.120	0.110	0.110	-0.005	-4.35	1,502,886
HSI-CWGX	0.205	0.205	0.195	0.200	-0.005	-2.44	733,926
HSI-PWJF	0.225	0.225	0.180	0.180	-0.015	-7.69	664,140
HSI-CWIC	0.245	0.245	0.240	0.240	0.005	2.13	590,146
HSCEI-CAA	0.285	0.285	0.275	0.275	0.000	0.00	580,835
SUNCON	5.840	5.840	4.930	5.200	-0.780	-13.04	565,521
NEXG	0.495	0.510	0.490	0.500	0.005	1.01	562,423
PHARMA	0.170	0.225	0.170	0.210	0.055	35.48	493,336
HSI-PWHU	0.100	0.100	0.090	0.095	-0.015	-13.64	445,136
HSI-PWH2	0.195	0.195	0.195	0.195	-0.030	-13.33	400,220
SRIDGE	0.210	0.230	0.205	0.220	0.020	10.00	278,635
SUNCON-C34	0.230	0.230	0.175	0.200	-0.070	-25.93	242,077
HSI-PWHH	0.110	0.110	0.095	0.100	-0.015	-13.04	232,446
SUNWAY-C43	0.150	0.150	0.130	0.145	-0.020	-12.12	210,256

How did DeepSeek build its AI with less money?

EDUPRENEURS IN THE MAKING�

Next In Tech News

Others Also Read

MULTIMEDIA: Social media leads the way for Chinese tourists in Malaysia

More stingrays, fewer sharks for sale at Singapore’s fishery ports, say researchers

Hepatitis: Infections that drive liver disease

An urban collection by BRDB

Escape stories and royal secrets: Inside the Netherlands' historic castles

Raid on Subang Jaya private party nets four suspected drug dealers

Do not cross the ‘red line’, Anwar warns foreign powers including US

Indonesian exporters seek to split tariffs with US buyers to ensure demand

Singer Charli XCX weds The 1975 drummer George Daniel in intimate ceremony

Wan Ahmad Farid was sole candidate for Chief Justice, no last minute changes made, says Anwar

Survey finds Thai public blames monks' misconduct for Buddhism’s decline

Intraday short-selling of SunCon shares suspended after price plunge

StarPicks

Honor Magic V5 joins the competitive foldable market

A degree without borders

An urban collection by BRDB

Market Summary

FBM KLCI

16,393,664

Market Movers

Want to listen to full audio?

Thank you for downloading.

How did DeepSeek build its AI with less money?

Related News

Related stories:

Related News

Next In Tech News

Others Also Read

Trending in Tech

Market Summary

FBM KLCI

16,393,664

Want to listen to full audio?

Thank you for downloading.