DeepSeek touts new training method as China pushes AI efficiency


Such publications from DeepSeek have foreshadowed the release of major models in the past. — Reuters

DeepSeek published a paper outlining a more efficient approach to developing AI, illustrating the Chinese artificial intelligence industry’s effort to compete with the likes of OpenAI despite a lack of free access to Nvidia Corp chips.

The document, co-authored by founder Liang Wenfeng, introduces a framework it called Manifold-Constrained Hyper-Connections. It’s designed to improve scalability while reducing the computational and energy demands of training advanced AI systems, according to the authors.

Such publications from DeepSeek have foreshadowed the release of major models in the past. The Hangzhou-based startup stunned the industry with the R1 reasoning model a year ago, developed at a fraction of the cost of its Silicon Valley rivals. DeepSeek has since released several smaller platforms but anticipation is mounting for its next flagship system, widely dubbed the R2, expected around the Spring Festival in February.

Chinese startups continue to operate under significant constraints, with the US preventing access to the most advanced semiconductors essential to developing and running AI. Those restrictions have forced researchers to pursue unconventional methods and architectures.

What Bloomberg Intelligence says

DeepSeek’s forthcoming R2 model – which could launch in the next few months – has potential to upend the global AI sector again, despite Google’s recent gains. Google’s Gemini 3 model overtook OpenAI in November to claim a top-3 slot in LiveBench’s ranking of global large language model (LLM) performance. China’s low-cost models, which are developed at a fraction of the cost of competitors, claimed two slots in the top-15.

– Robert Lea and Jasmine Lyu, analysts

DeepSeek, known for its unorthodox innovations, published its latest paper this week through the open repository arXiv and open-source platform Hugging Face. The paper lists 19 authors, with Liang’s name appearing last. 

The founder, who’s consistently steered DeepSeek’s research agenda, has pushed his team to rethink how large-scale AI systems are conceived and built.

The latest research addresses challenges such as training instability and limited scalability, noting that the new method incorporates "rigorous infrastructure optimisation to ensure efficiency.”

Tests were conducted on models ranging from 3 billion to 27 billion parameters, building on ByteDance Ltd’s 2024 research into hyper-connection architectures.

The technique holds promise "for the evolution of foundational models,” the authors said. – Bloomberg

 

 

 

Follow us on our official WhatsApp channel for breaking news alerts and key updates!

Next In Tech News

New app helps you sit up straight while at your computer
Dispose of CDs, DVDs while protecting your data and the environment
'Just the Browser' strips AI and other features from your browser
How do I reduce my child's screen time?
Anthropic buys Super Bowl ads to slap OpenAI for selling ads in ChatGPT
Chatbot Chucky: Parents told to keep kids away from talking AI dolls
South Korean crypto firm accidentally sends $44 billion in bitcoins to users
Opinion: Chinese AI videos used to look fake. Now they look like money
Anthropic mocks ChatGPT ads in Super Bowl spot, vows Claude will stay ad-free
Tesla 2.0: What customers think of Model S demise, Optimus robot rise

Others Also Read