
Former Alibaba scientist Yang Hongxia is at the forefront of an effort to evolve AI beyond DeepSeek's breakthroughs. — SCMP
A renowned professor at Hong Kong Polytechnic University (PolyU), and a former artificial intelligence (AI) scientist at Chinese tech giants ByteDance and Alibaba Group Holding, is trying to work with experts across different fields to develop “affordable” domain-specific models.
Yang Hongxia, who joined PolyU’s Department of Computing last year after decades in the technology industry, is at the forefront of an effort to use the capabilities of large language models (LLMs) in specialised applications. Her efforts come as Chinese companies, spurred by the success of start-up DeepSeek, move to open-source their AI models, giving greater access to the tech.
“While current LLMs have made impressive strides in general intelligence, they still fall short in specific domains in fields such as manufacturing and biochemistry,” Yang said in an interview with the South China Morning Post. Alibaba owns the Post.
“This gap exists because much of the relevant data for these fields hasn’t been incorporated into AI model development, because they cannot be crawled from the general web” she said. Yang added that general-purpose models require adjustments to fit specialised domains.

Yang is leading the establishment of an AI academy, which aims to drive fundamental scientific breakthroughs. The team, primarily composed of students from domestic universities including PolyU, Zhejiang University and Harbin Institute of Technology, is working on “democratising AI development”.
They aim to provide a platform where domain experts can train small AI models using entry-level graphics processing units (GPUs), available through high-performance computing centres in Hong Kong and the provinces of Zhejiang and Guangdong.
Earlier this month, the team published papers illustrating some of its progress, introducing training pipelines designed to minimise computing costs while enabling small models to perform competitive reasoning tasks within specialised fields. “It’s a domain-specific continual pre-train infrastructure”, as Yang put it, likening it to a cloud service that is both cost-effective and accessible.
This approach allows small models – ranging from 1 billion to 3 billion parameters, compared with the hundreds of billions in large models like the 671 billion-parameter DeepSeek-R1 – to complete training and achieve state-of-the-art reasoning capabilities in under 6,000 GPU hours.
“Specialised fields sometimes have been sidelined in AI development,” Yang said, citing challenges such as different data processing methods and limited access to extensive GPU resources. Her team is currently working on a cancer foundation model in collaboration with top hospitals in Zhejiang and Beijing.
MIT Technology Review’s 2025 list of breakthrough technologies highlights the increasing focus on small models in AI: “As the marginal gains for new high-end models trail off, researchers are figuring out how to do more with less. For certain tasks, smaller models that are trained on more focused data sets can now perform just as well as larger ones – if not better.”
“This approach also maximises the utility of less advanced, heterogeneous computing resources, allowing domestic chips to be more effectively used for small model training,” Yang said.
Yang’s team has pioneered a new machine learning paradigm called “model over models”, which integrates multiple domain-specific models into a single larger pivot model.

The team’s latest paper introduces InfiFusion, an efficient training pipeline that outperforms leading models including Alibaba’s Qwen-2.5-14B-Instruct and Microsoft’s Phi-4 on 11 widely used benchmarks, including reasoning, coding, maths and instruction-following tasks by focusing on small models. InfiFusion achieves superior results with just 160 H800 GPU hours, a fraction of the millions typically required for traditional LLM training, according to the paper.
“When we have enough domain-specific models and resources, we expect to see emerging capabilities besides data and test-of-time scaling,” Yang said. She likened the “model over models” paradigm to learning through “textbooks” (domain-specific models) rather than directly from data.
Regarding DeepSeek, Yang said the team behind the models that shook Wall Street last month had made significant breakthroughs in both pre-training and post-training phases. These include 8-bit floating point mixed-precision computing, which significantly improves efficiency in computation and usage of resources while maintaining model performance and improving reinforcement learning techniques. AI models commonly use 32-bit or 16-bit precision.
Yang’s team plans to focus on low-bit pre-training in the future.
She also praised DeepSeek for greater transparency than many other models on the market, offering industries across various sectors a clearer path to engaging in the AI ecosystem. DeepSeek announced last week that it would be making five of its code repositories open source to accelerate development. – South China Morning Post