The Google researchers found that the reasoning models generate internal multi-agent debates, which they termed 'societies of thought'. — SCMP
New Google research into DeepSeek and Alibaba Cloud’s artificial intelligence models has found that powerful reasoning models capable of “thinking” demonstrated internal cognition resembling the mechanisms underpinning human collective intelligence.
The findings published on Thursday suggested that perspective diversity, not just computational scale, was responsible for the increasing “intelligence” of AI models, while also underscoring the growing importance of Chinese open models for cutting-edge interdisciplinary research in the US.
Through experimentation with DeepSeek’s R1 and Alibaba Cloud’s QwQ-32B models, the researchers found that these reasoning models generated internal multi-agent debates, which they termed “societies of thought”, in which the interplay of distinct personality traits and domain expertise gave rise to greater capabilities.
“We suggest that reasoning models establish a computational parallel to collective intelligence in human groups, where diversity enables superior problem-solving when systematically structured,” the researchers said in their paper published on open-access online repository arXiv.
Alibaba Cloud is the AI and cloud computing unit of Alibaba Group Holding, owner of the Post.

The study, which has not been peer-reviewed, was conducted by four researchers from Google’s “Paradigms of Intelligence” research team, which explores the nature of intelligence through interdisciplinary methods.
Junsol Kim, a PhD candidate in sociology at the University of Chicago, led the study, while Blaise Agüera y Arcas, a Google vice-president, was listed as the final author.
Reasoning models that “think” through tasks have become the dominant type of foundational AI system since ChatGPT developer OpenAI introduced its o-series of models in September 2024.
Such models, designed to “think” by using more computational resources during deployment, have contributed to a significant increase in AI capabilities while lowering the cost of intelligence, according to benchmarking firm Artificial Analysis.
The Google researchers based their findings on analysis of the Chinese models’ “reasoning traces” – the intermediate step-by-step outputs generated by reasoning models before their final response, which were first exposed to users when Hangzhou start-up DeepSeek released its first reasoning model R1 a year ago.
The models’ reasoning traces mimicked “simulated social interactions” including questioning, perspective taking and reconciliation, the researchers said. When the models were steered towards being more conversational with themselves, their reasoning accuracy improved.
These findings could shift how AI models are conceptualised, from “solitary problem-solving entities towards collective reasoning architectures, where intelligence arises not merely from scale but the structured interplay of distinct voices”, the researchers said.
Google is considered one of the world’s leading AI companies, with its latest flagship reasoning model, Gemini Pro 3, developed by its foundational model team DeepMind, one of the most powerful in the world, according to Artificial Analysis.
Their latest study using Chinese models as the experimental subjects reflects the growing reliance on Chinese open-weight models in US academia, including at top institutions such as Stanford University. According to Chai Wenhao, a PhD candidate in computer science at Princeton University, classes at his university are almost exclusively using Chinese models as there are few open US models of comparable performance. – South China Morning Post
