No longer the domain of sci-fi or tech pranksters, translation glasses and earbuds are reality as Chinese firms fuse AI and AR hardware. — SCMP
How close are we to rebuilding the legendary Tower of Babel? China is rolling out a suite of AI tools that may soon make cross-language communication seamless.
In the Bible’s Book of Genesis, humanity came together to build a tower that would reach the heavens. To disrupt their ambitious project, God introduced one simple change: people would speak different languages.
Chaos engulfed the construction site, and the unfinished tower came to be known as Babel – a word derived from the Hebrew for “confusion”.
The metaphor is clear: language barriers hinder human collaboration.
Science fiction author Douglas Adams took the idea further in The Hitchhiker’s Guide To The Galaxy. He introduced the Babel Fish – a small, yellow fish that, when placed in the ear, could instantly translate any alien language, complete with accents and dialects. With it, interstellar diplomacy and even cross-species romance became possible.
Today, with the rapid advance of AI translation, the Babel Fish no longer seems far-fetched. Translation apps support most of the world’s languages, while large language models can generate real-time bilingual subtitles for videos.
“What we have now are cloud-based Babel Fish that rely on Internet connectivity. But translation models have matured enough to become deployable directly on smartphones, enabling real-time translation during calls,” said an artificial intelligence expert from Honor AI, who asked not to be named.
His team’s latest research was presented at Interspeech 2025, a top international conference on audio technology held in Rotterdam in August, and has been integrated into commercially available smartphones.
To fit the computational and power constraints of mobile devices, these on-device translation models have been miniaturised, but their performance remains uncompromised, according to the expert.
He said the system, while occupying just 800 megabytes of memory, recognises seven languages in real time – even in low-connectivity environments such as lifts and underground parking garages – and could operate offline for enhanced privacy, making it ideal for call translation.
“The speech model uses an autoregressive reasoning process, predicting the next word based on context. This allows it to perform streaming recognition with almost no delay – unlike traditional translators that require a full sentence before processing.”
Using the analogy of building a block tower, he said that the team modified the model with a multi-head reasoning algorithm. If the original process was akin to building one-handed, the new method used both hands, stacking two blocks at a time.
This accelerated inference boosted response speed by 38% while increasing accuracy by 16%.
“While most current translation models are designed for call translation – filtering out filler words and casual repetitions – they can also be fine-tuned to learn specialised terminology, idioms, slang and even perform translations in legal or medical fields,” he said.
According to the expert, the rise of wearable tech was accelerating the trend. Beyond smartphones and portable translators, simultaneous interpretation earbuds and smart glasses are already entering the market.
“Such IoT [Internet of Things] devices can capture subtle visual cues – gestures, expressions and other multimodal signals – that are crucial to communication. When combined with emotional reasoning in large models, they enable seamless, natural exchanges without the need to glance at a screen,” he said.
Viral videos show the human impact of AI-enhanced translation devices, such as a social media post by Chinese-American firm iTourTranslator. It shows a Chinese tourist walking into a US store wearing smart glasses that instantaneously project bilingual subtitles from the clerk’s speech.
But what once astonished both parties is now being mass-produced.
At the Beijing Culture Forum on September 23, attendees – including a former French prime minister – tried out a pair of domestically developed augmented reality (AR) translation glasses.
Weighing just 49 grams (1.7 ounces) and produced by Beijing-based LLVISION, the glasses purportedly translate more than 100 languages with a delay of under 500 milliseconds, displaying subtitles directly in the wearer’s field of vision.
Earlier, at the International Association of Science Parks and Innovation Areas (IASP) 2025 world conference on September 17, the same glasses provided real-time translation for 800 participants from 97 countries and regions.
As a fusion of Chinese AI and AR hardware, the product represents a successful example of a Chinese home-grown tech firm exporting innovation abroad, according to the Guangming Daily.
Once AI can accurately convey not only meaning but also emotion and cultural nuance, humanity may, in a sense, speak one language again. – South China Morning Post
