AI chatbots are still struggling to reliably develop software


The team looked at how 11 chatbots or large language models (LLMs) fared when assigned 44 pieces of work across 18 structured outputs. — dpa

NEW YORK: It's long been assumed that low-level software developers are set to be in the first wave of professions to be made redundant by artificial intelligence.

But three years since the advent of AI in the workplace, it appears that AI coding tools are still finding even basic software development to be a struggle.

Researchers at Canada's University of Waterloo are questioning "how reliably AI systems can assist developers" following recent research into the matter.

The latest advances have seen big players such as Anthropic, Google and OpenAI introduce what some call "structured outputs" – fine tunings that compel chatbots to follow established coding formats such as JSON and XML.

But the coding skill of AI "is not as reliable as many developers had hoped," according to the Waterloo team, which found a 75% accuracy to be the best the most advanced models could manage.

In other words, when AI is asked to help with coding, it gets something wrong around one in every four times in a best-case scenario.

The team looked at how 11 chatbots or large language models (LLMs) fared when assigned 44 pieces of work across 18 structured outputs.

"We found that while they do okay with text-related tasks, they really struggle on tasks involving image, video or website generation," said Waterloo's Dongfu Jiang.

The findings lend weight to the view that AI still has some way to go before it can code reliably or begin to make software developers redundant.

The chatbots "are not yet reliable enough to operate without human oversight," according to the Waterloo researchers, whose work was published in the journal Transactions on Machine Learning Research.

According to Morgan Stanley, AI-powered coding will create "new opportunities for developers and software companies alike," potentially meaning more jobs in the sector.

Published in November 2025, the investment bank’s survey of hirers in the industry came after the California-based Model Evaluation and Threat Research found coders to be around 20% slower at getting through tasks when they use AI. – dpa

Follow us on our official WhatsApp channel for breaking news alerts and key updates!

Next In Tech News

Bessent calls Anthropic’s Mythos a breakthrough in China AI race
ASML lifts 2026 outlook on the back of stronger AI demand
Exclusive-Russia-linked hackers compromised scores of Ukrainian prosecutors’ email accounts, data shows
Lumen CEO says AI bots are taking over the Internet
Spanish court shelves German TV star's sexualised deepfake complaint
Iran offers limited Internet in rare move to stem war losses
Federal agencies skirt Trump’s Anthropic ban to test its advanced AI model, Politico reports
Anthropic draws VC interest at up to $800 billion valuation, Business Insider reports
AI chatbots give misleading medical advice 50% of the time, study finds
Alibaba readies first robot for foray into crowded Chinese arena

Others Also Read