AI chatbots are still struggling to reliably develop software


The team looked at how 11 chatbots or large language models (LLMs) fared when assigned 44 pieces of work across 18 structured outputs. — dpa

NEW YORK: It's long been assumed that low-level software developers are set to be in the first wave of professions to be made redundant by artificial intelligence.

But three years since the advent of AI in the workplace, it appears that AI coding tools are still finding even basic software development to be a struggle.

Researchers at Canada's University of Waterloo are questioning "how reliably AI systems can assist developers" following recent research into the matter.

The latest advances have seen big players such as Anthropic, Google and OpenAI introduce what some call "structured outputs" – fine tunings that compel chatbots to follow established coding formats such as JSON and XML.

But the coding skill of AI "is not as reliable as many developers had hoped," according to the Waterloo team, which found a 75% accuracy to be the best the most advanced models could manage.

In other words, when AI is asked to help with coding, it gets something wrong around one in every four times in a best-case scenario.

The team looked at how 11 chatbots or large language models (LLMs) fared when assigned 44 pieces of work across 18 structured outputs.

"We found that while they do okay with text-related tasks, they really struggle on tasks involving image, video or website generation," said Waterloo's Dongfu Jiang.

The findings lend weight to the view that AI still has some way to go before it can code reliably or begin to make software developers redundant.

The chatbots "are not yet reliable enough to operate without human oversight," according to the Waterloo researchers, whose work was published in the journal Transactions on Machine Learning Research.

According to Morgan Stanley, AI-powered coding will create "new opportunities for developers and software companies alike," potentially meaning more jobs in the sector.

Published in November 2025, the investment bank’s survey of hirers in the industry came after the California-based Model Evaluation and Threat Research found coders to be around 20% slower at getting through tasks when they use AI. – dpa

Follow us on our official WhatsApp channel for breaking news alerts and key updates!

Next In Tech News

Grab to lean on scale, AI to navigate rising fuel costs, CEO says
Greece to ban social media for under-15s from 2027, PM says
Samsung to launch Galaxy A57 and A37 on April 10; prices start from RM1,899
How accurate are Google’s AI overviews?
Greece expected to announce social media ban for children under 15
Pro-Iran group takes credit for cyberattacks on Chime, Pinterest
Inside a huge compound on Thailand-Cambodia border where 10,000 workers scammed people globally
Hong Kong hospital uses VR tour to ease children’s fears ahead of operations
SK Hynix shares jump 15% after peer Samsung projects blowout earnings
T�rkiye’s parliament debates a bill to restrict access to social media for children under 15

Others Also Read