AI chatbots are still struggling to reliably develop software

The team looked at how 11 chatbots or large language models (LLMs) fared when assigned 44 pieces of work across 18 structured outputs. — dpa

NEW YORK: It's long been assumed that low-level software developers are set to be in the first wave of professions to be made redundant by artificial intelligence.

But three years since the advent of AI in the workplace, it appears that AI coding tools are still finding even basic software development to be a struggle.

Researchers at Canada's University of Waterloo are questioning "how reliably AI systems can assist developers" following recent research into the matter.

The latest advances have seen big players such as Anthropic, Google and OpenAI introduce what some call "structured outputs" – fine tunings that compel chatbots to follow established coding formats such as JSON and XML.

But the coding skill of AI "is not as reliable as many developers had hoped," according to the Waterloo team, which found a 75% accuracy to be the best the most advanced models could manage.

In other words, when AI is asked to help with coding, it gets something wrong around one in every four times in a best-case scenario.

The team looked at how 11 chatbots or large language models (LLMs) fared when assigned 44 pieces of work across 18 structured outputs.

"We found that while they do okay with text-related tasks, they really struggle on tasks involving image, video or website generation," said Waterloo's Dongfu Jiang.

The findings lend weight to the view that AI still has some way to go before it can code reliably or begin to make software developers redundant.

The chatbots "are not yet reliable enough to operate without human oversight," according to the Waterloo researchers, whose work was published in the journal Transactions on Machine Learning Research.

According to Morgan Stanley, AI-powered coding will create "new opportunities for developers and software companies alike," potentially meaning more jobs in the sector.

Published in November 2025, the investment bank’s survey of hirers in the industry came after the California-based Model Evaluation and Threat Research found coders to be around 20% slower at getting through tasks when they use AI. – dpa

Topic:

AI Technology

Report a mistake

What is the issue about?

Spelling and grammatical error

Factually incorrect

Story is irrelevant

Thank you for your report!

Related News

'Agentic commerce': What are the risks when AI buys things for you?

AI 5h ago

AI chatbots are still struggling to reliably develop software

Driving Asean's next growth

Others Also Read

Thank you for downloading.

AI chatbots are still struggling to reliably develop software

Related Stories

'Agentic commerce': What are the risks when AI buys things for you?

Hotels strive to be found as AI models conduct travel search

ECB convenes banks to fix flaws exposed by AI Models, FT says

Related stories:

Related News

'Agentic commerce': What are the risks when AI buys things for you?

Hotels strive to be found as AI models conduct travel search

ECB convenes banks to fix flaws exposed by AI Models, FT says

Driving Asean's next growth

Trending in Tech

Others Also Read

Thank you for downloading.