What are AI agents actually doing?


Systems from OpenAI, Anthropic and other companies can generate, test and edit computer code, letting experienced programmers automate many tasks they once performed on their own. — Photo by Levart_Photographer on Unsplash

SAN FRANCISCO: When OpenAI unveiled ChatGPT at the end of 2022, it kicked off the chatbot boom. Then last year, new systems from OpenAI and Anthropic incited a new technological push with so-called AI agents that can perform tasks like personal digital helpers.

Now, a San Francisco startup called Arena, which tracks hundreds of thousands of artificial intelligence users, is trying to take some of the mystery out of what exactly those digital tasks are.

The company’s service, Agent Mode, showed that over the past few weeks, people used agents for code-writing tasks about 17% of the time. Roughly 10% of the time, the company said, people used agents to do research.

Research agents were closely followed by agents that create images, generate documents like graphs and spreadsheets or brainstorm ideas. About 5% of the time, the users applied agents for creative writing or tutoring and education. Other areas included code debugging, which is related to building software, and chatting.

Systems from OpenAI, Anthropic and other companies can generate, test and edit computer code, letting experienced programmers automate many tasks they once performed on their own. Agents can also spend minutes or even days researching specific topics via the wider internet, including finance, healthcare, the law and practically anything else.

Some of these tasks overlap with what a chatbot can do. But the main difference with an agent is that it can use other software apps on behalf of users, including spreadsheets, calendars and email programs.

“An agent can access the internet, search the web, create files and even access other AI models to complete its work,” said Arena’s CEO, Anastasios Angelopoulos, a co-founder of the startup.

In Silicon Valley, some people treat these bots almost as employees they can delegate work to at any time of day. Many AI researchers, tech executives and pundits believe that agents could soon replace white-collar office workers.

In February, Block, the financial technology company that owns Square, Cash App and Tidal, said it was cutting 40% of its workforce as it anticipated the rise of this kind of technology. This was perhaps the most striking example of a company’s eliminating employees because of what AI may soon do.

The rub is that this digital employee can handle only some tasks – and sometimes, it is less than reliable. Like chatbots, AI agents can make mistakes and exhibit completely unexpected behaviour.

These mistakes can get particularly dicey when people use agents to send emails, texts and other instant messages. For that reason, Arena does not allow the people it tracks to connect their agents to email programs and messaging apps. (The company is selling its data and analysis of that data.)

The company also prevents people from using agents outside a digital “sandbox,” which prevents agents from doing serious harm on people’s computers. If let outside a sandbox, agents can accidentally delete files and software apps.

But the company’s service gives an indication of how often agents get things wrong. About 8% of the time, agents said they had completed a task when they hadn’t, Arena said. Because many tasks build on one another, the company added, this kind of agent “bluffing” or “blustering” can compound and create greater errors.

“The models will just say, ‘Yeah, I did this.’ But they lied, and they didn’t do it,” Angelopoulos said. “They might say they created a file, and then it’s not there.”

Arena also compares the technologies offered by OpenAI, Anthropic and other companies. The most effective agents are driven by OpenAI’s GPT-5.5 High technology, according to Arena’s data.

The next most effective technology was Anthropic’s Claude Opus 4.7 Thinking. These technologies, Arena said, were significantly more effective than those from Google, the leading Chinese companies and Elon Musk’s xAI. – ©2026 The New York Times Company

This article originally appeared in The New York Times.

Follow us on our official WhatsApp channel for breaking news alerts and key updates!

Others Also Read