The role of AI agents in the workplace: Expectations vs reality


Artificial intelligence agents have generated much buzz in the tech industry, but do they live up to the hype? — Image by DC Studio on Magnific

The year is 2026, and a fleet of virtual helpers touting the ability to manage the entirety of a person’s digital life is starting to sound less like the far-fetched musings of a distant future and more like something that could be just around the corner.

A report by Wired in April highlighted developers working on an artificial intelligence (AI) agents meant to socialise with other AI agents on a user’s behalf. The ­digital avatars were being trained using a person’s publicly available online data, alongside any additional information they choose to provide.

The end goal of these agents is to become capable of identifying compatible people before handing things off to human counterparts for introductions, matching users with potential friends, professional contacts, or even romantic partners.

Meanwhile, consulting firms such as McKinsey & Company and Accenture are anticipating the technology’s transformative potential in the finance and banking sectors in the near future.

All of this makes it seem like AI agents’ widespread use and utility an inexorable part of the future. However, some of the more advanced users in the AI enthusiast space have already found themselves knee deep in the technology, getting a taste of what it’s capable of, and where it still falls short for the time being.

According to Daren Tan, founder of the DeveloperKaki IT community and CEO of IT firm Alphv Technologies, AI agents have taken up a sort of complementary role in the workplace, putting humans in charge of checking the AI agents’ output rather than personally writing things up from scratch.

“The biggest change is that a lot of work that used to be a blank page is now an editing job. On the coding side, I’m describing what I want and reviewing a draft rather than typing every line, which has genuinely compressed how long it takes to ship a feature,” he says.

Daren notes that his team spends less time producing first drafts and more time on judgement. — DAREN TAN
Daren notes that his team spends less time producing first drafts and more time on judgement. — DAREN TAN

Where the technology has been deployed for coding, Tan says tasks that previously took one or two days can now be wrapped up in a matter of hours.

Shee Tze Jin, an AI and machine learning specialist at Taylor’s University, shares a similar experience with AI agents, saying that they have played a major role in work, beyond just saving time, but also enabling him to monitor and process far more information than he would normally be able to ­handle alone.

He has deployed a mix of cloud-based (OpenAI Codex and Claude Code) and locally hosted (OpenClaw with IlmuAI) models based on his needs.

“Rather than relying on a single AI system, I use different AI agents for different purposes. Cloud-based SOTA (state-of-the-art) models excel at research, information gathering, and large-scale automation, while locally hosted ones provide ­greater privacy, control, and trust when working with sensitive data,” he says.

Tan believes that AI agents have even more to offer in roles involving the processing of high-volume, repetitive ­documents. Documents such as a loan offer summary or claims report may take upwards of a week to be prepared by a human, but can be drafted up by an AI agent in just a little over ten minutes, he says.

“That doesn’t mean the whole task is done in a minute. A human still checks it, but the heavy lifting of reading, extracting, and structuring is done.

“The workflow shift is that my team spends less time producing first drafts and more time on judgement, such as reviewing, catching edge cases, and handling the unusual files the agent isn’t confident about,” he says.

Brand-new bugbears

While there are significant efficiency gains from deploying AI agents, Tan points out that the work itself does not vanish, but it instead moves to the setting up of an AI agent, and ensuring that its output is accurate.

“Up front, there’s prompt and pipeline engineering – getting the agent to behave consistently across thousands of slightly different documents takes real effort, and you only do that well by iterating.

“On the back end, there’s review and correction, which in our domains is non-negotiable. A wrong figure on a loan offer summary or a misread genetic result isn’t a typo; it’s a serious problem,” he says.

Tan also stresses that the technology itself isn’t foolproof, saying that “the classic case is the confident-but-wrong output”. Such incidents involve an AI agent producing an output that appears convincing and accurate but is wrong.

“We’ve had cases where an agent misinterpreted an unusual document layout or ‘filled in’ a detail that wasn’t in the source, and tracing that back and fixing it took longer than if we’d just done that one document by hand.

“On the coding side, I’ve had agents send me down the wrong path on a tricky bug, plausible fixes that didn’t address the real cause, and I lost time before going back to first principles.

“The lesson we’ve taken from this is to design the system so the agent flags low-­confidence cases for a human rather than guessing, and to never use agents as a ­single point of failure on anything that matters,” he says.

While there is a need for oversight, Tan believes that in cases where there is a high ­volume of structured tasks, such as preparing documents, deploying an AI agent to handle them would still be faster than ­producing them manually from scratch.

Shee says that earlier generations of AI models, such as GPT-4o and early Gemini models, had a much higher risk of hallucinations. — SHEE TZE JIN
Shee says that earlier generations of AI models, such as GPT-4o and early Gemini models, had a much higher risk of hallucinations. — SHEE TZE JIN

On the other hand, Shee believes that these concerns are becoming less relevant as AI models continue to improve.

“With earlier generations of AI models, such as GPT-4o and early Gemini models, there was a much higher risk of hallucinations. The AI could confidently provide an answer that sounded completely believable but was actually wrong.

“Because of this, developers often had to build additional verification layers into their workflows. Today, these workflows are commonly referred to as “harnesses” – systems that validate, cross-check, and ­filter AI-generated outputs before presenting a final answer to the user.

“In the early days, I spent a significant amount of time designing these harnesses. The AI might need to check ­multiple sources, review its own work, or even use another model to verify the result before I could trust the output,” he says, adding that understanding what each AI system is capable of is key.

The way Lynnette Tee sees things, it all comes down to how complex the tasks given to an AI agent are in the first place. Tee is a product lead at an AI startup and founder of the Coderpuffs IT community group, who deploys a mix of AI models under Claude Code and Cursor in her workflow.

“The more complex a task, the more time and effort you must put into making your prompt as airtight as possible to ­minimise the faultiness of the output. I would argue that with or without AI, it has always been imperative to define the scope of a task clearly before executing it.

“Let’s take the example of building a new website. Before AI, you’d have to make every single decision – the layout, the messaging, the colours and fonts – by hand. With AI, if you don’t clearly define them upfront, it will fill in the gaps for you, which you’d have to then tweak later anyway,” she says.

Tee also highlights that some AI agents, like Claude Code or Codex, even have ­planning modes built in, where they will obtain clarification on the job at hand to output a more refined product.

In her workflow, she uses her AI agents to perform early groundwork, such as designing prototypes and researching autonomously, though areas that affect the product’s direction are reviewed manually.

Lynette believes that the more complex a task, the more time and effort one must put into making their prompt as airtight as possible to minimise the faultiness of the output. — AZLINA ­ABDULLAH/The Star
Lynette believes that the more complex a task, the more time and effort one must put into making their prompt as airtight as possible to minimise the faultiness of the output. — AZLINA ­ABDULLAH/The Star

This has largely freed her from the more hands-on aspect of coding, placing her more in a strategic role that looks at planning out the big picture, aside from some minor edits and tweaks.

Shee further says that the latest generation of state-of-the-art AI agents has become significantly more sophisticated, with the ability to reason for longer, break down complex tasks into smaller steps, ­perform self-­verification, and identify mistakes before presenting a result.

“That said, I do not believe AI should operate completely without oversight. My view is that humans should be on the loop rather than in the loop. The goal is not for a human to review every single step the AI takes, but rather to define the objectives, monitor the outcomes, and intervene when necessary.

“As AI systems become more capable, humans spend less time supervising the process and more time evaluating the final results and making decisions.

“In my experience, the amount of ­manual oversight required has decreased significantly over the last six months. The role of humans is gradually shifting from checking every action to providing strategic direction and accountability,” he says.

Tan shares similar thoughts, saying that with these limitations in mind, he is comfortable with trusting AI agents to handle repetitive tasks that follow a standard structure, where errors are easy to spot and fix.

These tasks should also be low-stakes in each step, such as extracting specific information from a document, drafting a standardised section of a document or code, summarising, and updating ­formatting.

“Where there is a person’s health, money, or legal position involved, a human reviews and signs off before anything goes out every time. A DNA health report, an insurance claim outcome, a loan offer summary that a law firm will rely on: the agent drafts, but a qualified person checks and approves.

“We treat the agent as a very fast ­junior who produces the draft, never as the final authority. In regulated areas like these, I’d argue that human accountability isn’t just good practice; it’s essential that a client needs to know a person stands behind the document,” he says.

Counting costs

When it comes to the actual costs involved with deploying AI agents, Tan breaks it down into a few main categories. “There’s the usage cost of API (Application Programming Interface) fees for the cloud models, or the ­hardware and hosting for the locally run ones, which is a larger fixed cost but keeps sensitive data in-house.

“Then there’s the engineering cost, which people underestimate, including building the pipeline, testing it against real documents, and maintaining it as client requirements change, and there’s the ongoing human review cost, which is a ­deliberate part of our model,” he says.

Tan further adds that the hardware costs incurred in setting things up are usually one-off and are paid off over time as the system continues to be used.

“Where the maths gets worse is on one-off or highly unusual tasks. If something only happens a handful of times, the time spent prompting, checking, and correcting can eat up the whole savings. The efficiency comes from volume and repetition, not from any single use,” he says.

Cloud models typically require tokens – units of data that an AI model processes – in order to perform tasks. While many cloud AI providers give users a set number of free tokens to use, additional usage typically has to be purchased, with costs dependent on how many tokens are consumed.

According to Shee, usage can be easy to lose track of, especially when it comes to AI agents deployed through APIs, compared with a typical chatbot or large language model.

“AI agent workflows can consume significantly more tokens than a ­typical chatbot interaction because the agent may perform multiple rounds of reasoning, web searches, tool usage, and self-verification before producing a result.

“Some advanced agents can easily consume millions of tokens during a single task if they are not configured properly. As a result, users can ­accumulate unexpectedly large bills very quickly without realising it,” he says, adding that those first ­starting can try ­subscription-­ based services as costs are usually predictable and ­easier to manage.

When it comes to deploying a setup involving an API-based AI agent, both Shee and Tan advise setting hard spending caps and budget alerts at the account level to prevent surprises.

They also advise tracking usage closely instead of waiting for the monthly bill, and monitoring how many tokens are spent on specific documents or tasks to assess whether a workflow is cost-effective or needs further optimisation.

Follow us on our official WhatsApp channel for breaking news alerts and key updates!

Others Also Read