Let us know about free updates
Technology Simply sign up for Myft Digest and it will be delivered directly to your inbox.
TechWorld is Buzz on how artificial intelligence agents will augment people if they don’t replace people in the workplace. However, the current reality of Agent AI is sufficiently lacking for future promises. What happened when the humanity in the lab encouraged AI agents to run simple, automated shops? It lost money, hallucinated a fictitious bank account, and suffered a “identity crisis.” For at least for now, shopkeepers around the world can easily rest.
Humanity has developed some of the world’s most capable generation AI models, driving a frenzy of modern technology investments. The company exposes model restrictions by stress testing real applications for its credibility. In a recent experiment called Project Vend, Anthropic partnered with AI Safety Company Andon Labs to operate a vending machine at its San Francisco headquarters. The month-long experiment highlighted a co-created world that was “more curious than we expected.”
The researchers instructed the shopkeeper agent, known as the nickname Claudius, to stock 10 products. Anthropic’s Claude Sonnet 3.7 AI model was urged to make a profit by selling products. Claudius was given money, web access and humanity’s Slack channel, email address and contacts for Andon Labs. Payment was received via customer self-checkout. Like a real shopkeeper, Claudius was able to decide what to stock, how to price the item, when to restock or change inventory, and how to interact with customers.
result? Researchers concluded that if humanity diversified into the vending market, they would not hire Claudius. Vibe coding can already be a problem as it can encourage users with minimal software skills to write code to AI models. Vibrator management remains much more difficult.
The AI ​​agent made some obvious mistakes – some mediocre, some strange things – and failed to show much grasp of economic reasoning. They ignored the vendor’s special offers, sold items below the cost, and offered excessive discounts to human employees. Even more surprising, Claudius began playing the roll as a real person, inventing a conversation with Andon employees who were not present, claiming to have visited the 742 evergreen terrace (fictional speech by the Simpsons), and promised to wear a blue blazer and a red tie. Interestingly, he claimed that the incident was a joke on April Fool’s Day.
Nevertheless, human researchers suggest that this experiment can help refer to the evolution of these models. Claudius was good at sourcing products, adapting to customer demands, and resisting the attempts of troublesome human staff to “jailbreak” the system. However, as relying on customer relationship management systems, it will require more scaffolding to guide future agents. “We are optimistic about the trajectory of technology,” says Kevin Troy, a member of the Human Frontier Red Team, who carried out the experiment.
Researchers suggest that many of Claudius’ mistakes can be corrected, but admit that they still don’t know how to correct the model’s April Fool’s Day identity crisis. More testing and model redesign will be required to “ensure that sophisticated agency agents can act in a way that matches our interests.”
Many other companies already deploy more basic AI agents. For example, WPP, an advertising company, has built around 30,000 such agents to increase productivity and coordinate solutions for individual clients. However, there is a huge difference between simple, individual tasks within an organization and “agency agents” (agents such as Claudius) who are trying to interact directly with the real world and achieve more complex goals, says Daniel Halm, WPP chief executive.
Hulme co-founded a startup called Concaum to validate the knowledge, skills and experience of AI agents before they were deployed. At this point, he suggests that companies should consider AI agents to be something like “drinking alumni.”
Unlike most static software, AI agents with AIGENTS always adapt to the real world and should always be validated. However, unlike human employees, they don’t respond to paychecks, so it’s not easy to control. “There’s no leverage than agents,” says Hulme.
Building simple AI agents is now a simple and easy movement and is being carried out on a large scale. But seeing how agents with agents are used is an evil challenge.
john.thornhill@ft.com