The Dawning Age of Agents LLM-based agents that act autonomously are making rapid progress. Here's what we have to look forward to.

Published

Mar 06, 2024

Reading time

3 min read

Dear friends,

Progress on LLM-based agents that can autonomously plan out and execute sequences of actions has been rapid, and I continue to see month-over-month improvements. Many projects attempt to take a task like “write a report on topic X” and autonomously take actions such as browsing the web to gather information to synthesize a report.

AI agents can be designed to take many different types of actions. Research agents (like many projects built on AutoGPT, GPTresearcher, or STORM) search the web and fetch web pages. A sales representative agent might dispatch a product to a user. An industrial automation agent might control a robot.

So far, I see agents that browse the web progressing much faster because the cost of experimentation is low, and this is key to rapid technical progress. It’s cheap to fetch a webpage, and if your agent chooses poorly and reads the wrong page, there’s little harm done. In comparison, sending a product or moving a physical robot are costly actions, which makes it hard to experiment rapidly. Similarly, agents that generate code (that you can run in a sandbox environment) are relatively cheap to run, leading to rapid experimentation and progress.

Although today’s research agents, whose tasks are mainly to gather and synthesize information, are still in an early phase of development, I expect to see rapid improvements. ChatGPT, Bing Chat, and Gemini can already browse the web, but their online research tends to be limited; this helps them get back to users quickly. But I look forward to the next generation of agents that can spend minutes or perhaps hours doing deep research before getting back to you with an output. Such algorithms will be able to generate much better answers than models that fetch only one or two pages before returning an answer.

Even when experimentation is quick, evaluation remains a bottleneck in development. If you can try out 10 algorithm variations quickly, how do you actually pick among them? Using an LLM to evaluate another LLM's output is common practice, but prompting an LLM to give very accurate and consistent evaluations of text output is a challenge. Any breakthroughs here will accelerate progress!

An exciting trend has been a move toward multi-agent systems. What if, instead of having only a single agent, we have one agent to do research and gather information, a second agent to analyze the research, and a third to write the final report? Each of these agents can be built on the same LLM using a different prompt that causes it to play a particular, assigned role. Another common design pattern is to have one agent write and a second agent work as a critic to give constructive feedback to the first agent to help it improve. This can result in much higher-quality output. Open-source frameworks like Microsoft’s AutoGen, Crew AI, and LangGraph are making it easier for developers to program multiple agents that collaborate to get a task done.

I’ve been playing with many agent systems myself, and I think they are a promising approach to architecting intelligent systems. A lot of progress has been made by scaling up LLMs, and this progress no doubt will continue. But big ideas are sometimes made up of many, many little ideas. (For example, you might arrive at an important mathematical theorem via lots of little derivation steps.) Today’s LLMs can reason and have lots of “little ideas” in the sense that they take in information and make basic inferences. Chain-of-thought prompting shows that guiding an LLM to think step-by-step — that is, to string together many basic inferences — helps it to answer questions more accurately than asking it to leap to a conclusion without intermediate steps.

Agent programming models are a promising way to extend this principle significantly and guide LLMs to have lots of little ideas that collectively constitute bigger and more useful ideas.

Keep learning!
Andrew

P.S. New short course: “Open Source Models with Hugging Face,” taught by Maria Khalusova, Marc Sun, and Younes Belkada! Hugging Face has been a game changer by letting you quickly grab any of hundreds of thousands of already-trained open source models to assemble into new applications. This course teaches you best practices for building this way, including how to search and choose among models. You’ll learn to use the Transformers library and walk through multiple models for text, audio, and image processing, including zero-shot image segmentation, zero-shot audio classification, and speech recognition. You’ll also learn to use multimodal models for visual question answering, image search, and image captioning. Finally, you’ll learn how to demo what you build locally, on the cloud, or via an API using Gradio and Hugging Face Spaces. Please sign up here

Subscribe to The Batch