The shifting sweet spot for AI agents and multi-agent approaches

AI agents and multi-agent frameworks have been making the headlines lately. Should you use one in your next project? And in the rapidly shifting landscape of Generative AI, will these approaches be as useful in half a year or a year's time as they are now?

First of all, let's define our terms. In the discussion that follows, an "agent" is not (necessarily) a "personality" with a backstory and all that - starting your LLM prompt with "you are an experienced customer support agent" or "you are a genius-level theoretical physicist", doesn't make the application agentic. For the purpose of this article let us define an agent as a system where the LLM can request to execute some action against an external source (for example, query a database, execute code, ask a human) and get back a result. That is, in an agent the LLM's output determines the next step: which of the tools available to it should be called, or whether we should consider the task done and terminate; and there is a feedback loop from the results of that step to the LLM.

Often, a single such agent is not enough to handle complex tasks (see here on when that can happen). Then, we need to coordinate multiple agents, either by supplying agents as tools to other agents, or by using them as nodes in a state machine or in an event-driven computation graph.

Why do we believe that the space of natural use cases for agents, and multi-agent approaches, is shifting from what it was only, say, half a year ago?

First of all, many traditional use cases are still valid but are becoming less important, for the following reasons. One key use case for the agent pattern is a validation-retry loop, so an agent is trying, for example, to write code, getting back the results of executing that code, and trying again based on those. That can still be useful, but the space of use cases here is shrinking as the common types of validation (such as producing a valid json, or producing valid code) are increasingly done at training time, reducing the need for validation at inference time (such as json mode in OpenAI models, and now Reinforcement Learning with Execution Feedback).

Another key application domain for agents is when it's not clear what tool to call: this can be useful, especially when interacting with a human or when planning; but most production applications are likely to focus on doing a small, well-defined set of tasks well, rather than providing a "universal agent" - and in that case, if the task at hand allows it, you'll generally get a more reliable system by making the choices deterministic, that is writing explicit rules for what the sequence of LLM calls and external API calls you want to make.

An important reason to go agentic is to split a prompt with a large set of instructions, only some of which may be relevant for a particular instance of the task, into many sub-prompts, and give most of those sub-prompts to LLM-driven tools used by the main agent (some of which tools are themselves be agents). This remains a valid use case, but also a shrinking one as the LLMs themselves become better at processing complex instructions. A case in point is OpenAI’s o1, which does more inference-time reasoning without the need for an agentic wrapper.

What about multi-agent orchestration itself? Here we would argue that this is not really a separate domain, but rather a subdomain of computer orchestration. State machines, event-driven computation systems, and the like have been around for a long time, and have mature existing frameworks to support them. The biggest difference at the moment, from a structural standpoint, is that many earlier use cases were bottlenecked by local compute availability, whereas in LLM applications one spends most of the time waiting for the LLM to reply - not a game-changing difference as far as compute architecture design is concerned. So it seems likely to us that people will increasingly rediscover existing ways to orchestrate and scale computation, such as Ray, and will extend those to add some LLM-specific functionality on the margin, instead of further investing in completely new frameworks for doing the same things, but with LLMs.

Does that mean that agent and multi-agent frameworks are doomed? Of course not.

So where do we see them thriving in the future? The obvious use case for the agent pattern, that is only going to become more widespread, are human-facing assistants that are designed to do a large array of tasks, choosing the API to call based on user requests. However we wouldn't really describe this as a use case for multi-agent architectures, because those APIs may themselves have an agent behind them, or they may not - it makes no difference to the design of the human-facing agent. The other domain where we expect multi-agent architectures to become more prominent is in local setups using SLMs. While in the past the "Small" language models were often not capable enough to use in an agent, they are improving fast, and specialized tool-using ones are being built. Thus, just as the flagship LLMs become so sophisticated that the agent pattern becomes less needed in many cases, we can expect the same set of tricks and patterns to become more relevant in the SLM-driven and locally hosted applications.

AI agents are a fascinating new pattern, whose usefulness shifts as the capacities of the underlying LLMs keep evolving. Keeping track of these changes will let you choose the right patterns for your next LLM-using project.

The shifting sweet spot for AI agents and multi-agent approaches

How to hack your dependencies

Why I avoid Python's asyncio (by Egor)

Why too much Pydantic can be a bad thing