Memory and state in AI agents

Memory and state in AI agents

Friday, September 20, 2024

When you call an LLM model, for example, as part of running an AI agent, the only information it gets is what is contained in the prompt (plus, of course, the data it was trained on, implicit in the LLM's weights).

Very often, it's useful to include in that prompt either information about the past behavior of the agent ("memory") or about some other corpus of data beyond the prompt template. In the agentic case, usually you'll also want to allow the agent to modify that corpus of data, for example, through tool usage. That sort of external data, readable and modifiable by agents, is often referred to in the developer lingo as "external state" or simply "state" - this includes memory but goes beyond it.

This piece will discuss the different ways to manage agent state —some popular, others less often encountered (at least in our experience). Most of these can be applied both at the single-agent level and for cross-agent communication and orchestration.

The simplest kind of state is conversation history, that is including into the next prompt the full history of past messages that were both given to, and emitted by the LLM. On a single-agent level, this is the "memory" most agents have by default; an example of its usage in the multi-agent case is Autogen's multi-agent chat

This kind of memory quickly reaches its limits however. Even though LLMs these days can accept huge prompts, their quality of instruction following degrades as the prompt size grows; not to mention that LLM APIs charge by the token, and replaying the whole conversation so far into each new request leads to token consumption growing quadratically as a function of conversation length, which can get expensive quickly.

The logical next step might seem to be summarizing past conversation history. While this can be useful sometimes, its challenge is that one can't know in advance which information in the conversation history might be needed for the next step, so in many cases, this is merely a band-aid.

The next step is to store the whole conversation history somewhere apart from the agent and query the relevant pieces of it based on the user's query. This can either be done in a "push" fashion by automatically retrieving and adding some context into each LLM call by the agent (referred to as Retrieval Augmented Generation, or RAG), or in a "pull" fashion by offering a tool to the agent to query the context it chooses.

The latter is basically how MemGPT works - it uses tool calls to store chunks of knowledge from conversation history in an external store and another tool to retrieve pieces that appear relevant at a given moment. 

This is where agent "memory" meets the more general kind of external state, as the retrieval mechanism here is exactly the same as used in basic RAG: semantic similarity by comparing embeddings. So, one could describe MemGPT as an agent with a RAG tool that can also write into the RAG store. 

The retrieval from past conversation history can also be based on old-school deterministic methods; for example, in MetaGPT the agents share a global chat, but within it they can subscribe to messages of certain types emitted by certain other agents, so straightforward filtering based on discrete tags. 

So far, the external store we've considered was a collection of text snippets - either completely unstructured or, in MetaGPT's case, optionally structured (for example, containing flow diagrams represented in Mermaid syntax).

Another popular option is to have the state store be either a relational database, such as a good old Postgres instance, or a graph database. In the simplest implementation, the agent can submit arbitrary queries through the tool and get back the results. However, this can be quite fragile and error-prone, as there is no way to guarantee that the agent will formulate the queries with the right meaning or even well-formed ones. 

A much more robust way is to isolate the specific ways we want the agent to interact with the state store, and provide a custom structured tool for each of those ways, with as simple a signature as possible. 

Suppose we want the agent to write into such a database. In that case, it's even more important to give it specific tools for the particular writes it needs to do, both to constrain the accidental damage and to make it easier for the LLM to generate valid write requests. 

For example, in motleycrew's research agent, we have a tool that adds new questions to the knowledge graph as children of the present question and another tool that gathers all the questions that haven't been processed yet, etc.  

So far, we have considered two basic kinds of state: conversation history and an external database, either vector, relational, or graph, with some retrieval and write mechanisms. There is another important kind of state that is less popular, but that we found to be quite useful, namely stateful tools. This can mean either a specific tool instance used by a specific agent having a state that persists between calls or a collection of tools used by the same agent sharing a state that they update and read from as they are called. 

That shared state should usually be stored at the agent level (to avoid the usual problems that global variables cause), but it can sometimes also be global and thus used for inter-agent communication. 

An example of a single stateful tool could be a complex database query that is iteratively refined via a conversation between an agent and a user. Rather than depend on the agent to correctly remember the state of the query from last time and ask it to generate an elaborated version, we could have a tool that stores the current query state and specifies a quite rigid API to allow the agent to refine it. 

An example of a collection of tools sharing state could be one tool requesting an authentication token and storing it in the shared state, and the other tools retrieving that when called and using it to access whatever authenticated APIs they need. This way, we can guarantee that the LLM will not garble the token, and as the token has no intrinsic meaning, the LLM would not have gained anything by seeing it as part of its prompt anyway. Note that in this case the shared state doesn't directly enter the prompt at all!

The state that a LangGraph agent carries around as it walks its state machine graph is a very similar concept.

A final, special kind of shared state is event-driven orchestration, such as seen in LlamaIndex workflows or Faust. Here, the events emitted by some agents and consumed by others are the shared state that conveys information between them. You might think that these are also special in that they are transient, but that needn't be the case. If the messaging is, for example, underpinned by Kafka, as is the case in Faust, the whole message history is preserved in the Kafka log.

Thus, we could return to the beginning of the article, think of the content of all the message logs in an event-driven system as an extended kind of conversation history, and do retrieval on it in similar ways.

Most real-world applications of AI agents require them to read from some external state and, in many cases, also write to it. This piece has reviewed the most common patterns to do so and hopefully has equipped you with a mental map to make considered choices when designing your next agentic application. 

Did we miss any patterns currently in use in popular libraries? Are there any patterns that you would like to use, but haven't seen implemented yet? Please let us know in the comments!

Search