Reliable AI at your fingertips: how we built universal ReAct agents that just work

Understanding ReAct

ReAct, short for Reasoning and Acting, is a prompting technique that enhances the capabilities of AI agents by combining chain-of-thought reasoning with action execution. This approach allows the agent to iteratively reason about a problem and execute actions, refining its responses through a dynamic and interactive process.

In practice, a ReAct agent usually follows this algorithm:

Initial prompt: The agent receives an initial prompt or question.
Loop:

Reasoning: The agent reasons through the task by figuring out what the next step should be, given the information so far.
Action Execution: Based on its reasoning, the agent executes an action (usually calls a tool).
Observation: The agent observes the results of the executed action.

3. Final Output: When the agent reasons that the gathered information is enough, it provides a final, refined response or solution.

This method leads to more reliable and sophisticated responses, as the agent can adjust its approach based on real-time feedback and new information, mimicking human problem-solving strategies. All this makes ReAct agents a great go-to solution for a wide range of applications.

The problem with existing implementations

You'll think that newer chat models with function calling capabilities, like GPT-4o or Claude 3.5, are ideal for applications centered around action execution, like ReAct. And that's true! So there should be a ready-to-use implementation, right?Popular agent frameworks have built-in ReAct agents: you can check out the ones from Langchain (https://python.langchain.com/v0.1/docs/modules/agents/agent_types/react/) and LlamaIndex (https://docs.llamaindex.ai/en/stable/examples/agent/react_agent/). If you look at these implementations, you'll see they don't utilize multi-message input or function calling APIs at all, hoping that the model would generate a valid JSON string with tool input, or even just a single input string (hey Langchain, what about tools with multiple inputs?).These agents are too simplistic, and while they technically utilize the ReAct logic, they aren't robust and reliable enough for any real application. That's why we decided to create a good general-purpose ReAct agent for MotleyCrew, which we thought would be as straightforward as combining an existing prompt and the use of a function calling API.

How we built our agent

We started with some standard ReAct prompt, which described the way the agent should reason before acting and listed the available tools. We immediately found that the LLM didn't comply to the rules and tended to just call tools without describing its reasoning.

After a few rounds of refining the prompt and telling the model to NEVER CALL TOOLS WITHOUT THINKING FIRST, it became obvious that this approach is not sufficient. The LLM skipped the reasoning step more often than not.

Another approach we tested was to make two separate LLM calls on each iteration: a reasoning one and an acting one. On the reasoning step, the model didn't have access to any tools (but it had their text descriptions), and had to output a thought. It was followed by the acting step, where the prompt instructed the LLM to either call a tool or output the final result, based on the preceding thought. This somewhat complex agent worked OK, and we even used it for some time, before finding that it wasn't really reliable, with the model often becoming confused by the reasoning instructions.

After all this, we decided to treat the model like a human, which means to show it the right way besides telling. We included example messages that illustrated what we expect the LLM output to look like. We then included a message indicating that the AI accepted these rules:

To our surprise, this approach worked! We managed to get the model to follow the rules in most cases.

However, sometimes, especially when solving advanced tasks, the LLM still skipped the reasoning step. It could be because the initial prompt and the examples became diluted by the message history in the context. So we thought, why don't we append a reminder to the context every time, which will tell the model what it should do right now?

This worked like a charm. Given a clear instruction what to do at the moment, the LLM consistently returned a valid thought followed by the relevant action.

Finally, we tried to use the agent with Claude models from Anthropic. It worked, but we noticed that it wasn't as reliable as with OpenAI models. The LLM sometimes skipped the reasoning step, and sometimes omitted the "Thought:" prefix and instead enclosed the message in XML tags, like `<thinking></thinking>`. Then we remembered that Anthropic models are trained to understand and use such tags in prompts and outputs (https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags). So we took advantage of this feature, modifying the prompts specifically for Claude.

The result

After several days of researching, coding and prompt-engineering, we believe we finally succeeded in creating universal ReAct agents for OpenAI and Anthropic models that just work. They are both reliable and user-friendly, capable of delivering high-quality, consistent results, making them an invaluable tool for AI engineers.https://motleycrew.readthedocs.io/en/latest/agents.html#react-tool-calling-agentGive them a try, whether you're just experimenting with agents or building an AI application. We're sure they will save you a good bit of time and effort!

Reliable AI at your fingertips: how we built universal ReAct agents that just work

Understanding ReAct

The problem with existing implementations

How we built our agent

The result

How to hack your dependencies

Why I avoid Python's asyncio (by Egor)

Why too much Pydantic can be a bad thing