July 17, 2025

Navigating Modern LLM Agent Architectures

A guide to understanding commonly used LLM Agent architectures and their distinct advantages and disadvantages: Multi‑Agents, Plan-and-Execute, ReWOO, Tree-of-Thoughts, and ReAct

Published by:

Jack Spolski

Santiago Pourteau

‍Building an AI agent to tackle complex tasks requires more than just prompting a single large language model (LLM) for an answer. Recent research and frameworks (like LangChain’s LangGraph) have introduced various architectures to extend an LLM’s capabilities. These architectures help break down problems, integrate external knowledge, and coordinate reasoning steps. In this post, we’ll explore several prominent agent patterns, ReAct (Reason + Act), Multi-Agent systems, Plan-and-Execute, ReWOO (Reasoning Without Observation) and Tree-of-Thoughts and discuss when to use each approach.
‍

Imagine you’re building an AI assistant to answer a complex research question: “Who won the 2024 Australian Open, and what is their hometown? Also, did that person collaborate with any researchers from University X?”). Answering this involves multiple steps: finding the tennis winner, finding their hometown, then checking for academic collaborations – a mix of factual lookup and multi-step reasoning. Let’s see how each architecture could approach such a task, and what unique strengths they bring.
‍

ReAct (Reason + Act): Think Step-by-Step, Act When Needed

‍

ReAct is one of the earliest and most influential agent patterns for LLMs. ReAct stands for “Reasoning and Acting” and it marries the idea of chain-of-thought reasoning with tool use. A ReAct agent operates in a loop that goes: Thought → Action → Observation, and then repeats this cycle until the task is complete. In simpler terms, the agent thinks out loud, decides on an action (like calling a tool or giving a final answer), observes the result of that action, and then uses that new information to inform the next thought.
‍

To illustrate, let’s apply ReAct to our example question (winner and hometown and collaboration):
‍

Thought 1: “The question asks for the 2024 Australian Open winner and their hometown, then about collaborations with University X. I should first find out who won the 2024 Australian Open men’s title.”
‍
Action 1: Call the search tool with query “2024 Australian Open men’s winner”.
‍
Observation 1: The search returns: “The men’s singles champion in 2024 was Jannik Sinner.”
‍
Thought 2: “The winner is Jannik Sinner. Now I need his hometown. Let me find that.”
‍
Action 2: Search for “Jannik Sinner hometown”.
‍
Observation 2: The results say: “Jannik Sinner’s hometown is Sexten, in northern Italy.”
‍
Thought 3: “So Sinner is from Sexten, Italy. Next, did he collaborate with anyone at University X? He’s a tennis player, likely not an academic, but maybe University X did a study with him or he gave a talk there. I should search for Sinner and University X together.”
‍
Action 3: Search for “Jannik Sinner University X collaboration”.
‍
Observation 3: No obvious results about collaborations (likely none exist if it’s a random university).
‍
Thought 4: “It appears he has no direct collaborations with University X. I should answer with the info I have: he won the 2024 Australian Open, hometown Sexten, and presumably no notable collaboration with that university.”
‍
Action 4: Finish with the compiled answer.
‍

This sequence shows how ReAct allows the agent to adapt at each step based on observations. If the first search had already included hometown info, the agent might have skipped a step. If the collaboration search turned up something unexpected (say he did participate in an event at University X), the agent would incorporate that into the answer. This dynamic looping is what gives ReAct its robustness: it closely mimics how a human problem-solver alternates between thinking and doing, adjusting plans on the fly.
‍

Key characteristics of ReAct:
‍

It uses an implicit short-term memory: after each action, the agent essentially asks itself “What happened? Have I achieved my goal, or what should I do next?”. The observation is fed back into the next reasoning step, so the agent incrementally builds understanding.
‍
The format is often implemented with a special prompt template that guides the LLM to produce output in the form: “Thought: …; Action: …; Observation: …” repeatedly. This structured prompting is how the LLM knows to alternate between reasoning text and tool-using instructions.
‍
ReAct was shown to reduce hallucinations compared to chain-of-thought alone, by grounding reasoning with actions (like looking things up). It also improved performance on interactive tasks (like text-based games) by letting the model fetch information and observe environment changes in between thoughts.
‍

When to use ReAct:
‍

This pattern is a great default for many scenarios where an LLM needs to use tools or perform multi-step reasoning but you don’t have a very long or well-defined plan from the outset. It’s particularly useful if each next step depends heavily on what you find in the previous step. If the task has a solution a few steps away and you won’t need an elaborate plan or parallel steps, ReAct provides a simple and effective control flow.
‍

Most frameworks (LangChain, LangGraph, etc.) support ReAct-style agents readily, and it’s often the starting point for tool-using agent development. In LangGraph’s context, for example, create_react_agent is a utility that sets up an agent with given tools and the ReAct prompt structure. Under the hood, the agent will loop through Thought/Action/Observation until it emits a final answer. This design pattern has influenced many derivatives and enhancements (like ReWOO, Reflexion, etc.), but it remains a foundational technique for “interactive” LLM agents.
‍

Plan-and-Execute: First Plan the Steps, Then Get to Work

‍

If a task is complex and requires long-term planning, a single LLM agent might struggle to keep the goal in sight by reasoning one step at a time. Plan-and-Execute architectures address this by explicitly planning out a multi-step solution first, then executing each step sequentially. This paradigm is inspired by the “Plan-and-Solve” approach in recent research. It usually involves two phases:
‍

Planning phase: The agent generates a list of steps or sub-tasks needed to achieve the goal. This plan is formulated before taking any action. For example, given our question, a planning agent might produce: (1) Identify the 2024 Australian Open men’s winner; (2) Find that winner’s hometown; (3) Check if this person has collaborations with University X; (4) Compile the findings. This plan gives a high-level roadmap.
‍
Execution phase: Another component (an executor agent) then tackles each step one by one. After each step, the system can optionally re-plan or adjust remaining steps based on new information. In our example, if step (3) yielded no obvious collaborations, the planner could refine the query or add a step (e.g., broaden the search criteria).

Advantages: A plan-and-execute agent has two big strengths: (1) it enforces explicit long-term planning (which even advanced LLMs can struggle with if left to improvise step-by-step), and (2) it lets you use different models for different stages. For instance, you could use a powerful (but slower) LLM to devise the plan, and a smaller (cheaper) model to carry out each execution step. This way, the heavy lifting of strategy is done by the best reasoning model, but repetitive sub-tasks can be handled more efficiently.
‍

Compared to a ReAct agent that “thinks” one step, acts, then re-thinks in a loop, plan-and-execute commits to a full strategy up front. This can make it more efficient in terms of API calls – the agent doesn’t have to repeat the entire conversation history to the LLM at every single step, since the plan is decided in one go. However, the trade-off is that if the initial plan is flawed or the task environment changes, the agent must detect that and replan. Plan-and-execute works best for tasks where a reasonable plan can be formulated initially and the problem is complex enough to warrant that planning (e.g. coding a multi-module program, performing a research project, or our multi-part question). For simpler tasks or ones that require immediate adaptation to new information, a reactive approach might suffice.
‍

In practice, implementing plan-and-execute often involves a Planner prompt and an Executor loop. The planner might output a numbered to-do list, and the executor will go through them, possibly using tools (like web search, calculators, etc.) to complete each item. LangGraph’s example of this architecture shows exactly this pattern: the agent first produces a plan and then yields a trace of solving each step (e.g., find winner -> result, then find hometown -> result, and so on). The result is a coherent solution built step-by-step, but guided by an upfront game plan.
‍

Compared to ReAct, Plan-and-Execute can be more efficient and less token-intensive for long tasks because it commits to a clearly defined strategy from the start, reducing the repeated prompting overhead of step-by-step reasoning. However, this initial commitment means Plan-and-Execute may require explicit replanning if unexpected results appear mid-task—for instance, if the search reveals the Australian Open winner is unknown or controversial. In such scenarios, while a ReAct agent naturally adapts by revising each step on the fly, a Plan-and-Execute agent would pause, reassess, and explicitly adjust the plan before continuing. This makes Plan-and-Execute particularly well-suited to structured tasks with predictable sub-steps, though potentially less agile in highly dynamic situations.
‍

ReWOO (Reasoning Without Observation): Plan the Tool Usage in One Go

‍

ReWOO (short for Reasoning WithOut Observation) is a relatively new agent design that can be seen as an optimization of the ReAct paradigm. That means the LLM is called every cycle to decide the next action, consuming tokens to reconsider the entire history each time. ReWOO streamlines this by having the LLM plan out the entire sequence of tool calls in one pass, before executing any of them.
‍

In other words, ReWOO asks the LLM to produce something like a script: a chain of step-by-step instructions with the specified tools and inputs, possibly using placeholders for results of earlier steps. Only after the plan is written does the agent execute the tools in order, and finally the LLM may be called one more time to synthesize the final answer from all the gathered observations. It’s “without observation” in the sense that the initial reasoning doesn’t stop to incorporate intermediate observations – it assumes tools will work and uses variable placeholders (#E1, #E2, etc.) to reference their future outputs.
‍

Why do this? The ReWOO paper by Xu et al. highlighted a couple of benefits:
‍

Efficiency: By generating the full plan (the chain of tool uses) at once, you avoid the repetitive prompt overhead of calling the LLM multiple times. A ReAct agent has to send the entire dialogue history plus the new observation into the model at each step, which adds a lot of redundant tokens. ReWOO’s planner uses one prompt to outline all steps, saving time and tokens. For our example, a ReWOO planner might output:
‍
Plan: “Use Google to find 2024 Australian Open winner” → #E1 = Google["2024 Australian Open men’s winner"] ‍
Plan: “Extract the winner’s name from #E1” → #E2 = LLM["Who won the 2024 Australian Open? Given #E1"] ‍
Plan: “Use Google to find #E2’s hometown” → #E3 = Google["<Name> hometown"] ‍
Plan: “Check for collaborations at University X” → #E4 = Google["<Name> University X collaboration"] ‍
(Here #E1, #E2... are placeholders that will be filled with actual results when executed.) The LLM produced all these in one go, instead of four separate reasoning calls as ReAct would require.
‍
Simplified training/fine-tuning: Because the planning doesn’t depend on seeing actual tool outputs (it just predicts what tools should do), you could train the planner model on plan examples without needing a live environment for each step. In theory, one could fine-tune an LLM to produce good plans from questions alone, since the plan format uses placeholders (so you don’t need real-time search during training). This modularizes the problem: the “planner” can be improved independently, and the “executor” just follows instructions.
‍

After planning, execution is typically straightforward: each Tool[...] step is run in sequence by a Worker module. Finally, a Solver module (often another LLM call) takes all the results #E1, #E2... and composes the final answer. In our case, the solver might take the winner’s name, hometown, and any collaboration info found, and then formulate a user-friendly answer.
‍

Comparison with Plan-and-Execute: ReWOO and plan-and-execute share the idea of an upfront plan, but they differ in granularity. Plan-and-execute might create high-level steps (“Find winner”, “Find hometown”, “Check collaborations”) and then rely on an executor agent (which could itself be a ReAct agent) to figure out each step. ReWOO, on the other hand, plans including the tool specifics for each step (it might even decide which search queries to run, as above, using precise placeholders. Thus, ReWOO is more tightly coupled: the plan knows the exact sequence of tool usage. It’s almost like an LLM-generated program that the agent then runs.
‍

Whereas plan-and-execute might replan after each step if needed, ReWOO by design tries to avoid re-planning during execution (it’s without intermediate observation). If a tool result is unexpected or the plan was wrong, a naive ReWOO agent could fail unless you add extra logic to catch errors. In practice, one might combine ReWOO planning with a check – for example, if a step fails or returns nothing, maybe call the planner again or fall back to a ReAct loop. So, ReWOO trades some adaptability for speed and efficiency in scenarios where you can reasonably predict the sequence of actions. For tasks like iterative web search or multi-hop questions, ReWOO can be a big win in latency. LangGraph’s tutorial shows that the LLM context for later steps can be kept minimal (thanks to the #E variable substitution) – the LLM doesn’t need to see the entire history each time, just the placeholder reference.
‍

In summary, ReWOO is like a sibling of ReAct. Both handle multi-step tool use, but ReWOO plans the whole tool itinerary upfront. In our tennis query, a ReAct agent might have stopped after finding the winner to decide the next move, whereas ReWOO already had the “find hometown, then check collaborations” in mind from the start. This makes ReWOO more goal-directed and potentially faster, at the cost of being less interactive with intermediate results.
‍

Tree-of-Thoughts: Searching Through Many Reasoning Paths
‍

‍

Sometimes a single line of reasoning isn’t enough – the agent might need to consider multiple possible approaches or solutions and then converge on the best one. Tree-of-Thoughts (ToT) is an algorithmic prompting technique that extends the idea of Chain-of-Thought by introducing a search over the space of thoughts. Instead of the agent following one train of thought deterministically, ToT allows it to branch out, try different partial solutions, evaluate them, and refine further, much like exploring a search tree.
‍

Here’s how Tree-of-Thoughts works conceptually:
‍

Expand: At a given state of reasoning, generate multiple candidate thoughts or solutions (instead of one). These could be different possible next steps or hypotheses. For instance, if our agent is trying to figure out if the tennis champion collaborated with University X, it might expand by considering different angles: (a) search the champion’s publication history, (b) search news articles for both names, (c) search the university’s press releases. Each is a candidate path to explore.
‍
Score: Evaluate the generated candidates for promise or quality. This could be done by a heuristic, another learned model, or even by the LLM itself via a prompt (e.g., “rate which approach seems most likely to find the info”). The scoring mechanism gives each partial solution a “reward” or confidence.
‍
Prune: Keep only the top K candidates and discard the rest. This focuses the search on the most promising avenues and controls combinatorial explosion.
‍
Repeat: The process returns to Expand from each remaining candidate state, building out the “tree” further, until a satisfactory solution is found or a depth limit is reached.

In essence, Tree-of-Thoughts transforms the problem-solving into a tree search (often implemented as breadth-first search, but depth-first or other strategies can be used too). It incorporates an element of reflection or evaluation at each level – the agent doesn’t blindly commit to the first thought; it systematically explores alternatives.

For our scenario, ToT might not be as necessary if the task is straightforward (the winner and hometown question is more factual). But imagine a tougher case: perhaps a riddle or a puzzle, or a planning problem with many possible plans. ToT shines in deliberative problem solving where multiple reasoning chains should be compared. For example, if the user’s query was a tricky puzzle (“Using these numbers, get 24” – which is literally the example in the LangGraph ToT tutorial), a ToT agent would generate several equations (candidates) that evaluate to 24, score them (maybe by checking if they indeed equal 24 and use all numbers), prune the wrong ones, and iterate. This methodical search yields a correct solution more reliably than a single-pass attempt that might get stuck in a dead end.

‍

Even for planning tasks, ToT can explore different plan outlines. For instance, to answer a complicated open-ended question, one branch might represent one way of structuring the answer (e.g., by chronological narrative), another branch a different structure (e.g., by categories), and the agent can evaluate partial attempts and decide which route produces the best answer.

‍

Relationship to other architectures: Tree-of-Thoughts is less an “agent with tools” framework and more a meta-strategy for prompting. It can be combined with tool use or purely mental simulation. In practice, ToT could wrap around a ReAct or ReWOO style approach – at each reasoning step, instead of one thought, generate several. In LangGraph, an implementation called Language Agent Tree Search (LATS) combines aspects of ToT with ReAct and planning. The core idea is giving the model the ability to backtrack and try alternatives in a structured way, rather than committing to a single chain-of-thought from the start.

‍

To use ToT effectively, you need a way to score partial solutions. For factual tasks, an automatic check might be possible (like verifying a mathematical result). For subjective tasks, you might rely on the LLM to self-evaluate (“Does this draft answer the question fully? Score 1-10.”). Another option is to use a second model or heuristic rules as a judge.

‍

In summary, Tree-of-Thoughts is like giving the agent a breadth of imagination. It’s powerful for complex reasoning or creative tasks where many different thoughts could lead to a solution. The cost is more computation (generating and evaluating multiple options), so it’s beneficial when a straight-line approach might fail or when you can’t easily break the task into deterministic sub-tasks. Our example query likely wouldn’t need a full ToT, but if we twisted it into a puzzle (“find a link between the tennis champion and the university”), a ToT agent might explore various hypothesis chains and ensure it finds a valid connection (or correctly concludes none exists) with higher confidence.

‍

Multi‑Agent Systems: Divide and Conquer with Specialized Agents

‍

‍

Real-world projects often require an AI to perform diverse sub-tasks or handle large context, which can overwhelm a single monolithic agent. Multi-agent architectures tackle this by splitting the problem among multiple smaller, specialized agents that work together. Each agent can be designed for a specific role or expertise – for example, one agent focused on web research, another on data analysis, and another on summarization.

‍

Why multi-agent? As an AI application grows complex, you may encounter problems such as: too many tools or APIs for one agent to manage, a context that’s too large for one prompt, or the need for different skill sets (e.g. a “planner” vs. a “math expert”). By breaking the application into independent agents, you gain:

‍

Modularity: each agent can be developed and tested in isolation, simplifying maintenance.
‍
Specialization: an agent can be optimized (or fine-tuned) for a domain or task, improving performance on that aspect.
‍
Explicit Control: you decide how agents communicate or hand off tasks, rather than relying on a single complex prompt.
‍

In our tennis question example, a multi-agent solution might deploy one agent to handle sports queries (“Who won the 2024 Australian Open?”) and another to handle academic queries (“Did that person collaborate with University X?”). A supervising agent could orchestrate the workflow: first activating the sports agent to get the winner and hometown, then passing that info to the academic agent. This modular approach keeps each agent’s context focused, reducing confusion and errors.

‍

Multi-agent systems can be organized in different ways. Common patterns include a network (any agent can call any other next), a supervisor or hierarchy (one master agent decides which specialized agent to invoke next), or treating sub-agents as tools that a main agent can call. The choice depends on whether your problem has a clear sequence/hierarchy of tasks or is more free-form. Regardless of structure, the key benefit is clear: by delegating sub-tasks to dedicated agents, the overall system becomes more scalable and easier to manage.

‍

Choosing the Right Architecture for Your Use Case

‍

Now that we’ve surveyed these architectures, how do they stack up for your project? The truth is, there’s no one-size-fits-all – each has a niche where it shines.

‍

ReAct: Great for moderate complexity tasks where adaptivity is important. It’s simple and mimics human step-by-step problem solving. Use this if you aren’t sure how many steps a task will take or what tools you’ll need until you start exploring. It’s like having the AI figure it out as it goes, which is very powerful for open-ended queries.
‍
ReWOO: Best when you can predict the sequence of tool calls needed and want efficiency. It reduces overhead by planning the whole tool usage upfront. Use ReWOO for tasks like multi-hop questions or tool-using workflows that are routine and can be templated (e.g., first search X, then ask LLM to analyze Y, then search Z). It will save costs and time by cutting out redundant prompt steps, though it’s a bit less forgiving if something unexpected occurs mid-way.
‍
Plan-and-Execute: Ideal for long-horizon tasks that naturally decompose into sub-tasks. If your AI needs to, say, execute a multi-step job (coding a program, conducting a multi-part analysis, planning an event, etc.), this gives structure. The clear separation of planning and doing helps maintain focus on the big picture. Also consider this when you want to leverage different models for different parts of the task (e.g., a powerful planner with a fast worker). It’s a bit heavier to implement (since you manage two agents in effect) but pays off for complex projects.
‍
Tree-of-Thoughts: Use this when solving the problem might require creative thinking or search, like puzzles, optimization, or strategic planning. If a straightforward approach might fail or get stuck in a local optimum, ToT can explore many possibilities and find a better solution It’s most relevant in scenarios where you can evaluate partial solutions or where multiple lines of reasoning should be compared. Keep in mind it can be resource-intensive; you wouldn’t use ToT for every query, but for those particularly tricky ones where breadth of search is needed, it’s invaluable.
‍
Multi-Agent Systems: When your application naturally breaks into roles or requires multitasking, a multi-agent setup can be more scalable and maintainable. For instance, a complex assistant might have a “Researcher” agent for info gathering, a “Planner” agent to organize solutions, and a “Responder” agent to interact with the user. Multi-agent architectures introduce overhead in orchestrating agents, but they offer a way to manage complexity through clear separation of concerns. Use multi-agents if a single agent is getting overwhelmed (many tools, large contexts, diverse tasks) or if you want to incorporate domain experts (maybe even fine-tuned models) for different subtasks.

‍

To circle back to our initial question use-case: a well-designed system might actually combine several of these patterns. For example, you might use a planner agent to outline the high-level approach with Plan and Execute, then spawn a research agent utilizing ReAct to dynamically gather and evaluate information, perhaps even spinning up a verification agent to double-check findings, and finally a writer agent to compile the answer. This would represent a multi-agent orchestration, internally leveraging architectures like ReAct or ReWOO for each agent's reasoning. While this may be overly complex for the tennis question scenario, it demonstrates how these architectures can complement one another effectively.
‍

In practice, start with the simplest architecture that fits your needs and iterate. If your single agent is faltering due to complexity, consider breaking it into multi-agents. If it’s hallucinating facts, add retrieval. If it’s too slow or costly with many steps, try a planning approach or ReWOO to cut down model calls. And if it’s getting stuck on hard problems, give it a Tree-of-Thoughts boost to widen its search.
‍

Each of these approaches was born out of specific shortcomings of a naïve LLM solution, and understanding those motivations will help you choose the right tool for the job. By leveraging the right architecture (or combination thereof), you can build AI systems that are more reliable, efficient, and capable of tackling the complex tasks you throw at them – whether it’s answering intricate questions, solving puzzles, or executing multi-step workflows.
‍

References

‍

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. Presented at the International Conference on Learning Representations (ICLR).
Xu, B., Peng, Z., Lei, B., Mukherjee, S., Liu, Y., & Xu, D. (2023). ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models.
Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R. K.-W., & Lim, E.-P. (2023). Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models.
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models.
Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., Awadallah, A., White, R. W., Burger, D., & Wang, C. (2023). AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation.
‍