- AiNews.com
- Posts
- Anthropic Shares Detailed Best Practices for Building Effective AI Agents
Anthropic Shares Detailed Best Practices for Building Effective AI Agents
Image Source: ChatGPT-4o
Anthropic has published comprehensive best practices for building large language model (LLM) agents, emphasizing simplicity and composable patterns over complex frameworks. Drawing from extensive collaborations with customers across industries, the company provides valuable guidance for developers seeking to create efficient and reliable agentic systems.
Defining AI Agents and Workflows
Anthropic categorizes agentic systems into two main types:
Workflows: Systems where LLMs follow predefined code paths to orchestrate tools and tasks.
Agents: Systems where LLMs dynamically direct their own processes, allowing for flexible decision-making and tool usage.
These distinctions help developers decide whether to implement workflows for predictable tasks or agents for scenarios requiring adaptability and model-driven reasoning.
When to Use Agents
When building applications with LLMs, Anthropic recommends starting with the simplest solutions and escalating complexity only when necessary. Agentic systems, while powerful, often trade speed and cost-efficiency for better performance on complex tasks. Key considerations include:
Workflows: Best for predictable, well-defined tasks requiring consistency and efficiency.
Agents: Ideal for open-ended problems requiring model-driven decision-making and adaptability at scale.
Tools and Frameworks
Various frameworks simplify agentic system development, such as:
LangGraph by LangChain
Amazon Bedrock’s AI Agent Framework
Rivet (GUI workflow builder)
Vellum (complex workflow builder)
While these tools offer convenience, Anthropic advises developers to first build with direct LLM API calls to maintain transparency and control.
These frameworks simplify the process of building agentic systems by handling foundational tasks such as calling LLMs, defining and parsing tools, and chaining prompts together. However, they often introduce additional layers of abstraction that can obscure the prompts and responses, making debugging more challenging. They may also encourage unnecessary complexity when a straightforward setup would be more effective.
To avoid these pitfalls, we recommend developers begin with direct LLM API usage, as many patterns can be implemented with just a few lines of code. If you choose to use a framework, take the time to thoroughly understand its underlying mechanics. Misunderstandings about how the framework operates are a frequent cause of implementation errors.
You can view Anthropic's cookbook for some sample implementations.
Building Blocks of Agentic Systems
At the core of agentic systems lies the augmented LLM, enhanced with retrieval capabilities, memory, and tools. Anthropic's models can leverage these capabilities autonomously—formulating their own search queries, choosing the most suitable tools, and deciding which information to retain.
Anthropic recommends tailoring these augmentations to specific use cases and ensuring they have clear, well-documented interfaces. There are various ways to implement these augmentations, but one effective approach is through our recently introduced Model Context Protocol. This protocol enables developers to seamlessly integrate with a growing ecosystem of third-party tools using a straightforward client implementation.
Throughout the rest of this post, we’ll assume that each LLM call incorporates these augmented capabilities.
Common Agentic Workflows:
In this section, we’ll dive deeper into the common workflows used in agentic systems, explaining what each entails, when it’s best applied, and real-world examples to help clarify their purpose and usage.
Prompt Chaining
What it is:
Prompt chaining involves breaking down a larger task into smaller, sequential steps. Each step is handled by an individual LLM call that processes the output of the previous step. This approach ensures that each LLM call is focused on a specific subtask, leading to more manageable and accurate results.
When to use this workflow:
Prompt chaining is ideal for tasks that can be easily decomposed into a series of well-defined steps. If each step in a task can be outlined clearly and performed sequentially, prompt chaining is an efficient way to break down complexity. It’s especially useful when the goal is to optimize for accuracy, even at the expense of some added latency.
Examples where prompt chaining is useful:
Marketing Copy: Generating an initial marketing copy, then reviewing and refining it, followed by translating it into a different language.
Document Creation: Creating a document outline, checking the outline against certain criteria, and then drafting the document based on the outline.
Routing
What it is:
Routing involves classifying an input and directing it to the appropriate follow-up task. This workflow allows an agent to differentiate between various types of tasks and assign each one to the most suitable process, tool, or model for handling it.
When to use this workflow:
Routing works best for tasks that can be clearly categorized and handled separately. It is useful when you need to optimize performance for different types of inputs, and where the classification step can be accurately done by an LLM or a traditional model.
Examples where routing is useful:
Customer Service: Directing customer queries into different categories, such as refund requests, technical support, or general inquiries, and then handling them with specialized models or processes.
Model Optimization: Routing common questions to smaller models for quicker responses and directing more complex queries to larger, more powerful models like Claude 3.5 Sonnet for improved accuracy.
Parallelization
What it is:
Parallelization allows multiple tasks to be performed simultaneously, either by sectioning a task into independent subtasks that run in parallel or by voting which runs the same task multiple times to get diverse outputs. This approach helps improve speed or ensure more reliable results by leveraging multiple perspectives.
When to use this workflow:
Parallelization is effective when a task can be divided into subtasks that can be processed simultaneously, or when multiple attempts are needed to improve confidence in the results. It’s especially useful for tasks with complex, multi-faceted considerations or when you want diverse outputs for better accuracy.
Examples where parallelization is useful:
Sectioning: Implementing guardrails by using one LLM instance to process user queries while another screens the input for inappropriate content. This ensures better performance by focusing on each task separately.
Voting: Reviewing a piece of code for vulnerabilities by running different LLMs with different prompts and flags the code if there's a problem, and then aggregating the results to ensure comprehensive vulnerability detection. Reviewing code for vulnerabilities or evaluating content for appropriateness can involve using multiple prompts to identify issues, ensuring diverse perspectives and balancing false positives and negatives with varied vote thresholds.
Orchestrator-Workers
What it is:
The orchestrator-workers workflow involves a central LLM (the orchestrator) that breaks down tasks dynamically and delegates them to worker LLMs. After the workers complete their subtasks, the orchestrator synthesizes the results and ensures that the task is completed correctly.
When to use this workflow:
This workflow is ideal for tasks where you cannot predict the exact subtasks needed in advance, or where the number of steps required may vary depending on the specific inputs. Unlike parallelization, the orchestrator-workers workflow is more flexible, allowing for dynamic delegation based on real-time inputs.
Examples where orchestrator-workers is useful:
Software Development: A coding task that requires modifications to several files. The orchestrator can delegate changes to specific files based on the input task description and then consolidate the results.
Search Tasks: An orchestrator collects data from multiple sources, analyzes it, and identifies the most relevant information, adapting to the task’s needs as it progresses.
Evaluator-Optimizer
What it is:
In the evaluator-optimizer workflow, one LLM generates an initial response, and another LLM evaluates and provides feedback. The evaluation helps refine the output through iterative rounds, where the evaluator suggests improvements based on specific criteria.
When to use this workflow:
This workflow is particularly useful when you have clear evaluation criteria and where iterative refinement offers significant value. It's effective in situations where the quality of the output can be measurably improved through feedback and adjustments.
Examples where evaluator-optimizer is useful:
Literary Translation: Translating text where the initial output may not fully capture all the nuances of the source material. An evaluator LLM reviews and refines the translation to ensure accuracy and context.
Complex Search Tasks: Running multiple rounds of searches and evaluations to gather comprehensive information, where the evaluator determines if further searches are necessary or if the task can be completed.
Each of these workflows serves a different need depending on the complexity and requirements of the task. Understanding when and how to implement these workflows allows developers to build more efficient and adaptable agentic systems that can handle a wide range of tasks effectively. As with all AI system design, the key is to start simple, iteratively optimize, and add complexity only when necessary to meet your goals.
Agent Implementation: Best Practices
Agents operate autonomously, planning tasks and iterating based on real-time feedback, but they aren't necessarily the most sophisticated system, it's the one that is the right system for your needs. Anthropic highlights these principles for building effective agents:
Simplicity: Avoid unnecessary complexity in agent design.
Transparency: Clearly display the agent’s decision-making steps.
Well-Designed Tools: Develop the interface thorough documentation and interfaces for tools.
Practical Applications
Anthropic identifies two key areas where agents add significant value, but there are many more use cases across various industries and applications:
Customer Support: Combining conversational AI with enhanced tool integration for actions like accessing user data, order history, or processing refunds.
Coding Agents: Autonomous problem-solving for software development tasks, such as addressing GitHub issues or iterating on solutions using automated testing feedback.
Tool Design Tips
Anthropic highlights several best practices for designing effective agent-computer interfaces (ACI). These tips ensure tools are intuitive for models to use, reducing errors and improving overall performance:
Simplify Formats: Choose tool formats that are easy for the model to use, avoiding unnecessary complexity like precise line counts or excessive string-escaping. Ensure there’s no formatting "overhead" that could hinder the model's efficiency.
Provide Examples: Include clear usage examples, edge cases, and expected input/output formats in tool documentation to guide the model’s behavior effectively.
Use Intuitive Parameters: Name parameters descriptively and clearly, as if explaining them to a junior developer, to ensure ease of use.
Test Thoroughly: Run extensive tests with diverse inputs to identify and address potential tool usage errors. Iterate on designs based on test results.
Poka-Yoke Design: Design tools to prevent errors (e.g., requiring absolute file paths instead of relative ones to avoid directory missteps).
Think Like the Model: Consider whether the tool’s functionality and instructions are immediately clear from the perspective of the LLM. If not, simplify or clarify further.
In addition, Anthropic recommends these strategies when deciding on tool formats:
Allow the model enough tokens to "think" before it begins writing to avoid forcing it into a corner.
Keep formats familiar to the model, resembling structures it has encountered naturally in internet text.
Minimize formatting complexity, ensuring tasks like counting large numbers of lines or string-escaping code aren’t unnecessarily burdensome.
By focusing on clarity, simplicity, and robust testing, developers can optimize tools to enhance agentic systems' reliability and performance.
Building the Right System
Anthropic underscores that success in the LLM space is about building the right system, not the most complex one. Developers should start with simple prompts, optimize through evaluation, and add agentic systems only when necessary. By focusing on simplicity, transparency, and robust tooling, developers can create reliable and maintainable AI agents that effectively meet user needs.
Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.