Building AI Agents In Action Architectures, Algorithms, and Source Code,LangGraph,FastAPI, Vue,Dockr
In the era of large language models (LLMs), AI Agents have emerged as the next-generation intelligent systems that bridge the gap between static model outputs and dynamic real-world interactions. Unli
Building AI Agents In Action: Architectures, Algorithms, and Source Code, Using LangGraph, FastAPI, Vue, Docker (deploy, sandbox, browser-use, shell, file ops …)
文章目录
- Building AI Agents In Action: Architectures, Algorithms, and Source Code, Using LangGraph, FastAPI, Vue, Docker (deploy, sandbox, browser-use, shell, file ops ...)
- Preface
- Part 1: Foundations of AI Agents
- Define a simple graph: prompt → LLM → output
- Run the graph
- Part 2: Core AI Agent Technologies
Preface
1.1 The Rise of AI Agents
In the era of large language models (LLMs), AI Agents have emerged as the next-generation intelligent systems that bridge the gap between static model outputs and dynamic real-world interactions. Unlike traditional AI applications that follow fixed workflows, AI Agents possess autonomy, adaptability, and interactivity—they can perceive environmental inputs, make decisions, execute actions, and learn from feedback to accomplish complex tasks independently. From automated code generation and intelligent document processing to multi-step data analysis and cross-system orchestration, AI Agents are reshaping software development, business operations, and daily productivity tools.
This book focuses on practical AI Agent development, integrating cutting-edge tools and frameworks to help readers turn theoretical concepts into actionable systems. We will leverage LangGraph for workflow orchestration, FastAPI for backend service construction, Vue for frontend interaction, and Docker for environment encapsulation and deployment—covering the entire lifecycle of AI Agents, including architecture design, algorithm optimization, source code implementation, sandbox isolation, browser-based access, shell interaction, and file operation integration.
1.2 Who This Book Is For
This book is intended for developers, engineers, and tech enthusiasts who want to build production-grade AI Agents. Readers should have the following foundational knowledge:
-
Basic Python programming skills (familiar with functions, classes, and asyncio).
-
Elementary understanding of LLMs (e.g., GPT, Claude, open-source models like Llama 3) and their application scenarios.
-
Basic web development concepts (HTTP protocols, frontend-backend separation).
-
Optional: Familiarity with Docker, databases, and command-line operations (will be supplemented in relevant chapters).
Whether you are a backend developer looking to integrate AI capabilities, a frontend engineer exploring intelligent interaction, or a researcher aiming to prototype AI Agent systems, this book provides a systematic guide from entry to advanced practice.
1.3 Technology Stack Overview
The core technology stack adopted in this book is carefully selected to balance ease of use, scalability, and production readiness. Here’s a breakdown of key tools:
-
LangGraph: A workflow orchestration framework for AI Agents, built on LangChain. It enables stateful, cyclic, and branching workflows—critical for modeling Agent decision-making processes (e.g., “think-execute-reflect” loops).
-
FastAPI: A high-performance, easy-to-use Python backend framework. It powers the Agent’s API layer, enabling efficient communication between frontend, backend, and external tools (LLMs, databases, shell, etc.).
-
Vue: A progressive JavaScript framework for building user interfaces. We use Vue to create intuitive browser-based dashboards for Agent monitoring, control, and interaction.
-
Docker: A containerization platform that standardizes the Agent’s runtime environment. It simplifies deployment, ensures consistency across development/production, and enables sandbox isolation for safe tool execution.
Supplementary tools include: LangChain (LLM integration), Pydantic (data validation), SQLite/PostgreSQL (state persistence), Redis (caching), and OpenAI/Cohere/Anthropic APIs (LLM backends). We will also cover shell command execution, file I/O operations, and sandboxing techniques to enhance Agent functionality and security.
1.4 How to Use This Book
This book follows a “theory-practice-source code” structure, with each chapter building on the previous one:
-
Part 1: Foundations (Chapters 2-3): Covers AI Agent core concepts, architectures, and prerequisite technology setup (development environment, tool installation).
-
Part 2: Core Technologies (Chapters 4-7): Dives into LangGraph workflow design, LLM integration, tool calling algorithms, and state management.
-
Part 3: Full-Stack Integration (Chapters 8-10): Builds the backend with FastAPI, frontend with Vue, and integrates shell/file operations.
-
Part 4: Deployment & Security (Chapters 11-13): Explores Docker containerization, sandboxing, cloud deployment, and performance optimization.
-
Part 5: Advanced Practice (Chapters 14-16): Covers multi-Agent collaboration, memory optimization, and real-world case studies (e.g., code assistant, data analyst Agent).
-
Appendices: Provides source code repositories, troubleshooting guides, and extended resources.
Each chapter includes hands-on examples with complete source code. We recommend readers run the code step-by-step, modify parameters, and experiment with extensions to deepen understanding. The book’s GitHub repository contains all code, configuration files, and Docker images for quick setup.
1.5 Acknowledgments
We would like to thank the open-source communities behind LangGraph, LangChain, FastAPI, Vue, and Docker—their tools have laid the foundation for this book. Thanks to AI researchers and engineers worldwide for advancing the field of intelligent Agents. Finally, gratitude to readers for choosing this book—we hope it empowers you to build innovative AI Agent systems.
Part 1: Foundations of AI Agents
Chapter 2: What Are AI Agents?
2.1 Definition and Core Characteristics
An AI Agent is an intelligent system that perceives its environment, makes decisions based on goals and context, executes actions, and adapts to feedback. Unlike static LLM applications that generate one-time outputs, AI Agents exhibit four core characteristics:
-
Autonomy: Ability to initiate and execute tasks without continuous human intervention (e.g., automatically searching for data to answer a query).
-
Proactivity: Takes initiative to pursue goals (e.g., identifying missing information and fetching it proactively).
-
Adaptability: Adjusts behavior based on new inputs or feedback (e.g., correcting errors after failed tool execution).
-
Interactivity: Communicates with the environment (LLMs, tools, users) and other Agents (e.g., collaborating with a code Agent to debug scripts).
AI Agents extend LLMs by adding “action capabilities”—they are not just “thinkers” but “doers” that connect LLMs to the real world via tools.
2.2 AI Agent Architectures
While AI Agent architectures vary by use case, most follow a modular design with four core components. We will focus on the Think-Plan-Act-Reflect architecture, which balances simplicity and expressiveness for production systems.
2.2.1 Core Components
-
Perception Module: Collects inputs from the environment—including user queries, tool outputs, file contents, shell feedback, and browser events. It normalizes diverse inputs into a unified format for the Agent to process.
-
Planning & Decision Module: The “brain” of the Agent, powered by LLMs and LangGraph. It interprets perception data, decomposes complex tasks into sub-tasks, prioritizes actions, and decides which tools to use (e.g., “Should I read a file first or run a shell command?”).
-
Action Execution Module: Executes decisions via tools—including LLM calls, shell commands, file operations, API requests, and database queries. It handles tool invocation, error handling, and result formatting.
-
Memory Module: Stores stateful information, including task history, intermediate results, user preferences, and tool metadata. Memory is critical for context-aware decisions (e.g., recalling a previous file path to avoid re-querying).
2.2.2 Workflow Example: A File-Analyzing Agent
Let’s walk through a simple workflow to illustrate how components collaborate:
-
User inputs a query: “Analyze the sales data in ./data/sales.csv and generate a summary.”
-
Perception Module: Reads the user query and validates the existence of the CSV file.
-
Planning Module: Decides to (1) read the CSV file, (2) extract key metrics (revenue, top products), (3) generate a summary via LLM.
-
Action Execution Module:
-
Executes a file read operation to load sales.csv.
-
Uses Pandas (via a tool) to compute metrics.
-
Calls GPT-4 to generate a summary from the metrics.
-
-
Memory Module: Stores the file path, computed metrics, and LLM summary for future reference.
-
Perception Module: Returns the summary to the user via the browser interface.
2.3 Types of AI Agents
AI Agents can be categorized by their capabilities and use cases:
-
Single-Tool Agents: Specialized in one task (e.g., a file reader Agent that only parses text files). Simple to build but limited in scope.
-
Multi-Tool Agents: Integrate multiple tools (LLMs, shell, files, APIs) to handle complex tasks (e.g., a data analyst Agent that reads files, runs SQL queries, and generates visualizations).
-
Conversational Agents: Focus on natural language interaction with users (e.g., chatbots with action capabilities like booking flights).
-
Autonomous Agents: Operate without human input once initialized (e.g., a monitoring Agent that scans logs, detects anomalies, and sends alerts).
-
Multi-Agent Systems: Multiple Agents collaborate to accomplish a goal (e.g., a code Agent + test Agent + deployment Agent working together on software delivery).
This book focuses on building multi-tool, conversational, and deployable Agents—with extensions to multi-Agent systems in advanced chapters.
Chapter 3: Environment Setup and Prerequisites
3.1 Development Environment Requirements
Before starting, ensure your system meets the following requirements:
-
Operating System: Windows 10+/macOS 12+/Linux (Ubuntu 20.04+ recommended; Docker works best on Linux/macOS).
-
Python: 3.10+ (required for LangGraph and FastAPI compatibility).
-
Node.js: 18+ (for Vue frontend development).
-
Docker & Docker Compose: 20.10+ (for containerization).
-
Hardware: 8GB+ RAM (16GB+ recommended for running LLMs locally), 20GB+ free disk space.
-
API Keys: OpenAI/Anthropic/Cohere API key (for cloud LLMs) or local LLM setup (e.g., Llama 3 via Ollama).
3.2 Installing Core Tools
3.2.1 Python and Virtual Environment
First, install Python 3.10+ from python.org. Then create a virtual environment to isolate dependencies:
# Create virtual environment
python -m venv agent-env
# Activate environment (Linux/macOS)
source agent-env/bin/activate
# Activate environment (Windows)
agent-env\Scripts\activate
# Install core Python packages
pip install langgraph langchain langchain-openai fastapi uvicorn pydantic python-multipart python-dotenv pandas
3.2.2 Node.js and Vue CLI
Install Node.js from nodejs.org (includes npm). Then install Vue CLI to build the frontend:
# Install Vue CLI globally
npm install -g @vue/cli
# Verify installation
vue --version # Should output 5.x+
3.2.3 Docker and Docker Compose
Install Docker following official guides:
Docker Installation |
Docker Compose Installation
Verify installation:
docker --version
docker compose --version
3.2.4 API Keys and Configuration
Create a .env file in your project root to store sensitive information (never commit this file to version control):
# .env file
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key (optional)
FASTAPI_HOST=0.0.0.0
FASTAPI_PORT=8000
VUE_APP_API_URL=http://localhost:8000/api
For local LLM setup (e.g., Llama 3), install Ollama from ollama.com and pull the model:
ollama pull llama3:8b
3.3 Project Structure
We will use a modular project structure to separate frontend, backend, Agent logic, and deployment configs. Create the following directories:
ai-agent-project/
├── agent/ # AI Agent core logic (LangGraph workflows, tools)
│ ├── workflows/ # LangGraph graph definitions
│ ├── tools/ # Tool implementations (file, shell, LLM)
│ ├── memory/ # Memory modules (in-memory, database)
│ └── __init__.py
├── backend/ # FastAPI backend
│ ├── api/ # API routes
│ ├── models/ # Pydantic models
│ ├── services/ # Business logic
│ ├── main.py # FastAPI entry point
│ └── __init__.py
├── frontend/ # Vue frontend
│ ├── public/
│ ├── src/ # Vue components, routes, API calls
│ ├── package.json
│ └── vue.config.js
├── docker/ # Docker configs
│ ├── agent.Dockerfile
│ ├── backend.Dockerfile
│ ├── frontend.Dockerfile
│ └── docker-compose.yml
├── data/ # Sample data, file storage
├── .env # Environment variables
├── .gitignore
└── README.md
This structure ensures scalability—you can easily add new tools, API routes, or frontend components without disrupting existing code.
3.4 Verifying the Setup
Let’s run a quick test to confirm all tools are working:
- Test Python Environment: Run a simple LangGraph script to create a minimal workflow:
`# test_langgraph.py
from langgraph.graph import Graph
from langchain_openai import ChatOpenAI
import os
from dotenv import load_dotenv
load_dotenv()
llm = ChatOpenAI(model=“gpt-3.5-turbo”, api_key=os.getenv(“OPENAI_API_KEY”))
Define a simple graph: prompt → LLM → output
graph = Graph()
def prompt_node(state):
return {“prompt”: f"Summarize: {state[‘input’]}"}
def llm_node(state):
response = llm.invoke(state[“prompt”])
return {“output”: response.content}
graph.add_node(“prompt”, prompt_node)
graph.add_node(“llm”, llm_node)
graph.add_edge(“prompt”, “llm”)
graph.set_entry_point(“prompt”)
graph.set_finish_point(“llm”)
Run the graph
app = graph.compile()
result = app.invoke({“input”: “AI Agents are intelligent systems that act autonomously.”})
print(“LLM Output:”, result[“output”])Run the script: python test_langgraph.py`. You should see a summary from GPT-3.5-turbo.
- Test FastAPI: Create a minimal backend endpoint:`# backend/main.py
from fastapi import FastAPI
from dotenv import load_dotenv
load_dotenv()
app = FastAPI(title=“AI Agent Backend”)
@app.get(“/api/health”)
async def health_check():
return {“status”: “healthy”, “message”: “AI Agent backend is running.”}
if name == “main”:
import uvicorn
uvicorn.run(app, host=os.getenv(“FASTAPI_HOST”), port=int(os.getenv(“FASTAPI_PORT”)))Run the backend: python backend/main.py. Visit http://localhost:8000/api/health` in your browser—you should see the health check response.
-
Test Vue: Create a new Vue project in the
frontenddirectory:cd frontend vue create . # Select "Default ([Vue 3] babel, eslint)" npm run serveVisithttp://localhost:8080—you should see the default Vue welcome page. -
Test Docker: Create a simple
Dockerfilein the project root:# Dockerfile FROM python:3.10-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "test_langgraph.py"]Build and run the container:docker build -t test-agent . && docker run test-agent. You should see the LLM output again.
If all tests pass, your environment is ready for AI Agent development!
Part 2: Core AI Agent Technologies
Chapter 4: LangGraph for Agent Workflow Orchestration
4.1 Introduction to LangGraph
LangGraph is a graph-based workflow orchestration framework designed explicitly for AI Agents. Built on LangChain, it addresses a key limitation of linear LLM pipelines: the ability to model cyclic, stateful, and branching workflows. Unlike traditional DAG (Directed Acyclic Graph) tools, LangGraph supports loops (critical for “reflect” steps) and dynamic state management—making it ideal for Agent decision-making.
Key advantages of LangGraph for AI Agents:
-
State Persistence: Maintains a shared state across workflow nodes (e.g., task history, tool outputs) for context-aware decisions.
-
Cyclic Workflows: Enables loops for “think-execute-reflect” cycles (e.g., re-trying a failed tool call or refining a plan based on feedback).
-
Modular Nodes: Each node is a reusable function (e.g., tool caller, planner, reflector) that can be swapped or extended.
-
Error Handling: Built-in support for node-level error handling and fallback actions (e.g., switching to a different LLM if one fails).
-
LangChain Compatibility: Seamlessly integrates with LangChain’s tools, LLMs, and memory modules.
4.2 LangGraph Core Concepts
4.2.1 Graph
A Graph is the top-level object that defines the Agent’s workflow. It consists of nodes, edges, entry points, and finish points. LangGraph supports two types of graphs:
-
StatefulGraph: For workflows with shared state (most Agent use cases).
-
StatelessGraph: For simple linear workflows without state persistence (rarely used for Agents).
We will focus on StatefulGraph for all examples in this book.
4.2.2 Nodes
Nodes are the building blocks of a LangGraph workflow—each node is a function that processes input state and returns updated state. There are three types of nodes:
-
Functional Nodes: Simple Python functions that take state as input and return state (e.g., prompt formatting, tool execution).
-
LLM Nodes: Wrappers for LLMs that generate text based on state (e.g., planner nodes that generate task lists).
-
Conditional Nodes: Nodes that determine the next edge to take based on state (e.g., “if tool execution failed, go to reflect node; else, go to finish node”).
Nodes are added to the graph withgraph.add_node(node_id, node_function).
4.2.3 State
State is a shared dictionary (or Pydantic model) that flows through the graph. It stores all context needed for the Agent to make decisions, including:
-
User input and goals.
-
Intermediate tool outputs and errors.
-
Plans and sub-tasks.
-
Memory snapshots.
Using Pydantic models for state is recommended for type safety and data validation. For example:
from pydantic import BaseModel
from typing import List, Optional
class AgentState(BaseModel):
user_query: str
plan: Optional[List[str]] = None
tool_outputs: Optional[dict] = None
error: Optional[str] = None
final_response: Optional[str] = None
4.2.4 Edges
Edges define the flow of state between nodes. There are two types of edges:
-
Direct Edges: Fixed transitions from one node to another (e.g.,
graph.add_edge("planner", "tool_caller")). -
Conditional Edges: Dynamic transitions based on state (e.g., “if state.error is not None, go to ‘reflect’; else, go to ‘finish’”).
4.2.5 Entry and Finish Points
The entry point is the node where the workflow starts (graph.set_entry_point("entry_node")). The finish point is the node where the workflow ends (graph.set_finish_point("finish_node")); multiple finish points can be defined for complex workflows.
4.3 Building a Simple Agent with LangGraph
Let’s build a minimal Agent that follows the “Plan-Tool-Execute” workflow. This Agent will: (1) Generate a plan from the user query, (2) Execute a tool (file read) based on the plan, (3) Generate a final response.
4.3.1 Step 1: Define State Model
# agent/workflows/plan_tool_execute.py
from pydantic import BaseModel
from typing import Optional, List
class PlanToolExecuteState(BaseModel):
user_query: str
plan: Optional[List[str]] = None
file_content: Optional[str] = None
final_response: Optional[str] = None
error: Optional[str] = None
4.3.2 Step 2: Implement Nodes
First, create a planner node that generates a plan using an LLM:
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
# Initialize LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.2)
# Planner node: Generate a plan to answer the user query
planner_prompt = ChatPromptTemplate.from_messages([
("system", "You are a planner. Given a user query, generate a 1-2 step plan to answer it. "
"If the query requires reading a file, include 'Read the specified file' in the plan. "
"Return only the plan as a list of strings."),
("human", "User Query: {user_query}")
])
planner_chain = planner_prompt | llm
def planner_node(state: PlanToolExecuteState) -> PlanToolExecuteState:
plan_str = planner_chain.invoke({"user_query": state.user_query}).content
# Parse plan string to list (simplified; add error handling in production)
plan = [step.strip() for step in plan_str.strip().split("\n") if step.strip()]
return PlanToolExecuteState(**state.dict(), plan=plan)
Next, create a tool caller node that reads a file (we’ll implement the file tool in Chapter 5):
import os
# Tool caller node: Execute file read tool
def file_read_tool(file_path: str) -> str:
"""Simple file read tool."""
if not os.path.exists(file_path):
raise FileNotFoundError(f"File {file_path} not found.")
with open(file_path, "r", encoding="utf-8") as f:
return f.read()
def tool_caller_node(state: PlanToolExecuteState) -> PlanToolExecuteState:
if not state.plan:
return PlanToolExecuteState(**state.dict(), error="No plan generated.")
# Extract file path from user query (simplified; use regex in production)
file_path = None
if "file" in state.user_query.lower():
# Assume user query includes a file path (e.g., "Read ./data/sales.csv")
import re
match = re.search(r"(\./[\w/\.]+)", state.user_query)
if match:
file_path = match.group(1)
if not file_path:
return PlanToolExecuteState(**state.dict(), error="No file path specified in query.")
try:
content = file_read_tool(file_path)
return PlanToolExecuteState(**state.dict(), file_content=content)
except Exception as e:
return PlanToolExecuteState(**state.dict(), error=str(e))
Finally, create a response generator node that creates a final answer from the file content:
# Response generator node: Generate final response
response_prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant. Use the file content to answer the user query. "
"If there's an error, inform the user. Keep the response concise."),
("human", "User Query: {user_query}\nFile Content: {file_content}\nError: {error}")
])
response_chain = response_prompt | llm
def response_generator_node(state: PlanToolExecuteState) -> PlanToolExecuteState:
response = response_chain.invoke({
"user_query": state.user_query,
"file_content": state.file_content or "No file content available.",
"error": state.error or "No errors."
}).content
return PlanToolExecuteState(**state.dict(), final_response=response)
4.3.3 Step 3: Assemble the Graph
from langgraph.graph import StatefulGraph
# Create stateful graph
graph = StatefulGraph(PlanToolExecuteState)
# Add nodes
graph.add_node("planner", planner_node)
graph.add_node("tool_caller", tool_caller_node)
graph.add_node("response_generator", response_generator_node)
# Add edges
graph.add_edge("planner", "tool_caller")
graph.add_edge("tool_caller", "response_generator")
# Set entry and finish points
graph.set_entry_point("planner")
graph.set_finish_point("response_generator")
# Compile the graph into an executable app
app = graph.compile()
4.3.4 Step 4: Run the Agent
# Test the Agent
if __name__ == "__main__":
# Create a sample file
sample_file = "./data/sample.txt"
os.makedirs("./data", exist_ok=True)
with open(sample_file, "w", encoding="utf-8") as f:
f.write("LangGraph is a powerful framework for AI Agent workflow orchestration. "
"It supports stateful workflows and cyclic execution.")
# Run the Agent with a user query
user_query = f"Read the file {sample_file} and summarize its content."
result = app.invoke({"user_query": user_query})
print("Final Response:", result.final_response)
Expected output: A summary of the sample file content, generated by the LLM.
4.4 Advanced LangGraph Features
4.4.1 Conditional Edges
Add a conditional edge to handle errors (e.g., skip tool execution if there’s a plan error). Modify the graph assembly:
def conditional_edge(state: PlanToolExecuteState) -> str:
"""Determine next node based on state."""
if state.error:
return "response_generator" # Skip tool execution if error exists
return "tool_caller"
# Replace direct edge with conditional edge
graph.remove_edge("planner", "tool_caller")
graph.add_conditional_edges(
source="planner",
condition=conditional_edge,
# Map condition returns to node IDs
mapping={
"tool_caller": "tool_caller",
"response_generator": "response_generator"
}
)
4.4.2 Cyclic Workflows (Reflect Loop)
Add a reflect node to retry failed tool calls. First, implement the reflect node:
# Reflect node: Decide whether to retry or exit on error
reflect_prompt = ChatPromptTemplate.from_messages([
("system", "You are a reflector. Given an error, decide if we should retry the tool call "
"or exit. Return 'retry' or 'exit'."),
("human", "Error: {error}\nPlan: {plan}")
])
reflect_chain = reflect_prompt | llm
def reflect_node(state: PlanToolExecuteState) -> PlanToolExecuteState:
decision = reflect_chain.invoke({
"error": state.error,
"plan": state.plan
}).content.lower()
return PlanToolExecuteState(**state.dict(), error=f"Decision: {decision}; {state.error}")
# Add reflect node to graph
graph.add_node("reflect", reflect_node)
# Update conditional edges for tool caller errors
def tool_caller_conditional(state: PlanToolExecuteState) -> str:
if state.error and "not found" in state.error.lower():
return "reflect"
return "response_generator"
# Update edges
graph.add_conditional_edges(
source="tool_caller",
condition=tool_caller_conditional,
mapping={
"reflect": "reflect",
"response_generator": "response_generator"
}
)
# Add edge from reflect to tool caller (retry loop)
graph.add_conditional_edges(
source="reflect",
condition=lambda state: "retry" in state.error.lower() if state.error else "exit",
mapping={
True: "tool_caller",
False: "response_generator"
}
)
This creates a loop: if the file is not found, the Agent reflects, decides whether to retry, and either re-runs the tool caller or exits.
4.4.3 Parallel Execution
LangGraph supports parallel node execution for tasks like running multiple tools simultaneously. For example, read two files in parallel:
from langgraph.graph import parallel
# Define two file read nodes
def file_read_tool_1(state: PlanToolExecuteState) -> PlanToolExecuteState:
content = file_read_tool("./data/file1.txt")
return PlanToolExecuteState(**state.dict(), file_content=content + "\n" + state.file_content)
def file_read_tool_2(state: PlanToolExecuteState) -> PlanToolExecuteState:
content = file_read_tool("./data/file2.txt")
return PlanToolExecuteState(**state.dict(), file_content=state.file_content + "\n" + content)
# Add parallel nodes
graph.add_node("read_file_1", file_read_tool_1)
graph.add_node("read_file_2", file_read_tool_2)
# Replace tool_caller with parallel execution
graph.remove_node("tool_caller")
graph.add_edge("planner", parallel(["read_file_1", "read_file_2"]))
graph.add_edge(parallel(["read_file_1", "read_file_2"]), "response_generator")
4.5 Best Practices for LangGraph Workflows
-
Keep Nodes Small and Reusable: Each node should handle one task (e.g., planning, tool execution) to simplify testing and maintenance.
-
Use Pydantic for State: Type safety prevents bugs and makes state manipulation clearer.
-
Limit Cycle Depth: Add a max retry count to cyclic workflows to avoid infinite loops.
-
Log State Changes: Add logging to nodes to track state transitions and debug workflows.
-
Test Nodes In isolation: Test individual nodes before assembling the graph to catch errors early.
Chapter 5: Tool Integration for AI Agents
5.1 Tool Design Principles for AI Agents
Tools are the “hands and feet” of AI Agents—they enable Agents to interact with the external world (files, shell, APIs, databases). Effective tool design is critical for Agent usability and reliability. Follow these principles:
-
Single Responsibility: Each tool should perform one specific action (e.g.,
file_read,shell_execute,api_call)—avoid multi-purpose tools that complicate decision-making. -
Clear Input/Output Schemas: Define explicit input parameters (e.g.,
file_pathfor file read) and output formats (e.g., string content, JSON) to help the Agent parse results. -
Error Handling: Return structured errors (e.g.,
FileNotFoundError,PermissionDeniedError) instead of crashing—enabling the Agent to handle failures gracefully. -
Idempotency: Ensure tools can be re-run safely (e.g., reading a file multiple times doesn’t change the file) to support retry loops.
-
Security: Sandbox high-risk tools (e.g., shell execution) to prevent malicious actions (covered in Chapter 12).
5.2 Tool Abstraction with LangChain
LangChain provides a BaseTool abstract class that standardizes tool implementation. We use this class to wrap custom tools, enabling seamless integration with LangGraph and LLMs. The core methods of BaseTool are:
-
_run(): Synchronous tool execution (for simple tools). -
_arun(): Asynchronous tool execution (for I/O-bound tasks like API calls). -
description: A string describing the tool’s purpose, inputs, and outputs—critical for the Agent to decide when to use the tool.
5.3 Implementing Core Tools
5.3.1 File Operation Tools
Implement tools for reading, writing, and listing files—essential for document processing Agents. We’ll use LangChain’s BaseTool and Pydantic for input validation.
# agent/tools/file_tools.py
from langchain.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Optional, Type
import os
# Input schema for file read tool
class FileReadInput(BaseModel):
file_path: str = Field(description="Path to the file to read (e.g., ./data/sales.csv)")
class FileReadTool(BaseTool):
name: str = "file_read"
description: str = "Reads the content of a text file. Use this when you need to access data from a file."
args_schema: Type[BaseModel] = FileReadInput
def _run(self, file_path: str) -> str:
"""Synchronous file read."""
try:
if not os.path.exists(file_path):
return f"Error: File '{file_path}' not found."
if not os.path.isfile(file_path):
return f"Error: '{file_path}' is not a file."
with open(file_path, "r", encoding="utf-8") as f:
content = f.read()
return f"Successfully read file '{file_path}'. Content:\n{content[:1000]}..." # Truncate long content
except PermissionError:
return f"Error: Permission denied when reading '{file_path}'."
except Exception as e:
return f"Error reading file: {str(e)}"
async def _arun(self, file_path: str) -> str:
"""Asynchronous file read (for async workflows)."""
return self._run(file_path) # File I/O is sync; use aiofiles for true async
# Input schema for file write tool
class FileWriteInput(BaseModel):
file_path: str = Field(description="Path to the file to write (e.g., ./output/result.txt)")
content: str = Field(description="Content to write to the file")
overwrite: bool = Field(default=False, description="Whether to overwrite the file if it exists")
class FileWriteTool(BaseTool):
name: str = "file_write"
description: str = "Writes content to a text file. Use this when you need to save results to a file."
args_schema: Type[BaseModel] = FileWriteInput
def _run(self, file_path: str, content: str, overwrite: bool = False) -> str:
try:
# Create directory if it doesn't exist
dir_path = os.path.dirname(file_path)
if dir_path and not os.path.exists(dir_path):
os.makedirs(dir_path)
# Check if file exists
if os.path.exists(file_path) and not overwrite:
return f"Error: File '{file_path}' already exists. Set overwrite=True to replace it."
with open(file_path, "w" if overwrite else "x", encoding="utf-8") as f:
f.write(content)
return f"Successfully wrote content to '{file_path}'."
except PermissionError:
return f"Error: Permission denied when writing to '{file_path}'."
except FileExistsError:
return f"Error: File '{file_path}' already exists. Set overwrite=True to replace it."
except Exception as e:
return f"Error writing file: {str(e)}"
async def _arun(self, file_path: str, content: str, overwrite: bool = False) -> str:
return self._run(file_path, content, overwrite)
5.3.2 Shell Execution Tool
Shell tools enable Agents to run command-line commands—useful for system administration, data processing (e.g., csvcut), and automation. Note: Shell tools are high-risk—always sandbox them (Chapter 12).
# agent/tools/shell_tools.py
from langchain.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type
import subprocess
class ShellExecuteInput(BaseModel):
command: str = Field(description="Shell command to execute (e.g., 'ls ./data', 'python script.py')")
timeout: int = Field(default=10, description="Timeout for the command in seconds (prevents hanging)")
class ShellExecuteTool(BaseTool):
name: str = "shell_execute"
description: str = "Executes a shell command. Use this for system operations, file system queries, or running scripts. "
"Avoid dangerous commands (rm -rf, sudo) unless explicitly authorized."
args_schema: Type[BaseModel] = ShellExecuteInput
def _run(self, command: str, timeout: int = 10) -> str:
try:
# Run command and capture output
result = subprocess.run(
command,
shell=True,
check=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
timeout=timeout
)
return f"Command succeeded. Stdout:\n{result.stdout}\nStderr:\n{result.stderr}"
except subprocess.CalledProcessError as e:
return f"Command failed with exit code {e.returncode}. Stdout:\n{e.stdout}\nStderr:\n{e.stderr}"
except subprocess.TimeoutExpired:
return f"Error: Command timed out after {timeout} seconds."
except Exception as e:
return f"Error executing command: {str(e)}"
async def _arun(self, command: str, timeout: int = 10) -> str:
# Run sync command in async context (use asyncio.subprocess for true async)
return self._run(command, timeout)
更多推荐

所有评论(0)