Building AI Agents In Action Architectures, Algorithms, and Source Code,LangGraph,FastAPI, Vue,Dockr

禅与计算机程序设计艺术

596人浏览 · 2026-01-17 05:05:14

禅与计算机程序设计艺术 · 2026-01-17 05:05:14 发布

Building AI Agents In Action: Architectures, Algorithms, and Source Code, Using LangGraph, FastAPI, Vue, Docker (deploy, sandbox, browser-use, shell, file ops …)

文章目录

Building AI Agents In Action: Architectures, Algorithms, and Source Code, Using LangGraph, FastAPI, Vue, Docker (deploy, sandbox, browser-use, shell, file ops ...)
Preface
Part 1: Foundations of AI Agents
Define a simple graph: prompt → LLM → output
Run the graph
Part 2: Core AI Agent Technologies

Preface

1.1 The Rise of AI Agents

In the era of large language models (LLMs), AI Agents have emerged as the next-generation intelligent systems that bridge the gap between static model outputs and dynamic real-world interactions. Unlike traditional AI applications that follow fixed workflows, AI Agents possess autonomy, adaptability, and interactivity—they can perceive environmental inputs, make decisions, execute actions, and learn from feedback to accomplish complex tasks independently. From automated code generation and intelligent document processing to multi-step data analysis and cross-system orchestration, AI Agents are reshaping software development, business operations, and daily productivity tools.

This book focuses on practical AI Agent development, integrating cutting-edge tools and frameworks to help readers turn theoretical concepts into actionable systems. We will leverage LangGraph for workflow orchestration, FastAPI for backend service construction, Vue for frontend interaction, and Docker for environment encapsulation and deployment—covering the entire lifecycle of AI Agents, including architecture design, algorithm optimization, source code implementation, sandbox isolation, browser-based access, shell interaction, and file operation integration.

1.2 Who This Book Is For

This book is intended for developers, engineers, and tech enthusiasts who want to build production-grade AI Agents. Readers should have the following foundational knowledge:

Basic Python programming skills (familiar with functions, classes, and asyncio).
Elementary understanding of LLMs (e.g., GPT, Claude, open-source models like Llama 3) and their application scenarios.
Basic web development concepts (HTTP protocols, frontend-backend separation).
Optional: Familiarity with Docker, databases, and command-line operations (will be supplemented in relevant chapters).

Whether you are a backend developer looking to integrate AI capabilities, a frontend engineer exploring intelligent interaction, or a researcher aiming to prototype AI Agent systems, this book provides a systematic guide from entry to advanced practice.

1.3 Technology Stack Overview

The core technology stack adopted in this book is carefully selected to balance ease of use, scalability, and production readiness. Here’s a breakdown of key tools:

LangGraph: A workflow orchestration framework for AI Agents, built on LangChain. It enables stateful, cyclic, and branching workflows—critical for modeling Agent decision-making processes (e.g., “think-execute-reflect” loops).
FastAPI: A high-performance, easy-to-use Python backend framework. It powers the Agent’s API layer, enabling efficient communication between frontend, backend, and external tools (LLMs, databases, shell, etc.).
Vue: A progressive JavaScript framework for building user interfaces. We use Vue to create intuitive browser-based dashboards for Agent monitoring, control, and interaction.
Docker: A containerization platform that standardizes the Agent’s runtime environment. It simplifies deployment, ensures consistency across development/production, and enables sandbox isolation for safe tool execution.

Supplementary tools include: LangChain (LLM integration), Pydantic (data validation), SQLite/PostgreSQL (state persistence), Redis (caching), and OpenAI/Cohere/Anthropic APIs (LLM backends). We will also cover shell command execution, file I/O operations, and sandboxing techniques to enhance Agent functionality and security.

1.4 How to Use This Book

This book follows a “theory-practice-source code” structure, with each chapter building on the previous one:

Part 1: Foundations (Chapters 2-3): Covers AI Agent core concepts, architectures, and prerequisite technology setup (development environment, tool installation).
Part 2: Core Technologies (Chapters 4-7): Dives into LangGraph workflow design, LLM integration, tool calling algorithms, and state management.
Part 3: Full-Stack Integration (Chapters 8-10): Builds the backend with FastAPI, frontend with Vue, and integrates shell/file operations.
Part 4: Deployment & Security (Chapters 11-13): Explores Docker containerization, sandboxing, cloud deployment, and performance optimization.
Part 5: Advanced Practice (Chapters 14-16): Covers multi-Agent collaboration, memory optimization, and real-world case studies (e.g., code assistant, data analyst Agent).
Appendices: Provides source code repositories, troubleshooting guides, and extended resources.

Each chapter includes hands-on examples with complete source code. We recommend readers run the code step-by-step, modify parameters, and experiment with extensions to deepen understanding. The book’s GitHub repository contains all code, configuration files, and Docker images for quick setup.

1.5 Acknowledgments

We would like to thank the open-source communities behind LangGraph, LangChain, FastAPI, Vue, and Docker—their tools have laid the foundation for this book. Thanks to AI researchers and engineers worldwide for advancing the field of intelligent Agents. Finally, gratitude to readers for choosing this book—we hope it empowers you to build innovative AI Agent systems.

Part 1: Foundations of AI Agents

Chapter 2: What Are AI Agents?

2.1 Definition and Core Characteristics

An AI Agent is an intelligent system that perceives its environment, makes decisions based on goals and context, executes actions, and adapts to feedback. Unlike static LLM applications that generate one-time outputs, AI Agents exhibit four core characteristics:

Autonomy: Ability to initiate and execute tasks without continuous human intervention (e.g., automatically searching for data to answer a query).
Proactivity: Takes initiative to pursue goals (e.g., identifying missing information and fetching it proactively).
Adaptability: Adjusts behavior based on new inputs or feedback (e.g., correcting errors after failed tool execution).
Interactivity: Communicates with the environment (LLMs, tools, users) and other Agents (e.g., collaborating with a code Agent to debug scripts).

AI Agents extend LLMs by adding “action capabilities”—they are not just “thinkers” but “doers” that connect LLMs to the real world via tools.

2.2 AI Agent Architectures

While AI Agent architectures vary by use case, most follow a modular design with four core components. We will focus on the Think-Plan-Act-Reflect architecture, which balances simplicity and expressiveness for production systems.

2.2.1 Core Components

Perception Module: Collects inputs from the environment—including user queries, tool outputs, file contents, shell feedback, and browser events. It normalizes diverse inputs into a unified format for the Agent to process.
Planning & Decision Module: The “brain” of the Agent, powered by LLMs and LangGraph. It interprets perception data, decomposes complex tasks into sub-tasks, prioritizes actions, and decides which tools to use (e.g., “Should I read a file first or run a shell command?”).
Action Execution Module: Executes decisions via tools—including LLM calls, shell commands, file operations, API requests, and database queries. It handles tool invocation, error handling, and result formatting.
Memory Module: Stores stateful information, including task history, intermediate results, user preferences, and tool metadata. Memory is critical for context-aware decisions (e.g., recalling a previous file path to avoid re-querying).

2.2.2 Workflow Example: A File-Analyzing Agent

Let’s walk through a simple workflow to illustrate how components collaborate:

User inputs a query: “Analyze the sales data in ./data/sales.csv and generate a summary.”
Perception Module: Reads the user query and validates the existence of the CSV file.
Planning Module: Decides to (1) read the CSV file, (2) extract key metrics (revenue, top products), (3) generate a summary via LLM.
Action Execution Module:
- Executes a file read operation to load sales.csv.
- Uses Pandas (via a tool) to compute metrics.
- Calls GPT-4 to generate a summary from the metrics.
Memory Module: Stores the file path, computed metrics, and LLM summary for future reference.
Perception Module: Returns the summary to the user via the browser interface.

2.3 Types of AI Agents

AI Agents can be categorized by their capabilities and use cases:

Single-Tool Agents: Specialized in one task (e.g., a file reader Agent that only parses text files). Simple to build but limited in scope.
Multi-Tool Agents: Integrate multiple tools (LLMs, shell, files, APIs) to handle complex tasks (e.g., a data analyst Agent that reads files, runs SQL queries, and generates visualizations).
Conversational Agents: Focus on natural language interaction with users (e.g., chatbots with action capabilities like booking flights).
Autonomous Agents: Operate without human input once initialized (e.g., a monitoring Agent that scans logs, detects anomalies, and sends alerts).
Multi-Agent Systems: Multiple Agents collaborate to accomplish a goal (e.g., a code Agent + test Agent + deployment Agent working together on software delivery).

This book focuses on building multi-tool, conversational, and deployable Agents—with extensions to multi-Agent systems in advanced chapters.

Chapter 3: Environment Setup and Prerequisites

3.1 Development Environment Requirements

Before starting, ensure your system meets the following requirements:

Operating System: Windows 10+/macOS 12+/Linux (Ubuntu 20.04+ recommended; Docker works best on Linux/macOS).
Python: 3.10+ (required for LangGraph and FastAPI compatibility).
Node.js: 18+ (for Vue frontend development).
Docker & Docker Compose: 20.10+ (for containerization).
Hardware: 8GB+ RAM (16GB+ recommended for running LLMs locally), 20GB+ free disk space.
API Keys: OpenAI/Anthropic/Cohere API key (for cloud LLMs) or local LLM setup (e.g., Llama 3 via Ollama).

3.2 Installing Core Tools

3.2.1 Python and Virtual Environment

First, install Python 3.10+ from python.org. Then create a virtual environment to isolate dependencies:


# Create virtual environment
python -m venv agent-env

# Activate environment (Linux/macOS)
source agent-env/bin/activate

# Activate environment (Windows)
agent-env\Scripts\activate

# Install core Python packages
pip install langgraph langchain langchain-openai fastapi uvicorn pydantic python-multipart python-dotenv pandas

3.2.2 Node.js and Vue CLI

Install Node.js from nodejs.org (includes npm). Then install Vue CLI to build the frontend:


# Install Vue CLI globally
npm install -g @vue/cli

# Verify installation
vue --version  # Should output 5.x+

3.2.3 Docker and Docker Compose

Install Docker following official guides:
Docker Installation |
Docker Compose Installation

Verify installation:


docker --version
docker compose --version

3.2.4 API Keys and Configuration

Create a .env file in your project root to store sensitive information (never commit this file to version control):


# .env file
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key (optional)
FASTAPI_HOST=0.0.0.0
FASTAPI_PORT=8000
VUE_APP_API_URL=http://localhost:8000/api

For local LLM setup (e.g., Llama 3), install Ollama from ollama.com and pull the model:


ollama pull llama3:8b

3.3 Project Structure

We will use a modular project structure to separate frontend, backend, Agent logic, and deployment configs. Create the following directories:


ai-agent-project/
├── agent/                  # AI Agent core logic (LangGraph workflows, tools)
│   ├── workflows/          # LangGraph graph definitions
│   ├── tools/              # Tool implementations (file, shell, LLM)
│   ├── memory/             # Memory modules (in-memory, database)
│   └── __init__.py
├── backend/                # FastAPI backend
│   ├── api/                # API routes
│   ├── models/             # Pydantic models
│   ├── services/           # Business logic
│   ├── main.py             # FastAPI entry point
│   └── __init__.py
├── frontend/               # Vue frontend
│   ├── public/
│   ├── src/                # Vue components, routes, API calls
│   ├── package.json
│   └── vue.config.js
├── docker/                 # Docker configs
│   ├── agent.Dockerfile
│   ├── backend.Dockerfile
│   ├── frontend.Dockerfile
│   └── docker-compose.yml
├── data/                   # Sample data, file storage
├── .env                    # Environment variables
├── .gitignore
└── README.md

This structure ensures scalability—you can easily add new tools, API routes, or frontend components without disrupting existing code.

3.4 Verifying the Setup

Let’s run a quick test to confirm all tools are working:

Test Python Environment: Run a simple LangGraph script to create a minimal workflow:
`# test_langgraph.py
from langgraph.graph import Graph
from langchain_openai import ChatOpenAI
import os
from dotenv import load_dotenv

load_dotenv()
llm = ChatOpenAI(model=“gpt-3.5-turbo”, api_key=os.getenv(“OPENAI_API_KEY”))

Define a simple graph: prompt → LLM → output

graph = Graph()

def prompt_node(state):
return {“prompt”: f"Summarize: {state[‘input’]}"}

def llm_node(state):
response = llm.invoke(state[“prompt”])
return {“output”: response.content}

graph.add_node(“prompt”, prompt_node)
graph.add_node(“llm”, llm_node)
graph.add_edge(“prompt”, “llm”)
graph.set_entry_point(“prompt”)
graph.set_finish_point(“llm”)

Run the graph

app = graph.compile()
result = app.invoke({“input”: “AI Agents are intelligent systems that act autonomously.”})
print(“LLM Output:”, result[“output”])Run the script: python test_langgraph.py`. You should see a summary from GPT-3.5-turbo.

Test FastAPI: Create a minimal backend endpoint:`# backend/main.py
from fastapi import FastAPI
from dotenv import load_dotenv

load_dotenv()
app = FastAPI(title=“AI Agent Backend”)

@app.get(“/api/health”)
async def health_check():
return {“status”: “healthy”, “message”: “AI Agent backend is running.”}

if name == “main”:
import uvicorn
uvicorn.run(app, host=os.getenv(“FASTAPI_HOST”), port=int(os.getenv(“FASTAPI_PORT”)))Run the backend: python backend/main.py. Visit http://localhost:8000/api/health` in your browser—you should see the health check response.

Test Vue: Create a new Vue project in the frontend directory:
cd frontend vue create . # Select "Default ([Vue 3] babel, eslint)" npm run serveVisit http://localhost:8080—you should see the default Vue welcome page.
Test Docker: Create a simple Dockerfile in the project root:
# Dockerfile FROM python:3.10-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "test_langgraph.py"]Build and run the container:
docker build -t test-agent . && docker run test-agent. You should see the LLM output again.

If all tests pass, your environment is ready for AI Agent development!

Part 2: Core AI Agent Technologies

Chapter 4: LangGraph for Agent Workflow Orchestration

4.1 Introduction to LangGraph

LangGraph is a graph-based workflow orchestration framework designed explicitly for AI Agents. Built on LangChain, it addresses a key limitation of linear LLM pipelines: the ability to model cyclic, stateful, and branching workflows. Unlike traditional DAG (Directed Acyclic Graph) tools, LangGraph supports loops (critical for “reflect” steps) and dynamic state management—making it ideal for Agent decision-making.

Key advantages of LangGraph for AI Agents:

State Persistence: Maintains a shared state across workflow nodes (e.g., task history, tool outputs) for context-aware decisions.
Cyclic Workflows: Enables loops for “think-execute-reflect” cycles (e.g., re-trying a failed tool call or refining a plan based on feedback).
Modular Nodes: Each node is a reusable function (e.g., tool caller, planner, reflector) that can be swapped or extended.
Error Handling: Built-in support for node-level error handling and fallback actions (e.g., switching to a different LLM if one fails).
LangChain Compatibility: Seamlessly integrates with LangChain’s tools, LLMs, and memory modules.

4.2 LangGraph Core Concepts

4.2.1 Graph

A Graph is the top-level object that defines the Agent’s workflow. It consists of nodes, edges, entry points, and finish points. LangGraph supports two types of graphs:

StatefulGraph: For workflows with shared state (most Agent use cases).
StatelessGraph: For simple linear workflows without state persistence (rarely used for Agents).

We will focus on StatefulGraph for all examples in this book.

4.2.2 Nodes

Nodes are the building blocks of a LangGraph workflow—each node is a function that processes input state and returns updated state. There are three types of nodes:

Functional Nodes: Simple Python functions that take state as input and return state (e.g., prompt formatting, tool execution).
LLM Nodes: Wrappers for LLMs that generate text based on state (e.g., planner nodes that generate task lists).
Conditional Nodes: Nodes that determine the next edge to take based on state (e.g., “if tool execution failed, go to reflect node; else, go to finish node”).

Nodes are added to the graph withgraph.add_node(node_id, node_function).

4.2.3 State

State is a shared dictionary (or Pydantic model) that flows through the graph. It stores all context needed for the Agent to make decisions, including:

User input and goals.
Intermediate tool outputs and errors.
Plans and sub-tasks.
Memory snapshots.

Using Pydantic models for state is recommended for type safety and data validation. For example:


from pydantic import BaseModel
from typing import List, Optional

class AgentState(BaseModel):
    user_query: str
    plan: Optional[List[str]] = None
    tool_outputs: Optional[dict] = None
    error: Optional[str] = None
    final_response: Optional[str] = None

4.2.4 Edges

Edges define the flow of state between nodes. There are two types of edges:

Direct Edges: Fixed transitions from one node to another (e.g., graph.add_edge("planner", "tool_caller")).
Conditional Edges: Dynamic transitions based on state (e.g., “if state.error is not None, go to ‘reflect’; else, go to ‘finish’”).

4.2.5 Entry and Finish Points

The entry point is the node where the workflow starts (graph.set_entry_point("entry_node")). The finish point is the node where the workflow ends (graph.set_finish_point("finish_node")); multiple finish points can be defined for complex workflows.

4.3 Building a Simple Agent with LangGraph

Let’s build a minimal Agent that follows the “Plan-Tool-Execute” workflow. This Agent will: (1) Generate a plan from the user query, (2) Execute a tool (file read) based on the plan, (3) Generate a final response.

4.3.1 Step 1: Define State Model


# agent/workflows/plan_tool_execute.py
from pydantic import BaseModel
from typing import Optional, List

class PlanToolExecuteState(BaseModel):
    user_query: str
    plan: Optional[List[str]] = None
    file_content: Optional[str] = None
    final_response: Optional[str] = None
    error: Optional[str] = None

4.3.2 Step 2: Implement Nodes

First, create a planner node that generates a plan using an LLM:


from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

# Initialize LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.2)

# Planner node: Generate a plan to answer the user query
planner_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a planner. Given a user query, generate a 1-2 step plan to answer it. "
               "If the query requires reading a file, include 'Read the specified file' in the plan. "
               "Return only the plan as a list of strings."),
    ("human", "User Query: {user_query}")
])

planner_chain = planner_prompt | llm

def planner_node(state: PlanToolExecuteState) -> PlanToolExecuteState:
    plan_str = planner_chain.invoke({"user_query": state.user_query}).content
    # Parse plan string to list (simplified; add error handling in production)
    plan = [step.strip() for step in plan_str.strip().split("\n") if step.strip()]
    return PlanToolExecuteState(**state.dict(), plan=plan)

Next, create a tool caller node that reads a file (we’ll implement the file tool in Chapter 5):


import os

# Tool caller node: Execute file read tool
def file_read_tool(file_path: str) -> str:
    """Simple file read tool."""
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"File {file_path} not found.")
    with open(file_path, "r", encoding="utf-8") as f:
        return f.read()

def tool_caller_node(state: PlanToolExecuteState) -> PlanToolExecuteState:
    if not state.plan:
        return PlanToolExecuteState(**state.dict(), error="No plan generated.")
    
    # Extract file path from user query (simplified; use regex in production)
    file_path = None
    if "file" in state.user_query.lower():
        # Assume user query includes a file path (e.g., "Read ./data/sales.csv")
        import re
        match = re.search(r"(\./[\w/\.]+)", state.user_query)
        if match:
            file_path = match.group(1)
    
    if not file_path:
        return PlanToolExecuteState(**state.dict(), error="No file path specified in query.")
    
    try:
        content = file_read_tool(file_path)
        return PlanToolExecuteState(**state.dict(), file_content=content)
    except Exception as e:
        return PlanToolExecuteState(**state.dict(), error=str(e))

Finally, create a response generator node that creates a final answer from the file content:


# Response generator node: Generate final response
response_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use the file content to answer the user query. "
               "If there's an error, inform the user. Keep the response concise."),
    ("human", "User Query: {user_query}\nFile Content: {file_content}\nError: {error}")
])

response_chain = response_prompt | llm

def response_generator_node(state: PlanToolExecuteState) -> PlanToolExecuteState:
    response = response_chain.invoke({
        "user_query": state.user_query,
        "file_content": state.file_content or "No file content available.",
        "error": state.error or "No errors."
    }).content
    return PlanToolExecuteState(**state.dict(), final_response=response)

4.3.3 Step 3: Assemble the Graph


from langgraph.graph import StatefulGraph

# Create stateful graph
graph = StatefulGraph(PlanToolExecuteState)

# Add nodes
graph.add_node("planner", planner_node)
graph.add_node("tool_caller", tool_caller_node)
graph.add_node("response_generator", response_generator_node)

# Add edges
graph.add_edge("planner", "tool_caller")
graph.add_edge("tool_caller", "response_generator")

# Set entry and finish points
graph.set_entry_point("planner")
graph.set_finish_point("response_generator")

# Compile the graph into an executable app
app = graph.compile()

4.3.4 Step 4: Run the Agent


# Test the Agent
if __name__ == "__main__":
    # Create a sample file
    sample_file = "./data/sample.txt"
    os.makedirs("./data", exist_ok=True)
    with open(sample_file, "w", encoding="utf-8") as f:
        f.write("LangGraph is a powerful framework for AI Agent workflow orchestration. "
                "It supports stateful workflows and cyclic execution.")
    
    # Run the Agent with a user query
    user_query = f"Read the file {sample_file} and summarize its content."
    result = app.invoke({"user_query": user_query})
    
    print("Final Response:", result.final_response)

Expected output: A summary of the sample file content, generated by the LLM.

4.4 Advanced LangGraph Features

4.4.1 Conditional Edges

Add a conditional edge to handle errors (e.g., skip tool execution if there’s a plan error). Modify the graph assembly:


def conditional_edge(state: PlanToolExecuteState) -> str:
    """Determine next node based on state."""
    if state.error:
        return "response_generator"  # Skip tool execution if error exists
    return "tool_caller"

# Replace direct edge with conditional edge
graph.remove_edge("planner", "tool_caller")
graph.add_conditional_edges(
    source="planner",
    condition=conditional_edge,
    # Map condition returns to node IDs
    mapping={
        "tool_caller": "tool_caller",
        "response_generator": "response_generator"
    }
)

4.4.2 Cyclic Workflows (Reflect Loop)

Add a reflect node to retry failed tool calls. First, implement the reflect node:


# Reflect node: Decide whether to retry or exit on error
reflect_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a reflector. Given an error, decide if we should retry the tool call "
               "or exit. Return 'retry' or 'exit'."),
    ("human", "Error: {error}\nPlan: {plan}")
])

reflect_chain = reflect_prompt | llm

def reflect_node(state: PlanToolExecuteState) -> PlanToolExecuteState:
    decision = reflect_chain.invoke({
        "error": state.error,
        "plan": state.plan
    }).content.lower()
    return PlanToolExecuteState(**state.dict(), error=f"Decision: {decision}; {state.error}")

# Add reflect node to graph
graph.add_node("reflect", reflect_node)

# Update conditional edges for tool caller errors
def tool_caller_conditional(state: PlanToolExecuteState) -> str:
    if state.error and "not found" in state.error.lower():
        return "reflect"
    return "response_generator"

# Update edges
graph.add_conditional_edges(
    source="tool_caller",
    condition=tool_caller_conditional,
    mapping={
        "reflect": "reflect",
        "response_generator": "response_generator"
    }
)

# Add edge from reflect to tool caller (retry loop)
graph.add_conditional_edges(
    source="reflect",
    condition=lambda state: "retry" in state.error.lower() if state.error else "exit",
    mapping={
        True: "tool_caller",
        False: "response_generator"
    }
)

This creates a loop: if the file is not found, the Agent reflects, decides whether to retry, and either re-runs the tool caller or exits.

4.4.3 Parallel Execution

LangGraph supports parallel node execution for tasks like running multiple tools simultaneously. For example, read two files in parallel:


from langgraph.graph import parallel

# Define two file read nodes
def file_read_tool_1(state: PlanToolExecuteState) -> PlanToolExecuteState:
    content = file_read_tool("./data/file1.txt")
    return PlanToolExecuteState(**state.dict(), file_content=content + "\n" + state.file_content)

def file_read_tool_2(state: PlanToolExecuteState) -> PlanToolExecuteState:
    content = file_read_tool("./data/file2.txt")
    return PlanToolExecuteState(**state.dict(), file_content=state.file_content + "\n" + content)

# Add parallel nodes
graph.add_node("read_file_1", file_read_tool_1)
graph.add_node("read_file_2", file_read_tool_2)

# Replace tool_caller with parallel execution
graph.remove_node("tool_caller")
graph.add_edge("planner", parallel(["read_file_1", "read_file_2"]))
graph.add_edge(parallel(["read_file_1", "read_file_2"]), "response_generator")

4.5 Best Practices for LangGraph Workflows

Keep Nodes Small and Reusable: Each node should handle one task (e.g., planning, tool execution) to simplify testing and maintenance.
Use Pydantic for State: Type safety prevents bugs and makes state manipulation clearer.
Limit Cycle Depth: Add a max retry count to cyclic workflows to avoid infinite loops.
Log State Changes: Add logging to nodes to track state transitions and debug workflows.
Test Nodes In isolation: Test individual nodes before assembling the graph to catch errors early.

Chapter 5: Tool Integration for AI Agents

5.1 Tool Design Principles for AI Agents

Tools are the “hands and feet” of AI Agents—they enable Agents to interact with the external world (files, shell, APIs, databases). Effective tool design is critical for Agent usability and reliability. Follow these principles:

Single Responsibility: Each tool should perform one specific action (e.g., file_read, shell_execute, api_call)—avoid multi-purpose tools that complicate decision-making.
Clear Input/Output Schemas: Define explicit input parameters (e.g., file_path for file read) and output formats (e.g., string content, JSON) to help the Agent parse results.
Error Handling: Return structured errors (e.g.,FileNotFoundError, PermissionDeniedError) instead of crashing—enabling the Agent to handle failures gracefully.
Idempotency: Ensure tools can be re-run safely (e.g., reading a file multiple times doesn’t change the file) to support retry loops.
Security: Sandbox high-risk tools (e.g., shell execution) to prevent malicious actions (covered in Chapter 12).

5.2 Tool Abstraction with LangChain

LangChain provides a BaseTool abstract class that standardizes tool implementation. We use this class to wrap custom tools, enabling seamless integration with LangGraph and LLMs. The core methods of BaseTool are:

_run(): Synchronous tool execution (for simple tools).
_arun(): Asynchronous tool execution (for I/O-bound tasks like API calls).
description: A string describing the tool’s purpose, inputs, and outputs—critical for the Agent to decide when to use the tool.

5.3 Implementing Core Tools

5.3.1 File Operation Tools

Implement tools for reading, writing, and listing files—essential for document processing Agents. We’ll use LangChain’s BaseTool and Pydantic for input validation.


# agent/tools/file_tools.py
from langchain.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Optional, Type
import os

# Input schema for file read tool
class FileReadInput(BaseModel):
    file_path: str = Field(description="Path to the file to read (e.g., ./data/sales.csv)")

class FileReadTool(BaseTool):
    name: str = "file_read"
    description: str = "Reads the content of a text file. Use this when you need to access data from a file."
    args_schema: Type[BaseModel] = FileReadInput

    def _run(self, file_path: str) -> str:
        """Synchronous file read."""
        try:
            if not os.path.exists(file_path):
                return f"Error: File '{file_path}' not found."
            if not os.path.isfile(file_path):
                return f"Error: '{file_path}' is not a file."
            with open(file_path, "r", encoding="utf-8") as f:
                content = f.read()
            return f"Successfully read file '{file_path}'. Content:\n{content[:1000]}..."  # Truncate long content
        except PermissionError:
            return f"Error: Permission denied when reading '{file_path}'."
        except Exception as e:
            return f"Error reading file: {str(e)}"

    async def _arun(self, file_path: str) -> str:
        """Asynchronous file read (for async workflows)."""
        return self._run(file_path)  # File I/O is sync; use aiofiles for true async

# Input schema for file write tool
class FileWriteInput(BaseModel):
    file_path: str = Field(description="Path to the file to write (e.g., ./output/result.txt)")
    content: str = Field(description="Content to write to the file")
    overwrite: bool = Field(default=False, description="Whether to overwrite the file if it exists")

class FileWriteTool(BaseTool):
    name: str = "file_write"
    description: str = "Writes content to a text file. Use this when you need to save results to a file."
    args_schema: Type[BaseModel] = FileWriteInput

    def _run(self, file_path: str, content: str, overwrite: bool = False) -> str:
        try:
            # Create directory if it doesn't exist
            dir_path = os.path.dirname(file_path)
            if dir_path and not os.path.exists(dir_path):
                os.makedirs(dir_path)
            # Check if file exists
            if os.path.exists(file_path) and not overwrite:
                return f"Error: File '{file_path}' already exists. Set overwrite=True to replace it."
            with open(file_path, "w" if overwrite else "x", encoding="utf-8") as f:
                f.write(content)
            return f"Successfully wrote content to '{file_path}'."
        except PermissionError:
            return f"Error: Permission denied when writing to '{file_path}'."
        except FileExistsError:
            return f"Error: File '{file_path}' already exists. Set overwrite=True to replace it."
        except Exception as e:
            return f"Error writing file: {str(e)}"

    async def _arun(self, file_path: str, content: str, overwrite: bool = False) -> str:
        return self._run(file_path, content, overwrite)

5.3.2 Shell Execution Tool

Shell tools enable Agents to run command-line commands—useful for system administration, data processing (e.g., csvcut), and automation. Note: Shell tools are high-risk—always sandbox them (Chapter 12).


# agent/tools/shell_tools.py
from langchain.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type
import subprocess

class ShellExecuteInput(BaseModel):
    command: str = Field(description="Shell command to execute (e.g., 'ls ./data', 'python script.py')")
    timeout: int = Field(default=10, description="Timeout for the command in seconds (prevents hanging)")

class ShellExecuteTool(BaseTool):
    name: str = "shell_execute"
    description: str = "Executes a shell command. Use this for system operations, file system queries, or running scripts. "
                       "Avoid dangerous commands (rm -rf, sudo) unless explicitly authorized."
    args_schema: Type[BaseModel] = ShellExecuteInput

    def _run(self, command: str, timeout: int = 10) -> str:
        try:
            # Run command and capture output
            result = subprocess.run(
                command,
                shell=True,
                check=True,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                text=True,
                timeout=timeout
            )
            return f"Command succeeded. Stdout:\n{result.stdout}\nStderr:\n{result.stderr}"
        except subprocess.CalledProcessError as e:
            return f"Command failed with exit code {e.returncode}. Stdout:\n{e.stdout}\nStderr:\n{e.stderr}"
        except subprocess.TimeoutExpired:
            return f"Error: Command timed out after {timeout} seconds."
        except Exception as e:
            return f"Error executing command: {str(e)}"

    async def _arun(self, command: str, timeout: int = 10) -> str:
        # Run sync command in async context (use asyncio.subprocess for true async)
        return self._run(command, timeout)

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

刚刚，Anthropic 用 Claude Code 团灭了一大批 AI 初创公司！

2048 AI社区

WorldModel_Theory_002_PPT

这句话其实是在说一件经典事：在 POMDP 里，如果你用历史构造一个信念态/信息态（belief state / agent state），这个“新状态”对智能体来说就是可观测的，并且可以变成 Markov，从而把问题转成一个 MDP 来做。Dreamer 的 (s_t) 就扮演了这种“agent state”（智能体内部状态）的角色：RL4AA 的讲义也明确区分了environment stat

2048 AI社区

数据库那些事

简单的说，bc范式是在第三范式的基础上的一种特殊情况，既每个表中只有一个候选键（在一个数据库中每行的值都不相同，则可称为候选键），在上面第三范式的noNf表中可以看出，每一个员工的email都是唯一的（难道两个人用同一个email?数据库范式在数据库设计中的地位一直很暧昧，教科书中对于数据库范式倒是都给出了学术性的定义，但实际应用中范式的应用却不甚乐观，这篇文章会用简单的语言和一个简单的数据库DE