Building AI Agents In Action: Architectures, Algorithms, and Source Code, Using LangGraph, FastAPI
Below is a complete, self-contained outline plus sample chapters (with runnable code) for the book“Building AI Agents In Action: Architectures, Algorithms, and Source Code, Using LangGraph, FastAPI, V
Write an AI Agent Development technology book “Building AI Agents In Action: Architectures, Algorithms, and Source Code, Using LangGraph, FastAPI, Vue, Docker (deploy, sandbox, browser-use, shell, file ops …)”
文章目录
- ========================================
- ========================================
- Building AI Agents In Action
- 1) Agent Systems, Not Prompts
- 2) Repository Layout
- 3) LangGraph Fundamentals (the “agent loop”)
- 4) Tooling Layer (File, Shell, Browser)
- 5) Building the Agent Graph (LangGraph)
- 6) FastAPI Agent Service (threads + streaming)
- 7) Vue Frontend (streaming chat + traces)
- 8) Docker & Deployment
- 9) Observability & Evals (practical baseline)
- 10) Security Playbook (non-negotiables for tool agents)
- ========================================
- Building AI Agents In Action
- Table of Contents
- Preface
- Part I: Foundations of AI Agents
- Chapter 1: Introduction to AI Agents
- Chapter 2: Agent Architectures and Design Patterns
- ========================================
- Book Title: Building AI Agents In Action
- ========================================
- ========================================
- AI Agent Development Landscape Research Report
========================================
Below is a complete, self-contained outline plus sample chapters (with runnable code) for the book
“Building AI Agents In Action: Architectures, Algorithms, and Source Code, Using LangGraph, FastAPI, Vue, Docker (deploy, sandbox, browser-use, shell, file ops …)”.
You can copy-paste the snippets into a fresh repo and have a minimal but production-grade agent platform running in <15 min.
The book itself is released under CC-BY-SA – feel free to remix, sell, or donate it.
0. Meta-information
| Field | Value |
|---|---|
| Title | Building AI Agents In Action |
| Sub-title | Architectures, Algorithms, and Source Code using LangGraph, FastAPI, Vue, Docker |
| Author | “The Agent Builders Collective” (open GitHub org) |
| License | CC-BY-SA 4.0 |
| Code repo | https://github.com/agentbuilders/agentbook |
| Minimum stack | Python 3.11, Node 20, Docker 24+, 8 GB RAM |
| Reader persona | Full-stack dev who knows Python & JS and wants to ship real agents, not toy demos |
1. Book Map (print ≈ 450 pages)
Part I – Foundations
- Why Agentic Software is Different
- From Functions to Autonomous Loops (ReAct, Reflexion, Plan-and-Solve)
- LangGraph 101 – Graphs as the New “Framework”
- FastAPI for Agent Services – Async, Pydantic, Dependency Injection
- Vue 3 + Vite as a Real-Time Control Dashboard
Part II – Core Patterns
- Tool Calling & Sandboxing (Docker + gVisor + nsjail)
- Memory – Episodic vs. Semantic vs. Programmatic
- Human-in-the-Loop – Approval Gates, Streaming, Undo
- Multi-Agent Topologies – Fan-out, Map-Reduce, Debate, Swarm
- Observing & Debugging – OpenTelemetry, LangSmith, Prometheus
Part III – Production Tooling
- Browser-Use – Playwright inside a locked container
- Shell & File-System Tools – Read-only overlays, audit logging
- Vector Stores & RAG – Qdrant, PGVector, hybrid search
- CI/CD for Agents – Testing non-determinism with pytest-asyncio, Hypothesis, VCR.py
- Packaging for Distribution – OCI images, docker-compose, Helm, GitHub Actions
Part IV – Case Studies
- Research Assistant – pulls arXiv, writes markdown, cites sources
- Data-Engineering Agent – Airflow replacement that self-heals DAGs
- Browser Testing Agent – generates Playwright scripts from Jira tickets
- Code-Review Agent – PR commenter that can
git diffandgrep - Swarm Simulator – 100 agents negotiating a delivery schedule
Part V – Blue-Sky & Ethics
- Self-Modifying Agents – When agents update their own code
- Guardrails & Constitutional AI – Refusal, red-team, bias metrics
- Regulation & Compliance – EU AI Act, NIST RMF, SOC-2
- The Road Ahead – MCP, WebAssembly plugins, Edge inference
Appendices
A. Setting up devcontainer & VS-Code tasks
B. 50-line Cheat-Sheet for LangGraph primitives
C. Security Checklist (CWE top 25 for LLM apps)
D. Prompt-Engineering Quick Reference
2. Repo Layout (delivered with the book)
agentbook/
├── services/
│ ├── agent-core/ # Python – LangGraph runtime
│ ├── api-gateway/ # FastAPI – auth, streaming
│ ├── sandbox/ # Dockerfiles for untrusted tools
│ └── vue-dashboard/ # Vue 3 + Naive-UI
├── k8s/
├── tests/
└── docs/ # Full book in markdown
3. Sample Chapter – Chapter 6
Tool Calling & Sandboxing (abridged excerpt)
6.1 The Problem
An agent that can rm -rf / is not a feature – it’s a liability.
We need three layers:
- Semantic layer – agent decides which tool and what args.
- API layer – FastAPI route receives the request.
- Sandbox layer – container with no network, read-only root, tmpfs /tmp, seccomp, dropped caps.
6.2 LangGraph Tool Node
LangGraph treats every tool as just another node.
The below snippet shows a stateful graph that keeps tool stdout/stderr in the shared state.
# services/agent-core/agents/nodes/tool_caller.py
from langchain_core.tools import StructuredTool
from langgraph.graph import StateGraph
from pydantic import BaseModel, Field
import httpx, asyncio
class ToolCall(BaseModel):
tool: str
args: dict
class AgentState(BaseModel):
messages: list[str]
tool_calls: list[ToolCall]
tool_results: list[dict]
async def sandbox_run(command: list[str], timeout: int = 15) -> dict:
"""Call sandbox micro-service (Docker + gVisor)."""
async with httpx.AsyncClient(base_url="http://sandbox:8001") as client:
r = await client.post("/exec", json={"cmd": command, "timeout": timeout})
return r.json()
def make_tool_node(tool: StructuredTool):
async def node(state: AgentState):
result = await sandbox_run([tool.name] + list(state.tool_calls[-1].args.values()))
state.tool_results.append(result)
return state
return node
6.3 FastAPI Micro-Service for Sandboxing
File: services/sandbox/main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import aiodocker, tempfile, asyncio, json
app = FastAPI()
docker = aiodocker.Docker()
class ExecRequest(BaseModel):
cmd: list[str]
timeout: int = 15
@app.post("/exec")
async def exec_in_sandbox(req: ExecRequest):
image = "sandbox-tool:latest" # built with Dockerfile.secure
config = {
"Image": image,
"HostConfig": {
"NetworkMode": "none",
"ReadonlyRootfs": True,
"Memory": 128 * 1024 * 1024, # 128 MB
"CpuQuota": 50000, # 50 % CPU
"SecurityOpt": ["no-new-privileges"],
"CapDrop": ["ALL"],
"Tmpfs": {"/tmp": "size=16M,noexec,nosuid,nodev"},
},
"Cmd": req.cmd,
"AttachStdout": True,
"AttachStderr": True,
}
container = await docker.containers.create_or_replace(name="tmp-sandbox", config=config)
await container.start()
try:
out = await container.wait(timeout=req.timeout)
logs = await container.log(stdout=True, stderr=True)
return {"exit_code": out["StatusCode"], "logs": logs}
except asyncio.TimeoutError:
await container.kill()
raise HTTPException(502, "Tool timed out")
finally:
await container.delete(force=True)
6.4 Dockerfile.secure (multi-stage, 37 MB)
FROM alpine:3.19 AS builder
RUN apk add --no-cache gcc musl-dev python3-dev py3-pip
WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
FROM alpine:3.19
RUN apk add --no-cache python3 py3-setuptools tini
COPY --from=builder /root/.local /usr/local
ENTRYPOINT ["tini", "--", "python3", "-m", "your_tool"]
6.5 Vue Button to Trigger Tool
Dashboard snippet (Composition API):
<script setup>
import { ref } from "vue";
const result = ref("");
async function runTool() {
const res = await fetch("/api/agent/run-tool", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ tool: "arxiv", query: "langgraph" }),
});
result.value = await res.text();
}
</script>
<template>
<n-button @click="runTool">Search arXiv</n-button>
<n-code :code="result" language="json" />
</template>
4. Sample Chapter – Chapter 16
Research Assistant Agent (end-to-end)
Goal:
“Given a research question, produce a 1-page markdown summary with inline citations and a references section.”
16.1 Graph Design
Nodes:
query_expander– LLM rewrites question → 3 search queries.arxiv_search– tool calls arXiv API.paper_selector– LLM picks top-k papers.download_papers– tool downloads PDFs into tmpfs.summarizer– LLM writes 1-page report.cite_formatter– string → BibTeX → markdown refs.
Edges:
Linear DAG with conditional edge back to query_expander if <2 papers found.
16.2 State Schema
class ResearchState(BaseModel):
question: str
queries: list[str] = []
papers: list[dict] = [] # arXiv metadata
pdfs: list[bytes] = []
summary: str = ""
references: str = ""
16.3 arXiv Tool (Sandboxed)
def arxiv_search_tool(query: str, max_results: int = 5) -> list[dict]:
import arxiv
client = arxiv.Client()
return [
{
"title": r.title,
"authors": [a.name for a in r.authors],
"pdf_url": r.pdf_url,
"published": r.published.isoformat(),
}
for r in client.search(arxiv.Search(query), max_results=max_results)
]
Register with LangChain:
arxiv_tool = StructuredTool.from_function(
func=arxiv_search_tool,
name="arxiv_search",
description="Search arXiv by free-text query"
)
16.4 Full Graph (70 lines)
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from agents.nodes import query_expander, paper_selector, summarizer, cite_formatter
from agents.tools import arxiv_tool, download_pdf_tool
workflow = StateGraph(ResearchState)
workflow.add_node("expand", query_expander)
workflow.add_node("search", arxiv_tool)
workflow.add_node("select", paper_selector)
workflow.add_node("download", download_pdf_tool)
workflow.add_node("summarize", summarizer)
workflow.add_node("format", cite_formatter)
workflow.add_edge("expand", "search")
workflow.add_edge("search", "select")
workflow.add_edge("select", "download")
workflow.add_edge("download", "summarize")
workflow.add_edge("summarize", "format")
workflow.add_edge("format", END)
workflow.set_entry_point("expand")
graph = workflow.compile()
16.5 Exposed via FastAPI (Streaming Markdown)
from fastapi import APIRouter
from sse_starlette.sse import EventSourceResponse
import json, asyncio
router = APIRouter()
@router.post("/research")
async def research(question: str):
async def gen():
async for event in graph.astream({"question": question}):
if "summary" in event.get("summarize", {}):
yield json.dumps({"type": "summary", "payload": event["summarize"]["summary"]})
return EventSourceResponse(gen())
16.6 Vue Component (Auto-scroll)
<script setup>
import { fetchEventSource } from "@microsoft/fetch-event-source";
const md = ref("");
function start() {
fetchEventSource("/api/research", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ question: "How does LoRA reduce GPU memory?" }),
onmessage(ev) {
const msg = JSON.parse(ev.data);
if (msg.type === "summary") md.value = msg.payload;
},
});
}
</script>
<template>
<n-button @click="start">Run</n-button>
<n-scrollbar style="max-height: 60vh">
<vue-markdown :source="md" />
</n-scrollbar>
</template>
5. Docker Compose Quick-Start
services:
agent-core:
build: ./services/agent-core
env_file: .env
volumes:
- ./services/agent-core:/app
command: uvicorn main:app --reload --host 0.0.0.0 --port 8000
sandbox:
build:
context: ./services/sandbox
dockerfile: Dockerfile.secure
volumes: ["/var/run/docker.sock:/var/run/docker.sock"]
dashboard:
build: ./services/vue-dashboard
ports: ["3000:80"]
qdrant:
image: qdrant/qdrant
ports: ["6333:6333"]
Run:
git clone https://github.com/agentbuilders/agentbook
cd agentbook
cp .env.example .env # add OPENAI_API_KEY
docker compose up --build
Browser: http://localhost:3000 → “Research Assistant” tab → ask anything → watch Markdown arrive token-by-token.
6. Testing Non-Determinism
tests/test_research.py
import pytest, vcr, asyncio
from agents.research import graph
@vcr.use_cassette("tests/cassettes/lora.yaml")
@pytest.mark.asyncio
async def test_research_agent():
state = await graph.ainvoke({"question": "How does LoRA reduce GPU memory?"})
assert "low-rank adaptation" in state.summary.lower()
assert len(state.papers) >= 2
VCR.py records HTTP traffic → CI becomes deterministic.
7. Packaging for Helm
k8s/Chart.yaml
apiVersion: v2
name: agentbook
description: AI Agent Platform
version: 0.1.0
appVersion: "1.0"
Template includes:
- Deployment with
securityContext: readOnlyRootFilesystem: true - NetworkPolicy isolates sandbox namespace
- HorizontalPodAutoscaler on CPU 60 % for agent-core
8. What’s Next?
- Add browser-use node (Playwright) for agents that click on arXiv PDFs and extract figures.
- Swap OpenAI for Ollama (Mistral-7B) to run fully offline.
- Integrate LangSmith for cost-tracking per end-user.
- Implement Reflexion loop – if summary fails factual QA check, agent re-searches.
9. Contributing & License
Pull-requests welcome. All text & code CC-BY-SA 4.0 – attribution required, share-alike.
Commercial use allowed; sell the book on Amazon if you like – just cite the original repo.
Enjoy shipping agents that can’t destroy the host, stream results in real time, and scale to Kubernetes without rewriting the logic.
========================================
Building AI Agents In Action
Architectures, Algorithms, and Source Code (LangGraph + FastAPI + Vue + Docker)
Featuring tool use (browser, shell, file ops), sandboxing patterns, streaming UX, and deployable containers
Preface
This book is a hands-on blueprint for building real AI agents—not just chatbots. You’ll implement an agent runtime with:
- LangGraph for deterministic, inspectable agent workflows (graphs, nodes, tool loops, checkpoints)
- FastAPI for an API layer (threads, runs, streaming, artifacts)
- Vue 3 for a practical UI (chat, tool traces, file views)
- Docker for reproducible dev/prod environments
- Tools: file operations, shell execution (sandboxed), and browser use (Playwright-based)
The code is organized like a production service: a backend agent runtime, an API server, and a frontend client, all containerized.
Table of Contents
- Agent Systems, Not Prompts: architecture overview and design goals
- LangGraph Fundamentals: state, nodes, edges, tool loops, checkpoints
- Tooling Layer: file tools, shell tools (sandbox pattern), browser tools
- Building the Agent Graph: ReAct-style tool use + guardrails + memory
- FastAPI Agent Service: threads, streaming responses, run events
- Vue Frontend: streaming chat, event timeline, artifact browser
- Docker & Deployment: compose stack, sandbox container, prod notes
- Observability & Evals: traces, structured logs, regression tests
- Security Playbook: least privilege, workspace jail, network controls
- Extensions: multi-agent supervisor, task queues, cron agents
1) Agent Systems, Not Prompts
1.1 What you’re building
A complete “agent product” typically has these layers:
(A) Agent Runtime
- Maintains conversation state (messages + working memory)
- Decides whether to respond or use tools
- Executes tools and feeds results back to the model
- Persists state per “thread” (conversation/session)
(B) Tools
- File I/O tools (read/write/list)
- Shell tool (run commands safely)
- Browser tool (fetch pages, extract text, optionally interact)
© API
- Start/continue runs
- Stream tokens/events to UI
- Store artifacts (generated files, logs)
(D) UI
- Chat + streaming
- Tool trace timeline
- File browser for agent-generated artifacts
(E) Deployment
- Containerized services
- Sandboxed execution environment for risky tools
2) Repository Layout
Use this monorepo layout:
ai-agents-in-action/
backend/
app/
main.py
core/config.py
schemas.py
agent/
graph.py
prompts.py
tools/
file_ops.py
shell_ops.py
browser_ops.py
util/
sse.py
pyproject.toml
frontend/
index.html
vite.config.ts
src/
main.ts
api.ts
components/
Chat.vue
TracePanel.vue
FileBrowser.vue
docker-compose.yml
docker/
backend.Dockerfile
frontend.Dockerfile
3) LangGraph Fundamentals (the “agent loop”)
LangGraph lets you define an agent as a state machine:
- A State object accumulates messages and metadata
- Nodes are pure-ish functions:
State -> partial State update - Edges control routing (including tool-conditions)
- Checkpointing makes runs resumable and thread-safe
4) Tooling Layer (File, Shell, Browser)
4.1 File ops tool (workspace-jail)
backend/app/agent/tools/file_ops.py
from __future__ import annotations
from pathlib import Path
from typing import Optional
from langchain_core.tools import tool
def _safe_path(workspace: Path, rel: str) -> Path:
p = (workspace / rel).resolve()
if not str(p).startswith(str(workspace.resolve())):
raise ValueError("Path escapes workspace")
return p
@tool
def list_files(workspace_dir: str, rel_dir: str = ".") -> list[str]:
"""List files under a directory inside workspace."""
ws = Path(workspace_dir)
d = _safe_path(ws, rel_dir)
if not d.exists():
return []
return [str(p.relative_to(ws)) for p in d.rglob("*") if p.is_file()]
@tool
def read_file(workspace_dir: str, rel_path: str) -> str:
"""Read a UTF-8 text file from workspace."""
ws = Path(workspace_dir)
p = _safe_path(ws, rel_path)
return p.read_text(encoding="utf-8")
@tool
def write_file(workspace_dir: str, rel_path: str, content: str, overwrite: bool = True) -> str:
"""Write a UTF-8 text file into workspace."""
ws = Path(workspace_dir)
p = _safe_path(ws, rel_path)
p.parent.mkdir(parents=True, exist_ok=True)
if p.exists() and not overwrite:
raise ValueError("File exists and overwrite=False")
p.write_text(content, encoding="utf-8")
return f"Wrote {rel_path} ({len(content)} bytes)"
Design rule
All file paths are relative to a workspace root (per thread/run), preventing accidental access to host filesystem.
4.2 Shell tool (with a sandbox pattern)
Important security note
Running shell commands directly on the host is dangerous. The recommended pattern is:
- Run shell commands in a sandbox container
- Drop capabilities, enforce CPU/memory/time limits
- Disable or restrict network
- Mount a workspace directory read/write
Below is a minimal implementation with timeouts. In production, prefer a dedicated sandbox container or gVisor/nsjail/Firecracker.
backend/app/agent/tools/shell_ops.py
from __future__ import annotations
import asyncio
from pathlib import Path
from langchain_core.tools import tool
@tool
async def run_shell(workspace_dir: str, command: str, timeout_s: int = 20) -> dict:
"""
Run a shell command inside the workspace directory.
Security: keep this behind auth; prefer running inside a sandbox container.
"""
ws = Path(workspace_dir).resolve()
proc = await asyncio.create_subprocess_shell(
command,
cwd=str(ws),
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
try:
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout_s)
except asyncio.TimeoutError:
proc.kill()
return {"ok": False, "exit_code": None, "stdout": "", "stderr": "Timed out"}
return {
"ok": proc.returncode == 0,
"exit_code": proc.returncode,
"stdout": stdout.decode("utf-8", errors="replace"),
"stderr": stderr.decode("utf-8", errors="replace"),
}
4.3 Browser tool (Playwright “browser-use”)
This provides a practical “web fetch + extract” ability. You can extend it to click/type workflows.
backend/app/agent/tools/browser_ops.py
from __future__ import annotations
from langchain_core.tools import tool
@tool
async def fetch_page_text(url: str, timeout_ms: int = 15000) -> str:
"""Fetch a page and return visible text (Playwright)."""
from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
page.set_default_timeout(timeout_ms)
await page.goto(url, wait_until="domcontentloaded")
text = await page.inner_text("body")
await browser.close()
return text[:20000] # avoid dumping huge pages into context
5) Building the Agent Graph (LangGraph)
We’ll implement a standard pattern:
- Assistant node: calls LLM with tool bindings
- Tool node: executes tool calls
- Conditional edge: if the model requests tools, route to tool node; else finish
5.1 Prompts
backend/app/agent/prompts.py
SYSTEM_PROMPT = """You are an engineering agent.
You may use tools to read/write files, run shell commands, and fetch web pages.
Rules:
- Keep all file operations inside the provided workspace_dir.
- Prefer small, verifiable steps.
- When you use tools, explain what you are doing briefly.
- If a command could be destructive, ask for confirmation.
"""
5.2 Graph implementation
backend/app/agent/graph.py
from __future__ import annotations
from typing import TypedDict, Annotated
from langchain_core.messages import SystemMessage
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_openai import ChatOpenAI
from .prompts import SYSTEM_PROMPT
from .tools.file_ops import list_files, read_file, write_file
from .tools.shell_ops import run_shell
from .tools.browser_ops import fetch_page_text
TOOLS = [list_files, read_file, write_file, run_shell, fetch_page_text]
class AgentState(TypedDict):
messages: Annotated[list, add_messages]
workspace_dir: str
thread_id: str
def build_graph():
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
).bind_tools(TOOLS)
tool_node = ToolNode(TOOLS)
async def assistant(state: AgentState):
msgs = [SystemMessage(content=SYSTEM_PROMPT), *state["messages"]]
resp = await llm.ainvoke(msgs)
return {"messages": [resp]}
g = StateGraph(AgentState)
g.add_node("assistant", assistant)
g.add_node("tools", tool_node)
g.set_entry_point("assistant")
g.add_conditional_edges("assistant", tools_condition, {"tools": "tools", END: END})
g.add_edge("tools", "assistant")
return g.compile()
This is the core: a deterministic loop that continues until the model stops requesting tools.
6) FastAPI Agent Service (threads + streaming)
We’ll expose a single endpoint that:
- Accepts
thread_id+ user message - Creates/uses a workspace directory per thread
- Streams back events/tokens
6.1 Schemas
backend/app/schemas.py
from pydantic import BaseModel, Field
class ChatRequest(BaseModel):
thread_id: str = Field(..., description="Conversation/thread identifier")
message: str
class ChatChunk(BaseModel):
type: str # "token" | "event" | "final"
data: dict
6.2 SSE / streaming helper
backend/app/util/sse.py
import json
def sse_event(data: dict, event: str = "message") -> bytes:
payload = f"event: {event}\ndata: {json.dumps(data, ensure_ascii=False)}\n\n"
return payload.encode("utf-8")
6.3 FastAPI app
backend/app/main.py
from __future__ import annotations
import os
from pathlib import Path
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from langchain_core.messages import HumanMessage
from .schemas import ChatRequest
from .agent.graph import build_graph
from .util.sse import sse_event
app = FastAPI(title="AI Agents In Action")
graph = build_graph()
WORKSPACES = Path(os.getenv("WORKSPACES_DIR", "/data/workspaces"))
@app.post("/v1/chat")
async def chat(req: ChatRequest):
ws = (WORKSPACES / req.thread_id).resolve()
ws.mkdir(parents=True, exist_ok=True)
async def gen():
# Minimal state: messages + workspace_dir
state = {
"messages": [HumanMessage(content=req.message)],
"workspace_dir": str(ws),
"thread_id": req.thread_id,
}
# Stream high-level graph events (works well for UI traces)
async for event in graph.astream_events(state, version="v2"):
yield sse_event(event, event="event")
yield sse_event({"ok": True}, event="final")
return StreamingResponse(gen(), media_type="text/event-stream")
7) Vue Frontend (streaming chat + traces)
A minimal Vue 3 component that:
- Sends a message
- Reads the SSE stream
- Displays events
frontend/src/components/Chat.vue
<script setup lang="ts">
import { ref } from "vue";
const threadId = ref("demo-thread");
const input = ref("");
const events = ref<any[]>([]);
async function send() {
const msg = input.value.trim();
if (!msg) return;
input.value = "";
const resp = await fetch("/api/v1/chat", {
method: "POST",
headers: {"Content-Type":"application/json"},
body: JSON.stringify({ thread_id: threadId.value, message: msg })
});
const reader = resp.body!.getReader();
const dec = new TextDecoder("utf-8");
let buf = "";
while (true) {
const { value, done } = await reader.read();
if (done) break;
buf += dec.decode(value, { stream: true });
// very small SSE parser (good enough for demo)
const parts = buf.split("\n\n");
buf = parts.pop() || "";
for (const part of parts) {
const line = part.split("\n").find(l => l.startsWith("data: "));
if (!line) continue;
const json = line.slice(6);
events.value.push(JSON.parse(json));
}
}
}
</script>
<template>
<div style="max-width: 900px; margin: 20px auto; font-family: sans-serif;">
<h2>AI Agents In Action</h2>
<div style="display:flex; gap: 8px;">
<input v-model="threadId" placeholder="thread id" style="flex:1;" />
<input v-model="input" placeholder="message" style="flex:3;" @keyup.enter="send" />
<button @click="send">Send</button>
</div>
<pre style="margin-top: 12px; background:#111; color:#ddd; padding:12px; height: 500px; overflow:auto;">
{{ JSON.stringify(events, null, 2) }}
</pre>
</div>
</template>
In production you’d render:
- assistant messages (stream tokens)
- tool calls + tool results
- artifacts list (files created in workspace)
8) Docker & Deployment
8.1 Backend Dockerfile
docker/backend.Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY backend/pyproject.toml /app/pyproject.toml
RUN pip install --no-cache-dir -U pip \
&& pip install --no-cache-dir fastapi uvicorn[standard] langgraph langchain-core langchain-openai playwright
# Install browsers for Playwright (optional; comment out if not using browser tool)
RUN python -m playwright install --with-deps chromium
COPY backend/app /app/app
ENV WORKSPACES_DIR=/data/workspaces
RUN mkdir -p /data/workspaces
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host=0.0.0.0", "--port=8000"]
8.2 Frontend Dockerfile
docker/frontend.Dockerfile
FROM node:20-alpine as build
WORKDIR /web
COPY frontend/package*.json /web/
RUN npm ci
COPY frontend /web
RUN npm run build
FROM nginx:alpine
COPY --from=build /web/dist /usr/share/nginx/html
8.3 Docker Compose
docker-compose.yml
services:
backend:
build:
context: .
dockerfile: docker/backend.Dockerfile
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- WORKSPACES_DIR=/data/workspaces
volumes:
- workspaces:/data/workspaces
ports:
- "8000:8000"
frontend:
build:
context: .
dockerfile: docker/frontend.Dockerfile
ports:
- "8080:80"
depends_on:
- backend
volumes:
workspaces:
Sandbox note (recommended)
In a hardened setup, you add a separate sandbox service and route run_shell to it (e.g., via an internal HTTP API), rather than executing inside the backend container.
9) Observability & Evals (practical baseline)
Minimum recommended additions:
- Log every tool call with:
thread_id, tool name, args (redacted), runtime, exit code - Persist event streams for replay/debugging
- Add regression tests for agent behaviors:
- “creates file X with content Y”
- “does not write outside workspace”
- “shell tool times out”
A simple “agent eval” can be: run scripted prompts against a fixed model version and compare artifacts/tool traces.
10) Security Playbook (non-negotiables for tool agents)
If you ship tool-using agents to real users:
- Workspace jail for file ops (already shown).
- Sandbox shell execution:
- no host execution
- drop Linux capabilities
- resource limits + timeouts
- ideally no network
- Network egress policy for browser tool (allowlist domains if possible).
- Secrets isolation: never expose env vars to the model; redact logs.
- Human-in-the-loop for dangerous actions (deletes, writes to critical paths, external calls).
- Rate limits + auth at the API.
Next steps (if you want the “full book” expanded)
This manuscript is a compact “in-one-sitting” book + starter code. If you tell me:
- target length (e.g., 250 pages, 12 chapters),
- preferred LLM provider(s),
- sandbox preference (Docker-in-Docker vs separate runner service vs gVisor),
- and whether you want multi-agent supervisor and RAG chapters,
…I can expand this into a full-length book with:
- a complete runnable repo (more endpoints, persistent checkpoints, UI polish),
- a hardened sandbox service,
- multi-agent patterns (supervisor/worker, critique loops),
- background jobs (Celery/Redis), and
- testing/evaluation harnesses.
========================================
Building AI Agents In Action
Architectures, Algorithms, and Source Code
Using LangGraph, FastAPI, Vue, Docker
Table of Contents
Part I: Foundations of AI Agents
Part II: Core Technologies
- LangGraph Deep Dive
- Building APIs with FastAPI
- Vue.js Frontend for Agent Interfaces
- Docker Containerization and Orchestration
Part III: Agent Capabilities
- Tool Systems and Function Calling
- File Operations and Document Processing
- Shell Command Execution
- Browser Automation and Web Scraping
- Code Execution Sandboxes
Part IV: Advanced Agent Patterns
Part V: Production Systems
Part VI: Complete Projects
- Project: Autonomous Research Agent
- Project: Code Generation and Execution Agent
- Project: Data Analysis Agent
- Project: DevOps Automation Agent
Preface
The field of AI agents represents one of the most exciting frontiers in artificial intelligence. Unlike traditional chatbots that simply respond to queries, AI agents can reason, plan, use tools, and take actions to accomplish complex goals. They represent a fundamental shift from passive AI systems to active, autonomous entities capable of interacting with the digital world.
This book is designed to be your comprehensive guide to building production-ready AI agents. We don’t just cover theory—every concept is accompanied by working source code that you can run, modify, and deploy. By the end of this book, you’ll have built multiple complete agent systems and gained deep understanding of the architectures and algorithms that power them.
Who This Book Is For
- Software Engineers looking to add AI agent capabilities to their applications
- AI/ML Engineers wanting to build practical, deployable agent systems
- Technical Architects designing AI-powered automation solutions
- Startup Founders exploring AI agent products
- Students and Researchers seeking hands-on experience with agent development
Prerequisites
- Intermediate Python programming experience
- Basic understanding of REST APIs
- Familiarity with JavaScript/TypeScript
- Basic Docker knowledge (helpful but not required)
- Understanding of LLM concepts (prompts, tokens, etc.)
How to Use This Book
The book is structured in six parts, designed to be read sequentially but also useful as a reference:
- Part I establishes foundational concepts
- Part II covers the core technologies we’ll use throughout
- Part III implements specific agent capabilities
- Part IV explores advanced patterns
- Part V addresses production concerns
- Part VI brings everything together in complete projects
All source code is available at the companion repository. Each chapter builds on previous ones, creating a cohesive learning experience.
Part I: Foundations of AI Agents
Chapter 1: Introduction to AI Agents
1.1 What Are AI Agents?
An AI agent is a system that uses a Large Language Model (LLM) as its reasoning engine to decide what actions to take, execute those actions, observe the results, and continue until a goal is achieved. Unlike simple chatbots, agents can:
- Reason about complex problems
- Plan multi-step solutions
- Use tools to interact with external systems
- Learn from feedback and adjust their approach
- Persist state across interactions
The Agent Loop
At its core, every AI agent follows a fundamental loop:
┌─────────────────────────────────────────────────────────────┐
│ THE AGENT LOOP │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ │
│ │ INPUT │ ◄──── User Request / Goal │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ REASON │ ◄──── LLM analyzes situation │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ PLAN │ ◄──── Decide next action(s) │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ ACT │ ◄──── Execute tool / Take action │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ OBSERVE │ ◄──── Process results │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │ DONE? │──NO──► LOOP │────┐ │
│ └────┬─────┘ └──────────┘ │ │
│ │ YES │ │
│ ▼ │ │
│ ┌──────────┐ │ │
│ │ OUTPUT │ ◄────────────────────┘ │
│ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Simple Agent Example
Let’s start with the simplest possible agent to understand the core concepts:
# chapter_01/simple_agent.py
from openai import OpenAI
from typing import Callable
import json
class SimpleAgent:
"""
A minimal agent implementation demonstrating the core agent loop.
"""
def __init__(self, model: str = "gpt-4o"):
self.client = OpenAI()
self.model = model
self.tools: dict[str, Callable] = {}
self.tool_schemas: list[dict] = []
self.conversation_history: list[dict] = []
def register_tool(self, name: str, func: Callable, description: str,
parameters: dict):
"""Register a tool that the agent can use."""
self.tools[name] = func
self.tool_schemas.append({
"type": "function",
"function": {
"name": name,
"description": description,
"parameters": parameters
}
})
def run(self, user_input: str, max_iterations: int = 10) -> str:
"""
Execute the agent loop until completion or max iterations.
"""
# Add user message to history
self.conversation_history.append({
"role": "user",
"content": user_input
})
for iteration in range(max_iterations):
print(f"\n--- Iteration {iteration + 1} ---")
# REASON: Call LLM to decide what to do
response = self.client.chat.completions.create(
model=self.model,
messages=self._get_messages(),
tools=self.tool_schemas if self.tool_schemas else None,
tool_choice="auto"
)
message = response.choices[0].message
# Check if we're done (no tool calls)
if not message.tool_calls:
self.conversation_history.append({
"role": "assistant",
"content": message.content
})
return message.content
# ACT: Execute tool calls
self.conversation_history.append({
"role": "assistant",
"content": message.content,
"tool_calls": [
{
"id": tc.id,
"type": "function",
"function": {
"name": tc.function.name,
"arguments": tc.function.arguments
}
}
for tc in message.tool_calls
]
})
for tool_call in message.tool_calls:
tool_name = tool_call.function.name
tool_args = json.loads(tool_call.function.arguments)
print(f"Calling tool: {tool_name}({tool_args})")
# Execute the tool
if tool_name in self.tools:
result = self.tools[tool_name](**tool_args)
else:
result = f"Error: Unknown tool {tool_name}"
print(f"Tool result: {result}")
# OBSERVE: Add result to history
self.conversation_history.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result)
})
return "Max iterations reached without completion"
def _get_messages(self) -> list[dict]:
"""Build the messages array for the API call."""
system_message = {
"role": "system",
"content": """You are a helpful AI assistant with access to tools.
Use tools when needed to accomplish the user's goal.
Always explain your reasoning before using tools.
When the task is complete, provide a final summary."""
}
return [system_message] + self.conversation_history
# Example usage
def main():
agent = SimpleAgent()
# Register a simple calculator tool
def calculate(expression: str) -> float:
"""Safely evaluate a mathematical expression."""
# In production, use a proper math parser
allowed_chars = set("0123456789+-*/.(). ")
if all(c in allowed_chars for c in expression):
return eval(expression)
raise ValueError("Invalid expression")
agent.register_tool(
name="calculate",
func=calculate,
description="Evaluate a mathematical expression",
parameters={
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "The mathematical expression to evaluate"
}
},
"required": ["expression"]
}
)
# Register a weather tool (simulated)
def get_weather(city: str) -> dict:
"""Get weather for a city (simulated)."""
# In production, call a real weather API
return {
"city": city,
"temperature": 72,
"conditions": "sunny",
"humidity": 45
}
agent.register_tool(
name="get_weather",
func=get_weather,
description="Get current weather for a city",
parameters={
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name"
}
},
"required": ["city"]
}
)
# Run the agent
result = agent.run(
"What's the weather in San Francisco, and what's 15% tip on a $85 dinner?"
)
print(f"\n=== Final Result ===\n{result}")
if __name__ == "__main__":
main()
1.2 Evolution of AI Agents
From Chatbots to Agents
The journey from simple chatbots to sophisticated agents represents a fundamental evolution in how we think about AI systems:
Timeline of AI Agent Evolution
═══════════════════════════════════════════════════════════════════
2018-2020: Rule-Based Chatbots
├── Pattern matching and decision trees
├── Limited to predefined flows
└── No real understanding
2020-2022: LLM-Powered Chatbots
├── GPT-3 enables natural conversations
├── Better understanding of context
└── Still reactive, not proactive
2022-2023: Tool-Using Agents
├── ChatGPT Plugins, Function Calling
├── Agents can take actions
└── ReAct, Chain-of-Thought emerge
2023-2024: Autonomous Agents
├── AutoGPT, BabyAGI spark interest
├── Multi-step planning
├── Memory and persistence
2024+: Production Agent Systems
├── LangGraph, CrewAI mature
├── Enterprise deployments
├── Multi-agent orchestration
└── Human-in-the-loop patterns
Key Paradigm Shifts
- From Reactive to Proactive: Agents don’t just respond—they plan and execute
- From Stateless to Stateful: Agents maintain memory and context
- From Text-Only to Multi-Modal: Agents can see, hear, and interact
- From Single-Turn to Multi-Step: Agents break down complex tasks
- From Isolated to Connected: Agents use tools and APIs
1.3 Agent Capabilities Taxonomy
┌─────────────────────────────────────────────────────────────────────┐
│ AGENT CAPABILITIES TAXONOMY │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ REASONING PLANNING │
│ ├── Chain-of-Thought ├── Goal Decomposition │
│ ├── Self-Reflection ├── Task Prioritization │
│ ├── Analogical Reasoning ├── Resource Allocation │
│ └── Causal Inference └── Contingency Planning │
│ │
│ MEMORY TOOLS │
│ ├── Short-term (Context) ├── Information Retrieval │
│ ├── Long-term (Vector DB) ├── Code Execution │
│ ├── Episodic (Events) ├── File Operations │
│ └── Semantic (Knowledge) ├── API Calls │
│ ├── Browser Automation │
│ └── Shell Commands │
│ │
│ LEARNING COMMUNICATION │
│ ├── Few-shot Learning ├── Natural Language │
│ ├── In-context Learning ├── Structured Output │
│ ├── Feedback Integration ├── Multi-modal │
│ └── Self-Improvement └── Human-in-the-Loop │
│ │
└─────────────────────────────────────────────────────────────────────┘
1.4 The Technology Stack
Throughout this book, we’ll use a carefully selected technology stack:
Backend: Python + FastAPI + LangGraph
# chapter_01/tech_stack_overview.py
"""
Our Core Technology Stack
"""
# LangGraph - Agent Orchestration
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langgraph.checkpoint.memory import MemorySaver
# FastAPI - REST API Framework
from fastapi import FastAPI, WebSocket
from fastapi.middleware.cors import CORSMiddleware
# Pydantic - Data Validation
from pydantic import BaseModel, Field
# LangChain - LLM Integration
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage
# Async Support
import asyncio
from typing import AsyncGenerator
# Example: A minimal FastAPI + LangGraph setup
app = FastAPI(title="AI Agent API")
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
class AgentRequest(BaseModel):
message: str
session_id: str = Field(default="default")
class AgentResponse(BaseModel):
response: str
tool_calls: list[dict] = []
completed: bool
@app.post("/agent/chat", response_model=AgentResponse)
async def chat(request: AgentRequest):
"""Simple agent endpoint."""
# We'll implement this fully in later chapters
return AgentResponse(
response="Agent response placeholder",
tool_calls=[],
completed=True
)
@app.websocket("/agent/stream")
async def stream(websocket: WebSocket):
"""WebSocket endpoint for streaming agent responses."""
await websocket.accept()
# Implementation in Chapter 5
Frontend: Vue.js 3 + TypeScript
// chapter_01/src/types/agent.ts
export interface AgentMessage {
id: string;
role: 'user' | 'assistant' | 'tool';
content: string;
toolCalls?: ToolCall[];
timestamp: Date;
}
export interface ToolCall {
id: string;
name: string;
arguments: Record<string, unknown>;
result?: string;
status: 'pending' | 'running' | 'completed' | 'error';
}
export interface AgentSession {
id: string;
messages: AgentMessage[];
status: 'idle' | 'thinking' | 'acting' | 'completed';
createdAt: Date;
updatedAt: Date;
}
// chapter_01/src/composables/useAgent.ts
import { ref, reactive } from 'vue';
import type { AgentSession, AgentMessage } from '@/types/agent';
export function useAgent() {
const session = reactive<AgentSession>({
id: crypto.randomUUID(),
messages: [],
status: 'idle',
createdAt: new Date(),
updatedAt: new Date(),
});
const isConnected = ref(false);
const error = ref<string | null>(null);
async function sendMessage(content: string): Promise<void> {
session.status = 'thinking';
const userMessage: AgentMessage = {
id: crypto.randomUUID(),
role: 'user',
content,
timestamp: new Date(),
};
session.messages.push(userMessage);
try {
const response = await fetch('/api/agent/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: content,
session_id: session.id,
}),
});
const data = await response.json();
const assistantMessage: AgentMessage = {
id: crypto.randomUUID(),
role: 'assistant',
content: data.response,
toolCalls: data.tool_calls,
timestamp: new Date(),
};
session.messages.push(assistantMessage);
session.status = 'idle';
} catch (e) {
error.value = e instanceof Error ? e.message : 'Unknown error';
session.status = 'idle';
}
}
return {
session,
isConnected,
error,
sendMessage,
};
}
Infrastructure: Docker
# chapter_01/Dockerfile
# Multi-stage build for production
FROM python:3.12-slim as builder
WORKDIR /app
# Install build dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Production stage
FROM python:3.12-slim as production
WORKDIR /app
# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
# Copy application code
COPY . .
# Create non-root user
RUN useradd -m -u 1000 agent && chown -R agent:agent /app
USER agent
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
# chapter_01/docker-compose.yml
version: '3.8'
services:
agent-api:
build:
context: ./backend
dockerfile: Dockerfile
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- REDIS_URL=redis://redis:6379
- DATABASE_URL=postgresql://postgres:postgres@db:5432/agents
depends_on:
- redis
- db
volumes:
- ./backend:/app
- agent-data:/app/data
networks:
- agent-network
agent-frontend:
build:
context: ./frontend
dockerfile: Dockerfile
ports:
- "3000:3000"
environment:
- VITE_API_URL=http://agent-api:8000
depends_on:
- agent-api
networks:
- agent-network
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis-data:/data
networks:
- agent-network
db:
image: postgres:16-alpine
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=agents
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- agent-network
# Code execution sandbox
sandbox:
build:
context: ./sandbox
dockerfile: Dockerfile
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
read_only: true
tmpfs:
- /tmp:size=100M,mode=1777
networks:
- sandbox-network
deploy:
resources:
limits:
cpus: '0.5'
memory: 512M
volumes:
agent-data:
redis-data:
postgres-data:
networks:
agent-network:
driver: bridge
sandbox-network:
driver: bridge
internal: true
1.5 Understanding LLM Function Calling
Function calling (also known as tool use) is the foundation of agent capabilities. Here’s how it works:
# chapter_01/function_calling_deep_dive.py
from openai import OpenAI
import json
from typing import Any
client = OpenAI()
def demonstrate_function_calling():
"""
Demonstrates the complete function calling flow.
"""
# Step 1: Define tools with JSON Schema
tools = [
{
"type": "function",
"function": {
"name": "search_products",
"description": "Search for products in the catalog",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query"
},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "books", "home"],
"description": "Product category filter"
},
"max_price": {
"type": "number",
"description": "Maximum price filter"
},
"in_stock": {
"type": "boolean",
"description": "Only show in-stock items"
}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "add_to_cart",
"description": "Add a product to the shopping cart",
"parameters": {
"type": "object",
"properties": {
"product_id": {
"type": "string",
"description": "The product ID"
},
"quantity": {
"type": "integer",
"minimum": 1,
"default": 1,
"description": "Quantity to add"
}
},
"required": ["product_id"]
}
}
}
]
# Step 2: Send request to LLM
messages = [
{
"role": "system",
"content": "You are a helpful shopping assistant."
},
{
"role": "user",
"content": "Find me wireless headphones under $100 and add the best one to cart"
}
]
print("=" * 60)
print("STEP 1: Initial LLM Request")
print("=" * 60)
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto" # or "required" to force tool use
)
message = response.choices[0].message
print(f"LLM Response Type: {'Tool Call' if message.tool_calls else 'Text'}")
if message.tool_calls:
print(f"Number of Tool Calls: {len(message.tool_calls)}")
for i, tool_call in enumerate(message.tool_calls):
print(f"\nTool Call {i + 1}:")
print(f" ID: {tool_call.id}")
print(f" Function: {tool_call.function.name}")
print(f" Arguments: {tool_call.function.arguments}")
# Step 3: Execute tools and collect results
print("\n" + "=" * 60)
print("STEP 2: Execute Tools")
print("=" * 60)
# Add assistant message with tool calls
messages.append({
"role": "assistant",
"content": message.content,
"tool_calls": [
{
"id": tc.id,
"type": "function",
"function": {
"name": tc.function.name,
"arguments": tc.function.arguments
}
}
for tc in message.tool_calls
] if message.tool_calls else None
})
# Execute each tool call
tool_results = []
for tool_call in message.tool_calls or []:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
# Simulate tool execution
if func_name == "search_products":
result = {
"products": [
{
"id": "HP-001",
"name": "Sony WH-1000XM4",
"price": 89.99,
"rating": 4.8,
"in_stock": True
},
{
"id": "HP-002",
"name": "Bose QuietComfort 45",
"price": 99.99,
"rating": 4.7,
"in_stock": True
}
]
}
elif func_name == "add_to_cart":
result = {
"success": True,
"cart_id": "CART-12345",
"message": f"Added {func_args.get('quantity', 1)} x {func_args['product_id']} to cart"
}
else:
result = {"error": f"Unknown function: {func_name}"}
print(f"\nExecuting: {func_name}")
print(f"Arguments: {json.dumps(func_args, indent=2)}")
print(f"Result: {json.dumps(result, indent=2)}")
# Add tool result to messages
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
tool_results.append(result)
# Step 4: Get final response
print("\n" + "=" * 60)
print("STEP 3: Final LLM Response")
print("=" * 60)
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
final_message = final_response.choices[0].message
if final_message.tool_calls:
print("LLM wants to make more tool calls...")
# In a real agent, we'd loop back
else:
print(f"Final Response:\n{final_message.content}")
return final_message.content
def parallel_function_calling():
"""
Demonstrates parallel function calling where the LLM
requests multiple tools in a single response.
"""
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
},
{
"type": "function",
"function": {
"name": "get_time",
"description": "Get current time for a timezone",
"parameters": {
"type": "object",
"properties": {
"timezone": {"type": "string"}
},
"required": ["timezone"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "What's the weather and time in Tokyo, London, and New York?"}
],
tools=tools,
parallel_tool_calls=True # Enable parallel execution
)
message = response.choices[0].message
print(f"Number of parallel tool calls: {len(message.tool_calls or [])}")
for tc in message.tool_calls or []:
print(f"- {tc.function.name}({tc.function.arguments})")
return message.tool_calls
if __name__ == "__main__":
print("=== FUNCTION CALLING DEMONSTRATION ===\n")
demonstrate_function_calling()
print("\n\n=== PARALLEL FUNCTION CALLING ===\n")
parallel_function_calling()
1.6 Agent vs. Chain vs. Workflow
Understanding when to use each pattern:
┌─────────────────────────────────────────────────────────────────────┐
│ CHOOSING THE RIGHT PATTERN │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ CHAIN (Sequential) │
│ ├── Fixed sequence of steps │
│ ├── Predictable execution path │
│ ├── Example: Summarize → Translate → Format │
│ └── Use when: Steps are known ahead of time │
│ │
│ [Input] → [Step 1] → [Step 2] → [Step 3] → [Output] │
│ │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ WORKFLOW (DAG) │
│ ├── Parallel and conditional paths │
│ ├── Still deterministic structure │
│ ├── Example: Process multiple docs, merge results │
│ └── Use when: Multiple paths needed but structure known │
│ │
│ ┌→ [B1] ─┐ │
│ [Input] → [A] ┼→ [B2] ─┼→ [C] → [Output] │
│ └→ [B3] ─┘ │
│ │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ AGENT (Dynamic) │
│ ├── LLM decides what to do next │
│ ├── Non-deterministic execution │
│ ├── Example: Research a topic autonomously │
│ └── Use when: Can't predict steps needed │
│ │
│ ┌──────────────┐ │
│ │ │ │
│ ▼ │ │
│ [Input] → [LLM] → [Tool] ─┘ │
│ │ │
│ ▼ │
│ [Output] │
│ │
└─────────────────────────────────────────────────────────────────────┘
# chapter_01/patterns_comparison.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
llm = ChatOpenAI(model="gpt-4o")
# =============================================================================
# PATTERN 1: CHAIN (Simple Sequential)
# =============================================================================
def build_chain():
"""A simple chain for document processing."""
summarize_prompt = ChatPromptTemplate.from_template(
"Summarize this text in 2-3 sentences:\n\n{text}"
)
translate_prompt = ChatPromptTemplate.from_template(
"Translate this to Spanish:\n\n{summary}"
)
format_prompt = ChatPromptTemplate.from_template(
"Format this as a professional email:\n\n{translation}"
)
# Chain: summarize → translate → format
chain = (
summarize_prompt
| llm
| StrOutputParser()
| (lambda summary: {"summary": summary})
| translate_prompt
| llm
| StrOutputParser()
| (lambda translation: {"translation": translation})
| format_prompt
| llm
| StrOutputParser()
)
return chain
# =============================================================================
# PATTERN 2: WORKFLOW (DAG with conditions)
# =============================================================================
class WorkflowState(TypedDict):
text: str
category: str
summaries: Annotated[list[str], operator.add]
final_output: str
def build_workflow():
"""A workflow with branching logic."""
def categorize(state: WorkflowState) -> WorkflowState:
"""Categorize the input text."""
prompt = ChatPromptTemplate.from_template(
"Categorize this text as 'technical', 'business', or 'general':\n{text}\n\nCategory:"
)
chain = prompt | llm | StrOutputParser()
category = chain.invoke({"text": state["text"]}).strip().lower()
return {"category": category}
def summarize_technical(state: WorkflowState) -> WorkflowState:
"""Technical summary with code focus."""
prompt = ChatPromptTemplate.from_template(
"Create a technical summary focusing on implementation details:\n{text}"
)
chain = prompt | llm | StrOutputParser()
summary = chain.invoke({"text": state["text"]})
return {"summaries": [f"[TECHNICAL]\n{summary}"]}
def summarize_business(state: WorkflowState) -> WorkflowState:
"""Business summary with ROI focus."""
prompt = ChatPromptTemplate.from_template(
"Create a business summary focusing on value and ROI:\n{text}"
)
chain = prompt | llm | StrOutputParser()
summary = chain.invoke({"text": state["text"]})
return {"summaries": [f"[BUSINESS]\n{summary}"]}
def summarize_general(state: WorkflowState) -> WorkflowState:
"""General summary for broad audience."""
prompt = ChatPromptTemplate.from_template(
"Create a general summary accessible to any reader:\n{text}"
)
chain = prompt | llm | StrOutputParser()
summary = chain.invoke({"text": state["text"]})
return {"summaries": [f"[GENERAL]\n{summary}"]}
def route_by_category(state: WorkflowState) -> str:
"""Route to appropriate summarizer."""
return f"summarize_{state['category']}"
def combine_outputs(state: WorkflowState) -> WorkflowState:
"""Combine all summaries."""
combined = "\n\n".join(state["summaries"])
return {"final_output": combined}
# Build the graph
workflow = StateGraph(WorkflowState)
workflow.add_node("categorize", categorize)
workflow.add_node("summarize_technical", summarize_technical)
workflow.add_node("summarize_business", summarize_business)
workflow.add_node("summarize_general", summarize_general)
workflow.add_node("combine", combine_outputs)
workflow.set_entry_point("categorize")
workflow.add_conditional_edges(
"categorize",
route_by_category,
{
"summarize_technical": "summarize_technical",
"summarize_business": "summarize_business",
"summarize_general": "summarize_general"
}
)
workflow.add_edge("summarize_technical", "combine")
workflow.add_edge("summarize_business", "combine")
workflow.add_edge("summarize_general", "combine")
workflow.add_edge("combine", END)
return workflow.compile()
# =============================================================================
# PATTERN 3: AGENT (Dynamic, LLM-driven)
# =============================================================================
class AgentState(TypedDict):
messages: list
current_step: str
iterations: int
final_answer: str
def build_agent():
"""An agent that dynamically decides what to do."""
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage
# Define available tools
tools = [
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the web for information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "analyze_data",
"description": "Analyze numerical data",
"parameters": {
"type": "object",
"properties": {
"data": {"type": "string"},
"analysis_type": {"type": "string"}
},
"required": ["data"]
}
}
},
{
"type": "function",
"function": {
"name": "write_report",
"description": "Write a formatted report",
"parameters": {
"type": "object",
"properties": {
"content": {"type": "string"},
"format": {"type": "string"}
},
"required": ["content"]
}
}
}
]
llm_with_tools = llm.bind_tools(tools)
def call_llm(state: AgentState) -> AgentState:
"""Let the LLM decide what to do."""
response = llm_with_tools.invoke(state["messages"])
return {
"messages": state["messages"] + [response],
"current_step": "execute_tools" if response.tool_calls else "finish",
"iterations": state["iterations"] + 1
}
def execute_tools(state: AgentState) -> AgentState:
"""Execute the tools the LLM requested."""
last_message = state["messages"][-1]
tool_results = []
for tool_call in last_message.tool_calls:
# Simulate tool execution
result = f"Result for {tool_call['name']}: [simulated data]"
tool_results.append(
ToolMessage(
content=result,
tool_call_id=tool_call["id"]
)
)
return {
"messages": state["messages"] + tool_results,
"current_step": "call_llm"
}
def should_continue(state: AgentState) -> str:
"""Determine next step."""
if state["iterations"] >= 10:
return "finish"
return state["current_step"]
def finish(state: AgentState) -> AgentState:
"""Extract final answer."""
last_ai_message = None
for msg in reversed(state["messages"]):
if isinstance(msg, AIMessage) and not msg.tool_calls:
last_ai_message = msg
break
return {
"final_answer": last_ai_message.content if last_ai_message else "No answer"
}
# Build agent graph
agent = StateGraph(AgentState)
agent.add_node("call_llm", call_llm)
agent.add_node("execute_tools", execute_tools)
agent.add_node("finish", finish)
agent.set_entry_point("call_llm")
agent.add_conditional_edges(
"call_llm",
should_continue,
{
"execute_tools": "execute_tools",
"finish": "finish",
"call_llm": "call_llm"
}
)
agent.add_edge("execute_tools", "call_llm")
agent.add_edge("finish", END)
return agent.compile()
1.7 Summary
In this chapter, we’ve established the foundations of AI agents:
- Definition: AI agents are systems that use LLMs to reason, plan, and take actions
- The Agent Loop: Input → Reason → Plan → Act → Observe → Repeat
- Evolution: From chatbots to autonomous, tool-using agents
- Capabilities: Reasoning, planning, memory, tools, learning, communication
- Technology Stack: LangGraph, FastAPI, Vue.js, Docker
- Function Calling: The mechanism that enables agents to use tools
- Patterns: When to use chains, workflows, or agents
In the next chapter, we’ll dive deeper into agent architectures and design patterns that will guide our implementations throughout the book.
Chapter 2: Agent Architectures and Design Patterns
2.1 Fundamental Agent Architectures
The ReAct Architecture
ReAct (Reasoning and Acting) is one of the most influential agent architectures. It interleaves reasoning traces with action execution:
┌─────────────────────────────────────────────────────────────────────┐
│ ReAct ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Question: What is the elevation of the capital of France? │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Thought 1: I need to find the capital of France first. │ │
│ │ Action 1: search["capital of France"] │ │
│ │ Observation 1: Paris is the capital of France. │ │
│ ├────────────────────────────────────────────────────────────┤ │
│ │ Thought 2: Now I need to find the elevation of Paris. │ │
│ │ Action 2: search["elevation of Paris"] │ │
│ │ Observation 2: Paris has an elevation of 35 meters. │ │
│ ├────────────────────────────────────────────────────────────┤ │
│ │ Thought 3: I have all the information needed. │ │
│ │ Action 3: finish["The elevation of Paris is 35 meters"] │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
# chapter_02/react_agent.py
from openai import OpenAI
from typing import Callable, Any
import re
import json
class ReActAgent:
"""
Implementation of the ReAct (Reasoning and Acting) agent architecture.
Paper: "ReAct: Synergizing Reasoning and Acting in Language Models"
https://arxiv.org/abs/2210.03629
"""
def __init__(self, model: str = "gpt-4o"):
self.client = OpenAI()
self.model = model
self.tools: dict[str, Callable] = {}
self.tool_descriptions: dict[str, str] = {}
self.max_iterations = 10
def register_tool(self, name: str, func: Callable, description: str):
"""Register a tool for the agent to use."""
self.tools[name] = func
self.tool_descriptions[name] = description
def _build_system_prompt(self) -> str:
"""Build the ReAct system prompt with available tools."""
tools_text = "\n".join([
f"- {name}: {desc}"
for name, desc in self.tool_descriptions.items()
])
return f"""You are a helpful assistant that follows the ReAct pattern.
Available tools:
{tools_text}
For each step, you must use this exact format:
Thought: [Your reasoning about what to do next]
Action: [tool_name][arguments as JSON]
Or if you have the final answer:
Thought: [Your reasoning]
Final Answer: [Your complete response to the user]
Rules:
1. Always start with a Thought
2. Use only one Action per step
3. Wait for Observation before the next Thought
4. When you have enough information, provide Final Answer
5. Be concise but thorough in your reasoning"""
def _parse_response(self, text: str) -> dict:
"""Parse the LLM response to extract thought, action, and final answer."""
result = {
"thought": None,
"action": None,
"action_input": None,
"final_answer": None
}
# Extract Thought
thought_match = re.search(r"Thought:\s*(.+?)(?=Action:|Final Answer:|$)", text, re.DOTALL)
if thought_match:
result["thought"] = thought_match.group(1).strip()
# Check for Final Answer
final_match = re.search(r"Final Answer:\s*(.+?)$", text, re.DOTALL)
if final_match:
result["final_answer"] = final_match.group(1).strip()
return result
# Extract Action
action_match = re.search(r"Action:\s*(\w+)\[(.+?)\]", text, re.DOTALL)
if action_match:
result["action"] = action_match.group(1)
try:
result["action_input"] = json.loads(action_match.group(2))
except json.JSONDecodeError:
# Try as simple string
result["action_input"] = action_match.group(2).strip('"\'')
return result
def run(self, question: str) -> str:
"""Execute the ReAct loop."""
messages = [
{"role": "system", "content": self._build_system_prompt()},
{"role": "user", "content": f"Question: {question}"}
]
trajectory = []
for i in range(self.max_iterations):
print(f"\n{'='*60}")
print(f"Iteration {i + 1}")
print('='*60)
# Get LLM response
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
temperature=0.1
)
response_text = response.choices[0].message.content
parsed = self._parse_response(response_text)
print(f"\nThought: {parsed['thought']}")
# Check for final answer
if parsed["final_answer"]:
print(f"\nFinal Answer: {parsed['final_answer']}")
trajectory.append({
"thought": parsed["thought"],
"final_answer": parsed["final_answer"]
})
return parsed["final_answer"]
# Execute action
if parsed["action"]:
print(f"Action: {parsed['action']}[{parsed['action_input']}]")
if parsed["action"] in self.tools:
try:
if isinstance(parsed["action_input"], dict):
observation = self.tools[parsed["action"]](**parsed["action_input"])
else:
observation = self.tools[parsed["action"]](parsed["action_input"])
except Exception as e:
observation = f"Error: {str(e)}"
else:
observation = f"Error: Unknown tool '{parsed['action']}'"
print(f"Observation: {observation}")
trajectory.append({
"thought": parsed["thought"],
"action": parsed["action"],
"action_input": parsed["action_input"],
"observation": observation
})
# Add to messages
messages.append({
"role": "assistant",
"content": response_text
})
messages.append({
"role": "user",
"content": f"Observation: {observation}"
})
else:
print("No action found in response")
messages.append({
"role": "assistant",
"content": response_text
})
messages.append({
"role": "user",
"content": "Please provide either an Action or a Final Answer."
})
return "Max iterations reached without finding an answer."
# Example Usage
def main():
agent = ReActAgent()
# Register tools
def search(query: str) -> str:
"""Simulated web search."""
knowledge_base = {
"capital of france": "Paris is the capital and largest city of France.",
"elevation of paris": "Paris has an average elevation of 35 meters (115 ft) above sea level.",
"population of paris": "The population of Paris is approximately 2.1 million in the city proper.",
"eiffel tower height": "The Eiffel Tower is 330 meters (1,083 ft) tall."
}
query_lower = query.lower()
for key, value in knowledge_base.items():
if key in query_lower:
return value
return f"No results found for: {query}"
def calculate(expression: str) -> str:
"""Calculate a mathematical expression."""
try:
result = eval(expression)
return str(result)
except Exception as e:
return f"Calculation error: {e}"
agent.register_tool(
"search",
search,
"Search for information. Input should be a search query string."
)
agent.register_tool(
"calculate",
calculate,
"Calculate mathematical expressions. Input should be a valid Python expression."
)
# Run agent
questions = [
"What is the elevation of the capital of France?",
"If the Eiffel Tower is 330 meters tall and Paris's elevation is 35 meters, what is the total height above sea level of the top of the Eiffel Tower?"
]
for question in questions:
print(f"\n{'#'*60}")
print(f"Question: {question}")
print('#'*60)
answer = agent.run(question)
print(f"\n>>> Final Answer: {answer}")
if __name__ == "__main__":
main()
Plan-and-Execute Architecture
This architecture separates planning from execution:
┌─────────────────────────────────────────────────────────────────────┐
│ PLAN-AND-EXECUTE ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ PLANNER │ ← Creates high-level plan │
│ │ (LLM) │ │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ PLAN │ │
│ │ 1. Search for company financial reports │ │
│ │ 2. Extract key metrics from reports │ │
│ │ 3. Analyze trends over the past 5 years │ │
│ │ 4. Compare with industry benchmarks │ │
│ │ 5. Generate summary report │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ EXECUTOR │ ← Executes each step │
│ │ (LLM) │ │
│ └──────┬───────┘ │
│ │ │
│ ├──────▶ Step 1: Execute with tools │
│ │ │ │
│ │ ▼ │
│ │ ┌──────────────┐ │
│ │ │ REPLANNER │ ← Adjusts plan if needed │
│ │ └──────────────┘ │
│ │ │ │
│ ◄────────────────────┘ │
│ │ │
│ ├──────▶ Step 2: Execute... │
│ │ │
│ ▼ │
│ [Continue until all steps complete] │
│ │
└─────────────────────────────────────────────────────────────────────┘
# chapter_02/plan_execute_agent.py
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional, Callable
import json
class Step(BaseModel):
"""A single step in the plan."""
id: int
description: str
tool: Optional[str] = None
tool_input: Optional[dict] = None
status: str = "pending" # pending, running, completed, failed
result: Optional[str] = None
class Plan(BaseModel):
"""The complete execution plan."""
goal: str
steps: list[Step]
current_step: int = 0
completed: bool = False
class PlanAndExecuteAgent:
"""
An agent that separates planning from execution.
Better for complex, multi-step tasks.
"""
def __init__(self, model: str = "gpt-4o"):
self.client = OpenAI()
self.model = model
self.tools: dict[str, Callable] = {}
self.tool_schemas: list[dict] = []
def register_tool(self, name: str, func: Callable, description: str,
parameters: dict):
"""Register a tool."""
self.tools[name] = func
self.tool_schemas.append({
"name": name,
"description": description,
"parameters": parameters
})
def create_plan(self, goal: str) -> Plan:
"""Create an initial plan for the goal."""
tools_desc = "\n".join([
f"- {t['name']}: {t['description']}"
for t in self.tool_schemas
])
prompt = f"""Create a detailed step-by-step plan to achieve this goal:
Goal: {goal}
Available tools:
{tools_desc}
Respond with a JSON object containing:
{{
"steps": [
{{
"id": 1,
"description": "Step description",
"tool": "tool_name or null",
"tool_input": {{"param": "value"}} or null
}}
]
}}
Rules:
1. Break down complex tasks into simple steps
2. Each step should be achievable with one tool call (or no tool)
3. Order steps logically (dependencies first)
4. Be specific about what each step accomplishes
5. Include a final step to synthesize/present results"""
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"}
)
plan_data = json.loads(response.choices[0].message.content)
steps = [Step(id=i+1, **step) for i, step in enumerate(plan_data["steps"])]
return Plan(goal=goal, steps=steps)
def execute_step(self, plan: Plan, step: Step) -> str:
"""Execute a single step of the plan."""
step.status = "running"
if step.tool and step.tool in self.tools:
try:
if step.tool_input:
result = self.tools[step.tool](**step.tool_input)
else:
result = self.tools[step.tool]()
step.result = str(result)
step.status = "completed"
except Exception as e:
step.result = f"Error: {str(e)}"
step.status = "failed"
else:
# LLM-only step
context = self._build_context(plan)
prompt = f"""Execute this step:
{step.description}
Context from previous steps:
{context}
Provide the result of this step."""
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}]
)
step.result = response.choices[0].message.content
step.status = "completed"
return step.result
def should_replan(self, plan: Plan, step: Step) -> bool:
"""Check if we need to adjust the plan after a step."""
if step.status == "failed":
return True
# Ask LLM if replanning is needed
prompt = f"""Analyze if the plan needs adjustment:
Original Goal: {plan.goal}
Completed Step: {step.description}
Step Result: {step.result}
Remaining Steps:
{self._format_remaining_steps(plan)}
Should the plan be adjusted? Respond with JSON:
{{"replan": true/false, "reason": "explanation if true"}}"""
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"}
)
result = json.loads(response.choices[0].message.content)
return result.get("replan", False)
def replan(self, plan: Plan, reason: str) -> Plan:
"""Create a new plan based on current progress."""
context = self._build_context(plan)
prompt = f"""The plan needs adjustment.
Original Goal: {plan.goal}
Reason for Replanning: {reason}
Progress So Far:
{context}
Create a new plan to complete the goal from this point.
Respond with JSON containing new steps."""
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"}
)
plan_data = json.loads(response.choices[0].message.content)
# Keep completed steps, add new ones
completed_steps = [s for s in plan.steps if s.status == "completed"]
new_steps = [
Step(id=len(completed_steps) + i + 1, **step)
for i, step in enumerate(plan_data.get("steps", []))
]
return Plan(
goal=plan.goal,
steps=completed_steps + new_steps,
current_step=len(completed_steps)
)
def run(self, goal: str, verbose: bool = True) -> str:
"""Execute the full plan-and-execute loop."""
if verbose:
print(f"\n{'='*60}")
print(f"Goal: {goal}")
print('='*60)
# Create initial plan
plan = self.create_plan(goal)
if verbose:
print("\n📋 Initial Plan:")
for step in plan.steps:
print(f" {step.id}. {step.description}")
# Execute steps
while plan.current_step < len(plan.steps):
step = plan.steps[plan.current_step]
if verbose:
print(f"\n▶️ Executing Step {step.id}: {step.description}")
result = self.execute_step(plan, step)
if verbose:
print(f" Result: {result[:200]}...")
# Check if replanning needed
if self.should_replan(plan, step):
if verbose:
print("\n🔄 Replanning needed...")
plan = self.replan(plan, f"Step {step.id} result requires plan adjustment")
if verbose:
print(" New plan created")
else:
plan.current_step += 1
plan.completed = True
# Generate final summary
final_result = self._generate_final_result(plan)
if verbose:
print(f"\n✅ Plan Completed!")
print(f"\n{'='*60}")
print("Final Result:")
print('='*60)
print(final_result)
return final_result
def _build_context(self, plan: Plan) -> str:
"""Build context from completed steps."""
completed = [s for s in plan.steps if s.status == "completed"]
if not completed:
return "No steps completed yet."
return "\n".join([
f"Step {s.id}: {s.description}\nResult: {s.result}"
for s in completed
])
def _format_remaining_steps(self, plan: Plan) -> str:
"""Format remaining steps."""
remaining = [s for s in plan.steps if s.status == "pending"]
if not remaining:
return "No remaining steps."
return "\n".join([f"{s.id}. {s.description}" for s in remaining])
def _generate_final_result(self, plan: Plan) -> str:
"""Generate a final result from all step results."""
context = self._build_context(plan)
prompt = f"""Synthesize the results of all steps into a final answer.
Goal: {plan.goal}
Step Results:
{context}
Provide a comprehensive final answer that addresses the original goal."""
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
# Example Usage
def main():
agent = PlanAndExecuteAgent()
# Register tools
def search_web(query: str) -> str:
return f"Search results for '{query}': [Simulated web results]"
def read_file(path: str) -> str:
return f"Contents of {path}: [Simulated file contents]"
def write_file(path: str, content: str) -> str:
return f"Successfully wrote {len(content)} characters to {path}"
def analyze_data(data: str, analysis_type: str) -> str:
return f"Analysis ({analysis_type}) of data: [Simulated analysis results]"
agent.register_tool(
"search_web", search_web,
"Search the web for information",
{"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}
)
agent.register_tool(
"read_file", read_file,
"Read contents of a file",
{"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}
)
agent.register_tool(
"write_file", write_file,
"Write content to a file",
{"type": "object", "properties": {
"path": {"type": "string"},
"content": {"type": "string"}
}, "required": ["path", "content"]}
)
agent.register_tool(
"analyze_data", analyze_data,
"Analyze data with specified analysis type",
{"type": "object", "properties": {
"data": {"type": "string"},
"analysis_type": {"type": "string"}
}, "required": ["data", "analysis_type"]}
)
# Run agent
result = agent.run(
"Research the current state of quantum computing and write a brief summary report"
)
if __name__ == "__main__":
main()
2.2 Multi-Agent Architectures
Supervisor Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ SUPERVISOR ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ SUPERVISOR │ │
│ │ (LLM) │ │
│ └──────┬───────┘ │
│ │ │
│ ┌────────────┼────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ AGENT 1 │ │ AGENT 2 │ │ AGENT 3 │ │
│ │ Researcher│ │ Writer │ │ Critic │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │
│ Flow: │
│ 1. User sends request to Supervisor │
│ 2. Supervisor decides which agent(s) to invoke │
│ 3. Agent performs task, returns result │
│ 4. Supervisor routes to next agent or returns to user │
│ │
└─────────────────────────────────────────────────────────────────────┘
# chapter_02/supervisor_agent.py
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from typing import TypedDict, Literal, Annotated
import operator
from pydantic import BaseModel
# State definition
class SupervisorState(TypedDict):
messages: Annotated[list, operator.add]
next: str
final_response: str
# Agent definitions
class AgentConfig(BaseModel):
name: str
system_prompt: str
description: str
AGENTS = {
"researcher": AgentConfig(
name="researcher",
system_prompt="""You are a research agent. Your job is to:
- Search for and gather relevant information
- Verify facts from multiple sources
- Summarize findings clearly
Always be thorough and cite your sources.""",
description="Researches topics and gathers information"
),
"writer": AgentConfig(
name="writer",
system_prompt="""You are a writing agent. Your job is to:
- Take research and information provided
- Create well-structured, engaging content
- Adapt tone and style to the audience
Always write clearly and professionally.""",
description="Creates written content from information"
),
"critic": AgentConfig(
name="critic",
system_prompt="""You are a critical review agent. Your job is to:
- Review content for accuracy, clarity, and quality
- Identify issues and suggest improvements
- Ensure the content meets requirements
Be constructive but thorough in your criticism.""",
description="Reviews and critiques content for quality"
)
}
def create_supervisor_graph():
"""Create a supervisor-based multi-agent system."""
llm = ChatOpenAI(model="gpt-4o")
# Supervisor node
def supervisor(state: SupervisorState) -> SupervisorState:
"""The supervisor decides which agent to invoke next."""
agents_desc = "\n".join([
f"- {name}: {config.description}"
for name, config in AGENTS.items()
])
system_prompt = f"""You are a supervisor managing a team of agents.
Your role is to route tasks to the appropriate agent and synthesize results.
Available agents:
{agents_desc}
Based on the conversation, decide:
1. Which agent should handle the next step (respond with agent name)
2. If the task is complete (respond with "FINISH")
Respond with just the agent name or "FINISH"."""
response = llm.invoke([
SystemMessage(content=system_prompt),
*state["messages"]
])
next_agent = response.content.strip().lower()
if next_agent == "finish" or next_agent not in AGENTS:
return {"next": "finish"}
return {"next": next_agent}
# Create agent nodes
def create_agent_node(agent_name: str):
config = AGENTS[agent_name]
def agent_node(state: SupervisorState) -> SupervisorState:
response = llm.invoke([
SystemMessage(content=config.system_prompt),
*state["messages"],
HumanMessage(content=f"You are the {agent_name}. Complete your part of the task.")
])
return {
"messages": [AIMessage(
content=f"[{agent_name.upper()}]: {response.content}"
)]
}
return agent_node
# Final synthesis node
def synthesize(state: SupervisorState) -> SupervisorState:
"""Synthesize all agent outputs into final response."""
response = llm.invoke([
SystemMessage(content="""Synthesize all the agent contributions into
a final, cohesive response for the user. Be comprehensive but concise."""),
*state["messages"]
])
return {"final_response": response.content}
# Router function
def route(state: SupervisorState) -> str:
return state.get("next", "supervisor")
# Build graph
graph = StateGraph(SupervisorState)
# Add nodes
graph.add_node("supervisor", supervisor)
graph.add_node("synthesize", synthesize)
for agent_name in AGENTS:
graph.add_node(agent_name, create_agent_node(agent_name))
# Set entry point
graph.set_entry_point("supervisor")
# Add edges from supervisor
graph.add_conditional_edges(
"supervisor",
route,
{
**{name: name for name in AGENTS},
"finish": "synthesize"
}
)
# All agents go back to supervisor
for agent_name in AGENTS:
graph.add_edge(agent_name, "supervisor")
# Synthesize ends the graph
graph.add_edge("synthesize", END)
return graph.compile()
def main():
graph = create_supervisor_graph()
# Run the multi-agent system
result = graph.invoke({
"messages": [
HumanMessage(content="""Write a short blog post about the benefits
of AI agents in software development. Make sure it's well-researched and
professionally written.""")
],
"next": "supervisor",
"final_response": ""
})
print("="*60)
print("CONVERSATION TRACE:")
print("="*60)
for msg in result["messages"]:
if hasattr(msg, "content"):
print(f"\n{msg.content[:500]}...")
print("\n" + "="*60)
print("FINAL RESPONSE:")
print("="*60)
print(result["final_response"])
if __name__ == "__main__":
main()
Hierarchical Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ HIERARCHICAL ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ │
│ │ MANAGER │ │
│ │ (LLM) │ │
│ └──────┬──────┘ │
│ │ │
│ ┌─────────────────┼─────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ TEAM LEAD 1 │ │ TEAM LEAD 2 │ │ TEAM LEAD 3 │ │
│ │ Research │ │ Dev/Impl │ │ QA │ │
│ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │ │
│ ┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐ │
│ │ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ ▼ │
│ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │
│ │Worker1│ │Worker2│ │Worker3│ │Worker4│ │Worker5│ │Worker6│ │
│ └───────┘ └───────┘ └───────┘ └───────┘ └───────┘ └───────┘ │
│ │
│ Benefits: │
│ - Scales to complex tasks │
│ - Clear responsibility chains │
│ - Parallel execution at each level │
│ │
└─────────────────────────────────────────────────────────────────────┘
Collaborative/Debate Architecture
…
========================================
This is a comprehensive blueprint and content preview for the technical book “Building AI Agents In Action.”
Since writing a complete 400+ page technical manuscript in a single response is impossible, I have structured this as a Book Proposal & Technical Guide. It includes the Table of Contents, the Architectural Philosophy, and a complete, code-heavy “Deep Dive” into the Core Chapter, effectively giving you the skeleton and muscle to build this system.
Book Title: Building AI Agents In Action
Subtitle: Architectures, Algorithms, and Source Code Using LangGraph, FastAPI, Vue, and Docker.
Focus: Building Autonomous, Tool-Using Agents with Secure Sandboxing.
📖 Table of Contents
Part 1: The Agentic Foundation
- Chapter 1: Beyond the Chatbot. Understanding the Agent Loop (Thought → \rightarrow → Action → \rightarrow → Observation → \rightarrow → Refinement).
- Chapter 2: The Stack. Setting up Python 3.11+, LangGraph, Docker, and Vue.js.
- Chapter 3: Graph Theory for AI. Why Directed Cyclic Graphs (DCGs) are better than Chains for complex reasoning.
Part 2: Building the “Brain” (LangGraph & Python)
- Chapter 4: The State Machine. Defining
AgentStateand managing conversation history. - Chapter 5: The ReAct Pattern. Implementing Reasoning + Acting loops manually in LangGraph.
- Chapter 6: Tool Binding. Connecting LLMs (OpenAI/Anthropic) to Python functions.
Part 3: The “Hands” (Capabilities & Tools)
- Chapter 7: File Operations. Reading, writing, and patching code safely.
- Chapter 8: The Shell. Giving the Agent terminal access (and why this is dangerous).
- Chapter 9: The Eyes. Computer Vision and Browser Automation (using Playwright).
Part 4: The Infrastructure (Docker & Sandboxing)
- Chapter 10: The Sandbox. Designing an ephemeral Docker container for code execution.
- Chapter 11: Security. Preventing prompt injection and escaping the container.
Part 5: The Full Stack Application
- Chapter 12: The API. Building a FastAPI backend with Streaming Responses (SSE).
- Chapter 13: The UI. A Vue.js 3 interface to visualize the Agent’s “Thought Process” vs. “Final Answer.”
- Chapter 14: Deployment. Docker Compose orchestration for the Brain, the API, and the Frontend.
🔬 Sample Content: The “Deep Dive”
Below is a condensed version of Chapter 5, 8, and 12 combined, demonstrating how to build the Core Agent, give it Shell access, and serve it.
1. The Architecture
We are building a Stateful Graph Agent.
- Input: User request.
- Node 1 (Reasoning): LLM decides what to do.
- Edge (Conditional): If tool call is needed → \rightarrow → go to Tools. If done → \rightarrow → End.
- Node 2 (Tools): Execute Shell/File commands in a Docker Sandbox.
- Loop: Return output to Node 1.
2. The Backend: LangGraph Agent (agent.py)
This code sets up the graph and the state.
from typing import TypedDict, Annotated, List, Union
from langgraph.graph import StateGraph, END
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_openai import ChatOpenAI
import operator
# --- 1. Define the State ---
# The state is what is passed between nodes. It holds the conversation history.
class AgentState(TypedDict):
messages: Annotated[List[BaseMessage], operator.add]
execution_context: str # e.g., "host" or "sandbox_id"
# --- 2. Define Tools (The "Hands") ---
# In a real app, these interact with a Docker container via an API or subprocess
def execute_shell(command: str):
"""Executes a shell command. DANGEROUS: Use only in Sandbox."""
print(f"Executing: {command}")
# Mocking execution for safety in this snippet
if "ls" in command:
return "file1.txt\nfile2.py\nmain.js"
return f"Executed: {command}"
def read_file(path: str):
return f"Content of {path}"
tools = [execute_shell, read_file]
# --- 3. Setup the Model ---
# We bind tools so the LLM knows it can call them
llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
llm_with_tools = llm.bind_tools(tools)
# --- 4. Define Nodes ---
def reasoner_node(state: AgentState):
"""The Brain: Decides next step."""
messages = state['messages']
response = llm_with_tools.invoke(messages)
return {"messages": [response]}
def tool_node(state: AgentState):
"""The Action: Executes the tool calls requested by the LLM."""
last_message = state['messages'][-1]
# Simple tool parser (LangGraph has pre-built ToolNode, but we build manually for learning)
tool_calls = last_message.tool_calls
results = []
for call in tool_calls:
tool_name = call['name']
args = call['args']
if tool_name == "execute_shell":
res = execute_shell(**args)
elif tool_name == "read_file":
res = read_file(**args)
else:
res = "Error: Tool not found."
# Create a ToolMessage to feed back to the LLM
from langchain_core.messages import ToolMessage
results.append(ToolMessage(tool_call_id=call['id'], content=str(res)))
return {"messages": results}
# --- 5. Define Conditional Logic ---
def should_continue(state: AgentState):
"""Decides if we loop back or stop."""
last_message = state['messages'][-1]
if last_message.tool_calls:
return "tools"
return END
# --- 6. Build the Graph ---
workflow = StateGraph(AgentState)
workflow.add_node("reasoner", reasoner_node)
workflow.add_node("tools", tool_node)
workflow.set_entry_point("reasoner")
workflow.add_conditional_edges(
"reasoner",
should_continue,
{
"tools": "tools",
END: END
}
)
workflow.add_edge("tools", "reasoner") # Loop back to reasoner after acting
app = workflow.compile()
3. The Docker Sandbox (sandbox.py)
Chapter 10 Concept: Instead of running shell commands on your server (unsafe), we spin up a throwaway Docker container.
import docker
import os
client = docker.from_env()
class Sandbox:
def __init__(self):
self.container = None
def start(self):
# Starts a lightweight Alpine Linux container
self.container = client.containers.run(
"alpine:latest",
command="tail -f /dev/null", # Keep it alive
detach=True,
remove=True # Auto-delete on stop
)
return self.container.id
def execute(self, command):
if not self.container:
raise Exception("Sandbox not started")
# Run command inside the container
exec_log = self.container.exec_run(
["/bin/sh", "-c", command],
workdir="/workspace"
)
return exec_log.output.decode("utf-8")
def stop(self):
if self.container:
self.container.stop()
4. The API Layer: FastAPI (server.py)
This handles the communication between the Vue frontend and the Python agent. We use Server-Sent Events (SSE) so the UI updates as the agent thinks.
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from agent import app as agent_app # Import the LangGraph app
from langchain_core.messages import HumanMessage
import json
import asyncio
api = FastAPI()
class Request(BaseModel):
query: str
async def event_generator(query: str):
inputs = {"messages": [HumanMessage(content=query)]}
# Stream the graph execution
async for event in agent_app.astream_events(inputs, version="v1"):
kind = event["event"]
# Detect LLM streaming tokens
if kind == "on_chat_model_stream":
content = event["data"]["chunk"].content
if content:
yield f"data: {json.dumps({'type': 'token', 'content': content})}\n\n"
# Detect Tool Execution
elif kind == "on_tool_start":
yield f"data: {json.dumps({'type': 'status', 'content': 'Running tool...'})}\n\n"
@api.post("/chat")
async def chat_endpoint(req: Request):
return StreamingResponse(event_generator(req.query), media_type="text/event-stream")
5. The Frontend: Vue.js 3 (ChatInterface.vue)
This component renders the streaming thoughts and the final markdown.
<template>
<div class="agent-ui">
<div class="chat-window">
<div v-for="(msg, index) in messages" :key="index" :class="msg.role">
<!-- Render Markdown output -->
<div v-html="renderMarkdown(msg.content)"></div>
<!-- Show status indicators for Agent Actions -->
<div v-if="msg.status" class="status-indicator">
⚙️ {{ msg.status }}
</div>
</div>
</div>
<div class="input-area">
<input v-model="userInput" @keyup.enter="sendMessage" placeholder="Ask the agent to check a file..." />
<button @click="sendMessage">Send</button>
</div>
</div>
</template>
<script setup>
import { ref } from 'vue';
import { marked } from 'marked';
const userInput = ref('');
const messages = ref([]);
const sendMessage = async () => {
const text = userInput.value;
userInput.value = '';
messages.value.push({ role: 'user', content: text });
// Create a placeholder for the agent response
const agentMsg = { role: 'agent', content: '', status: '' };
messages.value.push(agentMsg);
const response = await fetch('http://localhost:8000/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query: text })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n\n');
lines.forEach(line => {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.replace('data: ', ''));
if (data.type === 'token') {
agentMsg.content += data.content;
} else if (data.type === 'status') {
agentMsg.status = data.content;
}
}
});
}
agentMsg.status = ''; // Clear status when done
};
const renderMarkdown = (text) => marked.parse(text);
</script>
<style scoped>
.agent-ui { display: flex; flex-direction: column; height: 100vh; }
.chat-window { flex: 1; overflow-y: auto; padding: 20px; }
.agent { background: #f0f0f0; padding: 10px; border-radius: 8px; }
.user { background: #e0f7fa; padding: 10px; border-radius: 8px; align-self: flex-end; }
.status-indicator { font-size: 0.8em; color: #666; margin-top: 5px; }
</style>
🛠️ Advanced Topics Covered in the Book
Browser-Use (The “Vision” Chapter)
How to implement a tool that controls a browser.
- Tech: Playwright (Python) + Docker.
- Algorithm:
- Snapshot DOM.
- Send condensed DOM + Accessibility Tree to LLM.
- LLM returns coordinate
(x, y)or element ID to click. - LangGraph executes
page.click().
Deployment (The “DevOps” Chapter)
A docker-compose.yml to spin up the whole world:
version: '3.8'
services:
backend:
build: ./backend
volumes:
- /var/run/docker.sock:/var/run/docker.sock # Allow agent to spawn sibling containers
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
frontend:
build: ./frontend
ports:
- "80:80"
“Building AI Agents In Action” bridges the gap between simple python scripts and production-ready, sandboxed, UI-driven AI applications. It moves away from “magic” frameworks and teaches the raw graph architecture required for reliable agents.
========================================
I have created a full book manuscript canvas titled “Building AI Agents In Action” and populated it with a complete, production-oriented structure plus substantive authored content (Preface through Part I, and a detailed blueprint for Parts II–VI).
This is not an outline-only placeholder. It is a foundational draft suitable for iterative expansion into a 100k-word technical book, with:
- Clear architectural philosophy (agent control loops, determinism, failure-first design)
- Explicit system patterns (Planner / Executor / Verifier)
- LangGraph-first agent modeling
- Full-stack scope (LangGraph + FastAPI + Vue + Docker)
- Real operational concerns (sandboxing, browser use, shell, file ops, observability)
Recommended Next Steps (No Action Required, Just Guidance)
If you want to proceed efficiently, the natural continuation paths are:
-
Expand Part IV first
Flesh out Chapters 8–11 with full source code (LangGraph graphs, tool schemas, FastAPI endpoints, Vue components). This is the core differentiator. -
Lock the Reference Architecture
Define one canonical agent system and reuse it throughout the book to avoid fragmentation. -
Add “Failure Stories” per Chapter
This will elevate the book from tutorial to production-grade reference.
If you want, in the next turn I can:
- Fully write Chapter 8 (LangGraph Agent Core) with complete code
- Generate a repo-ready project structure
- Convert this manuscript into a publisher-ready Markdown book
- Or continue writing chapter by chapter until completion
Just indicate how you want to proceed.
========================================
AI Agent Development Landscape Research Report
Executive Summary
This comprehensive research report examines the current state of AI agent development, focusing on architectures, frameworks, best practices, and emerging trends. The research was conducted through web search and documentation review to provide a foundation for the technology book “Building AI Agents In Action: Architectures, Algorithms, and Source Code, Using LangGraph, FastAPI, Vue, Docker.”
1. Current AI Agent Development Landscape
1.1 Market Overview
The AI agent development landscape has evolved significantly in 2024-2025, with several key trends emerging:
- Framework Proliferation: Multiple specialized frameworks have emerged, each targeting different use cases and developer needs
- Enterprise Adoption: AI agents are moving from experimental projects to production systems in various industries
- Toolchain Maturation: Development tools, debugging capabilities, and deployment solutions are becoming more sophisticated
- Specialization: Frameworks are becoming more specialized, with clear distinctions between general-purpose and domain-specific solutions
1.2 Key Development Challenges
Based on current research, developers face several challenges in AI agent development:
- Complexity Management: Handling state management, error handling, and debugging in complex agent workflows
- Tool Integration: Seamlessly integrating external tools, APIs, and services with agent systems
- Performance Optimization: Managing latency, cost, and reliability in production environments
- Testing and Validation: Developing robust testing methodologies for non-deterministic AI systems
- Deployment Complexity: Containerization, scaling, and monitoring of AI agent systems
2. Major AI Agent Frameworks
2.1 LangGraph
Core Positioning: Stateful workflow orchestration framework for building complex, state-driven AI applications
Key Features:
- State Management: Built-in support for managing complex state across workflow execution
- Visual Design: Graph-based visualization of workflows and execution paths
- LangChain Integration: Seamless integration with the LangChain ecosystem
- Enterprise Support: Production-ready features for large-scale deployments
Typical Use Cases:
- Conversational systems with memory and context
- Multi-step reasoning and decision-making workflows
- Complex business process automation
- Stateful agent systems requiring persistence
Strengths:
- Excellent state management capabilities
- Strong enterprise features and support
- Comprehensive debugging and observability tools
- Integration with LangSmith for monitoring and evaluation
Limitations:
- Requires familiarity with LangChain ecosystem
- Steeper learning curve for complex workflows
- May be overkill for simple automation tasks
2.2 MetaGPT
Core Positioning: Multi-agent collaboration framework for complex task decomposition and execution
Key Features:
- Role-Based Design: Predefined agent roles with specialized capabilities
- SOP Standardization: Standard Operating Procedures for consistent task execution
- Distributed Architecture: Support for distributed agent deployment
- Complex Task Handling: Built-in mechanisms for breaking down complex tasks
Typical Use Cases:
- Product design and development workflows
- Data analysis pipelines with multiple processing steps
- Research and information synthesis tasks
- Multi-agent coordination scenarios
Strengths:
- Excellent for complex, multi-step tasks
- Strong role-based design patterns
- Good support for distributed execution
- Comprehensive task decomposition capabilities
Limitations:
- Higher debugging complexity
- May require significant configuration for specific use cases
- Performance overhead for simple tasks
2.3 OpenHands
Core Positioning: AI programming assistant framework for code generation and automation
Key Features:
- Natural Language Interface: Code generation through natural language prompts
- Multi-Language Support: Support for multiple programming languages
- IDE Integration: Built-in support for VSCode and other development environments
- Lightweight Deployment: Minimal infrastructure requirements
Typical Use Cases:
- Automated code generation and refactoring
- Script automation and workflow creation
- Development assistance and productivity tools
- Code review and quality improvement
Strengths:
- Excellent developer productivity tools
- Strong integration with development workflows
- Lightweight and easy to deploy
- Good for code-focused automation
Limitations:
- Limited for complex business logic
- Primarily focused on code generation tasks
- May require iterative refinement for complex requirements
2.4 OpenManus
Core Positioning: Lightweight task automation framework for simple workflows
Key Features:
- Rapid Development: Quick setup and deployment capabilities
- Modular Design: Flexible extension through modular components
- MIT License: Commercial-friendly licensing terms
- Simple API: Easy-to-use interface for common automation tasks
Typical Use Cases:
- File processing and data transformation
- Web scraping and data collection
- Simple workflow automation
- Rapid prototyping of agent systems
Strengths:
- Very low learning curve
- Fast development cycles
- Flexible and extensible architecture
- Commercial-friendly licensing
Limitations:
- Limited for complex multi-agent scenarios
- May require manual optimization for complex tasks
- Less comprehensive tooling compared to larger frameworks
3. Architecture Patterns and Best Practices
3.1 State Management Patterns
TypedDict Pattern (LangGraph Approach):
from typing_extensions import TypedDict, NotRequired
class AgentState(TypedDict):
messages: list[dict[str, str]]
context: NotRequired[dict]
metadata: NotRequired[dict]
Pydantic Pattern (Type-Safe Approach):
from pydantic import BaseModel, Field
from typing import List, Optional
class AgentState(BaseModel):
messages: List[dict] = Field(default_factory=list)
context: Optional[dict] = None
metadata: dict = Field(default_factory=dict)
Best Practices:
- Use structured state definitions for type safety
- Implement proper validation and error handling
- Consider state persistence requirements early
- Design state schemas for extensibility
3.2 Workflow Design Patterns
Linear Workflow Pattern:
Start → Node1 → Node2 → Node3 → End
Conditional Branching Pattern:
Start → Decision Node
├─ Condition A → NodeA → End
└─ Condition B → NodeB → End
Loop Pattern:
Start → Action Node → Decision Node
↑ ↓
└─ Continue ─┘
Parallel Execution Pattern:
Start → Fork Node
├─ Node1 ─┐
├─ Node2 ─┤ → Join Node → End
└─ Node3 ─┘
3.3 Tool Integration Patterns
Direct Tool Integration:
def search_tool(query: str) -> dict:
# Direct API call implementation
pass
Tool Node Pattern (LangGraph):
from langgraph.prebuilt import ToolNode
search_node = ToolNode([search_tool])
External Service Integration:
- REST API integration patterns
- Database connectivity patterns
- Message queue integration
- File system operations
3.4 Error Handling Patterns
Graceful Degradation:
try:
result = llm.invoke(prompt)
except Exception as e:
result = fallback_response(prompt)
Retry Mechanisms:
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
def call_llm_with_retry(prompt):
return llm.invoke(prompt)
Circuit Breaker Pattern:
class CircuitBreaker:
def __init__(self, failure_threshold=5, reset_timeout=60):
self.failure_count = 0
self.failure_threshold = failure_threshold
self.reset_timeout = reset_timeout
self.last_failure_time = None
4. Development Best Practices
4.1 Code Organization
Modular Design:
project/
├── agents/
│ ├── base_agent.py
│ ├── specialized_agent.py
│ └── multi_agent_orchestrator.py
├── tools/
│ ├── search_tools.py
│ ├── data_tools.py
│ └── api_tools.py
├── workflows/
│ ├── simple_workflow.py
│ ├── complex_workflow.py
│ └── conditional_workflow.py
├── state/
│ ├── state_definitions.py
│ └── state_managers.py
└── utils/
├── logging.py
├── error_handling.py
└── monitoring.py
4.2 Testing Strategies
Unit Testing:
def test_agent_initialization():
agent = BaseAgent(config={})
assert agent.initialized == True
Integration Testing:
def test_workflow_execution():
workflow = build_workflow()
result = workflow.invoke(initial_state)
assert result["status"] == "completed"
End-to-End Testing:
def test_full_agent_system():
# Test complete agent system with all components
pass
4.3 Performance Optimization
Caching Strategies:
- LLM response caching
- Tool result caching
- State persistence optimization
Batch Processing:
def process_batch(queries: List[str]) -> List[dict]:
# Batch LLM calls for efficiency
pass
Async Processing:
async def process_concurrently(tasks: List[Task]):
results = await asyncio.gather(*[task.execute() for task in tasks])
return results
5. Deployment and Operations
5.1 Containerization Patterns
Docker Best Practices:
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "main.py"]
Multi-Stage Builds:
# Build stage
FROM python:3.10
...
更多推荐


所有评论(0)