AI Agent 生产环境部署实战
把 AI Agent 部署到生产环境,让它真正为用户服务。本文详解 Docker 容器化、FastAPI 服务、Nginx 负载均衡、Prometheus 监控、安全性配置和高可用方案。包含完整的 docker-compose 配置和部署脚本。8000+ 字实战教程,适合运维和后端开发者。
·
AI Agent 生产环境部署实战
前两篇我们学会了搭建和优化 AI Agent,这篇文章将带你把 Agent 部署到生产环境,让它真正为用户服务。
🎯 你将学到什么
- 如何选择合适的部署架构
- Docker 容器化部署
- 使用 FastAPI 构建 API 服务
- 负载均衡和高可用
- 监控、日志和告警
- 安全性和权限控制
- 实战案例:部署智能客服系统
技术栈: Docker, FastAPI, Nginx, Prometheus, Grafana
一、部署架构选择
1.1 架构对比
| 架构类型 | 适用场景 | 优点 | 缺点 |
|---|---|---|---|
| 单机部署 | 个人项目、MVP | 简单快速 | 无法扩展 |
| 容器化部署 | 中小型项目 | 易迁移、易扩展 | 需要容器知识 |
| Serverless | 低频调用 | 按需付费、免运维 | 冷启动慢 |
| Kubernetes | 大型项目 | 高可用、自动扩展 | 复杂度高 |
1.2 推荐架构
中小型项目(推荐):
┌─────────────────────────────────────┐
│ Nginx (负载均衡) │
└──────────────┬──────────────────────┘
│
┌───────┴───────┐
│ │
┌──────▼─────┐ ┌─────▼──────┐
│ Agent 1 │ │ Agent 2 │
│ (Docker) │ │ (Docker) │
└──────┬─────┘ └─────┬──────┘
│ │
└───────┬───────┘
│
┌──────────────▼──────────────────────┐
│ Redis (缓存 + 队列) │
└──────────────┬──────────────────────┘
│
┌──────────────▼──────────────────────┐
│ PostgreSQL (数据库) │
└─────────────────────────────────────┘
二、Docker 容器化
2.1 Dockerfile
# 使用官方 Python 镜像
FROM python:3.11-slim
# 设置工作目录
WORKDIR /app
# 安装系统依赖
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# 复制依赖文件
COPY requirements.txt .
# 安装 Python 依赖
RUN pip install --no-cache-dir -r requirements.txt
# 复制应用代码
COPY . .
# 暴露端口
EXPOSE 8000
# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# 启动命令
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
2.2 requirements.txt
fastapi==0.109.0
uvicorn[standard]==0.27.0
langchain==0.1.0
langchain-openai==0.0.2
chromadb==0.4.22
redis==5.0.1
psycopg2-binary==2.9.9
python-dotenv==1.0.0
prometheus-client==0.19.0
pydantic==2.5.3
httpx==0.26.0
2.3 docker-compose.yml
version: '3.8'
services:
# AI Agent 服务
agent:
build: .
container_name: ai-agent
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- REDIS_URL=redis://redis:6379
- DATABASE_URL=postgresql://user:password@postgres:5432/agentdb
depends_on:
- redis
- postgres
volumes:
- ./logs:/app/logs
- ./data:/app/data
restart: unless-stopped
networks:
- agent-network
# Redis 缓存
redis:
image: redis:7-alpine
container_name: ai-agent-redis
ports:
- "6379:6379"
volumes:
- redis-data:/data
restart: unless-stopped
networks:
- agent-network
# PostgreSQL 数据库
postgres:
image: postgres:16-alpine
container_name: ai-agent-postgres
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=password
- POSTGRES_DB=agentdb
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
restart: unless-stopped
networks:
- agent-network
# Nginx 负载均衡
nginx:
image: nginx:alpine
container_name: ai-agent-nginx
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
- ./ssl:/etc/nginx/ssl
depends_on:
- agent
restart: unless-stopped
networks:
- agent-network
volumes:
redis-data:
postgres-data:
networks:
agent-network:
driver: bridge
三、FastAPI 服务
3.1 主应用(main.py)
from fastapi import FastAPI, HTTPException, Depends
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import Optional
import logging
from prometheus_client import Counter, Histogram, generate_latest
from fastapi.responses import Response
import time
# 初始化 FastAPI
app = FastAPI(
title="AI Agent API",
description="智能助手 API 服务",
version="1.0.0"
)
# CORS 配置
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # 生产环境应该限制具体域名
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Prometheus 指标
REQUEST_COUNT = Counter(
'agent_requests_total',
'Total number of requests',
['method', 'endpoint', 'status']
)
REQUEST_DURATION = Histogram(
'agent_request_duration_seconds',
'Request duration in seconds',
['method', 'endpoint']
)
# 日志配置
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('logs/agent.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
# 请求模型
class ChatRequest(BaseModel):
message: str
user_id: Optional[str] = None
session_id: Optional[str] = None
class ChatResponse(BaseModel):
response: str
session_id: str
timestamp: float
# Agent 实例(全局单例)
from agent import anagementAgent
agent = ProjectManntAgent()
# 中间件:请求计时
@app.middleware("http")
async def add_process_time_header(request, call_next):
start_time = time.time()
response = await call_next(request)
process_time = time.time() - start_time
# 记录指标
REQUEST_COUNT.labels(
method=request.method,
endpoint=request.url.path,
status=response.status_code
).inc()
REQUEST_DURATION.labels(
method=request.method,
endpoint=request.url.path
).observe(process_time)
response.headers["X-Process-Time"] = str(process_time)
return response
# 健康检查
@app.get("/health")
async def health_check():
"""健康检查端点"""
return {
"status": "healthy",
"timestamp": time.time()
}
# 就绪检查
@app.get("/ready")
async def readiness_check():
"""就绪检查端点"""
try:
# 检查 Agent 是否就绪
# 检查数据库连接
# 检查 Redis 连接
return {"status": "ready"}
except Exception as e:
raise HTTPException(status_code=503, detail=str(e))
# Prometheus 指标端点
@app.get("/metrics")
async def metrics():
"""Prometheus 指标"""
return Response(
content=generate_latest(),
media_type="text/plain"
)
# n@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRe
"""
处理用户消息
Args:
request: 聊天请求
Returns:
Agent 响应
"""
try:
logger.info(f"收到消息: {request.message} (用户: {request.user_id})")
# 调用 Agent
response = agent.run(request.message)
logger.info(f"Agent 响应: {response[:100]}...")
return ChatResponse(
response=response,
session_id=request.session_id or "default",
timestamp=time.time()
)
except Exception as e:
logger.error(f"处理消息失败: {e}", exc_info=True)
raise HTTPException(status_code=500, detail=str(e))
# 会话管理
@app.post("/session/create")
async def create_session(user_id: str):
"""创建新会话"""
session_id = f"session_{user_id}_{int(time.time())}"
return {"session_id": session_id}
@app.delete("/session/{session_id}")
async def delete_session(session_id: str):
"""删除会话"""
# 清理会话数据
return {"status": "deleted"}
# 启动事件
@app.on_event("startup")
async def startup_event():
"""应用启动时执行"""
loo("AI Agent API 启动")
# 初始化数据库连接
# 预热模型
@app.on_event("shutdown")
async def shutdown_event():
"""应用关闭时执行"""
logger.info("AI Agent API 关闭")
# 清理资源
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
3.2 启动服务
# 开发环境
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# 生产环境(多进程)
gunicorn main:app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--timeout 120 \
--access-logfile logs/access.log \
--error-logfile logs/error.log
四、负载均衡
4.1 Nginx ginx
nginx.conf
upstream agent_backend {
least_conn; # 最少连接负载均衡
server agent1:8000 max_fails=3 fail_timeout=30s;
server agent2:8000 max_fails=3 fail_timeout=30s;
}
server {
listen 80;
server_name your-domain.com;
# 重定向到 HTTPS
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name your-domain.com;
# SSL 证书
ssl_certificate /etc/nginx/ssl/cert.pem;
ssl_certificate_key /etc/nginx/ssl/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
# 日志
access_log /var/log/nginx/agent_access.log;
error_log /var/log/nginx/agent_error.log;
# 限流
limit_req_zone $binary_remote_addr zone=agent_limit:10m rate=10r/s;
limit_req zone=agent_limit burst=20 nodelay;
# 代理配置
location / {
proxy_pass http://agent_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# 超时设置
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
# WebSocket 支持
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
# 健康检查
location /health {
proxy_pass http://agent_backend/health;
access_log off;
}
# Prometheus 指标(仅内网访问)
location /metrics {
allow 10.0.0.0/8;
deny all;
proxy_pass http://agent_backend/metrics;
}
}
---
## 五、监控和日志
### 5.1 Prometheus 配置
```yaml
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'ai-agent'
static_configs:
- targets: ['agent:8000']
metrics_path: '/metrics'
5.2 Grafana 仪表板
{
"dashboard": {
"title": "AI Agent 监控",
"panels": [
{
"title": "请求速率",
"targets": [
{
"expr": "rate(agent_requests_total[5m])"
}
]
},
{
"title": "响应时间",
"targets": [
{
"expr": "histogram_quantile(0.95, agent_request_duration_seconds)"
}
]
},
{
"title": "错误率",
"targets": [
{
"expr": "rate(agent_requests_total{status=~\"5..\"}[5m])"
}
]
}
]
}
}
5.3 合
# 使用 structlog 结构化日志
import structlog
logger = structlog.get_logger()
logger.info(
"agent_request",
user_id=user_id,
message=message,
response_time=response_time,
status="success"
)
六、安全性
6.1 API 认证
from fastapi import Security, HTTPException
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
import jwt
security = HTTPBearer()
def verify_token(credentials: HTTPAuthorizationCredentials = Security(security)):
"""验证 JWT Token"""
try:
token = credentials.credentials
payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
return payload
except jwt.ExpiredSignatureError:
raise HTTPException(status_code=401, detail="Token 已过期")
except jwt.InvalidTokenError:
raise HTTPException(status_code=401, detail="无效的 Token")
@app.post("/chat")
async def chat(
request: ChatRequest,
user=Depends(verify_token)
):
"""需要认证的聊天端点"""
# 处理请求
pass
6.2 速率限制
from slowapi import Limiter, _rate_limit_exed_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
@app.post("/chat")
@limiter.limit("10/minute")
async def chat(request: Request, chat_request: ChatRequest):
"""限制每分钟 10 次请求"""
pass
6.3 输入验证
from pydantic import BaseModel, validator
class ChatRequest(BaseModel):
message: str
@validator('message')
def validate_message(cls, v):
if len(v) > 1000:
raise ValueError('消息长度不能超过 1000 字符')
if not v.strip():
raise ValueError('消息不能为空')
return v.strip()
七、部署流程
7.1 构建镜像
# 构建 Docker 镜像
docker build -t ai-agent:latest .
# 推送到镜像仓库
docker tag ai-agent:latest your-registry/ai-agent:latest
docker push your-registry/ai-agent:latest
7.2 启动服务
# 使用 docker-compose
docker-compose up -d
# 查看日志
docker-compose logs -f agent
# 查看状态
docker-compose ps
7.3 滚动更新
# 拉取新镜像
docker-compose pull agent
# 滚动更新(零停机)
docse up -d --no-deps --build agent
# 回滚
docker-compose up -d --no-deps ai-agent:previous-version
八、高可用方案
8.1 多实例部署
# docker-compose.yml
services:
agent1:
image: ai-agent:latest
container_name: agent-1
# ... 配置
agent2:
image: ai-agent:latest
container_name: agent-2
# ... 配置
agent3:
image: ai-agent:latest
container_name: agent-3
# ... 配置
8.2 健康检查和自动重启
services:
agent:
# ...
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
restart: unless-stopped
8.3 数据备份
#!/bin/bash
# backup.sh
# 备份 PostgreSQL
docker exec ai-agent-postgres pg_dump -U user agentdb > backup_$(date +%Y%m%d).sql
# 备份 Redis
docker exec ai-agent-redis redis-cli SAVE
docker cp ai-agent-redis:/data/dump.rdb backup_redis_$(date +%Y%m%d).rdb
# 上传到 S3
aws s3 cp backup_$(date +%Y%m%d).sql s3://your-bucket/backups/
九、性能优化
9.1 连接池
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool
engine = create_engine(
DATABASE_URL,
poolclass=QueuePool,
pool_size=20,
max_overflow=10,
pool_timeout=30,
pool_recycle=3600
)
9.2 缓存策略
import redis
import json
import hashlib
redis_client = redis.Redis(host='redis', port=6379, decode_responses=True)
def cached_agent_response(message: str, ttl: int = 3600):
"""缓存 Agent 响应"""
cache_key = f"agent:{hashlib.md5(message.encode()).hexdigest()}"
# 尝试从缓存获取
cached = redis_client.get(cache_key)
if cached:
return json.loads(cached)
# 调用 Agent
response = agent.run(message)
# redis_client.setex(cache_key, ttl, json.dumps(response))
return response
9.3 异步处理
from celery import Celery
celery_app = Celery('agent', broker='redis://redis:6379/0')
@celery_app.task
def process_message_async(message: str, user_id: str):
"""异步处理消息"""
response = agent.run(message)
# 保存结果到数据库
# 发送通知给用户
return response
@app.post("/chat/async")
async def chat_async(request: ChatRequest):
"""异步聊天端点"""
task = process_message_async.delay(request.message, request.user_id)
return {"task_id": task.id, "status": "processing"}
十、故障排查
10.1 常见问题
问题 1:容器启动失败
# 查看日志
docker-compose logs agent
# 检查配置
docker-compose config
# 进入容器调试
docker-compose exec agent /bin/bash
问题 2:API 响应慢
# 查看 Prometheus 指标
curl http://localhost:8000/metrics
# 查看数据库连接
docker-compose exec postgres psql -U user -d agentdb -c "SELECT * FROM pg_stat_activity;"
# 查看 Redis 状态
docker-compose exec redis redis-cli INFO
问题 3:内存泄漏
# 监控内存使用
docker stats
# 重启服务
docker-compose restart agent
十一、总结
11.1 关键要点
- 容器化是标准做法 — Docker + docker-compose
- API 服务化 — FastAPI 提供 RESTful API
- 负载均衡 — Nginx 分发请求
- 监控告警 — Prometheus + Grafana
- 安全第一 — 认证、限流、输入验证
11.2 生产环境检查清单
- Docker 镜像构建成功
- 环境变量配置正确
- 数据库连接正常
- Redis 缓存可用
- 健康检查端点正常
- 日志输出正常
- Prometheus 指标可访问
- SSL 证书配置
- 备份策略就绪
- 监控告警配置
11.3 下一步
- 📊 第 4 篇:性能优化和成本控制 — 如何降低成本提高性能
关注我,不错过后续更新! 🚀
如果觉得有帮助,欢迎点赞、收藏、转发! ❤️
更多推荐



所有评论(0)