AgentFramework-零基础入门-第08章_部署和监控代理

Azure Functions 是部署 AI 代理最简单、最经济的方式之一。本指南将手把手教你如何将代理部署到 Azure Functions，让它可以通过 HTTP 请求被访问。就像把你的代理从本地的"工作室"搬到云端的"办公室"，让全世界的用户都能访问它。在 Function App 中选择"自定义域"添加你的域名配置 DNS 记录添加 SSL 证书部署代理后，监控其运行状况至关重要。就像开车

许泽宇的技术分享

356人浏览 · 2025-11-29 23:25:23

许泽宇的技术分享 · 2025-11-29 23:25:23 发布

部署选项介绍

概述

当你完成了 AI 代理的开发和测试后，下一步就是将它部署到生产环境，让真实用户可以使用。就像你开了一家餐厅，菜品研发完成后，需要选择一个合适的店面位置和经营方式一样，部署代理也需要选择合适的托管方式。

本章将介绍几种常见的部署方式，帮助你根据实际需求选择最适合的方案。

为什么需要部署？

在开发阶段，我们通常在本地运行代理进行测试。但这种方式有几个问题：

可访问性有限：只有你自己能访问，其他人无法使用
不稳定：你的电脑关机后，代理就停止工作了
性能受限：受限于本地电脑的性能
难以扩展：无法应对大量用户同时访问

部署到云端可以解决这些问题，让你的代理：

24/7 全天候运行
任何人都可以通过网络访问
自动扩展以应对流量变化
获得专业的监控和维护

主要部署方式对比

1. Azure Functions（无服务器函数）

什么是 Azure Functions？

Azure Functions 是一种"无服务器"计算服务。你只需要编写代码，Azure 会自动处理服务器管理、扩展等问题。就像外卖服务一样，你只需要点餐，不用关心厨房、配送等细节。

特点：

✅ 按需付费：只在代码运行时收费，空闲时不收费
✅ 自动扩展：自动应对流量高峰
✅ 快速部署：几分钟内就能部署完成
✅ 易于维护：不需要管理服务器
⚠️ 冷启动：长时间未使用后首次调用可能较慢
⚠️ 执行时间限制：单次执行有时间限制（默认5分钟，最长10分钟）

适用场景：

间歇性使用的代理（如每天处理几次请求）
需要快速响应的简单任务
预算有限的项目
流量波动较大的应用

成本示例：

前 100 万次执行免费
之后每 100 万次执行约 $0.20
非常适合小型项目和初创应用

2. Azure Web 应用（App Service）

什么是 Azure Web 应用？

Azure Web 应用是一个完全托管的 Web 应用托管平台。就像租用一个专门的店面，你有固定的空间和资源。

特点：

✅ 持续运行：没有冷启动问题
✅ 更多控制：可以配置更多运行时参数
✅ 支持长时间运行：没有执行时间限制
✅ 内置负载均衡：自动分配流量
⚠️ 固定成本：即使没有流量也需要付费
⚠️ 需要选择规格：需要预估资源需求

适用场景：

需要持续运行的代理
有稳定流量的应用
需要长时间处理的任务
需要 WebSocket 等高级功能

成本示例：

基础版：约 $13/月
标准版：约 $100/月
高级版：约 $200/月起

3. Azure 容器实例（ACI）

什么是容器实例？

容器是一种打包应用的方式，包含了应用运行所需的一切。Azure 容器实例让你可以快速运行容器，无需管理虚拟机。

特点：

✅ 快速启动：几秒钟内启动容器
✅ 灵活配置：可以精确控制资源
✅ 按秒计费：只为实际使用的时间付费
✅ 易于迁移：容器可以在不同环境间移动
⚠️ 需要容器知识：需要了解 Docker 等容器技术
⚠️ 手动扩展：需要自己管理扩展

适用场景：

需要特定运行环境的应用
批处理任务
需要完全控制运行环境
已经使用容器化的项目

成本示例：

按 CPU 和内存计费
1 vCPU + 1GB 内存：约 32/月持续运行）

4. Azure Kubernetes Service（AKS）

什么是 Kubernetes？

Kubernetes 是一个容器编排平台，可以管理大量容器的部署、扩展和运维。就像一个大型购物中心的管理系统，可以协调众多商铺的运营。

特点：

✅ 强大的编排能力：自动部署、扩展、恢复
✅ 高可用性：自动故障转移
✅ 适合大规模：可以管理成百上千个容器
✅ 生态丰富：大量工具和插件
⚠️ 复杂度高：学习曲线陡峭
⚠️ 成本较高：需要持续运行的集群

适用场景：

大型企业应用
需要高可用性的关键业务
微服务架构
需要复杂编排的多代理系统

成本示例：

集群管理免费
节点（虚拟机）成本：约 $70/月起
适合大规模部署

5. 本地部署（On-Premises）

什么是本地部署？

在自己的服务器或数据中心运行代理，完全由自己管理。

特点：

✅ 完全控制：对所有方面有完全控制权
✅ 数据安全：数据不离开自己的环境
✅ 无云成本：不需要支付云服务费用
⚠️ 维护负担重：需要自己管理硬件、网络、安全等
⚠️ 初期投入大：需要购买硬件和软件
⚠️ 扩展困难：需要手动添加硬件

适用场景：

有严格数据合规要求
已有现成的基础设施
不希望依赖云服务
特殊的网络隔离需求

部署方式对比表

特性	Azure Functions	Azure Web 应用	容器实例	Kubernetes	本地部署
部署难度	⭐ 简单	⭐⭐ 中等	⭐⭐⭐ 较难	⭐⭐⭐⭐⭐ 复杂	⭐⭐⭐⭐ 困难
成本	💰 很低	💰💰 中等	💰💰 中等	💰💰💰 较高	💰💰💰💰 高
扩展性	⭐⭐⭐⭐⭐ 自动	⭐⭐⭐⭐ 自动	⭐⭐ 手动	⭐⭐⭐⭐⭐ 自动	⭐ 手动
启动速度	⚠️ 有冷启动	✅ 快速	✅ 快速	✅ 快速	✅ 快速
执行时间	⚠️ 有限制	✅ 无限制	✅ 无限制	✅ 无限制	✅ 无限制
维护工作	⭐ 最少	⭐⭐ 较少	⭐⭐⭐ 中等	⭐⭐⭐⭐ 较多	⭐⭐⭐⭐⭐ 最多
适合规模	小到中	小到大	小到中	中到超大	任意

如何选择部署方式？

决策流程图

开始
  ↓
是否是学习/测试项目？
  ├─ 是 → Azure Functions（免费额度充足）
  └─ 否 ↓
      ↓
预期流量是否稳定？
  ├─ 否（间歇性）→ Azure Functions
  └─ 是 ↓
      ↓
是否需要长时间运行任务？
  ├─ 是 → Azure Web 应用 或 容器实例
  └─ 否 ↓
      ↓
是否已经使用容器？
  ├─ 是 → 容器实例 或 AKS
  └─ 否 ↓
      ↓
是否是大规模企业应用？
  ├─ 是 → AKS
  └─ 否 → Azure Web 应用

部署前的准备工作

无论选择哪种部署方式，都需要做以下准备：

1. 配置管理

// 不要在代码中硬编码敏感信息
// ❌ 错误做法
var apiKey = "sk-1234567890abcdef";

// ✅ 正确做法：使用环境变量或配置服务
var apiKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY");

2. 依赖项检查

确保所有 NuGet 包都已正确引用：

<ItemGroup>
  <PackageReference Include="Microsoft.Extensions.AI" Version="9.0.0" />
  <PackageReference Include="Azure.AI.OpenAI" Version="2.1.0" />
  <!-- 其他依赖... -->
</ItemGroup>

3. 错误处理

添加完善的错误处理和日志记录：

try
{
    var response = await agent.RunAsync(userMessage);
    return response;
}
catch (Exception ex)
{
    // 记录错误
    logger.LogError(ex, "代理执行失败");
    // 返回友好的错误信息
    return "抱歉，处理您的请求时出现了问题。";
}

4. 性能优化

使用连接池
实现缓存策略
优化数据库查询
减少不必要的 API 调用

下一步

在接下来的课程中，我们将详细介绍：

Azure Functions 的具体部署步骤
如何配置监控和日志
如何分析和优化性能
常见问题的解决方法

小结

选择合适的部署方式是成功运行 AI 代理的关键。对于大多数初学者和小型项目，Azure Functions 是最佳起点。随着应用的成长，你可以根据需要迁移到其他方案。

记住：

从简单开始，逐步优化
根据实际需求选择，不要过度设计
重视监控和日志，它们是排查问题的关键
安全性永远是第一位的

Azure Functions 部署指南

概述

Azure Functions 是部署 AI 代理最简单、最经济的方式之一。本指南将手把手教你如何将代理部署到 Azure Functions，让它可以通过 HTTP 请求被访问。

就像把你的代理从本地的"工作室"搬到云端的"办公室"，让全世界的用户都能访问它。

前置条件

在开始之前，请确保你已经：

✅ 安装了 .NET 8.0 SDK 或更高版本
✅ 安装了 Azure Functions Core Tools
✅ 拥有 Azure 账号（可以使用免费试用）
✅ 安装了 Azure CLI（可选，但推荐）
✅ 有一个可以正常运行的代理项目

安装 Azure Functions Core Tools

Windows（使用 npm）：

npm install -g azure-functions-core-tools@4 --unsafe-perm true

macOS（使用 Homebrew）：

brew tap azure/functions
brew install azure-functions-core-tools@4

验证安装：

func --version

应该看到类似 4.x.x 的版本号。

第一步：创建 Azure Functions 项目

1.1 创建项目结构

打开终端，创建新的 Functions 项目：

# 创建项目文件夹
mkdir MyAgentFunction
cd MyAgentFunction

# 初始化 Functions 项目
func init --worker-runtime dotnet-isolated --target-framework net8.0

这会创建以下文件：

host.json - Functions 运行时配置
local.settings.json - 本地开发配置
.gitignore - Git 忽略文件
MyAgentFunction.csproj - 项目文件

1.2 添加必要的 NuGet 包

编辑 MyAgentFunction.csproj，添加所需的包：

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>net8.0</TargetFramework>
    <AzureFunctionsVersion>v4</AzureFunctionsVersion>
    <OutputType>Exe</OutputType>
  </PropertyGroup>
  
  <ItemGroup>
    <!-- Azure Functions 核心包 -->
    <PackageReference Include="Microsoft.Azure.Functions.Worker" Version="1.21.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Sdk" Version="1.17.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Http" Version="3.1.0" />
    
    <!-- AI 代理相关包 -->
    <PackageReference Include="Microsoft.Extensions.AI" Version="9.0.0" />
    <PackageReference Include="Azure.AI.OpenAI" Version="2.1.0" />
    
    <!-- 日志和配置 -->
    <PackageReference Include="Microsoft.Extensions.Logging" Version="8.0.0" />
    <PackageReference Include="Microsoft.Extensions.Configuration" Version="8.0.0" />
  </ItemGroup>
</Project>

安装包：

dotnet restore

第二步：编写 Function 代码

2.1 创建简单的代理 Function

创建 AgentFunction.cs 文件：

using Microsoft.Azure.Functions.Worker;
using Microsoft.Azure.Functions.Worker.Http;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.AI;
using Azure.AI.OpenAI;
using System.Net;
using System.Text.Json;

namespace MyAgentFunction;

public class AgentFunction
{
    private readonly ILogger<AgentFunction> _logger;
    private readonly IChatClient _chatClient;

    public AgentFunction(ILogger<AgentFunction> logger)
    {
        _logger = logger;
        
        // 从环境变量获取配置
        var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT");
        var apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY");
        var deploymentName = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT");
        
        // 创建 Azure OpenAI 客户端
        var client = new AzureOpenAIClient(
            new Uri(endpoint!),
            new System.ClientModel.ApiKeyCredential(apiKey!)
        );
        
        _chatClient = client.AsChatClient(deploymentName!);
    }

    [Function("Chat")]
    public async Task<HttpResponseData> RunAsync(
        [HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequestData req)
    {
        _logger.LogInformation("收到聊天请求");

        try
        {
            // 读取请求体
            var requestBody = await new StreamReader(req.Body).ReadToEndAsync();
            var request = JsonSerializer.Deserialize<ChatRequest>(requestBody);

            if (request == null || string.IsNullOrEmpty(request.Message))
            {
                return await CreateErrorResponse(req, "请求消息不能为空", HttpStatusCode.BadRequest);
            }

            _logger.LogInformation($"用户消息: {request.Message}");

            // 调用代理
            var response = await _chatClient.CompleteAsync(request.Message);
            var agentReply = response.Message.Text;

            _logger.LogInformation($"代理回复: {agentReply}");

            // 返回成功响应
            var httpResponse = req.CreateResponse(HttpStatusCode.OK);
            await httpResponse.WriteAsJsonAsync(new ChatResponse
            {
                Reply = agentReply,
                Success = true
            });

            return httpResponse;
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "处理请求时发生错误");
            return await CreateErrorResponse(req, $"处理失败: {ex.Message}", HttpStatusCode.InternalServerError);
        }
    }

    private async Task<HttpResponseData> CreateErrorResponse(
        HttpRequestData req, 
        string message, 
        HttpStatusCode statusCode)
    {
        var response = req.CreateResponse(statusCode);
        await response.WriteAsJsonAsync(new ChatResponse
        {
            Reply = message,
            Success = false
        });
        return response;
    }
}

// 请求和响应模型
public class ChatRequest
{
    public string Message { get; set; } = string.Empty;
}

public class ChatResponse
{
    public string Reply { get; set; } = string.Empty;
    public bool Success { get; set; }
}

2.2 配置 Program.cs

创建或修改 Program.cs：

using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Azure.Functions.Worker;

var host = new HostBuilder()
    .ConfigureFunctionsWorkerDefaults()
    .ConfigureServices(services =>
    {
        // 添加应用程序洞察（可选）
        services.AddApplicationInsightsTelemetryWorkerService();
        services.ConfigureFunctionsApplicationInsights();
    })
    .Build();

await host.RunAsync();

第三步：本地测试

3.1 配置本地设置

编辑 local.settings.json：

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "UseDevelopmentStorage=true",
    "FUNCTIONS_WORKER_RUNTIME": "dotnet-isolated",
    "AZURE_OPENAI_ENDPOINT": "https://your-resource.openai.azure.com/",
    "AZURE_OPENAI_API_KEY": "your-api-key-here",
    "AZURE_OPENAI_DEPLOYMENT": "gpt-4"
  }
}

⚠️ 重要提示：

local.settings.json 仅用于本地开发
不要将此文件提交到 Git（已在 .gitignore 中）
生产环境使用 Azure 的应用设置

3.2 本地运行

启动 Functions：

func start

你应该看到类似的输出：

Azure Functions Core Tools
Core Tools Version:       4.0.5455
Function Runtime Version: 4.27.5.21554

Functions:
        Chat: [POST] http://localhost:7071/api/Chat

3.3 测试 Function

使用 curl 或 Postman 测试：

curl -X POST http://localhost:7071/api/Chat \
  -H "Content-Type: application/json" \
  -d '{"message": "你好，请介绍一下自己"}'

预期响应：

{
  "reply": "你好！我是一个 AI 助手...",
  "success": true
}

第四步：部署到 Azure

4.1 创建 Azure 资源

方法一：使用 Azure Portal（图形界面）

登录 Azure Portal
点击"创建资源"
搜索"Function App"
点击"创建"
填写以下信息：
- 订阅：选择你的订阅
- 资源组：创建新的或选择现有的
- Function App 名称：例如 my-agent-func（必须全局唯一）
- 运行时堆栈：.NET
- 版本：8 (LTS) Isolated
- 区域：选择离你最近的区域
- 操作系统：Windows 或 Linux
- 计划类型：消费（无服务器）
点击"查看 + 创建"
点击"创建"

等待几分钟，资源创建完成。

方法二：使用 Azure CLI（命令行）

# 登录 Azure
az login

# 创建资源组
az group create --name MyAgentResourceGroup --location eastus

# 创建存储账户（Functions 需要）
az storage account create \
  --name myagentstorage123 \
  --resource-group MyAgentResourceGroup \
  --location eastus \
  --sku Standard_LRS

# 创建 Function App
az functionapp create \
  --resource-group MyAgentResourceGroup \
  --consumption-plan-location eastus \
  --runtime dotnet-isolated \
  --runtime-version 8 \
  --functions-version 4 \
  --name my-agent-func \
  --storage-account myagentstorage123

4.2 配置应用设置

在 Azure Portal 中：

打开你的 Function App
在左侧菜单选择"配置"
点击"新建应用程序设置"
添加以下设置：

名称	值
`AZURE_OPENAI_ENDPOINT`	`https://your-resource.openai.azure.com/`
`AZURE_OPENAI_API_KEY`	`your-api-key`
`AZURE_OPENAI_DEPLOYMENT`	`gpt-4`

点击"保存"

或使用 CLI：

az functionapp config appsettings set \
  --name my-agent-func \
  --resource-group MyAgentResourceGroup \
  --settings \
    AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" \
    AZURE_OPENAI_API_KEY="your-api-key" \
    AZURE_OPENAI_DEPLOYMENT="gpt-4"

4.3 部署代码

方法一：使用 Visual Studio Code

安装 Azure Functions 扩展
在 VS Code 中打开项目
点击左侧的 Azure 图标
在 Functions 部分，点击"部署到 Function App"
选择你的订阅和 Function App
确认部署

方法二：使用 Azure Functions Core Tools

# 发布到 Azure
func azure functionapp publish my-agent-func

方法三：使用 Azure CLI

# 先构建项目
dotnet publish --configuration Release

# 创建部署包
cd bin/Release/net8.0/publish
zip -r ../deploy.zip .

# 部署
az functionapp deployment source config-zip \
  --resource-group MyAgentResourceGroup \
  --name my-agent-func \
  --src ../deploy.zip

4.4 获取 Function URL

部署完成后，获取 Function 的 URL：

az functionapp function show \
  --name my-agent-func \
  --resource-group MyAgentResourceGroup \
  --function-name Chat \
  --query "invokeUrlTemplate" \
  --output tsv

或在 Azure Portal 中：

打开 Function App
选择"Functions"
点击"Chat"
点击"获取函数 URL"

URL 格式类似：

https://my-agent-func.azurewebsites.net/api/Chat?code=xxxxx

第五步：测试部署的 Function

5.1 使用 curl 测试

curl -X POST "https://my-agent-func.azurewebsites.net/api/Chat?code=your-function-key" \
  -H "Content-Type: application/json" \
  -d '{"message": "你好"}'

5.2 使用 Postman 测试

创建新的 POST 请求
URL：https://my-agent-func.azurewebsites.net/api/Chat?code=your-function-key
Headers：Content-Type: application/json
Body（raw JSON）：

{
  "message": "你好，请介绍一下自己"
}

点击"Send"

5.3 创建简单的测试页面

创建 test.html：

<!DOCTYPE html>
<html>
<head>
    <title>AI 代理测试</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 600px;
            margin: 50px auto;
            padding: 20px;
        }
        #chat-box {
            border: 1px solid #ccc;
            height: 300px;
            overflow-y: auto;
            padding: 10px;
            margin-bottom: 10px;
        }
        .message {
            margin: 10px 0;
            padding: 8px;
            border-radius: 5px;
        }
        .user {
            background-color: #e3f2fd;
            text-align: right;
        }
        .agent {
            background-color: #f5f5f5;
        }
        #input-box {
            display: flex;
            gap: 10px;
        }
        #message-input {
            flex: 1;
            padding: 10px;
        }
        button {
            padding: 10px 20px;
            background-color: #2196F3;
            color: white;
            border: none;
            cursor: pointer;
        }
        button:hover {
            background-color: #0b7dda;
        }
    </style>
</head>
<body>
    <h1>AI 代理测试</h1>
    <div id="chat-box"></div>
    <div id="input-box">
        <input type="text" id="message-input" placeholder="输入消息..." />
        <button onclick="sendMessage()">发送</button>
    </div>

    <script>
        // 替换为你的 Function URL
        const FUNCTION_URL = 'https://my-agent-func.azurewebsites.net/api/Chat?code=your-function-key';

        async function sendMessage() {
            const input = document.getElementById('message-input');
            const message = input.value.trim();
            
            if (!message) return;

            // 显示用户消息
            addMessage(message, 'user');
            input.value = '';

            try {
                // 调用 Function
                const response = await fetch(FUNCTION_URL, {
                    method: 'POST',
                    headers: {
                        'Content-Type': 'application/json'
                    },
                    body: JSON.stringify({ message: message })
                });

                const data = await response.json();
                
                // 显示代理回复
                if (data.success) {
                    addMessage(data.reply, 'agent');
                } else {
                    addMessage('错误: ' + data.reply, 'agent');
                }
            } catch (error) {
                addMessage('请求失败: ' + error.message, 'agent');
            }
        }

        function addMessage(text, type) {
            const chatBox = document.getElementById('chat-box');
            const messageDiv = document.createElement('div');
            messageDiv.className = `message ${type}`;
            messageDiv.textContent = text;
            chatBox.appendChild(messageDiv);
            chatBox.scrollTop = chatBox.scrollHeight;
        }

        // 支持回车发送
        document.getElementById('message-input').addEventListener('keypress', function(e) {
            if (e.key === 'Enter') {
                sendMessage();
            }
        });
    </script>
</body>
</html>

第六步：监控和调试

6.1 查看日志

在 Azure Portal 中：

打开 Function App
选择"监视" > "日志流"
实时查看日志输出

或使用 CLI：

az webapp log tail --name my-agent-func --resource-group MyAgentResourceGroup

6.2 查看执行历史

打开 Function App
选择"Functions" > "Chat"
点击"监视"
查看调用历史、成功率、执行时间等

6.3 常见问题排查

问题 1：Function 返回 500 错误

检查应用设置是否正确配置
查看日志中的详细错误信息
确认 API 密钥有效

问题 2：冷启动时间长

这是正常现象，首次调用需要启动容器
考虑升级到高级计划以获得预热实例
或使用 Azure Web 应用

问题 3：超时错误

默认超时是 5 分钟
在 host.json 中增加超时时间：

{
  "version": "2.0",
  "extensions": {
    "http": {
      "routePrefix": "api",
      "maxOutstandingRequests": 200,
      "maxConcurrentRequests": 100,
      "dynamicThrottlesEnabled": true
    }
  },
  "functionTimeout": "00:10:00"
}

进阶配置

启用 CORS（跨域资源共享）

如果需要从网页调用 Function：

az functionapp cors add \
  --name my-agent-func \
  --resource-group MyAgentResourceGroup \
  --allowed-origins "*"

⚠️ 生产环境应指定具体的域名，而不是 *

添加自定义域名

在 Function App 中选择"自定义域"
添加你的域名
配置 DNS 记录
添加 SSL 证书

配置自动扩展

消费计划会自动扩展，但你可以设置限制：

az functionapp config set \
  --name my-agent-func \
  --resource-group MyAgentResourceGroup \
  --max-instances 10

成本优化建议

使用消费计划：只为实际使用付费
优化代码：减少执行时间
实现缓存：避免重复调用 API
设置预算警报：在 Azure Portal 中设置成本警报
监控使用情况：定期检查执行次数和成本

小结

恭喜！你已经成功将 AI 代理部署到 Azure Functions。现在你的代理：

✅ 可以通过 HTTP 被全球访问
✅ 自动扩展以应对流量
✅ 只在使用时付费
✅ 有完整的监控和日志

监控配置指南

概述

部署代理后，监控其运行状况至关重要。就像开车需要看仪表盘一样，运行 AI 代理也需要监控各种指标，以便及时发现和解决问题。

本章将介绍如何使用 OpenTelemetry 和 Azure Application Insights 来监控你的 AI 代理。

为什么需要监控？

想象一下，如果你的代理出现以下情况：

🐌 响应速度突然变慢
❌ 频繁返回错误
💰 API 调用成本异常增高
🔥 某个功能被大量使用

如果没有监控，你可能完全不知道这些问题的存在。监控可以帮助你：

及时发现问题：在用户投诉之前就发现异常
分析性能：了解哪些操作最慢，需要优化
追踪错误：快速定位错误的根本原因
优化成本：了解资源使用情况，优化开支
改进体验：基于数据做出改进决策

监控的三大支柱

1. 日志（Logs）

记录发生了什么事情，比如：

"用户发送了消息：你好"
"调用 OpenAI API 成功"
"发生错误：连接超时"

2. 指标（Metrics）

数值化的性能数据，比如：

每分钟请求数：150
平均响应时间：2.3 秒
错误率：0.5%

3. 追踪（Traces）

请求的完整路径，比如：

用户请求 → 调用代理 → 调用 OpenAI → 返回结果
每个步骤的耗时

OpenTelemetry 简介

什么是 OpenTelemetry？

OpenTelemetry（简称 OTel）是一个开源的可观测性框架，就像一个"监控工具箱"，可以收集应用的日志、指标和追踪数据。

优势：

🌍 行业标准，被广泛支持
🔌 与多种监控平台兼容
📊 提供丰富的遥测数据
🆓 完全免费和开源

OpenTelemetry 的工作原理

你的代码
   ↓
OpenTelemetry SDK（收集数据）
   ↓
OpenTelemetry Exporter（导出数据）
   ↓
监控平台（Azure Application Insights、Prometheus 等）

配置 OpenTelemetry

第一步：安装必要的包

在你的 Function 项目中添加 NuGet 包：

dotnet add package OpenTelemetry
dotnet add package OpenTelemetry.Exporter.Console
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package Azure.Monitor.OpenTelemetry.Exporter

或在 .csproj 文件中添加：

<ItemGroup>
  <!-- OpenTelemetry 核心包 -->
  <PackageReference Include="OpenTelemetry" Version="1.7.0" />
  <PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.7.0" />
  
  <!-- 自动仪表化 -->
  <PackageReference Include="OpenTelemetry.Instrumentation.Http" Version="1.7.0" />
  
  <!-- 导出器 -->
  <PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.7.0" />
  <PackageReference Include="Azure.Monitor.OpenTelemetry.Exporter" Version="1.2.0" />
</ItemGroup>

第二步：配置 OpenTelemetry

修改 Program.cs：

using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Azure.Functions.Worker;
using OpenTelemetry;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
using OpenTelemetry.Metrics;
using OpenTelemetry.Logs;

var host = new HostBuilder()
    .ConfigureFunctionsWorkerDefaults()
    .ConfigureServices(services =>
    {
        // 配置 OpenTelemetry
        services.AddOpenTelemetry()
            .ConfigureResource(resource => resource
                .AddService(
                    serviceName: "MyAgentFunction",
                    serviceVersion: "1.0.0"))
            .WithTracing(tracing => tracing
                // 添加追踪源
                .AddSource("MyAgentFunction")
                // 自动追踪 HTTP 请求
                .AddHttpClientInstrumentation()
                // 添加 Azure Functions 追踪
                .AddAspNetCoreInstrumentation()
                // 导出到控制台（开发环境）
                .AddConsoleExporter()
                // 导出到 Azure Monitor（生产环境）
                .AddAzureMonitorTraceExporter(options =>
                {
                    options.ConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING");
                }))
            .WithMetrics(metrics => metrics
                // 添加指标源
                .AddMeter("MyAgentFunction")
                // 自动收集 HTTP 指标
                .AddHttpClientInstrumentation()
                // 导出到控制台
                .AddConsoleExporter()
                // 导出到 Azure Monitor
                .AddAzureMonitorMetricExporter(options =>
                {
                    options.ConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING");
                }));

        // 配置日志
        services.AddLogging(logging =>
        {
            logging.AddOpenTelemetry(options =>
            {
                options.AddConsoleExporter();
                options.AddAzureMonitorLogExporter(options =>
                {
                    options.ConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING");
                });
            });
        });
    })
    .Build();

await host.RunAsync();

第三步：在代码中使用遥测

修改 AgentFunction.cs，添加遥测功能：

using Microsoft.Azure.Functions.Worker;
using Microsoft.Azure.Functions.Worker.Http;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.AI;
using System.Diagnostics;
using System.Diagnostics.Metrics;

namespace MyAgentFunction;

public class AgentFunction
{
    private readonly ILogger<AgentFunction> _logger;
    private readonly IChatClient _chatClient;
    
    // 创建追踪源
    private static readonly ActivitySource ActivitySource = new("MyAgentFunction");
    
    // 创建指标
    private static readonly Meter Meter = new("MyAgentFunction");
    private static readonly Counter<long> RequestCounter = Meter.CreateCounter<long>("agent.requests");
    private static readonly Histogram<double> RequestDuration = Meter.CreateHistogram<double>("agent.request.duration");
    private static readonly Counter<long> ErrorCounter = Meter.CreateCounter<long>("agent.errors");

    public AgentFunction(ILogger<AgentFunction> logger, IChatClient chatClient)
    {
        _logger = logger;
        _chatClient = chatClient;
    }

    [Function("Chat")]
    public async Task<HttpResponseData> RunAsync(
        [HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequestData req)
    {
        // 开始追踪
        using var activity = ActivitySource.StartActivity("ProcessChatRequest");
        var stopwatch = Stopwatch.StartNew();

        try
        {
            // 记录请求
            RequestCounter.Add(1);
            _logger.LogInformation("收到聊天请求");

            // 读取请求
            var requestBody = await new StreamReader(req.Body).ReadToEndAsync();
            var request = JsonSerializer.Deserialize<ChatRequest>(requestBody);

            if (request == null || string.IsNullOrEmpty(request.Message))
            {
                activity?.SetStatus(ActivityStatusCode.Error, "无效的请求");
                ErrorCounter.Add(1, new KeyValuePair<string, object?>("error.type", "validation"));
                return await CreateErrorResponse(req, "请求消息不能为空", HttpStatusCode.BadRequest);
            }

            // 添加追踪标签
            activity?.SetTag("user.message.length", request.Message.Length);
            activity?.SetTag("user.message", request.Message);

            _logger.LogInformation("用户消息: {Message}", request.Message);

            // 调用代理（创建子追踪）
            using var agentActivity = ActivitySource.StartActivity("CallAgent");
            var response = await _chatClient.CompleteAsync(request.Message);
            var agentReply = response.Message.Text;

            // 记录代理响应
            agentActivity?.SetTag("agent.reply.length", agentReply?.Length ?? 0);
            _logger.LogInformation("代理回复: {Reply}", agentReply);

            // 记录成功
            activity?.SetStatus(ActivityStatusCode.Ok);
            
            // 记录请求时长
            stopwatch.Stop();
            RequestDuration.Record(stopwatch.Elapsed.TotalMilliseconds);

            // 返回响应
            var httpResponse = req.CreateResponse(HttpStatusCode.OK);
            await httpResponse.WriteAsJsonAsync(new ChatResponse
            {
                Reply = agentReply ?? string.Empty,
                Success = true
            });

            return httpResponse;
        }
        catch (Exception ex)
        {
            // 记录错误
            activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
            activity?.RecordException(ex);
            ErrorCounter.Add(1, new KeyValuePair<string, object?>("error.type", ex.GetType().Name));
            
            _logger.LogError(ex, "处理请求时发生错误");
            
            stopwatch.Stop();
            RequestDuration.Record(stopwatch.Elapsed.TotalMilliseconds);
            
            return await CreateErrorResponse(req, $"处理失败: {ex.Message}", HttpStatusCode.InternalServerError);
        }
    }

    private async Task<HttpResponseData> CreateErrorResponse(
        HttpRequestData req, 
        string message, 
        HttpStatusCode statusCode)
    {
        var response = req.CreateResponse(statusCode);
        await response.WriteAsJsonAsync(new ChatResponse
        {
            Reply = message,
            Success = false
        });
        return response;
    }
}

第四步：配置 Azure Application Insights

4.1 创建 Application Insights 资源

使用 Azure Portal：

登录 Azure Portal
点击"创建资源"
搜索"Application Insights"
填写信息：
- 资源组：选择与 Function App 相同的资源组
- 名称：例如 my-agent-insights
- 区域：与 Function App 相同
点击"查看 + 创建"
创建完成后，复制"连接字符串"

使用 Azure CLI：

# 创建 Application Insights
az monitor app-insights component create \
  --app my-agent-insights \
  --location eastus \
  --resource-group MyAgentResourceGroup

# 获取连接字符串
az monitor app-insights component show \
  --app my-agent-insights \
  --resource-group MyAgentResourceGroup \
  --query connectionString \
  --output tsv

4.2 配置连接字符串

在 Function App 的应用设置中添加：

az functionapp config appsettings set \
  --name my-agent-func \
  --resource-group MyAgentResourceGroup \
  --settings APPLICATIONINSIGHTS_CONNECTION_STRING="InstrumentationKey=xxx;IngestionEndpoint=https://xxx"

或在 Azure Portal 中：

打开 Function App
选择"配置"
添加新设置：
- 名称：APPLICATIONINSIGHTS_CONNECTION_STRING
- 值：从 Application Insights 复制的连接字符串

4.3 本地开发配置

在 local.settings.json 中添加：

{
  "Values": {
    "APPLICATIONINSIGHTS_CONNECTION_STRING": "InstrumentationKey=xxx;IngestionEndpoint=https://xxx",
    // ... 其他设置
  }
}

查看遥测数据

在 Azure Portal 中查看

1. 实时指标流

实时查看应用的运行状况：

打开 Application Insights 资源
选择"实时指标"
查看：
- 每秒请求数
- 平均响应时间
- 失败请求数
- 服务器资源使用情况

2. 性能视图

分析性能瓶颈：

选择"性能"
查看：
- 操作列表及其平均持续时间
- 最慢的操作
- 依赖项调用时间

3. 失败视图

分析错误和异常：

选择"失败"
查看：
- 失败率趋势
- 异常类型分布
- 失败的操作详情

4. 应用程序映射

可视化应用架构：

选择"应用程序映射"
查看：
- 组件之间的依赖关系
- 每个组件的健康状况
- 调用频率和失败率

使用 Kusto 查询语言（KQL）

Application Insights 使用 KQL 进行高级查询。

查询示例

查看最近的请求：

requests
| where timestamp > ago(1h)
| project timestamp, name, duration, resultCode
| order by timestamp desc
| take 100

分析错误：

exceptions
| where timestamp > ago(24h)
| summarize count() by type, outerMessage
| order by count_ desc

计算平均响应时间：

requests
| where timestamp > ago(1h)
| summarize avg(duration) by bin(timestamp, 5m)
| render timechart

查看自定义指标：

customMetrics
| where name == "agent.request.duration"
| summarize avg(value), max(value), min(value) by bin(timestamp, 5m)
| render timechart

追踪特定请求：

traces
| where message contains "用户消息"
| project timestamp, message, severityLevel
| order by timestamp desc

创建告警

设置性能告警

当响应时间过长时发送通知：

在 Application Insights 中选择"警报"
点击"新建警报规则"
配置条件：
- 信号：请求持续时间
- 阈值：大于 5000 毫秒
- 评估频率：每 5 分钟
配置操作组（发送邮件、短信等）
保存规则

设置错误率告警

当错误率超过阈值时通知：

requests
| where timestamp > ago(5m)
| summarize 
    total = count(),
    failed = countif(success == false)
| extend errorRate = (failed * 100.0) / total
| where errorRate > 5  // 错误率超过 5%

设置成本告警

监控 API 调用次数：

customMetrics
| where name == "agent.requests"
| where timestamp > ago(1h)
| summarize sum(value)
| where sum_value > 1000  // 每小时超过 1000 次请求

最佳实践

1. 结构化日志

使用结构化日志而不是字符串拼接：

// ❌ 不好的做法
_logger.LogInformation("用户 " + userId + " 发送了消息");

// ✅ 好的做法
_logger.LogInformation("用户 {UserId} 发送了消息", userId);

2. 使用日志级别

根据重要性选择合适的日志级别：

_logger.LogTrace("详细的调试信息");      // 开发时使用
_logger.LogDebug("调试信息");           // 开发时使用
_logger.LogInformation("正常操作");     // 生产环境
_logger.LogWarning("警告信息");         // 需要注意
_logger.LogError(ex, "错误信息");       // 需要处理
_logger.LogCritical(ex, "严重错误");    // 紧急情况

3. 添加上下文信息

在追踪中添加有用的标签：

activity?.SetTag("user.id", userId);
activity?.SetTag("request.type", "chat");
activity?.SetTag("model.name", "gpt-4");
activity?.SetTag("message.length", message.Length);

4. 采样策略

对于高流量应用，使用采样减少成本：

.WithTracing(tracing => tracing
    .SetSampler(new TraceIdRatioBasedSampler(0.1))  // 采样 10%
    // ... 其他配置
)

5. 敏感信息保护

不要记录敏感信息：

// ❌ 不要记录密码、API 密钥等
_logger.LogInformation("API Key: {ApiKey}", apiKey);

// ✅ 只记录必要的信息
_logger.LogInformation("API 调用成功");

性能监控仪表板

创建自定义仪表板

在 Azure Portal 选择"仪表板"
点击"新建仪表板"
添加以下图表：
- 请求数趋势图
- 平均响应时间
- 错误率
- 依赖项调用时间
- 自定义指标

导出仪表板

可以将仪表板导出为 JSON，与团队共享：

点击"下载"
保存 JSON 文件
其他人可以导入此文件

小结

监控是保证 AI 代理稳定运行的关键。通过 OpenTelemetry 和 Azure Application Insights，你可以：

✅ 实时了解应用运行状况
✅ 快速定位和解决问题
✅ 优化性能和成本
✅ 基于数据做出改进决策

记住：

从一开始就配置监控，不要等出问题再加
定期查看监控数据，主动发现问题
设置合理的告警，及时响应异常
保护用户隐私，不记录敏感信息

日志分析与调试指南

概述

日志就像是应用的"黑匣子"，记录了所有重要的事件。当代理出现问题时，日志是我们找到答案的第一手资料。

本章将教你如何有效地查看、分析和利用日志来调试问题，以及如何进行性能分析和优化。

日志的重要性

想象一下这些场景：

🤔 用户反馈"代理没有回复"，但你不知道发生了什么
🐛 代理偶尔返回错误，但无法重现
🐌 某些请求特别慢，但不知道瓶颈在哪里
💸 API 成本突然增加，但不清楚原因

这些问题都可以通过分析日志来解决。

日志级别详解

日志级别的含义

级别	用途	示例	生产环境
Trace	最详细的信息	"进入方法 X，参数 Y"	❌ 不建议
Debug	调试信息	"变量值：{value}"	❌ 不建议
Information	正常操作	"用户发送消息"	✅ 推荐
Warning	警告但不影响功能	"API 响应慢"	✅ 推荐
Error	错误但应用继续运行	"调用失败，重试中"	✅ 必须
Critical	严重错误，应用可能崩溃	"数据库连接失败"	✅ 必须

如何选择日志级别

public async Task<string> ProcessMessage(string message)
{
    // Trace: 非常详细的流程信息
    _logger.LogTrace("开始处理消息，长度: {Length}", message.Length);
    
    // Debug: 调试时有用的信息
    _logger.LogDebug("消息内容: {Message}", message);
    
    // Information: 重要的业务事件
    _logger.LogInformation("收到用户消息");
    
    try
    {
        var response = await _chatClient.CompleteAsync(message);
        
        // Information: 成功的操作
        _logger.LogInformation("代理响应成功");
        
        return response.Message.Text;
    }
    catch (HttpRequestException ex) when (ex.StatusCode == System.Net.HttpStatusCode.TooManyRequests)
    {
        // Warning: 需要注意但不是错误
        _logger.LogWarning("API 限流，将重试");
        await Task.Delay(1000);
        // 重试逻辑...
    }
    catch (Exception ex)
    {
        // Error: 发生错误
        _logger.LogError(ex, "处理消息失败");
        throw;
    }
}

在 Azure 中查看日志

方法一：实时日志流

最快速的查看方式，适合实时调试：

在 Azure Portal 中：

打开你的 Function App
选择"监视" → "日志流"
选择"文件系统日志"或"应用程序日志"
实时查看日志输出

使用 Azure CLI：

# 实时查看日志
az webapp log tail \
  --name my-agent-func \
  --resource-group MyAgentResourceGroup

# 下载日志文件
az webapp log download \
  --name my-agent-func \
  --resource-group MyAgentResourceGroup \
  --log-file logs.zip

方法二：Application Insights 日志

更强大的查询和分析功能：

打开 Application Insights 资源
选择"日志"
使用 KQL 查询日志

Kusto 查询语言（KQL）实战

基础查询

查看最近的日志

traces
| where timestamp > ago(1h)
| order by timestamp desc
| take 100

按级别筛选

traces
| where timestamp > ago(24h)
| where severityLevel >= 3  // 3=Warning, 4=Error, 5=Critical
| order by timestamp desc

搜索特定内容

traces
| where message contains "错误" or message contains "失败"
| where timestamp > ago(1h)
| project timestamp, message, severityLevel

高级查询

统计错误类型

traces
| where severityLevel == 4  // Error
| where timestamp > ago(24h)
| summarize count() by tostring(customDimensions.ErrorType)
| order by count_ desc

分析响应时间趋势

requests
| where timestamp > ago(24h)
| summarize 
    avg(duration),
    percentile(duration, 50),
    percentile(duration, 95),
    percentile(duration, 99)
    by bin(timestamp, 1h)
| render timechart

查找慢请求

requests
| where duration > 5000  // 超过 5 秒
| where timestamp > ago(24h)
| project timestamp, name, duration, resultCode
| order by duration desc

关联请求和日志

requests
| where timestamp > ago(1h)
| join kind=inner (
    traces
    | where severityLevel >= 3
) on operation_Id
| project 
    timestamp,
    request_name = name,
    request_duration = duration,
    log_message = message,
    log_level = severityLevel

分析用户行为

traces
| where message contains "用户消息"
| where timestamp > ago(24h)
| extend userMessage = extract("用户消息: (.*)", 1, message)
| summarize count() by userMessage
| order by count_ desc
| take 10

常见问题调试

问题 1：代理没有响应

症状：

请求发送后长时间没有返回
或返回超时错误

调试步骤：

查看是否有请求到达：

requests
| where name == "Chat"
| where timestamp > ago(1h)
| order by timestamp desc

检查请求持续时间：

requests
| where name == "Chat"
| where timestamp > ago(1h)
| project timestamp, duration, resultCode
| order by duration desc

查看相关日志：

traces
| where operation_Name == "Chat"
| where timestamp > ago(1h)
| order by timestamp desc

可能的原因：

API 调用超时
网络问题
代码中有阻塞操作
资源不足（内存、CPU）

解决方案：

// 添加超时控制
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
try
{
    var response = await _chatClient.CompleteAsync(message, cancellationToken: cts.Token);
}
catch (OperationCanceledException)
{
    _logger.LogWarning("请求超时");
    return "抱歉，处理时间过长，请稍后重试";
}

问题 2：频繁出现错误

症状：

错误率突然升高
特定类型的错误重复出现

调试步骤：

统计错误类型：

exceptions
| where timestamp > ago(24h)
| summarize count() by type, outerMessage
| order by count_ desc

查看错误详情：

exceptions
| where timestamp > ago(1h)
| project 
    timestamp,
    type,
    outerMessage,
    innermostMessage,
    details
| order by timestamp desc

分析错误趋势：

exceptions
| where timestamp > ago(24h)
| summarize count() by bin(timestamp, 1h), type
| render timechart

常见错误及解决方案：

错误：401 Unauthorized

dependencies
| where resultCode == "401"
| where timestamp > ago(1h)

解决：检查 API 密钥是否正确配置

错误：429 Too Many Requests

dependencies
| where resultCode == "429"
| where timestamp > ago(1h)

解决：实现重试机制和速率限制

// 实现指数退避重试
public async Task<string> CallWithRetry(string message, int maxRetries = 3)
{
    for (int i = 0; i < maxRetries; i++)
    {
        try
        {
            return await _chatClient.CompleteAsync(message);
        }
        catch (HttpRequestException ex) when (ex.StatusCode == System.Net.HttpStatusCode.TooManyRequests)
        {
            if (i == maxRetries - 1) throw;
            
            var delay = TimeSpan.FromSeconds(Math.Pow(2, i));
            _logger.LogWarning("API 限流，等待 {Delay} 秒后重试", delay.TotalSeconds);
            await Task.Delay(delay);
        }
    }
    throw new Exception("重试次数已用尽");
}

问题 3：性能下降

症状：

响应时间变长
用户反馈慢

调试步骤：

分析响应时间分布：

requests
| where timestamp > ago(24h)
| summarize 
    count(),
    avg(duration),
    percentiles(duration, 50, 90, 95, 99)
| render table

找出最慢的操作：

requests
| where timestamp > ago(24h)
| top 20 by duration desc
| project timestamp, name, duration, resultCode

分析依赖项性能：

dependencies
| where timestamp > ago(24h)
| summarize avg(duration), max(duration) by name
| order by avg_duration desc

查看资源使用情况：

performanceCounters
| where timestamp > ago(1h)
| where name == "% Processor Time" or name == "Available Bytes"
| summarize avg(value) by name, bin(timestamp, 5m)
| render timechart

优化建议：

// 1. 使用缓存减少重复调用
private readonly IMemoryCache _cache;

public async Task<string> GetResponseWithCache(string message)
{
    var cacheKey = $"response:{message.GetHashCode()}";
    
    if (_cache.TryGetValue(cacheKey, out string? cachedResponse))
    {
        _logger.LogInformation("从缓存返回响应");
        return cachedResponse;
    }
    
    var response = await _chatClient.CompleteAsync(message);
    
    _cache.Set(cacheKey, response, TimeSpan.FromMinutes(10));
    return response;
}

// 2. 并行处理多个请求
public async Task<List<string>> ProcessMultipleMessages(List<string> messages)
{
    var tasks = messages.Select(msg => _chatClient.CompleteAsync(msg));
    var responses = await Task.WhenAll(tasks);
    return responses.Select(r => r.Message.Text).ToList();
}

// 3. 使用流式响应
public async IAsyncEnumerable<string> StreamResponse(string message)
{
    await foreach (var chunk in _chatClient.CompleteStreamingAsync(message))
    {
        yield return chunk.Text;
    }
}

问题 4：成本异常增高

症状：

Azure 账单突然增加
API 调用次数异常

调试步骤：

统计 API 调用次数：

dependencies
| where type == "Http"
| where target contains "openai"
| where timestamp > ago(24h)
| summarize count() by bin(timestamp, 1h)
| render timechart

分析调用来源：

requests
| where timestamp > ago(24h)
| summarize count() by client_IP
| order by count_ desc

检查异常流量：

requests
| where timestamp > ago(1h)
| summarize count() by bin(timestamp, 1m)
| where count_ > 100  // 每分钟超过 100 次

成本优化措施：

// 1. 实现速率限制
public class RateLimiter
{
    private readonly SemaphoreSlim _semaphore;
    private readonly Queue<DateTime> _requestTimes;
    private readonly int _maxRequests;
    private readonly TimeSpan _timeWindow;

    public RateLimiter(int maxRequests, TimeSpan timeWindow)
    {
        _maxRequests = maxRequests;
        _timeWindow = timeWindow;
        _semaphore = new SemaphoreSlim(1, 1);
        _requestTimes = new Queue<DateTime>();
    }

    public async Task<bool> TryAcquire()
    {
        await _semaphore.WaitAsync();
        try
        {
            var now = DateTime.UtcNow;
            
            // 移除过期的请求记录
            while (_requestTimes.Count > 0 && now - _requestTimes.Peek() > _timeWindow)
            {
                _requestTimes.Dequeue();
            }
            
            if (_requestTimes.Count < _maxRequests)
            {
                _requestTimes.Enqueue(now);
                return true;
            }
            
            return false;
        }
        finally
        {
            _semaphore.Release();
        }
    }
}

// 使用速率限制
private readonly RateLimiter _rateLimiter = new(100, TimeSpan.FromMinutes(1));

public async Task<HttpResponseData> RunAsync(HttpRequestData req)
{
    if (!await _rateLimiter.TryAcquire())
    {
        _logger.LogWarning("请求被限流");
        return await CreateErrorResponse(req, "请求过于频繁，请稍后重试", HttpStatusCode.TooManyRequests);
    }
    
    // 处理请求...
}

// 2. 优化 prompt 长度
public string OptimizePrompt(string userMessage)
{
    // 移除多余的空白
    userMessage = Regex.Replace(userMessage, @"\s+", " ").Trim();
    
    // 限制长度
    if (userMessage.Length > 1000)
    {
        userMessage = userMessage.Substring(0, 1000);
        _logger.LogWarning("用户消息过长，已截断");
    }
    
    return userMessage;
}

创建自定义查询和告警

保存常用查询

在 Application Insights 中：

编写查询
点击"保存"
给查询命名
下次可以快速访问

创建基于查询的告警

示例：错误率告警

requests
| where timestamp > ago(5m)
| summarize 
    total = count(),
    failed = countif(success == false)
| extend errorRate = (failed * 100.0) / total
| where errorRate > 5

配置告警：

点击"新建告警规则"
设置阈值和评估频率
配置通知方式（邮件、短信、Webhook）

日志分析最佳实践

1. 使用结构化日志

// ❌ 不好
_logger.LogInformation($"用户 {userId} 发送了消息：{message}");

// ✅ 好
_logger.LogInformation("用户 {UserId} 发送了消息", userId);

这样可以更容易地查询：

traces
| where customDimensions.UserId == "user123"

2. 添加关键上下文

using (_logger.BeginScope(new Dictionary<string, object>
{
    ["UserId"] = userId,
    ["SessionId"] = sessionId,
    ["RequestId"] = requestId
}))
{
    _logger.LogInformation("处理用户请求");
    // ... 所有日志都会包含这些上下文
}

3. 定期审查日志

建立日志审查习惯：

每天查看错误日志
每周分析性能趋势
每月审查成本和使用情况

4. 建立日志保留策略

# 设置日志保留期（天）
az monitor app-insights component update \
  --app my-agent-insights \
  --resource-group MyAgentResourceGroup \
  --retention-time 90

5. 导出重要日志

对于合规要求，可以导出日志到存储：

# 配置连续导出
az monitor app-insights component continues-export create \
  --app my-agent-insights \
  --resource-group MyAgentResourceGroup \
  --record-types Requests Exceptions Traces \
  --dest-account mylogstorage \
  --dest-container logs

性能分析工具

1. Application Insights Profiler

自动捕获性能快照：

在 Application Insights 中启用 Profiler
查看"性能" → "Profiler 跟踪"
分析代码执行时间

2. 快照调试器

捕获异常时的完整状态：

启用快照调试器
当异常发生时自动捕获快照
查看变量值和调用堆栈

小结

有效的日志分析能力是成为优秀开发者的关键技能。通过本章学习，你应该能够：

✅ 理解不同日志级别的用途
✅ 使用 KQL 查询和分析日志
✅ 快速定位和解决常见问题
✅ 进行性能分析和优化
✅ 控制和优化成本

记住：

日志是你的朋友，不要吝啬记录
但也要注意不要记录敏感信息
定期审查日志，主动发现问题
建立告警机制，及时响应异常

第08章练习题

练习题说明

本章的练习题旨在帮助你巩固部署和监控相关的知识。每道题都提供了提示和参考答案，建议先独立思考，遇到困难再查看提示。

练习 1：选择合适的部署方式

题目

你正在为以下三个项目选择部署方式，请为每个项目推荐最合适的部署方案，并说明理由。

项目 A：个人学习项目

预期流量：每天 10-20 次请求
预算：希望尽可能低
复杂度：简单的问答代理

项目 B：企业内部工具

预期流量：工作时间持续使用，每小时 100-200 次请求
预算：中等
复杂度：需要连接内部数据库，处理时间可能较长

项目 C：公开的 SaaS 服务

预期流量：不确定，可能有突发高峰
预算：充足
复杂度：多个代理协作，需要高可用性

提示

考虑成本、扩展性、启动时间、维护难度
参考第一节"部署选项介绍"中的对比表
思考每个项目的特殊需求

参考答案

项目 A：推荐 Azure Functions（消费计划）

理由：

✅ 成本最低：前 100 万次执行免费，完全满足需求
✅ 部署简单：适合个人学习
✅ 无需维护：自动管理基础设施
⚠️ 冷启动可接受：流量低，偶尔的冷启动不影响体验

项目 B：推荐 Azure Web 应用（App Service）

理由：

✅ 持续运行：没有冷启动问题
✅ 支持长时间处理：适合复杂的数据库操作
✅ 稳定的性能：工作时间持续使用，固定成本更合理
✅ 易于配置：可以方便地配置数据库连接

项目 C：推荐 Azure Kubernetes Service (AKS)

理由：

✅ 高可用性：自动故障转移和负载均衡
✅ 强大的扩展能力：自动应对流量高峰
✅ 适合微服务：多个代理可以独立部署和扩展
✅ 生产级别：适合公开服务的可靠性要求
⚠️ 复杂度高：需要专业团队维护

练习 2：部署到 Azure Functions

题目

请按照以下步骤，将一个简单的代理部署到 Azure Functions：

创建一个新的 Azure Functions 项目
实现一个简单的 HTTP 触发器，接收用户消息并返回代理响应
在本地测试
部署到 Azure
测试部署后的 Function

要求：

使用 .NET 8.0
使用 Azure OpenAI 或 OpenAI
实现基本的错误处理
添加日志记录

提示

参考第二节"Azure Functions部署"的详细步骤
确保环境变量配置正确
使用 func start 进行本地测试
使用 func azure functionapp publish 部署

参考答案

步骤 1：创建项目

mkdir MyFirstAgentFunction
cd MyFirstAgentFunction
func init --worker-runtime dotnet-isolated --target-framework net8.0

步骤 2：添加依赖

编辑 .csproj 文件，添加：

<PackageReference Include="Microsoft.Extensions.AI" Version="9.0.0" />
<PackageReference Include="Azure.AI.OpenAI" Version="2.1.0" />

步骤 3：创建 Function

创建 SimpleAgent.cs：

using Microsoft.Azure.Functions.Worker;
using Microsoft.Azure.Functions.Worker.Http;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.AI;
using Azure.AI.OpenAI;
using System.Net;
using System.Text.Json;

namespace MyFirstAgentFunction;

public class SimpleAgent
{
    private readonly ILogger<SimpleAgent> _logger;
    private readonly IChatClient _chatClient;

    public SimpleAgent(ILogger<SimpleAgent> logger)
    {
        _logger = logger;
        
        var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT");
        var apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY");
        var deployment = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT");
        
        var client = new AzureOpenAIClient(
            new Uri(endpoint!),
            new System.ClientModel.ApiKeyCredential(apiKey!)
        );
        
        _chatClient = client.AsChatClient(deployment!);
    }

    [Function("Chat")]
    public async Task<HttpResponseData> Run(
        [HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequestData req)
    {
        _logger.LogInformation("收到请求");

        try
        {
            var body = await new StreamReader(req.Body).ReadToEndAsync();
            var request = JsonSerializer.Deserialize<ChatRequest>(body);

            if (string.IsNullOrEmpty(request?.Message))
            {
                var errorResponse = req.CreateResponse(HttpStatusCode.BadRequest);
                await errorResponse.WriteStringAsync("消息不能为空");
                return errorResponse;
            }

            _logger.LogInformation("处理消息: {Message}", request.Message);

            var response = await _chatClient.CompleteAsync(request.Message);
            
            var httpResponse = req.CreateResponse(HttpStatusCode.OK);
            await httpResponse.WriteAsJsonAsync(new { reply = response.Message.Text });
            
            return httpResponse;
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "处理失败");
            var errorResponse = req.CreateResponse(HttpStatusCode.InternalServerError);
            await errorResponse.WriteStringAsync("处理失败");
            return errorResponse;
        }
    }
}

public class ChatRequest
{
    public string Message { get; set; } = string.Empty;
}

步骤 4：配置本地设置

编辑 local.settings.json：

{
  "Values": {
    "AZURE_OPENAI_ENDPOINT": "https://your-resource.openai.azure.com/",
    "AZURE_OPENAI_API_KEY": "your-key",
    "AZURE_OPENAI_DEPLOYMENT": "gpt-4"
  }
}

步骤 5：本地测试

func start

测试：

curl -X POST http://localhost:7071/api/Chat \
  -H "Content-Type: application/json" \
  -d '{"message": "你好"}'

步骤 6：部署

# 创建 Function App（如果还没有）
az functionapp create \
  --resource-group MyResourceGroup \
  --consumption-plan-location eastus \
  --runtime dotnet-isolated \
  --functions-version 4 \
  --name my-first-agent-func \
  --storage-account mystorageaccount

# 配置应用设置
az functionapp config appsettings set \
  --name my-first-agent-func \
  --resource-group MyResourceGroup \
  --settings \
    AZURE_OPENAI_ENDPOINT="your-endpoint" \
    AZURE_OPENAI_API_KEY="your-key" \
    AZURE_OPENAI_DEPLOYMENT="gpt-4"

# 部署
func azure functionapp publish my-first-agent-func

步骤 7：测试部署

curl -X POST "https://my-first-agent-func.azurewebsites.net/api/Chat?code=your-key" \
  -H "Content-Type: application/json" \
  -d '{"message": "你好"}'

练习 3：配置监控和告警

题目

为你在练习 2 中部署的 Function 配置完整的监控：

添加 OpenTelemetry 支持
配置 Application Insights
添加自定义指标（请求计数、响应时间）
创建一个告警规则：当错误率超过 5% 时发送通知

提示

参考第三节"监控配置"
使用 ActivitySource 和 Meter 创建自定义遥测
在 Azure Portal 中配置告警

参考答案

步骤 1：添加 OpenTelemetry 包

dotnet add package OpenTelemetry
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package Azure.Monitor.OpenTelemetry.Exporter

步骤 2：配置 Program.cs

using OpenTelemetry;
using OpenTelemetry.Trace;
using OpenTelemetry.Metrics;

var host = new HostBuilder()
    .ConfigureFunctionsWorkerDefaults()
    .ConfigureServices(services =>
    {
        services.AddOpenTelemetry()
            .WithTracing(tracing => tracing
                .AddSource("MyFirstAgentFunction")
                .AddHttpClientInstrumentation()
                .AddAzureMonitorTraceExporter(options =>
                {
                    options.ConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING");
                }))
            .WithMetrics(metrics => metrics
                .AddMeter("MyFirstAgentFunction")
                .AddHttpClientInstrumentation()
                .AddAzureMonitorMetricExporter(options =>
                {
                    options.ConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING");
                }));
    })
    .Build();

await host.RunAsync();

步骤 3：添加自定义遥测

修改 SimpleAgent.cs：

using System.Diagnostics;
using System.Diagnostics.Metrics;

public class SimpleAgent
{
    private static readonly ActivitySource ActivitySource = new("MyFirstAgentFunction");
    private static readonly Meter Meter = new("MyFirstAgentFunction");
    private static readonly Counter<long> RequestCounter = Meter.CreateCounter<long>("requests.count");
    private static readonly Histogram<double> ResponseTime = Meter.CreateHistogram<double>("requests.duration");

    [Function("Chat")]
    public async Task<HttpResponseData> Run(HttpRequestData req)
    {
        using var activity = ActivitySource.StartActivity("ProcessRequest");
        var stopwatch = Stopwatch.StartNew();

        try
        {
            RequestCounter.Add(1);
            
            // ... 处理逻辑 ...
            
            stopwatch.Stop();
            ResponseTime.Record(stopwatch.Elapsed.TotalMilliseconds);
            activity?.SetStatus(ActivityStatusCode.Ok);
            
            return httpResponse;
        }
        catch (Exception ex)
        {
            activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
            throw;
        }
    }
}

步骤 4：创建 Application Insights

az monitor app-insights component create \
  --app my-agent-insights \
  --location eastus \
  --resource-group MyResourceGroup

# 获取连接字符串
az monitor app-insights component show \
  --app my-agent-insights \
  --resource-group MyResourceGroup \
  --query connectionString

步骤 5：配置连接字符串

az functionapp config appsettings set \
  --name my-first-agent-func \
  --resource-group MyResourceGroup \
  --settings APPLICATIONINSIGHTS_CONNECTION_STRING="your-connection-string"

步骤 6：创建告警规则

在 Azure Portal 中：

打开 Application Insights
选择"警报" → "新建警报规则"

配置条件：

信号：自定义日志搜索
查询：

requests
| where timestamp > ago(5m)
| summarize total = count(), failed = countif(success == false)
| extend errorRate = (failed * 100.0) / total
| where errorRate > 5

配置操作组（邮件通知）
保存规则

练习 4：日志分析实战

题目

假设你的代理出现了以下问题，请使用 KQL 查询来分析：

场景 1：用户反馈响应很慢 编写查询找出：

最近 1 小时内最慢的 10 个请求
平均响应时间的趋势
哪个依赖项最慢

场景 2：错误率突然升高 编写查询找出：

最常见的错误类型
错误发生的时间分布
受影响的用户数量

场景 3：成本异常 编写查询找出：

API 调用次数趋势
哪些用户调用最频繁
是否有异常的流量模式

提示

参考第四节"日志分析"中的 KQL 示例
使用 summarize、where、order by 等操作符
使用 render timechart 可视化趋势

参考答案

场景 1：响应慢分析

查询 1：最慢的 10 个请求

requests
| where timestamp > ago(1h)
| top 10 by duration desc
| project timestamp, name, duration, resultCode, operation_Id

查询 2：响应时间趋势

requests
| where timestamp > ago(24h)
| summarize 
    avg(duration),
    percentile(duration, 50),
    percentile(duration, 95)
    by bin(timestamp, 1h)
| render timechart

查询 3：依赖项性能

dependencies
| where timestamp > ago(1h)
| summarize 
    count(),
    avg(duration),
    max(duration)
    by name, type
| order by avg_duration desc

场景 2：错误分析

查询 1：错误类型统计

exceptions
| where timestamp > ago(24h)
| summarize count() by type, outerMessage
| order by count_ desc

查询 2：错误时间分布

exceptions
| where timestamp > ago(24h)
| summarize count() by bin(timestamp, 1h)
| render timechart

查询 3：受影响用户

requests
| where success == false
| where timestamp > ago(24h)
| summarize count() by client_IP
| summarize affectedUsers = count()

场景 3：成本分析

查询 1：API 调用趋势

dependencies
| where type == "Http"
| where target contains "openai"
| where timestamp > ago(7d)
| summarize count() by bin(timestamp, 1h)
| render timechart

查询 2：高频用户

requests
| where timestamp > ago(24h)
| summarize requestCount = count() by client_IP
| order by requestCount desc
| take 10

查询 3：异常流量检测

requests
| where timestamp > ago(24h)
| summarize count() by bin(timestamp, 1m), client_IP
| where count_ > 100  // 每分钟超过 100 次
| order by timestamp desc

练习 5：性能优化

题目

你的代理平均响应时间是 3 秒，你希望优化到 1 秒以内。请：

列出可能的优化方向
实现一个缓存机制来减少重复的 API 调用
实现请求超时控制
添加性能监控指标

提示

考虑缓存、并行处理、超时控制
使用 IMemoryCache 实现缓存
使用 CancellationToken 控制超时

参考答案

优化方向：

缓存常见问题的答案：减少 API 调用
优化 prompt：减少 token 使用
并行处理：同时处理多个独立操作
超时控制：避免长时间等待
使用更快的模型：如 GPT-3.5 而不是 GPT-4
流式响应：让用户更快看到结果

实现缓存：

using Microsoft.Extensions.Caching.Memory;

public class CachedAgent
{
    private readonly IChatClient _chatClient;
    private readonly IMemoryCache _cache;
    private readonly ILogger _logger;

    public CachedAgent(IChatClient chatClient, IMemoryCache cache, ILogger logger)
    {
        _chatClient = chatClient;
        _cache = cache;
        _logger = logger;
    }

    public async Task<string> GetResponseAsync(string message)
    {
        // 生成缓存键
        var cacheKey = $"response:{ComputeHash(message)}";

        // 尝试从缓存获取
        if (_cache.TryGetValue(cacheKey, out string? cachedResponse))
        {
            _logger.LogInformation("缓存命中");
            return cachedResponse;
        }

        _logger.LogInformation("缓存未命中，调用 API");

        // 调用 API
        var response = await _chatClient.CompleteAsync(message);
        var result = response.Message.Text;

        // 存入缓存（10 分钟过期）
        _cache.Set(cacheKey, result, TimeSpan.FromMinutes(10));

        return result;
    }

    private string ComputeHash(string input)
    {
        using var sha256 = System.Security.Cryptography.SHA256.Create();
        var bytes = System.Text.Encoding.UTF8.GetBytes(input);
        var hash = sha256.ComputeHash(bytes);
        return Convert.ToBase64String(hash);
    }
}

超时控制：

public async Task<string> GetResponseWithTimeoutAsync(string message, int timeoutSeconds = 30)
{
    using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(timeoutSeconds));
    
    try
    {
        var response = await _chatClient.CompleteAsync(message, cancellationToken: cts.Token);
        return response.Message.Text;
    }
    catch (OperationCanceledException)
    {
        _logger.LogWarning("请求超时");
        return "抱歉，处理时间过长，请稍后重试";
    }
}

性能监控：

private static readonly Histogram<double> CacheHitRate = Meter.CreateHistogram<double>("cache.hit.rate");
private static readonly Counter<long> CacheHits = Meter.CreateCounter<long>("cache.hits");
private static readonly Counter<long> CacheMisses = Meter.CreateCounter<long>("cache.misses");

public async Task<string> GetResponseAsync(string message)
{
    var cacheKey = $"response:{ComputeHash(message)}";

    if (_cache.TryGetValue(cacheKey, out string? cachedResponse))
    {
        CacheHits.Add(1);
        return cachedResponse;
    }

    CacheMisses.Add(1);
    
    var response = await _chatClient.CompleteAsync(message);
    var result = response.Message.Text;
    
    _cache.Set(cacheKey, result, TimeSpan.FromMinutes(10));
    
    return result;
}