原文作者:Alessandro Fael Garcia,Michael Pleshakov - F5 Software Engineer

原文链接:使用 NGINX 作为 AI Proxy

转载来源:NGINX 中文社区


NGINX 唯一中文官方社区 ,尽在 nginx.org.cn

在过去几年里,人工智能 (AI) 迅速席卷全球。AI 模型和服务的快速增长创造了一个复杂的生态环境,组织开始将多个大型语言模型 (LLM) 提供商结合起来,并管理具有不同 API 规范的多个 model endpoint,以构建 AI 驱动的应用程序。我们正在见证 AI gateway 和 LLM router 的兴起——这些是位于应用程序与 AI 模型之间的专用基础设施组件,用于编排和保护 AI 请求的流向。

什么是 AI Proxy?

AI proxy 是 AI gateway 的简化实现,主要关注 AI 流量控制、模型转换、身份验证与授权、模型宕机切换机制以及 AI 模型使用日志记录。一个功能完善的 AI gateway 通常可以无缝集成所有这些功能,并提供原生的 AI 安全防护,防范针对 LLM 的特定威胁,如 prompt 注入或数据外泄攻击。

AI proxy 的核心是流量控制,它构成了任何 AI 流量管理工具的基础,并实现身份验证与授权机制以及速率限制。这有助于防止模型滥用,并确保资源的公平分配。

同样,AI 模型 API 转换也是一个非常有价值的功能。模型翻译通过提供对不同 AI 模型的统一 API 入口来实现跨 AI 模型和提供者的无缝集成,从而将复杂性从客户端应用抽象掉。另一个重要功能是通过宕机切换系统提供高可用性,确保 prompt 请求的可靠性,能够优雅地处理模型不可用或速率受限的情况。

流量可观测性同样至关重要,它提供全面的监控和日志记录,以维持 AI 流程中的运营可见性。对请求与响应负载进行流量审计,与系统的可观测性密不可分。在人工智能广泛应用的时代,此类审计对于合规性验证、故障排查与模型性能分析等方面的重要性愈发凸显。

将 NGINX 配置为 OpenAI 与 Anthropic AI Proxy

以下所有用例示例都是相互构建的,所提供的代码片段仅供参考,并不保证在您的环境中能够直接运行。如果您希望实际测试功能性概念验证(PoC)形式的 NGINX AI proxy 部署,我们建议您首先部署下文 “亲自体验 NGINX AI Proxy” 部分或 NGINX demos AI Proxy GitHub 仓库中详细描述的 PoC,然后再回到本节。以下示例均基于该 PoC 中的代码。

用例 1:AI 模型/LLM 路由与模型转换

由于缺乏用于与不同 LLM 交互的统一标准 API,路由到不同模型时需要在请求和响应通过 NGINX 时进行转换。这可确保传入请求与每个后端 LLM 的特定要求兼容。本示例假设传入 NGINX 的 API 请求遵循 OpenAI chat completion API 规范,因为它被广泛采用。

为了转换传入的 OpenAI 兼容请求,我们将从实现一个 NJS 转换脚本开始,将这些请求转换为 Anthropic messages API 所需的格式,以便路由到 Anthropic 后端,同时将 Anthropic 响应转换回 OpenAI 格式。路由到 OpenAI endpoint 的请求将保持不变。

首先,您需要创建一个 NJS 文件,其中包含用于处理模型转换的脚本。这些函数可以将请求和响应从 Anthropic 格式转换为 OpenAI 兼容格式,同时根据所查询的 endpoint 判断请求是否需要进行转换:

aiproxy.js

// Convert an OpenAI compatible request to Anthropic's request format
function transformAnthropicRequest(requestBody) {
    // Anthropic requires max_tokens, but our API may not always specify it -> fallback to defaults if not provided
    let maxTokens = requestBody.max_completion_tokens || requestBody.max_tokens || 512;
 
    const anthropicRequest = {
        model: requestBody.model,
        max_tokens: maxTokens,
        stream: requestBody.stream || false,
        temperature: requestBody.temperature || 1.0,
        top_p: requestBody.top_p
    };
 
    // Scale Anthropic temperature based on its acceptable range (0-1) vs OpenAI (0-2)
    if (anthropicRequest.temperature > 1.0) {
        anthropicRequest.temperature = requestBody.temperature / 2.0;
    }
 
    // Convert stop sequences to Anthropic's format
    if (requestBody.stop) {
        anthropicRequest.stop_sequences = Array.isArray(requestBody.stop) ? requestBody.stop : [requestBody.stop];
    }
 
    // Separate system messages from user/assistant messages
    const systemMessages = [];
    const messages = [];
 
    for (let i = 0; i < requestBody.messages.length; i++) {
        const msg = requestBody.messages[i];
        if (msg.role === "system") {
            systemMessages.push({text: msg.content, type: "text"});
        } else {
            messages.push({role: msg.role, content: msg.content});
        }
    }
 
    // Attach system messages if present
    if (systemMessages.length > 0) {
        anthropicRequest.system = systemMessages;
    }
    anthropicRequest.messages = messages;
 
    return anthropicRequest;
}
 
// Convert an Anthropic response to an OpenAI response format
function transformAnthropicResponse(anthropicResponse) {
    const response = JSON.parse(anthropicResponse);
 
    // Handle error responses from Anthropic
    if (response.error) {
        return {
            error: {
                type: response.error.type,
                message: response.error.message,
                code: response.error.code
            }
        };
    }
 
    // Map Anthropic's successful response to OpenAI's expected structure
    const openaiResponse = {
        id: response.id,
        object: "chat.completion", // Standardize object type
        model: response.model,
        choices: [],
        usage: {
            prompt_tokens: response.usage.input_tokens,
            completion_tokens: response.usage.output_tokens,
            total_tokens: response.usage.input_tokens + response.usage.output_tokens
        }
    };
 
    // Convert content to choices format
    for (let i = 0; i < response.content.length; i++) {
        const content = response.content[i];
        openaiResponse.choices.push({
            index: i,
            finish_reason: response.stop_reason,
            message: {
                role: response.role,
                content: content.text
            }
        });
    }
 
    return openaiResponse;
}
 
// Attempts to call the specified model provider (Anthropic or OpenAI)
// Transforms the request as needed and issues a subrequest to the provider's location
async function tryModel(r, modelConfig, requestBody) {
    const location = modelConfig.location;
    let subrequestBody;
 
    // Transform request body for Anthropic, or pass through for OpenAI
    if (modelConfig.provider === "anthropic") {
        const transformedRequest = transformAnthropicRequest(requestBody);
        subrequestBody = JSON.stringify(transformedRequest);
    } else if (modelConfig.provider === "openai") {
        // For OpenAI, pass the request as-is (no transformation needed)
        subrequestBody = JSON.stringify(requestBody);
    } else {
        throw new Error(`Provider '${modelConfig.provider}' not supported`);
    }
 
    // Issue subrequest to the model provider
    return await r.subrequest(location, {
        method: 'POST',
        body: subrequestBody
    });
}
 
// Returns the response body in the correct format for the client
// Transforms Anthropic responses to OpenAI format, passes OpenAI through
function getResponseBody(modelConfig, serviceReply) {
    if (modelConfig.provider === "anthropic") {
        const transformedResponse = transformAnthropicResponse(serviceReply.responseText);
        return JSON.stringify(transformedResponse);
    } else {
        return serviceReply.responseText; // Pass through as-is for OpenAI
    }
}

接下来,您需要定义 AI 模型的 endpoint,并将 NJS 脚本导入到您的 NGINX 配置中。下面提供了一个示例配置,但请注意,您需要定义自己的 OpenAI 和 Anthropic 模型 API key。

aiproxy.conf

# Import custom AI proxy NJS module
js_import /etc/njs/aiproxy.js;

resolver 8.8.8.8;

upstream openai {
    zone openai 64k;
    server api.openai.com:443 resolve;
}

upstream anthropic {
    zone anthropic 64k;
    server api.anthropic.com:443 resolve;
}

server {
    listen 4242;
    default_type application/json;
    js_set $ai_proxy_config aiproxy.load_rbac;

    location  /v1/chat/completions {
        set $aiproxy_user $http_x_user;
        js_content aiproxy.route;
    }

    # Internal locations
    # Those locations are not public
    location /openai {
        internal;

        rewrite ^ /v1/chat/completions;
        break;

        proxy_pass_request_headers off;

        proxy_set_header Host "api.openai.com";
        proxy_set_header Content-Type "application/json";

        proxy_set_header Authorization 'Bearer ${OPENAI_API_KEY}'; # replace me to set the OpenAI API key

        proxy_method POST;
        proxy_pass https://openai;

        proxy_ssl_verify on;
        proxy_ssl_server_name on;
        proxy_ssl_name "api.openai.com";
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }

    location /anthropic {
        internal;

        rewrite ^ /v1/messages;
        break;

        proxy_pass_request_headers off;

        proxy_set_header Host "api.anthropic.com";
        proxy_set_header Content-Type "application/json";
        proxy_set_header anthropic-version "2023-06-01"; # required by Anthropic API

        proxy_set_header x-api-key '${ANTHROPIC_API_KEY}'; # replace me to set the Anthropic API key

        proxy_method POST;
        proxy_pass https://anthropic;

        proxy_ssl_verify on;
        proxy_ssl_server_name on;
        proxy_ssl_name "api.anthropic.com";
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }
}

用例 2:访问控制

访问控制对于大多数应用至关重要,而在处理不同 AI 模型的访问时更是如此。有些模型可能被授权用于内部数据,而其他模型虽然在日常使用中更有用,但不应与内部数据交互。同样,有些模型成本较高,因此可能需要更严格的访问控制,并需获得明确批准才能使用。

要通过 NGINX 启用对后端 LLM 模型的访问控制,请先创建一个 JSON 文件,其中包含您的访问控制配置/用户标识,以及他们可以访问的模型:

rbac.json

{
    "users": {
        "user-a": {
            "models": [
                {
                    "name": "gpt-5",
                },
                {
                    "name": "claude-sonnet-4-20250514"
                }
            ]
        },
        "user-b": {
            "models": [
                {
                    "name": "gpt-5"
                }
            ]
        }
    },
    "models": {
        "gpt-5": {
            "provider": "openai",
            "location": "/openai"
        },
        "claude-sonnet-4-20250514": {
            "provider": "anthropic",
            "location": "/anthropic"
        }
    }
}

接下来,您需要在 NJS 脚本中创建一个 NJS 函数,将 JSON 数据加载到变量中:

aiproxy.js

...
// Loads RBAC configuration from a JSON file and sets it to an NGINX variable
function load_rbac() {
    try {
        // Adjust the path as needed
        let config = fs.readFileSync('/etc/nginx/rbac.json', 'utf8');
        return config;
    } catch (e) {
        return JSON.stringify({
            error: "Failed to load RBAC: " + e.message
        });
    }
}
...

最后,您需要将该 NJS 函数导入到您的 NGINX 配置中,如下所示:

aiproxy.conf

# Import custom AI proxy NJS module
js_import /etc/njs/aiproxy.js;

# Declare variable to hold RBAC configuration
js_var $ai_proxy_config "";

resolver 8.8.8.8;

upstream openai {
    zone openai 64k;
    server api.openai.com:443 resolve;
}

upstream anthropic {
    zone anthropic 64k;
    server api.anthropic.com:443 resolve;
}

server {
    listen 4242;
    default_type application/json;
    js_set $ai_proxy_config aiproxy.load_rbac;

    location  /v1/chat/completions {
        set $aiproxy_user $http_x_user;
        js_content aiproxy.route;
    }

    # Internal locations
    # Those locations are not public
    location /openai {
        internal;

        rewrite ^ /v1/chat/completions;
        break;

        proxy_pass_request_headers off;

        proxy_set_header Host "api.openai.com";
        proxy_set_header Content-Type "application/json";

        proxy_set_header Authorization 'Bearer ${OPENAI_API_KEY}'; # replace me to set the OpenAI API key

        proxy_method POST;
        proxy_pass https://openai;

        proxy_ssl_verify on;
        proxy_ssl_server_name on;
        proxy_ssl_name "api.openai.com";
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }

    location /anthropic {
        internal;

        rewrite ^ /v1/messages;
        break;

        proxy_pass_request_headers off;

        proxy_set_header Host "api.anthropic.com";
        proxy_set_header Content-Type "application/json";
        proxy_set_header anthropic-version "2023-06-01"; # required by Anthropic API

        proxy_set_header x-api-key '${ANTHROPIC_API_KEY}'; # replace me to set the Anthropic API key

        proxy_method POST;
        proxy_pass https://anthropic;

        proxy_ssl_verify on;
        proxy_ssl_server_name on;
        proxy_ssl_name "api.anthropic.com";
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }
}

根据我们的 JSON 文件,只有 User A 可以访问 OpenAI 和 Anthropic,而 User B 将仅限访问 OpenAI。要进行测试,可以尝试使用不同用户进行查询。使用 User A 查询应可访问两个模型,但User B的查询权限仅限于 Anthropic:

curl commands

curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-a' \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'

// Success

curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-a' \
  -d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"Hello"}]}'

// Success

curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-b' \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'

// Success

curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-b' \
  -d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"Hello"}]}'

// Failure

注意

您也可以通过 NGINX 的 set 指令将访问控制配置/用户标识定义为变量。我们选择加载 JSON 文件,因为这更贴近现实场景中这些数据的实际获取方式。

用例 3:模型宕机切换与备份

为了提高 AI 应用的可用性,如果由于某种原因模型不可用(例如用完所有可用 token、API key 正在轮换,或模型宕机等原因),针对该不可用模型的传入请求应重定向到仍可用的备份模型。这可以通过多种方式实现,但在本博客中,我们假设模型宕机切换机制与访问控制绑定。这是一种常见模式,因为它可以将宕机切换机制与用户可访问的模型关联起来。

我们将从扩展前一个用例中创建的 JSON 文件开始,以包含宕机切换数据:

rbac.json

{
    "users": {
        "user-a": {
            "models": [
                {
                    "name": "gpt-5",
                    "failover": "claude-sonnet-4-20250514"
                },
                {
                    "name": "claude-sonnet-4-20250514"
                }
            ]
        },
        "user-b": {
            "models": [
                {
                    "name": "gpt-5"
                }
            ]
        }
    },
    "models": {
        "gpt-5": {
            "provider": "openai",
            "location": "/openai"
        },
        "claude-sonnet-4-20250514": {
            "provider": "anthropic",
            "location": "/anthropic"
        }
    }
}

接下来,我们需要创建一个 NJS 脚本,用于检查请求的模型是否可用,如果不可用,则将用户请求重定向到备份模型:

aiproxy.js

...
// Main routing function for the AI proxy
// Handles user authentication, model selection, failover, and response transformation
async function route(r) {
    try {
        // Parse the AI proxy configuration from NGINX variable
        const configStr = r.variables.ai_proxy_config;
        if (!configStr) {
            r.return(500, JSON.stringify({
                error: {
                    message: "AI proxy configuration was not found"
                }
            }));
            return;
        }
 
        // Parse the configuration JSON
        let config;
        try {
            config = JSON.parse(configStr);
        } catch (e) {
            r.return(500, JSON.stringify({
                error: {
                    message: "Invalid AI proxy configuration JSON"
                }
            }));
            return;
        }
 
        // Extract the user from NGINX variable (set by header)
        const user = r.variables.aiproxy_user;
        if (!user) {
            r.return(401, JSON.stringify({
                error: {
                    message: "User not specified"
                }
            }));
            return;
        }
 
        // Check if user exists in configuration
        if (!config.users || !config.users[user]) {
            r.return(403, JSON.stringify({
                error: {
                    message: "User not authorized"
                }
            }));
            return;
        }
 
        // Check the JSON validity of the AI proxy request body
        let requestBody;
        try {
            requestBody = JSON.parse(r.requestText);
        } catch (e) {
            r.return(400, JSON.stringify({
                error: {
                    message: "Invalid JSON in request body"
                }
            }));
            return;
        }
 
        // Extract the model from the request
        const requestedModel = requestBody.model;
        if (!requestedModel) {
            r.return(400, JSON.stringify({
                error: {
                    message: "Model not specified in request"
                }
            }));
            return;
        }
 
        // Check if the requested model is available to the user
        const userModels = config.users[user].models;
        const userModel = userModels.find(m => m.name === requestedModel);
 
        if (!userModel) {
            r.return(404, JSON.stringify({
                error: {
                    message: `The model '${requestedModel}' was not found or is not accessible to this user`
                }
            }));
            return;
        }
 
        // Get the model configuration from the global config
        const modelConfig = config.models[requestedModel];
        if (!modelConfig) {
            r.return(500, JSON.stringify({
                error: {
                    message: `Model '${requestedModel}' configuration not found`
                }
            }));
            return;
        }
 
        // Try primary model first
        let serviceReply = await tryModel(r, modelConfig, requestBody);
        let usedModelConfig = modelConfig;
 
        // If primary model failed (status code is not 200) and failover is configured, try failover
        if (serviceReply.status !== 200 && userModel.failover) {
            r.log(`Primary model '${requestedModel}' failed with status ${serviceReply.status}, trying failover model '${userModel.failover}'`);
 
            // Get failover model configuration
            const failoverModelConfig = config.models[userModel.failover];
            if (!failoverModelConfig) {
                r.error(`Failover model '${userModel.failover}' configuration not found`);
                // Return the original error since failover is misconfigured
                let responseBody = getResponseBody(modelConfig, serviceReply);
                r.return(serviceReply.status, responseBody);
                return;
            }
 
            // Update the request body to use the failover model
            const failoverRequestBody = Object.assign({}, requestBody, {model: userModel.failover});
 
            // Try the failover model
            serviceReply = await tryModel(r, failoverModelConfig, failoverRequestBody);
            usedModelConfig = failoverModelConfig;
        }
 
        // Transform and return response body based on provider that was actually used
        let responseBody = getResponseBody(usedModelConfig, serviceReply);
        r.return(serviceReply.status, responseBody);
 
    } catch (e) {
        r.log(`Error: ${e.toString()}`);
        r.return(500, JSON.stringify({
            error: {
                message: "Internal server error",
            }
        }));
    }
}
...

基于上述配置,如果 User A 尝试使用 OpenAI 且 OpenAI 不可用,NGINX 将把请求重定向到 Anthropic。User B 仅有访问 OpenAI 的权限,因此如果 OpenAI 不可用,请求将全部失败。

为了进行测试,您需要在 NGINX 配置中修改 OpenAI API key,确保该 key 不再有效。完成之后,尝试运行:

curl commands

curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-a' \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'

// Response comes from Anthropic

curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-b' \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'

// Failure

用例 4:Token 使用日志

本文的最后一个用例是 Token 使用日志。大多数现有 LLM 都基于 token 系统运行,不同请求消耗的 token 数量不同。计费通常与这些 token 用量挂钩,因此跟踪当前 token 使用情况非常重要,以确保 token 使用不会失控。

为了在 NGINX 中记录 token 使用情况,我们将在 NJS 代码中添加几行,用于从 LLM response body中提取 token 使用信息,并将其保存到 NJS 变量中。为此,请将以下代码添加到上一步中定义的 route JS 函数中:

aiproxy.js

...
// Main routing function for the AI proxy
// Handles user authentication, model selection, failover, and response transformation
async function route(r) {
    try {
        // Parse the AI proxy configuration from NGINX variable
        const configStr = r.variables.ai_proxy_config;
        if (!configStr) {
            r.return(500, JSON.stringify({
                error: {
                    message: "AI proxy configuration was not found"
                }
            }));
            return;
        }
 
        // Parse the configuration JSON
        let config;
        try {
            config = JSON.parse(configStr);
        } catch (e) {
            r.return(500, JSON.stringify({
                error: {
                    message: "Invalid AI proxy configuration JSON"
                }
            }));
            return;
        }
 
        // Extract the user from NGINX variable (set by header)
        const user = r.variables.aiproxy_user;
        if (!user) {
            r.return(401, JSON.stringify({
                error: {
                    message: "User not specified"
                }
            }));
            return;
        }
 
        // Check if user exists in configuration
        if (!config.users || !config.users[user]) {
            r.return(403, JSON.stringify({
                error: {
                    message: "User not authorized"
                }
            }));
            return;
        }
 
        // Check the JSON validity of the AI proxy request body
        let requestBody;
        try {
            requestBody = JSON.parse(r.requestText);
        } catch (e) {
            r.return(400, JSON.stringify({
                error: {
                    message: "Invalid JSON in request body"
                }
            }));
            return;
        }
 
        // Extract the model from the request
        const requestedModel = requestBody.model;
        if (!requestedModel) {
            r.return(400, JSON.stringify({
                error: {
                    message: "Model not specified in request"
                }
            }));
            return;
        }
 
        // Check if the requested model is available to the user
        const userModels = config.users[user].models;
        const userModel = userModels.find(m => m.name === requestedModel);
 
        if (!userModel) {
            r.return(404, JSON.stringify({
                error: {
                    message: `The model '${requestedModel}' was not found or is not accessible to this user`
                }
            }));
            return;
        }
 
        // Get the model configuration from the global config
        const modelConfig = config.models[requestedModel];
        if (!modelConfig) {
            r.return(500, JSON.stringify({
                error: {
                    message: `Model '${requestedModel}' configuration not found`
                }
            }));
            return;
        }
 
        // Try primary model first
        let serviceReply = await tryModel(r, modelConfig, requestBody);
        let usedModelConfig = modelConfig;
 
        // If primary model failed (status code is not 200) and failover is configured, try failover
        if (serviceReply.status !== 200 && userModel.failover) {
            r.log(`Primary model '${requestedModel}' failed with status ${serviceReply.status}, trying failover model '${userModel.failover}'`);
 
            // Get failover model configuration
            const failoverModelConfig = config.models[userModel.failover];
            if (!failoverModelConfig) {
                r.error(`Failover model '${userModel.failover}' configuration not found`);
                // Return the original error since failover is misconfigured
                let responseBody = getResponseBody(modelConfig, serviceReply);
                r.return(serviceReply.status, responseBody);
                return;
            }
 
            // Update the request body to use the failover model
            const failoverRequestBody = Object.assign({}, requestBody, {model: userModel.failover});
 
            // Try the failover model
            serviceReply = await tryModel(r, failoverModelConfig, failoverRequestBody);
            usedModelConfig = failoverModelConfig;
        }
 
        // Transform and return response body based on provider that was actually used
        let responseBody = getResponseBody(usedModelConfig, serviceReply);
 
        // Extract token usage information from response and set NGINX variables for logging
        if (serviceReply.status === 200) {
            try {
                const parsedResponse = JSON.parse(responseBody);
                if (parsedResponse.usage) {
                    r.variables.ai_proxy_response_prompt_tokens = parsedResponse.usage.prompt_tokens || "";
                    r.variables.ai_proxy_response_completion_tokens = parsedResponse.usage.completion_tokens || "";
                    r.variables.ai_proxy_response_total_tokens = parsedResponse.usage.total_tokens || "";
                }
            } catch (e) {
                r.log(`Warning: Failed to parse response body for token extraction: ${e.toString()}`);
            }
        }
 
        r.return(serviceReply.status, responseBody);
 
    } catch (e) {
        r.log(`Error: ${e.toString()}`);
        r.return(500, JSON.stringify({
            error: {
                message: "Internal server error",
            }
        }));
    }
}
...

接下来,我们将在 NGINX 配置中加载这些变量:

aiproxy.conf

# Import custom AI proxy NJS module
js_import /etc/njs/aiproxy.js;

# Declare variable to hold RBAC configuration
js_var $ai_proxy_config "";
# Declare variables for token tracking
js_var $ai_proxy_response_prompt_tokens "";
js_var $ai_proxy_response_completion_tokens "";
js_var $ai_proxy_response_total_tokens "";

resolver 8.8.8.8;

upstream openai {
    zone openai 64k;
    server api.openai.com:443 resolve;
}

upstream anthropic {
    zone anthropic 64k;
    server api.anthropic.com:443 resolve;
}

server {
    listen 4242;
    default_type application/json;
    js_set $ai_proxy_config aiproxy.load_rbac;

    location  /v1/chat/completions {
        set $aiproxy_user $http_x_user;
        js_content aiproxy.route;
    }

    # Internal locations
    # Those locations are not public
    location /openai {
        internal;

        rewrite ^ /v1/chat/completions;
        break;

        proxy_pass_request_headers off;

        proxy_set_header Host "api.openai.com";
        proxy_set_header Content-Type "application/json";

        proxy_set_header Authorization 'Bearer ${OPENAI_API_KEY}'; # replace me to set the OpenAI API key

        proxy_method POST;
        proxy_pass https://openai;

        proxy_ssl_verify on;
        proxy_ssl_server_name on;
        proxy_ssl_name "api.openai.com";
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }

    location /anthropic {
        internal;

        rewrite ^ /v1/messages;
        break;

        proxy_pass_request_headers off;

        proxy_set_header Host "api.anthropic.com";
        proxy_set_header Content-Type "application/json";
        proxy_set_header anthropic-version "2023-06-01"; # required by Anthropic API

        proxy_set_header x-api-key '${ANTHROPIC_API_KEY}'; # replace me to set the Anthropic API key

        proxy_method POST;
        proxy_pass https://anthropic;

        proxy_ssl_verify on;
        proxy_ssl_server_name on;
        proxy_ssl_name "api.anthropic.com";
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }
}

最后,修改 NGINX 的 access log 配置,使其在 NGINX 处理每一个请求时都输出这些变量:

nginx.conf

user nginx;
worker_processes 1;

error_log /var/log/nginx/error.log info;
pid /var/run/nginx.pid;

load_module /usr/lib/nginx/modules/ngx_http_js_module.so;

events {
    worker_connections 1024;
}

http {
    default_type application/octet-stream;

    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for" '
                    'prompt_tokens=$ai_proxy_response_prompt_tokens '
                    'completion_tokens=$ai_proxy_response_completion_tokens '
                    'total_tokens=$ai_proxy_response_total_tokens';

    access_log /var/log/nginx/access.log main;

    sendfile on;
    keepalive_timeout 65;

    include /etc/nginx/aiproxy.conf;
}

要进行测试,请运行之前的任意 curl 命令,并检查 NGINX access log:

access.log

... 401 ... prompt_tokens= completion_tokens= total_tokens= // Failed request
... 200 ... prompt_tokens=13 completion_tokens=39 total_tokens=52 // Successful request

注意

提取当前 token 使用情况在其他场景中也很有用,例如根据当前需求对任意模型的访问进行速率限制。

亲自体验 NGINX AI Proxy

涵盖上述所有用例的完整 demo 可在 NGINX demos AI Proxy GitHub 仓库中找到。要将该 demo 运行起来,您需要具备以下条件:

  • An OpenAI API key

  • An Anthropic API key

  • Docker

在准备好所有前置条件之后,通过运行以下命令克隆仓库:

git clone https://github.com/nginx/nginx-demos

在克隆的仓库中打开终端会话,并切换到 nginx-demos/nginx/ai-proxy 目录:

cd nginx-demos/nginx/ai-proxy

为了使该环境正常工作,您需要在该目录下执行以下所有命令:

1. 确保您已经下载了最新版本的 NGINX OSS Docker 镜像:

docker pull nginx:1.29.1

2. 将 OpenAI 和 Anthropic API key 导出到变量中(注意:若要测试宕机切换场景,请在此步骤中不要导出 OpenAI API key 或导出一个无效的 API key):

export OPENAI_API_KEY=<API_KEY> export ANTHROPIC_API_KEY=<API_KEY>

3. 为生成的 key 片段创建一个持久化的 Docker volume:

docker volume create nginx-keys

4. 使用以下命令启动一个新的 NGINX Docker container:

docker run -it --rm -p 4242:4242 \
  -v $(pwd)/config:/etc/nginx \
  -v $(pwd)/njs:/etc/njs \
  -v $(pwd)/templates:/etc/nginx-ai-proxy/templates \
  -v nginx-keys:/etc/nginx-ai-proxy/keys \
  -e NGINX_ENVSUBST_TEMPLATE_DIR=/etc/nginx-ai-proxy/templates \
  -e NGINX_ENVSUBST_OUTPUT_DIR=/etc/nginx-ai-proxy/keys \
  -e OPENAI_API_KEY \
  -e ANTHROPIC_API_KEY \
  --name nginx-ai-proxy \
  nginx:1.29.1

最后,为了测试 NGINX 作为 AI proxy 的功能,可以使用以下命令,以 User A 的身份向 OpenAI 模型发起请求:

curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-a' \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'

如果您选择不导出 OpenAI API key,并希望测试宕机切换机制,请改用以下命令:

docker run -it --rm -p 4242:4242 \
  -v $(pwd)/config:/etc/nginx \
  -v $(pwd)/njs:/etc/njs \
  -v $(pwd)/templates:/etc/nginx-ai-proxy/templates \
  -v nginx-keys:/etc/nginx-ai-proxy/keys \
  -e NGINX_ENVSUBST_TEMPLATE_DIR=/etc/nginx-ai-proxy/templates \
  -e NGINX_ENVSUBST_OUTPUT_DIR=/etc/nginx-ai-proxy/keys \
  -e OPENAI_API_KEY=bad \
  -e ANTHROPIC_API_KEY \
  --name nginx-ai-proxy \
  nginx:1.29.1

随后,您可以再次以 User A 的身份向 OpenAI 模型发起请求,以此测试宕机切换机制。这次响应应当来自 Anthropic。

注意

更完整的示例请求及其预期响应结果,已在 NGINX demos AI Proxy GitHub 仓库的 README 文件中提供。

最后思考

本文仅介绍了使用 NGINX 与 NJS 实现 AI proxy 的部分使用场景,但实际上还存在更多可能!使用 NJS 可以在扩展 NGINX 以实现 AI 用例时开启许多可能性!虽然本次 demo 使用的是 Docker 容器,但您同样可以在 K8s 集群中部署一套类似的架构。使用 NGINX 作为 AI proxy 的一个主要限制是缺乏专用的 AI 安全防护措施,因为这些通常需要专业的 AI 安全解决方案才能实现最佳效果,而无法仅通过 NJS 来实现。

现在,我们希望听到您的声音!您是否已经在使用 NGINX 作为 AI proxy?NGINX 是否以任何形式参与到了您的 AI pipeline 中?欢迎与我们分享您的实践经验,说不定我们会在后续博客中展示您的实现方案!

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐