基于蓝耘MaaS平台构建多模型LLM应用:从架构设计到全栈实现

一、引言:MaaS平台与LLM应用开发新范式

1.1 MaaS平台的革命性意义

模型即服务(Model as a Service, MaaS)平台正在彻底改变人工智能应用的开发方式。根据Gartner最新预测,到2025年,70%的新兴AI应用将通过MaaS平台进行开发和部署,而非从零开始训练模型。这种转变不仅大幅降低了AI应用的技术门槛,还显著提升了开发效率和应用性能。

蓝标元生代码算云平台作为国内领先的MaaS服务提供商之一,提供了包括DeepSeek、通义千问在内的多种大语言模型API接口,支持开发者快速构建高质量的LLM应用。本文将从架构设计、前端实现、后端开发、多模型调度等维度,详细讲解如何基于该平台构建一个功能完备的多模型LLM应用系统。
在这里插入图片描述

1.2 系统设计目标与架构概述

我们计划构建的LLM应用系统具备以下核心特性:

  • 多模型支持:同时接入DeepSeek-R1、DeepSeek-V3、Qwen等多个大语言模型
  • 统一API网关:提供标准化的接口规范,简化前端调用复杂度
  • 智能路由:根据 query 内容和模型特性自动选择最优模型
  • 流式响应:支持实时生成效果,提升用户体验
  • 使用监控:实时跟踪token消耗和API调用情况
  • 可扩展架构:便于后续新增模型和功能模块
支持服务层
模型服务层
API网关层
前端展示层
日志系统
使用统计
配置中心
缓存管理
API适配器
DeepSeek-R1
DeepSeek-V3
Qwen系列
其他模型
认证鉴权
统一路由
负载均衡
限流控制
移动适配
Web界面
实时交互

二、技术栈选择与环境配置

2.1 前端技术栈

前端采用Vue 3组合式API开发,主要依赖包括:

  • Vue 3:响应式前端框架
  • Element Plus:UI组件库
  • Axios:HTTP客户端
  • Vite:构建工具
  • Socket.io-client:WebSocket通信

package.json 关键依赖配置:

{
  "name": "llm-application-frontend",
  "version": "1.0.0",
  "dependencies": {
    "vue": "^3.3.0",
    "element-plus": "^2.3.0",
    "axios": "^1.4.0",
    "socket.io-client": "^4.6.0",
    "highlight.js": "^11.8.0"
  },
  "devDependencies": {
    "vite": "^4.3.0",
    "@vitejs/plugin-vue": "^4.1.0",
    "sass": "^1.62.0"
  }
}

2.2 后端技术栈

后端采用Spring Boot框架,主要技术组件:

  • Spring Boot 3.x:后端应用框架
  • Spring WebFlux:响应式Web支持
  • WebClient:非阻塞HTTP客户端
  • Redis:缓存和会话管理
  • MySQL:数据持久化
  • JWT:认证鉴权

pom.xml 关键依赖配置:

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-webflux</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-redis</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-jpa</artifactId>
    </dependency>
    <dependency>
        <groupId>io.jsonwebtoken</groupId>
        <artifactId>jjwt-api</artifactId>
        <version>0.11.5</version>
    </dependency>
</dependencies>

2.3 开发环境配置

2.3.1 前端开发环境搭建
# 创建Vue项目
npm create vite@latest llm-chat-frontend -- --template vue

# 安装依赖
cd llm-chat-frontend
npm install

# 安装额外依赖
npm install element-plus axios socket.io-client highlight.js

# 启动开发服务器
npm run dev
2.3.2 后端开发环境配置

application.yml 关键配置:

server:
  port: 8080
  compression:
    enabled: true
    mime-types: text/html,text/xml,text/plain,text/css,text/javascript,application/javascript,application/json
    min-response-size: 1024

spring:
  datasource:
    url: jdbc:mysql://localhost:3306/llm_app?useUnicode=true&characterEncoding=utf8&serverTimezone=Asia/Shanghai
    username: root
    password: your_password
    driver-class-name: com.mysql.cj.jdbc.Driver
    
  redis:
    host: localhost
    port: 6379
    password: 
    database: 0

maas:
  api:
    base-url: https://maas-api.lanyun.net
    deepseek-r1: /maas/deepseek-ai/DeepSeek-R1-0528
    deepseek-v3: /maas/deepseek-ai/DeepSeek-V3-0324
    qwen-32b: /maas/qwen/QwQ-32B

三、前端界面设计与实现

3.1 响应式布局设计

前端采用Flex+Grid混合布局方案,确保在各种设备上都能提供良好的用户体验。

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>多模型LLM对话平台</title>
    <link rel="stylesheet" href="https://unpkg.com/element-plus/dist/index.css">
    <style>
        :root {
            --primary-color: #409EFF;
            --bg-color: #f5f7fa;
            --text-color: #303133;
            --border-color: #dcdfe6;
        }
        
        .app-container {
            display: grid;
            grid-template-rows: 60px 1fr 80px;
            height: 100vh;
            background-color: var(--bg-color);
        }
        
        .header {
            display: flex;
            align-items: center;
            padding: 0 20px;
            border-bottom: 1px solid var(--border-color);
            background-color: white;
        }
        
        .main-content {
            display: grid;
            grid-template-columns: 250px 1fr;
            gap: 0;
            overflow: hidden;
        }
        
        .sidebar {
            border-right: 1px solid var(--border-color);
            background-color: white;
            overflow-y: auto;
        }
        
        .chat-container {
            display: flex;
            flex-direction: column;
            background-color: white;
        }
        
        .messages-area {
            flex: 1;
            overflow-y: auto;
            padding: 20px;
        }
        
        .input-area {
            border-top: 1px solid var(--border-color);
            padding: 15px;
            background-color: white;
        }
        
        @media (max-width: 768px) {
            .main-content {
                grid-template-columns: 1fr;
            }
            .sidebar {
                display: none;
            }
        }
    </style>
</head>
<body>
    <div id="app"></div>
    <script type="module" src="/src/main.js"></script>
</body>
</html>

3.2 Vue 3组件化实现

3.2.1 主应用组件
<template>
  <div class="app-container">
    <AppHeader />
    <div class="main-content">
      <ConversationSidebar />
      <ChatMain />
    </div>
    <AppFooter />
  </div>
</template>

<script setup>
import { provide, ref, reactive } from 'vue'
import AppHeader from './components/AppHeader.vue'
import AppFooter from './components/AppFooter.vue'
import ConversationSidebar from './components/ConversationSidebar.vue'
import ChatMain from './components/ChatMain.vue'

// 提供全局状态
const currentConversation = ref(null)
const conversations = ref([])
const apiSettings = reactive({
  apiKey: localStorage.getItem('maas_api_key') || '',
  baseURL: 'https://maas-api.lanyun.net',
  selectedModel: '/maas/deepseek-ai/DeepSeek-R1-0528'
})

provide('currentConversation', currentConversation)
provide('conversations', conversations)
provide('apiSettings', apiSettings)
</script>
3.2.2 聊天主界面组件
<template>
  <div class="chat-container">
    <div class="messages-area" ref="messagesRef">
      <div v-for="(message, index) in messages" :key="index" class="message-item">
        <div :class="['message-bubble', message.role]">
          <div class="message-avatar">
            <el-avatar :size="36">
              <span v-if="message.role === 'user'">用户</span>
              <span v-else>AI</span>
            </el-avatar>
          </div>
          <div class="message-content">
            <div v-if="message.role === 'assistant'" class="model-tag">
              {{ getModelName(message.model) }}
            </div>
            <div v-html="highlightCode(renderMarkdown(message.content))"></div>
          </div>
        </div>
      </div>
      
      <div v-if="isGenerating" class="message-item">
        <div class="message-bubble assistant">
          <div class="message-avatar">
            <el-avatar :size="36">AI</el-avatar>
          </div>
          <div class="message-content">
            <div class="model-tag">{{ getModelName(apiSettings.selectedModel) }}</div>
            <div class="typing-indicator">
              <span></span><span></span><span></span>
            </div>
          </div>
        </div>
      </div>
    </div>
    
    <div class="input-area">
      <MessageInput @send-message="handleSendMessage" />
    </div>
  </div>
</template>

<script setup>
import { ref, watch, nextTick, computed } from 'vue'
import { ElMessage } from 'element-plus'
import hljs from 'highlight.js'
import 'highlight.js/styles/github.css'
import MarkdownIt from 'markdown-it'
import MessageInput from './MessageInput.vue'

const md = new MarkdownIt({
  html: true,
  linkify: true,
  typographer: true
})

const props = defineProps({
  conversation: Object
})

const messagesRef = ref(null)
const isGenerating = ref(false)

const messages = computed(() => {
  return props.conversation ? props.conversation.messages : []
})

// 自动滚动到底部
watch(messages, () => {
  nextTick(() => {
    if (messagesRef.value) {
      messagesRef.value.scrollTop = messagesRef.value.scrollHeight
    }
  })
}, { deep: true })

// Markdown渲染
const renderMarkdown = (content) => {
  return md.render(content || '')
}

// 代码高亮
const highlightCode = (html) => {
  const div = document.createElement('div')
  div.innerHTML = html
  div.querySelectorAll('pre code').forEach((block) => {
    hljs.highlightElement(block)
  })
  return div.innerHTML
}

// 获取模型显示名称
const getModelName = (modelPath) => {
  const modelMap = {
    '/maas/deepseek-ai/DeepSeek-R1-0528': 'DeepSeek-R1',
    '/maas/deepseek-ai/DeepSeek-V3-0324': 'DeepSeek-V3',
    '/maas/qwen/QwQ-32B': 'Qwen-32B',
    '/maas/qwen/Qwen2.5-72B-Instruct': 'Qwen2.5-72B'
  }
  return modelMap[modelPath] || modelPath
}

// 处理发送消息
const handleSendMessage = async (content) => {
  if (!content.trim()) return
  
  // 添加到消息列表
  props.conversation.messages.push({
    role: 'user',
    content: content,
    timestamp: new Date()
  })
  
  isGenerating.value = true
  
  try {
    // 调用API
    const response = await sendMessageToAPI(props.conversation.messages)
    
    props.conversation.messages.push({
      role: 'assistant',
      content: response.content,
      model: apiSettings.selectedModel,
      timestamp: new Date()
    })
  } catch (error) {
    ElMessage.error('发送消息失败: ' + error.message)
  } finally {
    isGenerating.value = false
  }
}
</script>

<style scoped>
.typing-indicator {
  display: inline-flex;
  align-items: center;
  height: 20px;
}

.typing-indicator span {
  height: 8px;
  width: 8px;
  background-color: #909399;
  border-radius: 50%;
  display: inline-block;
  margin: 0 2px;
  animation: bounce 1.3s infinite ease-in-out;
}

.typing-indicator span:nth-child(2) {
  animation-delay: 0.15s;
}

.typing-indicator span:nth-child(3) {
  animation-delay: 0.3s;
}

@keyframes bounce {
  0%, 80%, 100% {
    transform: translateY(0);
  }
  40% {
    transform: translateY(-10px);
  }
}
</style>

四、后端API网关设计与实现

4.1 统一API网关架构

后端采用Spring WebFlux实现响应式API网关,处理模型路由、认证、限流等功能。

// API网关主控制器
@RestController
@RequestMapping("/api/v1")
public class ApiGatewayController {
    
    private final ModelService modelService;
    private final RateLimiterService rateLimiterService;
    private final AuthenticationService authService;
    
    public ApiGatewayController(ModelService modelService, 
                              RateLimiterService rateLimiterService,
                              AuthenticationService authService) {
        this.modelService = modelService;
        this.rateLimiterService = rateLimiterService;
        this.authService = authService;
    }
    
    @PostMapping("/chat/completions")
    public Mono<ResponseEntity<Object>> chatCompletions(
            @RequestBody ChatRequest request,
            @RequestHeader(value = "Authorization", required = false) String authHeader,
            ServerWebExchange exchange) {
        
        return authService.authenticate(authHeader)
            .flatMap(user -> rateLimiterService.checkRateLimit(user.getId()))
            .flatMap(allow -> {
                if (!allow) {
                    return Mono.just(ResponseEntity.status(429)
                        .body(Map.of("error", "Rate limit exceeded")));
                }
                
                return modelService.invokeModel(request, exchange)
                    .map(response -> ResponseEntity.ok().body(response))
                    .onErrorResume(error -> handleError(error, exchange));
            });
    }
    
    private Mono<ResponseEntity<Object>> handleError(Throwable error, ServerWebExchange exchange) {
        // 错误处理逻辑
        if (error instanceof ModelTimeoutException) {
            return Mono.just(ResponseEntity.status(504)
                .body(Map.of("error", "Model request timeout")));
        }
        
        return Mono.just(ResponseEntity.status(500)
            .body(Map.of("error", "Internal server error")));
    }
}

4.2 模型服务抽象层

设计统一的模型服务接口,支持多种LLM模型接入。

// 模型服务接口
public interface ModelService {
    Mono<ChatResponse> invokeModel(ChatRequest request, ServerWebExchange exchange);
    boolean supportsModel(String modelPath);
    ModelInfo getModelInfo();
}

// DeepSeek模型服务实现
@Service
@Primary
public class DeepSeekModelService implements ModelService {
    
    private final WebClient webClient;
    private final String apiBaseUrl;
    private final String modelPath;
    private final ObjectMapper objectMapper;
    
    public DeepSeekModelService(
            @Value("${maas.api.base-url}") String apiBaseUrl,
            @Value("${maas.api.deepseek-r1}") String modelPath,
            WebClient.Builder webClientBuilder,
            ObjectMapper objectMapper) {
        
        this.apiBaseUrl = apiBaseUrl;
        this.modelPath = modelPath;
        this.webClient = webClientBuilder.baseUrl(apiBaseUrl).build();
        this.objectMapper = objectMapper;
    }
    
    @Override
    public Mono<ChatResponse> invokeModel(ChatRequest request, ServerWebExchange exchange) {
        // 构建API请求
        Map<String, Object> apiRequest = createApiRequest(request);
        
        return webClient.post()
            .uri("/chat/completions")
            .header("Authorization", "Bearer " + getApiKey())
            .header("Content-Type", "application/json")
            .bodyValue(apiRequest)
            .retrieve()
            .bodyToMono(String.class)
            .timeout(Duration.ofSeconds(30))
            .flatMap(responseBody -> parseResponse(responseBody, exchange));
    }
    
    private Map<String, Object> createApiRequest(ChatRequest request) {
        Map<String, Object> apiRequest = new HashMap<>();
        apiRequest.put("model", this.modelPath);
        apiRequest.put("messages", convertMessages(request.getMessages()));
        apiRequest.put("stream", request.isStream());
        
        if (request.getMaxTokens() != null) {
            apiRequest.put("max_tokens", request.getMaxTokens());
        }
        if (request.getTemperature() != null) {
            apiRequest.put("temperature", request.getTemperature());
        }
        
        return apiRequest;
    }
    
    private List<Map<String, String>> convertMessages(List<ChatMessage> messages) {
        return messages.stream()
            .map(msg -> {
                Map<String, String> converted = new HashMap<>();
                converted.put("role", msg.getRole());
                converted.put("content", msg.getContent());
                return converted;
            })
            .collect(Collectors.toList());
    }
    
    private Mono<ChatResponse> parseResponse(String responseBody, ServerWebExchange exchange) {
        try {
            JsonNode rootNode = objectMapper.readTree(responseBody);
            
            ChatResponse response = new ChatResponse();
            response.setId(rootNode.path("id").asText());
            response.setModel(rootNode.path("model").asText());
            response.setCreated(rootNode.path("created").asLong());
            
            JsonNode choicesNode = rootNode.path("choices");
            if (choicesNode.isArray() && choicesNode.size() > 0) {
                JsonNode firstChoice = choicesNode.get(0);
                JsonNode messageNode = firstChoice.path("message");
                
                ChatMessage message = new ChatMessage();
                message.setRole(messageNode.path("role").asText());
                message.setContent(messageNode.path("content").asText());
                response.setMessage(message);
                
                response.setFinishReason(firstChoice.path("finish_reason").asText());
            }
            
            JsonNode usageNode = rootNode.path("usage");
            if (!usageNode.isMissingNode()) {
                UsageInfo usage = new UsageInfo();
                usage.setPromptTokens(usageNode.path("prompt_tokens").asInt());
                usage.setCompletionTokens(usageNode.path("completion_tokens").asInt());
                usage.setTotalTokens(usageNode.path("total_tokens").asInt());
                response.setUsage(usage);
            }
            
            return Mono.just(response);
        } catch (Exception e) {
            return Mono.error(new ModelParseException("Failed to parse model response", e));
        }
    }
    
    @Override
    public boolean supportsModel(String modelPath) {
        return this.modelPath.equals(modelPath);
    }
    
    @Override
    public ModelInfo getModelInfo() {
        ModelInfo info = new ModelInfo();
        info.setModelPath(this.modelPath);
        info.setModelName("DeepSeek-R1");
        info.setMaxTokens(4096);
        info.setSupportsStreaming(true);
        return info;
    }
    
    private String getApiKey() {
        // 从配置或数据库中获取API Key
        return System.getenv("MAAS_API_KEY");
    }
}

4.3 流式响应处理

对于流式请求,使用Server-Sent Events (SSE)实现实时数据推送。

// 流式响应控制器
@RestController
@RequestMapping("/api/v1")
public class StreamController {
    
    private final ModelStreamingService streamingService;
    
    public StreamController(ModelStreamingService streamingService) {
        this.streamingService = streamingService;
    }
    
    @GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<ServerSentEvent<Object>> streamChat(
            @RequestParam String conversationId,
            @RequestParam String message,
            @RequestHeader(value = "Authorization") String authHeader) {
        
        return authService.authenticate(authHeader)
            .flatMapMany(user -> streamingService.streamResponse(conversationId, message, user));
    }
}

// 流式服务实现
@Service
public class ModelStreamingService {
    
    private final WebClient webClient;
    private final ObjectMapper objectMapper;
    
    public Flux<ServerSentEvent<Object>> streamResponse(String conversationId, String message, User user) {
        return Flux.create(emitter -> {
            try {
                // 构建流式请求
                Map<String, Object> request = new HashMap<>();
                request.put("model", "/maas/deepseek-ai/DeepSeek-R1-0528");
                request.put("messages", List.of(
                    Map.of("role", "user", "content", message)
                ));
                request.put("stream", true);
                
                webClient.post()
                    .uri("/chat/completions")
                    .header("Authorization", "Bearer " + getApiKey())
                    .header("Content-Type", "application/json")
                    .bodyValue(request)
                    .retrieve()
                    .bodyToFlux(String.class)
                    .timeout(Duration.ofMinutes(5))
                    .subscribe(
                        chunk -> processChunk(chunk, emitter),
                        error -> emitter.error(error),
                        () -> emitter.complete()
                    );
                    
            } catch (Exception e) {
                emitter.error(e);
            }
        });
    }
    
    private void processChunk(String chunk, FluxSink<ServerSentEvent<Object>> emitter) {
        try {
            if (chunk.startsWith("data: ")) {
                String jsonData = chunk.substring(6);
                if ("[DONE]".equals(jsonData.trim())) {
                    emitter.next(ServerSentEvent.builder()
                        .event("end")
                        .data(Map.of("status", "complete"))
                        .build());
                    return;
                }
                
                JsonNode dataNode = objectMapper.readTree(jsonData);
                JsonNode choicesNode = dataNode.path("choices");
                if (choicesNode.isArray() && choicesNode.size() > 0) {
                    JsonNode deltaNode = choicesNode.get(0).path("delta");
                    if (!deltaNode.isMissingNode()) {
                        String content = deltaNode.path("content").asText();
                        if (StringUtils.hasText(content)) {
                            Map<String, Object> response = new HashMap<>();
                            response.put("type", "content");
                            response.put("content", content);
                            
                            emitter.next(ServerSentEvent.builder()
                                .data(response)
                                .build());
                        }
                    }
                }
            }
        } catch (Exception e) {
            // 处理解析错误
            emitter.next(ServerSentEvent.builder()
                .event("error")
                .data(Map.of("error", "Failed to parse chunk"))
                .build());
        }
    }
}

五、多模型调度与路由策略

5.1 智能模型路由

根据查询内容、模型性能和成本等因素,智能选择最合适的模型。

// 模型路由服务
@Service
public class ModelRouterService {
    
    private final List<ModelService> modelServices;
    private final ModelPerformanceTracker performanceTracker;
    private final ModelCostCalculator costCalculator;
    
    public ModelRouterService(List<ModelService> modelServices,
                            ModelPerformanceTracker performanceTracker,
                            ModelCostCalculator costCalculator) {
        this.modelServices = modelServices;
        this.performanceTracker = performanceTracker;
        this.costCalculator = costCalculator;
    }
    
    public ModelService selectBestModel(ChatRequest request, User user) {
        // 获取所有可用模型
        List<ModelCandidate> candidates = modelServices.stream()
            .filter(service -> service.supportsModel(request.getModel()) || 
                              ("auto".equals(request.getModel()) && isModelSuitable(service, request)))
            .map(service -> createModelCandidate(service, request, user))
            .collect(Collectors.toList());
        
        if (candidates.isEmpty()) {
            throw new NoSuitableModelException("No suitable model found for request");
        }
        
        // 根据评分选择最佳模型
        return candidates.stream()
            .max(Comparator.comparingDouble(ModelCandidate::getScore))
            .orElse(candidates.get(0))
            .getService();
    }
    
    private ModelCandidate createModelCandidate(ModelService service, ChatRequest request, User user) {
        ModelInfo info = service.getModelInfo();
        double performanceScore = calculatePerformanceScore(service, request);
        double costScore = calculateCostScore(service, request, user);
        double suitabilityScore = calculateSuitabilityScore(service, request);
        
        double totalScore = performanceScore * 0.4 + costScore * 0.3 + suitabilityScore * 0.3;
        
        return new ModelCandidate(service, totalScore, performanceScore, costScore, suitabilityScore);
    }
    
    private double calculatePerformanceScore(ModelService service, ChatRequest request) {
        ModelInfo info = service.getModelInfo();
        String modelPath = info.getModelPath();
        
        // 获取历史性能数据
        ModelPerformanceStats stats = performanceTracker.getStats(modelPath);
        double latencyScore = 1.0 - Math.min(stats.getAverageLatency() / 5000.0, 1.0);
        double successRateScore = stats.getSuccessRate();
        
        // 根据查询复杂度调整分数
        int messageLength = request.getMessages().stream()
            .mapToInt(msg -> msg.getContent().length())
            .sum();
        double complexityFactor = Math.min(messageLength / 1000.0, 1.0);
        
        return (latencyScore * 0.6 + successRateScore * 0.4) * (1.0 - complexityFactor * 0.2);
    }
    
    private double calculateCostScore(ModelService service, ChatRequest request, User user) {
        ModelInfo info = service.getModelInfo();
        double estimatedCost = costCalculator.estimateCost(info.getModelPath(), request);
        double userBalance = user.getBalance();
        
        // 成本越低分数越高,考虑用户余额
        double costFactor = 1.0 - Math.min(estimatedCost / 10.0, 1.0);
        double balanceFactor = Math.min(userBalance / 100.0, 1.0);
        
        return costFactor * 0.7 + balanceFactor * 0.3;
    }
    
    private double calculateSuitabilityScore(ModelService service, ChatRequest request) {
        ModelInfo info = service.getModelInfo();
        String content = request.getMessages().stream()
            .map(ChatMessage::getContent)
            .collect(Collectors.joining("\n"));
        
        // 简单的内容类型检测
        boolean isCodeRelated = containsCode(content);
        boolean isCreative = isCreativeContent(content);
        boolean isTechnical = isTechnicalContent(content);
        
        // 根据模型特性匹配内容类型
        double score = 0.5; // 基础分数
        
        if (info.getModelPath().contains("deepseek") && isCodeRelated) {
            score += 0.3; // DeepSeek擅长代码
        }
        
        if (info.getModelPath().contains("qwen") && isCreative) {
            score += 0.3; // Qwen擅长创意内容
        }
        
        if (info.getModelPath().contains("v3") && isTechnical) {
            score += 0.2; // V3擅长技术内容
        }
        
        return Math.min(score, 1.0);
    }
    
    private boolean containsCode(String content) {
        return content.contains("```") || 
               content.matches(".*(function|class|import|package|def|var|let|const).*");
    }
    
    private boolean isCreativeContent(String content) {
        return content.matches(".*(故事|诗歌|小说|创意|想象).*");
    }
    
    private boolean isTechnicalContent(String content) {
        return content.matches(".*(技术|算法|编程|代码|数学|物理|工程).*");
    }
    
    // 候选模型内部类
    private static class ModelCandidate {
        private final ModelService service;
        private final double score;
        private final double performanceScore;
        private final double costScore;
        private final double suitabilityScore;
        
        public ModelCandidate(ModelService service, double score, 
                             double performanceScore, double costScore, 
                             double suitabilityScore) {
            this.service = service;
            this.score = score;
            this.performanceScore = performanceScore;
            this.costCost = costScore;
            this.suitabilityScore = suitabilityScore;
        }
        
        // getters省略
    }
}

5.2 模型性能监控与降级

实现模型性能实时监控,在模型性能下降时自动切换到备用模型。

// 模型性能监控服务
@Service
public class ModelPerformanceTracker {
    
    private final Map<String, ModelPerformanceStats> statsMap = new ConcurrentHashMap<>();
    private final MeterRegistry meterRegistry;
    
    public ModelPerformanceTracker(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
    }
    
    public void recordSuccess(String modelPath, long latency) {
        ModelPerformanceStats stats = statsMap.computeIfAbsent(modelPath, k -> new ModelPerformanceStats());
        stats.recordSuccess(latency);
        
        // 记录指标
        meterRegistry.timer("model.invoke", "model", modelPath, "status", "success")
            .record(latency, TimeUnit.MILLISECONDS);
    }
    
    public void recordFailure(String modelPath, String errorType) {
        ModelPerformanceStats stats = statsMap.computeIfAbsent(modelPath, k -> new ModelPerformanceStats());
        stats.recordFailure();
        
        // 记录指标
        meterRegistry.counter("model.errors", "model", modelPath, "errorType", errorType)
            .increment();
    }
    
    public ModelPerformanceStats getStats(String modelPath) {
        return statsMap.getOrDefault(modelPath, new ModelPerformanceStats());
    }
    
    public boolean isModelDegraded(String modelPath) {
        ModelPerformanceStats stats = getStats(modelPath);
        if (stats.getTotalRequests() < 10) {
            return false; // 样本不足,不认为降级
        }
        
        // 检查失败率是否超过阈值
        if (stats.getSuccessRate() < 0.8) {
            return true;
        }
        
        // 检查延迟是否超过阈值
        if (stats.getAverageLatency() > 10000) { // 10秒
            return true;
        }
        
        return false;
    }
    
    public List<String> getAlternativeModels(String primaryModel) {
        // 根据模型类型返回备用模型
        if (primaryModel.contains("deepseek")) {
            return Arrays.asList(
                "/maas/deepseek-ai/DeepSeek-V3-0324",
                "/maas/qwen/QwQ-32B",
                "/maas/qwen/Qwen2.5-72B-Instruct"
            );
        } else if (primaryModel.contains("qwen")) {
            return Arrays.asList(
                "/maas/deepseek-ai/DeepSeek-R1-0528",
                "/maas/deepseek-ai/DeepSeek-V3-0324"
            );
        }
        
        return Arrays.asList("/maas/deepseek-ai/DeepSeek-R1-0528");
    }
}

// 模型性能统计数据类
public class ModelPerformanceStats {
    private final AtomicLong successCount = new AtomicLong(0);
    private final AtomicLong failureCount = new AtomicLong(0);
    private final AtomicLong totalLatency = new AtomicLong(0);
    private final LongAdder currentLatencySum = new LongAdder();
    private final AtomicLong currentCount = new AtomicLong(0);
    
    // 滑动窗口统计(最近100次请求)
    private final CircularFifoQueue<Long> recentLatencies = new CircularFifoQueue<>(100);
    
    public void recordSuccess(long latency) {
        successCount.incrementAndGet();
        totalLatency.addAndGet(latency);
        recentLatencies.add(latency);
        currentLatencySum.add(latency);
        currentCount.incrementAndGet();
    }
    
    public void recordFailure() {
        failureCount.incrementAndGet();
    }
    
    public double getSuccessRate() {
        long total = successCount.get() + failureCount.get();
        if (total == 0) return 1.0;
        return (double) successCount.get() / total;
    }
    
    public long getAverageLatency() {
        long count = currentCount.get();
        if (count == 0) return 0;
        return currentLatencySum.longValue() / count;
    }
    
    public long getP95Latency() {
        if (recentLatencies.isEmpty()) return 0;
        
        List<Long> sorted = new ArrayList<>(recentLatencies);
        Collections.sort(sorted);
        
        int index = (int) Math.ceil(0.95 * sorted.size()) - 1;
        return index >= 0 ? sorted.get(index) : 0;
    }
    
    public long getTotalRequests() {
        return successCount.get() + failureCount.get();
    }
}

六、高级功能实现

6.1 对话历史管理

实现对话历史持久化和管理功能,支持多轮对话上下文维护。

// 对话服务
@Service
public class ConversationService {
    
    private final ConversationRepository conversationRepository;
    private final MessageRepository messageRepository;
    private final RedisTemplate<String, Object> redisTemplate;
    
    public ConversationService(ConversationRepository conversationRepository,
                             MessageRepository messageRepository,
                             RedisTemplate<String, Object> redisTemplate) {
        this.conversationRepository = conversationRepository;
        this.messageRepository = messageRepository;
        this.redisTemplate = redisTemplate;
    }
    
    @Transactional
    public Conversation createConversation(String title, User user) {
        Conversation conversation = new Conversation();
        conversation.setTitle(title);
        conversation.setUserId(user.getId());
        conversation.setCreatedAt(LocalDateTime.now());
        conversation.setUpdatedAt(LocalDateTime.now());
        
        return conversationRepository.save(conversation);
    }
    
    @Transactional
    public Message addMessageToConversation(Long conversationId, String role, 
                                          String content, String model) {
        Conversation conversation = conversationRepository.findById(conversationId)
            .orElseThrow(() -> new ConversationNotFoundException(conversationId));
        
        Message message = new Message();
        message.setConversationId(conversationId);
        message.setRole(role);
        message.setContent(content);
        message.setModel(model);
        message.setTimestamp(LocalDateTime.now());
        message.setTokenCount(estimateTokenCount(content));
        
        Message savedMessage = messageRepository.save(message);
        
        // 更新对话时间
        conversation.setUpdatedAt(LocalDateTime.now());
        conversationRepository.save(conversation);
        
        // 缓存最新消息
        cacheRecentMessages(conversationId);
        
        return savedMessage;
    }
    
    public List<Message> getConversationMessages(Long conversationId, int limit) {
        // 先尝试从缓存获取
        String cacheKey = "conversation:" + conversationId + ":messages";
        List<Message> cachedMessages = (List<Message>) redisTemplate.opsForValue().get(cacheKey);
        
        if (cachedMessages != null && cachedMessages.size() >= limit) {
            return cachedMessages.stream().limit(limit).collect(Collectors.toList());
        }
        
        // 缓存未命中,从数据库获取
        List<Message> messages = messageRepository.findByConversationIdOrderByTimestampDesc(
            conversationId, PageRequest.of(0, limit));
        
        // 更新缓存
        redisTemplate.opsForValue().set(cacheKey, messages, 1, TimeUnit.HOURS);
        
        return messages;
    }
    
    public List<Conversation> getUserConversations(Long userId, int page, int size) {
        return conversationRepository.findByUserIdOrderByUpdatedAtDesc(
            userId, PageRequest.of(page, size));
    }
    
    @Transactional
    public void deleteConversation(Long conversationId) {
        // 删除消息
        messageRepository.deleteByConversationId(conversationId);
        
        // 删除对话
        conversationRepository.deleteById(conversationId);
        
        // 清除缓存
        String cacheKey = "conversation:" + conversationId + ":messages";
        redisTemplate.delete(cacheKey);
    }
    
    private void cacheRecentMessages(Long conversationId) {
        // 获取最近50条消息并缓存
        List<Message> recentMessages = messageRepository
            .findByConversationIdOrderByTimestampDesc(conversationId, PageRequest.of(0, 50));
        
        String cacheKey = "conversation:" + conversationId + ":messages";
        redisTemplate.opsForValue().set(cacheKey, recentMessages, 1, TimeUnit.HOURS);
    }
    
    private int estimateTokenCount(String text) {
        // 简单估算:中文大约1个字=1.3个token,英文1个单词=1.3个token
        if (text == null || text.isEmpty()) return 0;
        
        int chineseCount = 0;
        int englishCount = 0;
        
        for (char c : text.toCharArray()) {
            if (Character.UnicodeBlock.of(c) == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS) {
                chineseCount++;
            } else if (Character.isLetter(c)) {
                englishCount++;
            }
        }
        
        // 粗略估算
        return (int) (chineseCount * 1.3 + englishCount * 0.25);
    }
}

6.2 Token使用统计与限额管理

实现Token使用统计和用户限额管理功能。

// 使用统计服务
@Service
public class UsageStatsService {
    
    private final UsageRecordRepository usageRecordRepository;
    private final UserRepository userRepository;
    private final RedisTemplate<String, Object> redisTemplate;
    
    @Scheduled(cron = "0 0 0 * * ?") // 每天凌晨执行
    public void resetDailyCounters() {
        // 重置每日使用计数
        userRepository.resetDailyUsage();
    }
    
    @Transactional
    public void recordUsage(Long userId, String model, int promptTokens, 
                          int completionTokens, double cost) {
        // 记录使用情况
        UsageRecord record = new UsageRecord();
        record.setUserId(userId);
        record.setModel(model);
        record.setPromptTokens(promptTokens);
        record.setCompletionTokens(completionTokens);
        record.setTotalTokens(promptTokens + completionTokens);
        record.setCost(cost);
        record.setRecordTime(LocalDateTime.now());
        
        usageRecordRepository.save(record);
        
        // 更新用户统计
        updateUserUsageStats(userId, promptTokens, completionTokens, cost);
    }
    
    private void updateUserUsageStats(Long userId, int promptTokens, 
                                    int completionTokens, double cost) {
        User user = userRepository.findById(userId)
            .orElseThrow(() -> new UserNotFoundException(userId));
        
        user.setTotalTokensUsed(user.getTotalTokensUsed() + promptTokens + completionTokens);
        user.setDailyTokensUsed(user.getDailyTokensUsed() + promptTokens + completionTokens);
        user.setTotalCost(user.getTotalCost() + cost);
        user.setDailyCost(user.getDailyCost() + cost);
        
        userRepository.save(user);
        
        // 更新缓存
        updateUsageCache(userId, user);
    }
    
    public UsageStats getUsageStats(Long userId) {
        String cacheKey = "user:" + userId + ":usage";
        UsageStats cachedStats = (UsageStats) redisTemplate.opsForValue().get(cacheKey);
        
        if (cachedStats != null) {
            return cachedStats;
        }
        
        User user = userRepository.findById(userId)
            .orElseThrow(() -> new UserNotFoundException(userId));
        
        // 计算更详细的统计数据
        UsageStats stats = new UsageStats();
        stats.setTotalTokens(user.getTotalTokensUsed());
        stats.setDailyTokens(user.getDailyTokensUsed());
        stats.setTotalCost(user.getTotalCost());
        stats.setDailyCost(user.getDailyCost());
        
        // 计算各模型使用情况
        List<ModelUsage> modelUsage = usageRecordRepository.getModelUsageStats(userId);
        stats.setModelUsage(modelUsage);
        
        // 计算最近使用趋势
        LocalDate now = LocalDate.now();
        List<DailyUsage> dailyUsage = new ArrayList<>();
        for (int i = 6; i >= 0; i--) {
            LocalDate date = now.minusDays(i);
            DailyUsage usage = usageRecordRepository.getDailyUsage(userId, date);
            dailyUsage.add(usage);
        }
        stats.setWeeklyUsage(dailyUsage);
        
        // 缓存结果
        redisTemplate.opsForValue().set(cacheKey, stats, 30, TimeUnit.MINUTES);
        
        return stats;
    }
    
    public boolean checkQuota(Long userId, int estimatedTokens) {
        User user = userRepository.findById(userId)
            .orElseThrow(() -> new UserNotFoundException(userId));
        
        // 检查总配额
        if (user.getTotalTokensUsed() + estimatedTokens > user.getTokenQuota()) {
            return false;
        }
        
        // 检查每日配额
        if (user.getDailyTokensUsed() + estimatedTokens > user.getDailyTokenQuota()) {
            return false;
        }
        
        return true;
    }
    
    private void updateUsageCache(Long userId, User user) {
        String cacheKey = "user:" + userId + ":usage";
        redisTemplate.delete(cacheKey);
    }
}

// 使用统计数据结构
public class UsageStats {
    private long totalTokens;
    private long dailyTokens;
    private double totalCost;
    private double dailyCost;
    private List<ModelUsage> modelUsage;
    private List<DailyUsage> weeklyUsage;
    
    // getters and setters
}

public class ModelUsage {
    private String model;
    private long tokenCount;
    private double cost;
    
    // getters and setters
}

public class DailyUsage {
    private LocalDate date;
    private long tokenCount;
    private double cost;
    
    // getters and setters
}

七、安全与性能优化

7.1 API安全防护

实现全面的API安全防护机制,包括认证、授权、限流和防注入攻击。

// JWT认证过滤器
@Component
public class JwtAuthenticationFilter implements WebFilter {
    
    private final JwtTokenProvider tokenProvider;
    private final UserDetailsService userDetailsService;
    
    public JwtAuthenticationFilter(JwtTokenProvider tokenProvider,
                                 UserDetailsService userDetailsService) {
        this.tokenProvider = tokenProvider;
        this.userDetailsService = userDetailsService;
    }
    
    @Override
    public Mono<Void> filter(ServerWebExchange exchange, WebFilterChain chain) {
        String path = exchange.getRequest().getPath().value();
        
        // 跳过登录和公开端点
        if (path.startsWith("/api/auth/login") || 
            path.startsWith("/api/public/") ||
            path.equals("/") || 
            path.startsWith("/static/")) {
            return chain.filter(exchange);
        }
        
        String token = resolveToken(exchange.getRequest());
        
        if (StringUtils.hasText(token) && tokenProvider.validateToken(token)) {
            Authentication authentication = getAuthentication(token);
            return chain.filter(exchange)
                .contextWrite(ReactiveSecurityContextHolder.withAuthentication(authentication));
        }
        
        return Mono.error(new AuthenticationException("Invalid or missing token"));
    }
    
    private String resolveToken(ServerHttpRequest request) {
        String bearerToken = request.getHeaders().getFirst("Authorization");
        if (StringUtils.hasText(bearerToken) && bearerToken.startsWith("Bearer ")) {
            return bearerToken.substring(7);
        }
        return null;
    }
    
    private Authentication getAuthentication(String token) {
        String username = tokenProvider.getUsernameFromToken(token);
        UserDetails userDetails = userDetailsService.loadUserByUsername(username);
        return new UsernamePasswordAuthenticationToken(userDetails, "", userDetails.getAuthorities());
    }
}

// 速率限制服务
@Service
public class RateLimiterService {
    
    private final RedisTemplate<String, Object> redisTemplate;
    private final Map<String, RateLimitConfig> limitConfigs;
    
    public RateLimiterService(RedisTemplate<String, Object> redisTemplate) {
        this.redisTemplate = redisTemplate;
        this.limitConfigs = loadRateLimitConfigs();
    }
    
    public Mono<Boolean> checkRateLimit(String userId, String endpoint) {
        String key = "rate_limit:" + userId + ":" + endpoint;
        RateLimitConfig config = limitConfigs.getOrDefault(endpoint, 
            new RateLimitConfig(100, 60)); // 默认限制
        
        return redisTemplate.opsForValue().increment(key, 1)
            .flatMap(count -> {
                if (count == 1) {
                    // 第一次请求,设置过期时间
                    return redisTemplate.expire(key, config.getTimeWindow(), TimeUnit.SECONDS)
                        .then(Mono.just(count <= config.getMaxRequests()));
                }
                return Mono.just(count <= config.getMaxRequests());
            })
            .defaultIfEmpty(true);
    }
    
    public Mono<Boolean> checkModelRateLimit(String userId, String model) {
        String key = "model_limit:" + userId + ":" + model;
        // 模型特定限制
        RateLimitConfig config = new RateLimitConfig(50, 60); // 每分钟50次
        
        return checkRateLimitWithKey(key, config);
    }
    
    private Mono<Boolean> checkRateLimitWithKey(String key, RateLimitConfig config) {
        return redisTemplate.opsForValue().increment(key, 1)
            .flatMap(count -> {
                if (count == 1) {
                    return redisTemplate.expire(key, config.getTimeWindow(), TimeUnit.SECONDS)
                        .then(Mono.just(count <= config.getMaxRequests()));
                }
                return Mono.just(count <= config.getMaxRequests());
            })
            .defaultIfEmpty(true);
    }
    
    private Map<String, RateLimitConfig> loadRateLimitConfigs() {
        Map<String, RateLimitConfig> configs = new HashMap<>();
        configs.put("/api/v1/chat/completions", new RateLimitConfig(30, 60)); // 每分钟30次
        configs.put("/api/v1/models", new RateLimitConfig(10, 60)); // 每分钟10次
        configs.put("/api/v1/usage", new RateLimitConfig(5, 60)); // 每分钟5次
        return configs;
    }
}

// 输入验证与防护
@Component
public class InputValidationFilter implements WebFilter {
    
    private final List<Pattern> maliciousPatterns;
    
    public InputValidationFilter() {
        this.maliciousPatterns = Arrays.asList(
            Pattern.compile("<script.*?>", Pattern.CASE_INSENSITIVE),
            Pattern.compile("javascript:", Pattern.CASE_INSENSITIVE),
            Pattern.compile("onload\\s*=", Pattern.CASE_INSENSITIVE),
            Pattern.compile("union.*select", Pattern.CASE_INSENSITIVE),
            Pattern.compile("drop.*table", Pattern.CASE_INSENSITIVE)
        );
    }
    
    @Override
    public Mono<Void> filter(ServerWebExchange exchange, WebFilterChain chain) {
        // 检查URL参数
        boolean hasMaliciousParams = exchange.getRequest().getQueryParams().values().stream()
            .anyMatch(this::containsMaliciousContent);
        
        if (hasMaliciousContent) {
            return Mono.error(new SecurityException("Malicious input detected"));
        }
        
        // 对于POST请求,检查body
        if (exchange.getRequest().getMethod() == HttpMethod.POST) {
            return exchange.getRequest().getBody()
                .next()
                .flatMap(dataBuffer -> {
                    String body = dataBuffer.toString(StandardCharsets.UTF_8);
                    if (containsMaliciousContent(body)) {
                        return Mono.error(new SecurityException("Malicious input detected"));
                    }
                    return chain.filter(exchange);
                });
        }
        
        return chain.filter(exchange);
    }
    
    private boolean containsMaliciousContent(String input) {
        if (input == null) return false;
        
        return maliciousPatterns.stream()
            .anyMatch(pattern -> pattern.matcher(input).find());
    }
}

7.2 性能优化策略

实现多种性能优化策略,包括缓存、连接池优化和响应压缩。

// 响应缓存服务
@Service
public class ResponseCacheService {
    
    private final RedisTemplate<String, Object> redisTemplate;
    private final ObjectMapper objectMapper;
    
    public ResponseCacheService(RedisTemplate<String, Object> redisTemplate,
                              ObjectMapper objectMapper) {
        this.redisTemplate = redisTemplate;
        this.objectMapper = objectMapper;
    }
    
    public Mono<Object> getCachedResponse(String cacheKey) {
        return redisTemplate.opsForValue().get(cacheKey)
            .map(cached -> {
                try {
                    return objectMapper.readValue(cached.toString(), Object.class);
                } catch (Exception e) {
                    return null;
                }
            });
    }
    
    public Mono<Boolean> cacheResponse(String cacheKey, Object response, Duration ttl) {
        try {
            String jsonResponse = objectMapper.writeValueAsString(response);
            return redisTemplate.opsForValue().set(cacheKey, jsonResponse, ttl);
        } catch (Exception e) {
            return Mono.just(false);
        }
    }
    
    public String generateCacheKey(String endpoint, Map<String, Object> params) {
        String paramString = params.entrySet().stream()
            .sorted(Map.Entry.comparingByKey())
            .map(entry -> entry.getKey() + "=" + entry.getValue())
            .collect(Collectors.joining("&"));
        
        return endpoint + ":" + DigestUtils.md5DigestAsHex(paramString.getBytes());
    }
}

// 数据库连接池配置
@Configuration
public class DatabaseConfig {
    
    @Value("${spring.datasource.url}")
    private String url;
    
    @Value("${spring.datasource.username}")
    private String username;
    
    @Value("${spring.datasource.password}")
    private String password;
    
    @Bean
    public HikariDataSource dataSource() {
        HikariConfig config = new HikariConfig();
        config.setJdbcUrl(url);
        config.setUsername(username);
        config.setPassword(password);
        config.setMaximumPoolSize(20);
        config.setMinimumIdle(5);
        config.setConnectionTimeout(30000);
        config.setIdleTimeout(600000);
        config.setMaxLifetime(1800000);
        config.setAutoCommit(true);
        config.setPoolName("LLMAppPool");
        
        // 优化配置
        config.addDataSourceProperty("cachePrepStmts", "true");
        config.addDataSourceProperty("prepStmtCacheSize", "250");
        config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
        config.addDataSourceProperty("useServerPrepStmts", "true");
        config.addDataSourceProperty("useLocalSessionState", "true");
        config.addDataSourceProperty("rewriteBatchedStatements", "true");
        config.addDataSourceProperty("cacheResultSetMetadata", "true");
        config.addDataSourceProperty("cacheServerConfiguration", "true");
        config.addDataSourceProperty("elideSetAutoCommits", "true");
        config.addDataSourceProperty("maintainTimeStats", "false");
        
        return new HikariDataSource(config);
    }
}

// WebClient配置优化
@Configuration
public class WebClientConfig {
    
    @Bean
    public WebClient webClient(WebClient.Builder builder) {
        return builder
            .clientConnector(new ReactorClientHttpConnector(
                HttpClient.create()
                    .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 5000)
                    .doOnConnected(conn -> 
                        conn.addHandlerLast(new ReadTimeoutHandler(10, TimeUnit.SECONDS))
                            .addHandlerLast(new WriteTimeoutHandler(10, TimeUnit.SECONDS))
                    )
                    .responseTimeout(Duration.ofSeconds(10))
                    .compress(true)
            ))
            .baseUrl("https://maas-api.lanyun.net")
            .defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
            .build();
    }
    
    @Bean
    public WebClient.Builder webClientBuilder() {
        return WebClient.builder()
            .filter(ExchangeFilterers.rateLimiting())
            .filter(logRequest())
            .filter(logResponse());
    }
    
    private ExchangeFilterFunction logRequest() {
        return (request, next) -> {
            logger.info("Request: {} {}", request.method(), request.url());
            request.headers().forEach((name, values) -> 
                values.forEach(value -> logger.info("{}: {}", name, value)));
            return next.exchange(request);
        };
    }
    
    private ExchangeFilterFunction logResponse() {
        return ExchangeFilterFunction.ofResponseProcessor(response -> {
            logger.info("Response: {}", response.statusCode());
            return Mono.just(response);
        });
    }
}

八、部署与监控

8.1 Docker容器化部署

提供完整的Docker部署方案,包括Dockerfile和docker-compose配置。

# 前端Dockerfile
FROM node:18-alpine as frontend-build

WORKDIR /app
COPY package*.json ./
RUN npm ci

COPY . .
RUN npm run build

# 生产阶段
FROM nginx:alpine
COPY --from=frontend-build /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/nginx.conf

EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
# 后端Dockerfile
FROM openjdk:17-jdk-alpine as backend-build

WORKDIR /app
COPY mvnw pom.xml ./
COPY .mvn .mvn
RUN ./mvnw dependency:go-offline -B

COPY src src
RUN ./mvnw package -DskipTests

# 生产阶段
FROM openjdk:17-jdk-alpine
WORKDIR /app

COPY --from=backend-build /app/target/*.jar app.jar

# 创建非root用户
RUN addgroup -S spring && adduser -S spring -G spring
USER spring:spring

EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]
# docker-compose.yml
version: '3.8'

services:
  frontend:
    build: 
      context: ./frontend
      dockerfile: Dockerfile
    ports:
      - "80:80"
    networks:
      - llm-network
    depends_on:
      - backend
    environment:
      - API_BASE_URL=http://backend:8080

  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile
    ports:
      - "8080:8080"
    networks:
      - llm-network
    depends_on:
      - redis
      - mysql
    environment:
      - SPRING_DATASOURCE_URL=jdbc:mysql://mysql:3306/llm_app
      - SPRING_REDIS_HOST=redis
      - MAAS_API_KEY=${MAAS_API_KEY}

  mysql:
    image: mysql:8.0
    environment:
      - MYSQL_ROOT_PASSWORD=rootpassword
      - MYSQL_DATABASE=llm_app
      - MYSQL_USER=llm_user
      - MYSQL_PASSWORD=llm_password
    volumes:
      - mysql_data:/var/lib/mysql
    networks:
      - llm-network
    ports:
      - "3306:3306"

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
    networks:
      - llm-network
    ports:
      - "6379:6379"

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"
    networks:
      - llm-network

  grafana:
    image: grafana/grafana:latest
    volumes:
      - grafana_data:/var/lib/grafana
      - ./monitoring/grafana-dashboards:/var/lib/grafana/dashboards
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    ports:
      - "3000:3000"
    networks:
      - llm-network
    depends_on:
      - prometheus

volumes:
  mysql_data:
  redis_data:
  prometheus_data:
  grafana_data:

networks:
  llm-network:
    driver: bridge

8.2 监控与告警配置

实现全面的应用监控和告警系统。

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'llm-backend'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['backend:8080']
        labels:
          application: 'llm-backend'
          
  - job_name: 'llm-frontend'
    metrics_path: '/metrics'
    static_configs:
      - targets: ['frontend:80']
        labels:
          application: 'llm-frontend'
          
  - job_name: 'mysql'
    static_configs:
      - targets: ['mysql:9104']
    metrics_path: '/metrics'
    
  - job_name: 'redis'
    static_configs:
      - targets: ['redis:9121']
    metrics_path: '/metrics'

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 'alertmanager:9093'

rule_files:
  - 'alerts/*.yml'
# alert-rules.yml
groups:
- name: llm-app
  rules:
  - alert: HighErrorRate
    expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) / rate(http_server_requests_seconds_count[5m]) > 0.05
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "高错误率检测"
      description: "应用错误率超过5%,当前值: {{ $value }}"
  
  - alert: HighResponseTime
    expr: histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[5m])) > 2
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "高响应时间检测"
      description: "95%分位响应时间超过2秒,当前值: {{ $value }}s"
  
  - alert: ModelTimeout
    expr: rate(model_timeout_total[5m]) > 0
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "模型超时检测"
      description: "检测到模型调用超时,当前超时次数: {{ $value }}"
  
  - alert: RateLimitExceeded
    expr: rate(rate_limit_exceeded_total[5m]) > 5
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "速率限制频繁触发"
      description: "速率限制频繁触发,可能表明异常流量模式"

九、测试与质量保障

9.1 自动化测试策略

实现全面的自动化测试覆盖,包括单元测试、集成测试和端到端测试。

// 模型服务单元测试
@ExtendWith(MockitoExtension.class)
class DeepSeekModelServiceTest {
    
    @Mock
    private WebClient webClient;
    
    @Mock
    private WebClient.RequestHeadersUriSpec requestHeadersUriSpec;
    
    @Mock
    private WebClient.RequestHeadersSpec requestHeadersSpec;
    
    @Mock
    private WebClient.ResponseSpec responseSpec;
    
    @InjectMocks
    private DeepSeekModelService modelService;
    
    @Test
    void testInvokeModelSuccess() {
        // 准备测试数据
        ChatRequest request = new ChatRequest();
        request.setModel("/maas/deepseek-ai/DeepSeek-R1-0528");
        request.setMessages(List.of(
            new ChatMessage("user", "你好")
        ));
        request.setStream(false);
        
        // 模拟WebClient调用链
        when(webClient.post()).thenReturn(requestHeadersUriSpec);
        when(requestHeadersUriSpec.uri(anyString())).thenReturn(requestHeadersUriSpec);
        when(requestHeadersUriSpec.header(anyString(), anyString())).thenReturn(requestHeadersUriSpec);
        when(requestHeadersUriSpec.bodyValue(any())).thenReturn(requestHeadersSpec);
        when(requestHeadersSpec.retrieve()).thenReturn(responseSpec);
        when(responseSpec.bodyToMono(String.class)).thenReturn(Mono.just(createSuccessResponse()));
        
        // 执行测试
        StepVerifier.create(modelService.invokeModel(request, null))
            .expectNextMatches(response -> 
                response.getMessage().getContent().contains("你好") &&
                response.getId() != null)
            .verifyComplete();
        
        // 验证调用
        verify(webClient).post();
        verify(requestHeadersUriSpec).uri("/chat/completions");
    }
    
    @Test
    void testInvokeModelTimeout() {
        ChatRequest request = new ChatRequest();
        request.setModel("/maas/deepseek-ai/DeepSeek-R1-0528");
        request.setMessages(List.of(new ChatMessage("user", "test")));
        
        when(webClient.post()).thenReturn(requestHeadersUriSpec);
        when(requestHeadersUriSpec.uri(anyString())).thenReturn(requestHeadersUriSpec);
        when(requestHeadersUriSpec.header(anyString(), anyString())).thenReturn(requestHeadersUriSpec);
        when(requestHeadersUriSpec.bodyValue(any())).thenReturn(requestHeadersSpec);
        when(requestHeadersSpec.retrieve()).thenReturn(responseSpec);
        when(responseSpec.bodyToMono(String.class))
            .thenReturn(Mono.delay(Duration.ofSeconds(35)).then(Mono.just("response")));
        
        StepVerifier.create(modelService.invokeModel(request, null))
            .expectError(ModelTimeoutException.class)
            .verify();
    }
    
    private String createSuccessResponse() {
        return """
            {
                "id": "chatcmpl-123",
                "object": "chat.completion",
                "created": 1677652288,
                "model": "/maas/deepseek-ai/DeepSeek-R1-0528",
                "choices": [{
                    "index": 0,
                    "message": {
                        "role": "assistant",
                        "content": "你好!我是DeepSeek AI助手,有什么可以帮你的吗?"
                    },
                    "finish_reason": "stop"
                }],
                "usage": {
                    "prompt_tokens": 10,
                    "completion_tokens": 20,
                    "total_tokens": 30
                }
            }
            """;
    }
}

// API集成测试
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@Testcontainers
class ApiIntegrationTest {
    
    @Container
    static MySQLContainer<?> mysql = new MySQLContainer<>("mysql:8.0");
    
    @Container
    static GenericContainer<?> redis = new GenericContainer<>("redis:7-alpine")
        .withExposedPorts(6379);
    
    @DynamicPropertySource
    static void configureProperties(DynamicPropertyRegistry registry) {
        registry.add("spring.datasource.url", mysql::getJdbcUrl);
        registry.add("spring.datasource.username", mysql::getUsername);
        registry.add("spring.datasource.password", mysql::getPassword);
        registry.add("spring.redis.host", redis::getHost);
        registry.add("spring.redis.port", () -> redis.getMappedPort(6379));
    }
    
    @Test
    void testChatEndpoint(@Autowired WebTestClient webTestClient) {
        // 获取认证token
        String token = obtainAuthToken(webTestClient);
        
        // 测试聊天接口
        webTestClient.post()
            .uri("/api/v1/chat/completions")
            .header("Authorization", "Bearer " + token)
            .contentType(MediaType.APPLICATION_JSON)
            .bodyValue("""
                {
                    "model": "/maas/deepseek-ai/DeepSeek-R1-0528",
                    "messages": [{"role": "user", "content": "你好"}],
                    "stream": false
                }
                """)
            .exchange()
            .expectStatus().isOk()
            .expectBody()
            .jsonPath("$.choices[0].message.content").exists()
            .jsonPath("$.id").isNotEmpty();
    }
    
    private String obtainAuthToken(WebTestClient webTestClient) {
        // 实现获取token的逻辑
        return "mock-token";
    }
}

学习资源

通过持续学习和实践,您可以充分利用MaaS平台的强大能力,构建出AI应用。

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐