基于蓝耘MaaS平台构建多模型LLM应用:从架构设计到全栈实现
本文介绍基于蓝耘MaaS平台构建多模型LLM应用的完整方案。文章从MaaS平台的技术优势出发,提出一个支持多模型、统一API网关、智能路由和流式响应的系统架构。技术实现方面,前端采用Vue3+Element Plus+WebSocket技术栈,后端基于Spring Boot WebFlux构建,支持DeepSeek、通义千问等多个大模型接入。系统具备响应式布局、实时交互、使用监控等核心功能,通过标
基于蓝耘MaaS平台构建多模型LLM应用:从架构设计到全栈实现
一、引言:MaaS平台与LLM应用开发新范式
1.1 MaaS平台的革命性意义
模型即服务(Model as a Service, MaaS)平台正在彻底改变人工智能应用的开发方式。根据Gartner最新预测,到2025年,70%的新兴AI应用将通过MaaS平台进行开发和部署,而非从零开始训练模型。这种转变不仅大幅降低了AI应用的技术门槛,还显著提升了开发效率和应用性能。
蓝标元生代码算云平台作为国内领先的MaaS服务提供商之一,提供了包括DeepSeek、通义千问在内的多种大语言模型API接口,支持开发者快速构建高质量的LLM应用。本文将从架构设计、前端实现、后端开发、多模型调度等维度,详细讲解如何基于该平台构建一个功能完备的多模型LLM应用系统。
1.2 系统设计目标与架构概述
我们计划构建的LLM应用系统具备以下核心特性:
- 多模型支持:同时接入DeepSeek-R1、DeepSeek-V3、Qwen等多个大语言模型
- 统一API网关:提供标准化的接口规范,简化前端调用复杂度
- 智能路由:根据 query 内容和模型特性自动选择最优模型
- 流式响应:支持实时生成效果,提升用户体验
- 使用监控:实时跟踪token消耗和API调用情况
- 可扩展架构:便于后续新增模型和功能模块
二、技术栈选择与环境配置
2.1 前端技术栈
前端采用Vue 3组合式API开发,主要依赖包括:
- Vue 3:响应式前端框架
- Element Plus:UI组件库
- Axios:HTTP客户端
- Vite:构建工具
- Socket.io-client:WebSocket通信
package.json 关键依赖配置:
{
"name": "llm-application-frontend",
"version": "1.0.0",
"dependencies": {
"vue": "^3.3.0",
"element-plus": "^2.3.0",
"axios": "^1.4.0",
"socket.io-client": "^4.6.0",
"highlight.js": "^11.8.0"
},
"devDependencies": {
"vite": "^4.3.0",
"@vitejs/plugin-vue": "^4.1.0",
"sass": "^1.62.0"
}
}
2.2 后端技术栈
后端采用Spring Boot框架,主要技术组件:
- Spring Boot 3.x:后端应用框架
- Spring WebFlux:响应式Web支持
- WebClient:非阻塞HTTP客户端
- Redis:缓存和会话管理
- MySQL:数据持久化
- JWT:认证鉴权
pom.xml 关键依赖配置:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
<groupId>io.jsonwebtoken</groupId>
<artifactId>jjwt-api</artifactId>
<version>0.11.5</version>
</dependency>
</dependencies>
2.3 开发环境配置
2.3.1 前端开发环境搭建
# 创建Vue项目
npm create vite@latest llm-chat-frontend -- --template vue
# 安装依赖
cd llm-chat-frontend
npm install
# 安装额外依赖
npm install element-plus axios socket.io-client highlight.js
# 启动开发服务器
npm run dev
2.3.2 后端开发环境配置
application.yml 关键配置:
server:
port: 8080
compression:
enabled: true
mime-types: text/html,text/xml,text/plain,text/css,text/javascript,application/javascript,application/json
min-response-size: 1024
spring:
datasource:
url: jdbc:mysql://localhost:3306/llm_app?useUnicode=true&characterEncoding=utf8&serverTimezone=Asia/Shanghai
username: root
password: your_password
driver-class-name: com.mysql.cj.jdbc.Driver
redis:
host: localhost
port: 6379
password:
database: 0
maas:
api:
base-url: https://maas-api.lanyun.net
deepseek-r1: /maas/deepseek-ai/DeepSeek-R1-0528
deepseek-v3: /maas/deepseek-ai/DeepSeek-V3-0324
qwen-32b: /maas/qwen/QwQ-32B
三、前端界面设计与实现
3.1 响应式布局设计
前端采用Flex+Grid混合布局方案,确保在各种设备上都能提供良好的用户体验。
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>多模型LLM对话平台</title>
<link rel="stylesheet" href="https://unpkg.com/element-plus/dist/index.css">
<style>
:root {
--primary-color: #409EFF;
--bg-color: #f5f7fa;
--text-color: #303133;
--border-color: #dcdfe6;
}
.app-container {
display: grid;
grid-template-rows: 60px 1fr 80px;
height: 100vh;
background-color: var(--bg-color);
}
.header {
display: flex;
align-items: center;
padding: 0 20px;
border-bottom: 1px solid var(--border-color);
background-color: white;
}
.main-content {
display: grid;
grid-template-columns: 250px 1fr;
gap: 0;
overflow: hidden;
}
.sidebar {
border-right: 1px solid var(--border-color);
background-color: white;
overflow-y: auto;
}
.chat-container {
display: flex;
flex-direction: column;
background-color: white;
}
.messages-area {
flex: 1;
overflow-y: auto;
padding: 20px;
}
.input-area {
border-top: 1px solid var(--border-color);
padding: 15px;
background-color: white;
}
@media (max-width: 768px) {
.main-content {
grid-template-columns: 1fr;
}
.sidebar {
display: none;
}
}
</style>
</head>
<body>
<div id="app"></div>
<script type="module" src="/src/main.js"></script>
</body>
</html>
3.2 Vue 3组件化实现
3.2.1 主应用组件
<template>
<div class="app-container">
<AppHeader />
<div class="main-content">
<ConversationSidebar />
<ChatMain />
</div>
<AppFooter />
</div>
</template>
<script setup>
import { provide, ref, reactive } from 'vue'
import AppHeader from './components/AppHeader.vue'
import AppFooter from './components/AppFooter.vue'
import ConversationSidebar from './components/ConversationSidebar.vue'
import ChatMain from './components/ChatMain.vue'
// 提供全局状态
const currentConversation = ref(null)
const conversations = ref([])
const apiSettings = reactive({
apiKey: localStorage.getItem('maas_api_key') || '',
baseURL: 'https://maas-api.lanyun.net',
selectedModel: '/maas/deepseek-ai/DeepSeek-R1-0528'
})
provide('currentConversation', currentConversation)
provide('conversations', conversations)
provide('apiSettings', apiSettings)
</script>
3.2.2 聊天主界面组件
<template>
<div class="chat-container">
<div class="messages-area" ref="messagesRef">
<div v-for="(message, index) in messages" :key="index" class="message-item">
<div :class="['message-bubble', message.role]">
<div class="message-avatar">
<el-avatar :size="36">
<span v-if="message.role === 'user'">用户</span>
<span v-else>AI</span>
</el-avatar>
</div>
<div class="message-content">
<div v-if="message.role === 'assistant'" class="model-tag">
{{ getModelName(message.model) }}
</div>
<div v-html="highlightCode(renderMarkdown(message.content))"></div>
</div>
</div>
</div>
<div v-if="isGenerating" class="message-item">
<div class="message-bubble assistant">
<div class="message-avatar">
<el-avatar :size="36">AI</el-avatar>
</div>
<div class="message-content">
<div class="model-tag">{{ getModelName(apiSettings.selectedModel) }}</div>
<div class="typing-indicator">
<span></span><span></span><span></span>
</div>
</div>
</div>
</div>
</div>
<div class="input-area">
<MessageInput @send-message="handleSendMessage" />
</div>
</div>
</template>
<script setup>
import { ref, watch, nextTick, computed } from 'vue'
import { ElMessage } from 'element-plus'
import hljs from 'highlight.js'
import 'highlight.js/styles/github.css'
import MarkdownIt from 'markdown-it'
import MessageInput from './MessageInput.vue'
const md = new MarkdownIt({
html: true,
linkify: true,
typographer: true
})
const props = defineProps({
conversation: Object
})
const messagesRef = ref(null)
const isGenerating = ref(false)
const messages = computed(() => {
return props.conversation ? props.conversation.messages : []
})
// 自动滚动到底部
watch(messages, () => {
nextTick(() => {
if (messagesRef.value) {
messagesRef.value.scrollTop = messagesRef.value.scrollHeight
}
})
}, { deep: true })
// Markdown渲染
const renderMarkdown = (content) => {
return md.render(content || '')
}
// 代码高亮
const highlightCode = (html) => {
const div = document.createElement('div')
div.innerHTML = html
div.querySelectorAll('pre code').forEach((block) => {
hljs.highlightElement(block)
})
return div.innerHTML
}
// 获取模型显示名称
const getModelName = (modelPath) => {
const modelMap = {
'/maas/deepseek-ai/DeepSeek-R1-0528': 'DeepSeek-R1',
'/maas/deepseek-ai/DeepSeek-V3-0324': 'DeepSeek-V3',
'/maas/qwen/QwQ-32B': 'Qwen-32B',
'/maas/qwen/Qwen2.5-72B-Instruct': 'Qwen2.5-72B'
}
return modelMap[modelPath] || modelPath
}
// 处理发送消息
const handleSendMessage = async (content) => {
if (!content.trim()) return
// 添加到消息列表
props.conversation.messages.push({
role: 'user',
content: content,
timestamp: new Date()
})
isGenerating.value = true
try {
// 调用API
const response = await sendMessageToAPI(props.conversation.messages)
props.conversation.messages.push({
role: 'assistant',
content: response.content,
model: apiSettings.selectedModel,
timestamp: new Date()
})
} catch (error) {
ElMessage.error('发送消息失败: ' + error.message)
} finally {
isGenerating.value = false
}
}
</script>
<style scoped>
.typing-indicator {
display: inline-flex;
align-items: center;
height: 20px;
}
.typing-indicator span {
height: 8px;
width: 8px;
background-color: #909399;
border-radius: 50%;
display: inline-block;
margin: 0 2px;
animation: bounce 1.3s infinite ease-in-out;
}
.typing-indicator span:nth-child(2) {
animation-delay: 0.15s;
}
.typing-indicator span:nth-child(3) {
animation-delay: 0.3s;
}
@keyframes bounce {
0%, 80%, 100% {
transform: translateY(0);
}
40% {
transform: translateY(-10px);
}
}
</style>
四、后端API网关设计与实现
4.1 统一API网关架构
后端采用Spring WebFlux实现响应式API网关,处理模型路由、认证、限流等功能。
// API网关主控制器
@RestController
@RequestMapping("/api/v1")
public class ApiGatewayController {
private final ModelService modelService;
private final RateLimiterService rateLimiterService;
private final AuthenticationService authService;
public ApiGatewayController(ModelService modelService,
RateLimiterService rateLimiterService,
AuthenticationService authService) {
this.modelService = modelService;
this.rateLimiterService = rateLimiterService;
this.authService = authService;
}
@PostMapping("/chat/completions")
public Mono<ResponseEntity<Object>> chatCompletions(
@RequestBody ChatRequest request,
@RequestHeader(value = "Authorization", required = false) String authHeader,
ServerWebExchange exchange) {
return authService.authenticate(authHeader)
.flatMap(user -> rateLimiterService.checkRateLimit(user.getId()))
.flatMap(allow -> {
if (!allow) {
return Mono.just(ResponseEntity.status(429)
.body(Map.of("error", "Rate limit exceeded")));
}
return modelService.invokeModel(request, exchange)
.map(response -> ResponseEntity.ok().body(response))
.onErrorResume(error -> handleError(error, exchange));
});
}
private Mono<ResponseEntity<Object>> handleError(Throwable error, ServerWebExchange exchange) {
// 错误处理逻辑
if (error instanceof ModelTimeoutException) {
return Mono.just(ResponseEntity.status(504)
.body(Map.of("error", "Model request timeout")));
}
return Mono.just(ResponseEntity.status(500)
.body(Map.of("error", "Internal server error")));
}
}
4.2 模型服务抽象层
设计统一的模型服务接口,支持多种LLM模型接入。
// 模型服务接口
public interface ModelService {
Mono<ChatResponse> invokeModel(ChatRequest request, ServerWebExchange exchange);
boolean supportsModel(String modelPath);
ModelInfo getModelInfo();
}
// DeepSeek模型服务实现
@Service
@Primary
public class DeepSeekModelService implements ModelService {
private final WebClient webClient;
private final String apiBaseUrl;
private final String modelPath;
private final ObjectMapper objectMapper;
public DeepSeekModelService(
@Value("${maas.api.base-url}") String apiBaseUrl,
@Value("${maas.api.deepseek-r1}") String modelPath,
WebClient.Builder webClientBuilder,
ObjectMapper objectMapper) {
this.apiBaseUrl = apiBaseUrl;
this.modelPath = modelPath;
this.webClient = webClientBuilder.baseUrl(apiBaseUrl).build();
this.objectMapper = objectMapper;
}
@Override
public Mono<ChatResponse> invokeModel(ChatRequest request, ServerWebExchange exchange) {
// 构建API请求
Map<String, Object> apiRequest = createApiRequest(request);
return webClient.post()
.uri("/chat/completions")
.header("Authorization", "Bearer " + getApiKey())
.header("Content-Type", "application/json")
.bodyValue(apiRequest)
.retrieve()
.bodyToMono(String.class)
.timeout(Duration.ofSeconds(30))
.flatMap(responseBody -> parseResponse(responseBody, exchange));
}
private Map<String, Object> createApiRequest(ChatRequest request) {
Map<String, Object> apiRequest = new HashMap<>();
apiRequest.put("model", this.modelPath);
apiRequest.put("messages", convertMessages(request.getMessages()));
apiRequest.put("stream", request.isStream());
if (request.getMaxTokens() != null) {
apiRequest.put("max_tokens", request.getMaxTokens());
}
if (request.getTemperature() != null) {
apiRequest.put("temperature", request.getTemperature());
}
return apiRequest;
}
private List<Map<String, String>> convertMessages(List<ChatMessage> messages) {
return messages.stream()
.map(msg -> {
Map<String, String> converted = new HashMap<>();
converted.put("role", msg.getRole());
converted.put("content", msg.getContent());
return converted;
})
.collect(Collectors.toList());
}
private Mono<ChatResponse> parseResponse(String responseBody, ServerWebExchange exchange) {
try {
JsonNode rootNode = objectMapper.readTree(responseBody);
ChatResponse response = new ChatResponse();
response.setId(rootNode.path("id").asText());
response.setModel(rootNode.path("model").asText());
response.setCreated(rootNode.path("created").asLong());
JsonNode choicesNode = rootNode.path("choices");
if (choicesNode.isArray() && choicesNode.size() > 0) {
JsonNode firstChoice = choicesNode.get(0);
JsonNode messageNode = firstChoice.path("message");
ChatMessage message = new ChatMessage();
message.setRole(messageNode.path("role").asText());
message.setContent(messageNode.path("content").asText());
response.setMessage(message);
response.setFinishReason(firstChoice.path("finish_reason").asText());
}
JsonNode usageNode = rootNode.path("usage");
if (!usageNode.isMissingNode()) {
UsageInfo usage = new UsageInfo();
usage.setPromptTokens(usageNode.path("prompt_tokens").asInt());
usage.setCompletionTokens(usageNode.path("completion_tokens").asInt());
usage.setTotalTokens(usageNode.path("total_tokens").asInt());
response.setUsage(usage);
}
return Mono.just(response);
} catch (Exception e) {
return Mono.error(new ModelParseException("Failed to parse model response", e));
}
}
@Override
public boolean supportsModel(String modelPath) {
return this.modelPath.equals(modelPath);
}
@Override
public ModelInfo getModelInfo() {
ModelInfo info = new ModelInfo();
info.setModelPath(this.modelPath);
info.setModelName("DeepSeek-R1");
info.setMaxTokens(4096);
info.setSupportsStreaming(true);
return info;
}
private String getApiKey() {
// 从配置或数据库中获取API Key
return System.getenv("MAAS_API_KEY");
}
}
4.3 流式响应处理
对于流式请求,使用Server-Sent Events (SSE)实现实时数据推送。
// 流式响应控制器
@RestController
@RequestMapping("/api/v1")
public class StreamController {
private final ModelStreamingService streamingService;
public StreamController(ModelStreamingService streamingService) {
this.streamingService = streamingService;
}
@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<Object>> streamChat(
@RequestParam String conversationId,
@RequestParam String message,
@RequestHeader(value = "Authorization") String authHeader) {
return authService.authenticate(authHeader)
.flatMapMany(user -> streamingService.streamResponse(conversationId, message, user));
}
}
// 流式服务实现
@Service
public class ModelStreamingService {
private final WebClient webClient;
private final ObjectMapper objectMapper;
public Flux<ServerSentEvent<Object>> streamResponse(String conversationId, String message, User user) {
return Flux.create(emitter -> {
try {
// 构建流式请求
Map<String, Object> request = new HashMap<>();
request.put("model", "/maas/deepseek-ai/DeepSeek-R1-0528");
request.put("messages", List.of(
Map.of("role", "user", "content", message)
));
request.put("stream", true);
webClient.post()
.uri("/chat/completions")
.header("Authorization", "Bearer " + getApiKey())
.header("Content-Type", "application/json")
.bodyValue(request)
.retrieve()
.bodyToFlux(String.class)
.timeout(Duration.ofMinutes(5))
.subscribe(
chunk -> processChunk(chunk, emitter),
error -> emitter.error(error),
() -> emitter.complete()
);
} catch (Exception e) {
emitter.error(e);
}
});
}
private void processChunk(String chunk, FluxSink<ServerSentEvent<Object>> emitter) {
try {
if (chunk.startsWith("data: ")) {
String jsonData = chunk.substring(6);
if ("[DONE]".equals(jsonData.trim())) {
emitter.next(ServerSentEvent.builder()
.event("end")
.data(Map.of("status", "complete"))
.build());
return;
}
JsonNode dataNode = objectMapper.readTree(jsonData);
JsonNode choicesNode = dataNode.path("choices");
if (choicesNode.isArray() && choicesNode.size() > 0) {
JsonNode deltaNode = choicesNode.get(0).path("delta");
if (!deltaNode.isMissingNode()) {
String content = deltaNode.path("content").asText();
if (StringUtils.hasText(content)) {
Map<String, Object> response = new HashMap<>();
response.put("type", "content");
response.put("content", content);
emitter.next(ServerSentEvent.builder()
.data(response)
.build());
}
}
}
}
} catch (Exception e) {
// 处理解析错误
emitter.next(ServerSentEvent.builder()
.event("error")
.data(Map.of("error", "Failed to parse chunk"))
.build());
}
}
}
五、多模型调度与路由策略
5.1 智能模型路由
根据查询内容、模型性能和成本等因素,智能选择最合适的模型。
// 模型路由服务
@Service
public class ModelRouterService {
private final List<ModelService> modelServices;
private final ModelPerformanceTracker performanceTracker;
private final ModelCostCalculator costCalculator;
public ModelRouterService(List<ModelService> modelServices,
ModelPerformanceTracker performanceTracker,
ModelCostCalculator costCalculator) {
this.modelServices = modelServices;
this.performanceTracker = performanceTracker;
this.costCalculator = costCalculator;
}
public ModelService selectBestModel(ChatRequest request, User user) {
// 获取所有可用模型
List<ModelCandidate> candidates = modelServices.stream()
.filter(service -> service.supportsModel(request.getModel()) ||
("auto".equals(request.getModel()) && isModelSuitable(service, request)))
.map(service -> createModelCandidate(service, request, user))
.collect(Collectors.toList());
if (candidates.isEmpty()) {
throw new NoSuitableModelException("No suitable model found for request");
}
// 根据评分选择最佳模型
return candidates.stream()
.max(Comparator.comparingDouble(ModelCandidate::getScore))
.orElse(candidates.get(0))
.getService();
}
private ModelCandidate createModelCandidate(ModelService service, ChatRequest request, User user) {
ModelInfo info = service.getModelInfo();
double performanceScore = calculatePerformanceScore(service, request);
double costScore = calculateCostScore(service, request, user);
double suitabilityScore = calculateSuitabilityScore(service, request);
double totalScore = performanceScore * 0.4 + costScore * 0.3 + suitabilityScore * 0.3;
return new ModelCandidate(service, totalScore, performanceScore, costScore, suitabilityScore);
}
private double calculatePerformanceScore(ModelService service, ChatRequest request) {
ModelInfo info = service.getModelInfo();
String modelPath = info.getModelPath();
// 获取历史性能数据
ModelPerformanceStats stats = performanceTracker.getStats(modelPath);
double latencyScore = 1.0 - Math.min(stats.getAverageLatency() / 5000.0, 1.0);
double successRateScore = stats.getSuccessRate();
// 根据查询复杂度调整分数
int messageLength = request.getMessages().stream()
.mapToInt(msg -> msg.getContent().length())
.sum();
double complexityFactor = Math.min(messageLength / 1000.0, 1.0);
return (latencyScore * 0.6 + successRateScore * 0.4) * (1.0 - complexityFactor * 0.2);
}
private double calculateCostScore(ModelService service, ChatRequest request, User user) {
ModelInfo info = service.getModelInfo();
double estimatedCost = costCalculator.estimateCost(info.getModelPath(), request);
double userBalance = user.getBalance();
// 成本越低分数越高,考虑用户余额
double costFactor = 1.0 - Math.min(estimatedCost / 10.0, 1.0);
double balanceFactor = Math.min(userBalance / 100.0, 1.0);
return costFactor * 0.7 + balanceFactor * 0.3;
}
private double calculateSuitabilityScore(ModelService service, ChatRequest request) {
ModelInfo info = service.getModelInfo();
String content = request.getMessages().stream()
.map(ChatMessage::getContent)
.collect(Collectors.joining("\n"));
// 简单的内容类型检测
boolean isCodeRelated = containsCode(content);
boolean isCreative = isCreativeContent(content);
boolean isTechnical = isTechnicalContent(content);
// 根据模型特性匹配内容类型
double score = 0.5; // 基础分数
if (info.getModelPath().contains("deepseek") && isCodeRelated) {
score += 0.3; // DeepSeek擅长代码
}
if (info.getModelPath().contains("qwen") && isCreative) {
score += 0.3; // Qwen擅长创意内容
}
if (info.getModelPath().contains("v3") && isTechnical) {
score += 0.2; // V3擅长技术内容
}
return Math.min(score, 1.0);
}
private boolean containsCode(String content) {
return content.contains("```") ||
content.matches(".*(function|class|import|package|def|var|let|const).*");
}
private boolean isCreativeContent(String content) {
return content.matches(".*(故事|诗歌|小说|创意|想象).*");
}
private boolean isTechnicalContent(String content) {
return content.matches(".*(技术|算法|编程|代码|数学|物理|工程).*");
}
// 候选模型内部类
private static class ModelCandidate {
private final ModelService service;
private final double score;
private final double performanceScore;
private final double costScore;
private final double suitabilityScore;
public ModelCandidate(ModelService service, double score,
double performanceScore, double costScore,
double suitabilityScore) {
this.service = service;
this.score = score;
this.performanceScore = performanceScore;
this.costCost = costScore;
this.suitabilityScore = suitabilityScore;
}
// getters省略
}
}
5.2 模型性能监控与降级
实现模型性能实时监控,在模型性能下降时自动切换到备用模型。
// 模型性能监控服务
@Service
public class ModelPerformanceTracker {
private final Map<String, ModelPerformanceStats> statsMap = new ConcurrentHashMap<>();
private final MeterRegistry meterRegistry;
public ModelPerformanceTracker(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
}
public void recordSuccess(String modelPath, long latency) {
ModelPerformanceStats stats = statsMap.computeIfAbsent(modelPath, k -> new ModelPerformanceStats());
stats.recordSuccess(latency);
// 记录指标
meterRegistry.timer("model.invoke", "model", modelPath, "status", "success")
.record(latency, TimeUnit.MILLISECONDS);
}
public void recordFailure(String modelPath, String errorType) {
ModelPerformanceStats stats = statsMap.computeIfAbsent(modelPath, k -> new ModelPerformanceStats());
stats.recordFailure();
// 记录指标
meterRegistry.counter("model.errors", "model", modelPath, "errorType", errorType)
.increment();
}
public ModelPerformanceStats getStats(String modelPath) {
return statsMap.getOrDefault(modelPath, new ModelPerformanceStats());
}
public boolean isModelDegraded(String modelPath) {
ModelPerformanceStats stats = getStats(modelPath);
if (stats.getTotalRequests() < 10) {
return false; // 样本不足,不认为降级
}
// 检查失败率是否超过阈值
if (stats.getSuccessRate() < 0.8) {
return true;
}
// 检查延迟是否超过阈值
if (stats.getAverageLatency() > 10000) { // 10秒
return true;
}
return false;
}
public List<String> getAlternativeModels(String primaryModel) {
// 根据模型类型返回备用模型
if (primaryModel.contains("deepseek")) {
return Arrays.asList(
"/maas/deepseek-ai/DeepSeek-V3-0324",
"/maas/qwen/QwQ-32B",
"/maas/qwen/Qwen2.5-72B-Instruct"
);
} else if (primaryModel.contains("qwen")) {
return Arrays.asList(
"/maas/deepseek-ai/DeepSeek-R1-0528",
"/maas/deepseek-ai/DeepSeek-V3-0324"
);
}
return Arrays.asList("/maas/deepseek-ai/DeepSeek-R1-0528");
}
}
// 模型性能统计数据类
public class ModelPerformanceStats {
private final AtomicLong successCount = new AtomicLong(0);
private final AtomicLong failureCount = new AtomicLong(0);
private final AtomicLong totalLatency = new AtomicLong(0);
private final LongAdder currentLatencySum = new LongAdder();
private final AtomicLong currentCount = new AtomicLong(0);
// 滑动窗口统计(最近100次请求)
private final CircularFifoQueue<Long> recentLatencies = new CircularFifoQueue<>(100);
public void recordSuccess(long latency) {
successCount.incrementAndGet();
totalLatency.addAndGet(latency);
recentLatencies.add(latency);
currentLatencySum.add(latency);
currentCount.incrementAndGet();
}
public void recordFailure() {
failureCount.incrementAndGet();
}
public double getSuccessRate() {
long total = successCount.get() + failureCount.get();
if (total == 0) return 1.0;
return (double) successCount.get() / total;
}
public long getAverageLatency() {
long count = currentCount.get();
if (count == 0) return 0;
return currentLatencySum.longValue() / count;
}
public long getP95Latency() {
if (recentLatencies.isEmpty()) return 0;
List<Long> sorted = new ArrayList<>(recentLatencies);
Collections.sort(sorted);
int index = (int) Math.ceil(0.95 * sorted.size()) - 1;
return index >= 0 ? sorted.get(index) : 0;
}
public long getTotalRequests() {
return successCount.get() + failureCount.get();
}
}
六、高级功能实现
6.1 对话历史管理
实现对话历史持久化和管理功能,支持多轮对话上下文维护。
// 对话服务
@Service
public class ConversationService {
private final ConversationRepository conversationRepository;
private final MessageRepository messageRepository;
private final RedisTemplate<String, Object> redisTemplate;
public ConversationService(ConversationRepository conversationRepository,
MessageRepository messageRepository,
RedisTemplate<String, Object> redisTemplate) {
this.conversationRepository = conversationRepository;
this.messageRepository = messageRepository;
this.redisTemplate = redisTemplate;
}
@Transactional
public Conversation createConversation(String title, User user) {
Conversation conversation = new Conversation();
conversation.setTitle(title);
conversation.setUserId(user.getId());
conversation.setCreatedAt(LocalDateTime.now());
conversation.setUpdatedAt(LocalDateTime.now());
return conversationRepository.save(conversation);
}
@Transactional
public Message addMessageToConversation(Long conversationId, String role,
String content, String model) {
Conversation conversation = conversationRepository.findById(conversationId)
.orElseThrow(() -> new ConversationNotFoundException(conversationId));
Message message = new Message();
message.setConversationId(conversationId);
message.setRole(role);
message.setContent(content);
message.setModel(model);
message.setTimestamp(LocalDateTime.now());
message.setTokenCount(estimateTokenCount(content));
Message savedMessage = messageRepository.save(message);
// 更新对话时间
conversation.setUpdatedAt(LocalDateTime.now());
conversationRepository.save(conversation);
// 缓存最新消息
cacheRecentMessages(conversationId);
return savedMessage;
}
public List<Message> getConversationMessages(Long conversationId, int limit) {
// 先尝试从缓存获取
String cacheKey = "conversation:" + conversationId + ":messages";
List<Message> cachedMessages = (List<Message>) redisTemplate.opsForValue().get(cacheKey);
if (cachedMessages != null && cachedMessages.size() >= limit) {
return cachedMessages.stream().limit(limit).collect(Collectors.toList());
}
// 缓存未命中,从数据库获取
List<Message> messages = messageRepository.findByConversationIdOrderByTimestampDesc(
conversationId, PageRequest.of(0, limit));
// 更新缓存
redisTemplate.opsForValue().set(cacheKey, messages, 1, TimeUnit.HOURS);
return messages;
}
public List<Conversation> getUserConversations(Long userId, int page, int size) {
return conversationRepository.findByUserIdOrderByUpdatedAtDesc(
userId, PageRequest.of(page, size));
}
@Transactional
public void deleteConversation(Long conversationId) {
// 删除消息
messageRepository.deleteByConversationId(conversationId);
// 删除对话
conversationRepository.deleteById(conversationId);
// 清除缓存
String cacheKey = "conversation:" + conversationId + ":messages";
redisTemplate.delete(cacheKey);
}
private void cacheRecentMessages(Long conversationId) {
// 获取最近50条消息并缓存
List<Message> recentMessages = messageRepository
.findByConversationIdOrderByTimestampDesc(conversationId, PageRequest.of(0, 50));
String cacheKey = "conversation:" + conversationId + ":messages";
redisTemplate.opsForValue().set(cacheKey, recentMessages, 1, TimeUnit.HOURS);
}
private int estimateTokenCount(String text) {
// 简单估算:中文大约1个字=1.3个token,英文1个单词=1.3个token
if (text == null || text.isEmpty()) return 0;
int chineseCount = 0;
int englishCount = 0;
for (char c : text.toCharArray()) {
if (Character.UnicodeBlock.of(c) == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS) {
chineseCount++;
} else if (Character.isLetter(c)) {
englishCount++;
}
}
// 粗略估算
return (int) (chineseCount * 1.3 + englishCount * 0.25);
}
}
6.2 Token使用统计与限额管理
实现Token使用统计和用户限额管理功能。
// 使用统计服务
@Service
public class UsageStatsService {
private final UsageRecordRepository usageRecordRepository;
private final UserRepository userRepository;
private final RedisTemplate<String, Object> redisTemplate;
@Scheduled(cron = "0 0 0 * * ?") // 每天凌晨执行
public void resetDailyCounters() {
// 重置每日使用计数
userRepository.resetDailyUsage();
}
@Transactional
public void recordUsage(Long userId, String model, int promptTokens,
int completionTokens, double cost) {
// 记录使用情况
UsageRecord record = new UsageRecord();
record.setUserId(userId);
record.setModel(model);
record.setPromptTokens(promptTokens);
record.setCompletionTokens(completionTokens);
record.setTotalTokens(promptTokens + completionTokens);
record.setCost(cost);
record.setRecordTime(LocalDateTime.now());
usageRecordRepository.save(record);
// 更新用户统计
updateUserUsageStats(userId, promptTokens, completionTokens, cost);
}
private void updateUserUsageStats(Long userId, int promptTokens,
int completionTokens, double cost) {
User user = userRepository.findById(userId)
.orElseThrow(() -> new UserNotFoundException(userId));
user.setTotalTokensUsed(user.getTotalTokensUsed() + promptTokens + completionTokens);
user.setDailyTokensUsed(user.getDailyTokensUsed() + promptTokens + completionTokens);
user.setTotalCost(user.getTotalCost() + cost);
user.setDailyCost(user.getDailyCost() + cost);
userRepository.save(user);
// 更新缓存
updateUsageCache(userId, user);
}
public UsageStats getUsageStats(Long userId) {
String cacheKey = "user:" + userId + ":usage";
UsageStats cachedStats = (UsageStats) redisTemplate.opsForValue().get(cacheKey);
if (cachedStats != null) {
return cachedStats;
}
User user = userRepository.findById(userId)
.orElseThrow(() -> new UserNotFoundException(userId));
// 计算更详细的统计数据
UsageStats stats = new UsageStats();
stats.setTotalTokens(user.getTotalTokensUsed());
stats.setDailyTokens(user.getDailyTokensUsed());
stats.setTotalCost(user.getTotalCost());
stats.setDailyCost(user.getDailyCost());
// 计算各模型使用情况
List<ModelUsage> modelUsage = usageRecordRepository.getModelUsageStats(userId);
stats.setModelUsage(modelUsage);
// 计算最近使用趋势
LocalDate now = LocalDate.now();
List<DailyUsage> dailyUsage = new ArrayList<>();
for (int i = 6; i >= 0; i--) {
LocalDate date = now.minusDays(i);
DailyUsage usage = usageRecordRepository.getDailyUsage(userId, date);
dailyUsage.add(usage);
}
stats.setWeeklyUsage(dailyUsage);
// 缓存结果
redisTemplate.opsForValue().set(cacheKey, stats, 30, TimeUnit.MINUTES);
return stats;
}
public boolean checkQuota(Long userId, int estimatedTokens) {
User user = userRepository.findById(userId)
.orElseThrow(() -> new UserNotFoundException(userId));
// 检查总配额
if (user.getTotalTokensUsed() + estimatedTokens > user.getTokenQuota()) {
return false;
}
// 检查每日配额
if (user.getDailyTokensUsed() + estimatedTokens > user.getDailyTokenQuota()) {
return false;
}
return true;
}
private void updateUsageCache(Long userId, User user) {
String cacheKey = "user:" + userId + ":usage";
redisTemplate.delete(cacheKey);
}
}
// 使用统计数据结构
public class UsageStats {
private long totalTokens;
private long dailyTokens;
private double totalCost;
private double dailyCost;
private List<ModelUsage> modelUsage;
private List<DailyUsage> weeklyUsage;
// getters and setters
}
public class ModelUsage {
private String model;
private long tokenCount;
private double cost;
// getters and setters
}
public class DailyUsage {
private LocalDate date;
private long tokenCount;
private double cost;
// getters and setters
}
七、安全与性能优化
7.1 API安全防护
实现全面的API安全防护机制,包括认证、授权、限流和防注入攻击。
// JWT认证过滤器
@Component
public class JwtAuthenticationFilter implements WebFilter {
private final JwtTokenProvider tokenProvider;
private final UserDetailsService userDetailsService;
public JwtAuthenticationFilter(JwtTokenProvider tokenProvider,
UserDetailsService userDetailsService) {
this.tokenProvider = tokenProvider;
this.userDetailsService = userDetailsService;
}
@Override
public Mono<Void> filter(ServerWebExchange exchange, WebFilterChain chain) {
String path = exchange.getRequest().getPath().value();
// 跳过登录和公开端点
if (path.startsWith("/api/auth/login") ||
path.startsWith("/api/public/") ||
path.equals("/") ||
path.startsWith("/static/")) {
return chain.filter(exchange);
}
String token = resolveToken(exchange.getRequest());
if (StringUtils.hasText(token) && tokenProvider.validateToken(token)) {
Authentication authentication = getAuthentication(token);
return chain.filter(exchange)
.contextWrite(ReactiveSecurityContextHolder.withAuthentication(authentication));
}
return Mono.error(new AuthenticationException("Invalid or missing token"));
}
private String resolveToken(ServerHttpRequest request) {
String bearerToken = request.getHeaders().getFirst("Authorization");
if (StringUtils.hasText(bearerToken) && bearerToken.startsWith("Bearer ")) {
return bearerToken.substring(7);
}
return null;
}
private Authentication getAuthentication(String token) {
String username = tokenProvider.getUsernameFromToken(token);
UserDetails userDetails = userDetailsService.loadUserByUsername(username);
return new UsernamePasswordAuthenticationToken(userDetails, "", userDetails.getAuthorities());
}
}
// 速率限制服务
@Service
public class RateLimiterService {
private final RedisTemplate<String, Object> redisTemplate;
private final Map<String, RateLimitConfig> limitConfigs;
public RateLimiterService(RedisTemplate<String, Object> redisTemplate) {
this.redisTemplate = redisTemplate;
this.limitConfigs = loadRateLimitConfigs();
}
public Mono<Boolean> checkRateLimit(String userId, String endpoint) {
String key = "rate_limit:" + userId + ":" + endpoint;
RateLimitConfig config = limitConfigs.getOrDefault(endpoint,
new RateLimitConfig(100, 60)); // 默认限制
return redisTemplate.opsForValue().increment(key, 1)
.flatMap(count -> {
if (count == 1) {
// 第一次请求,设置过期时间
return redisTemplate.expire(key, config.getTimeWindow(), TimeUnit.SECONDS)
.then(Mono.just(count <= config.getMaxRequests()));
}
return Mono.just(count <= config.getMaxRequests());
})
.defaultIfEmpty(true);
}
public Mono<Boolean> checkModelRateLimit(String userId, String model) {
String key = "model_limit:" + userId + ":" + model;
// 模型特定限制
RateLimitConfig config = new RateLimitConfig(50, 60); // 每分钟50次
return checkRateLimitWithKey(key, config);
}
private Mono<Boolean> checkRateLimitWithKey(String key, RateLimitConfig config) {
return redisTemplate.opsForValue().increment(key, 1)
.flatMap(count -> {
if (count == 1) {
return redisTemplate.expire(key, config.getTimeWindow(), TimeUnit.SECONDS)
.then(Mono.just(count <= config.getMaxRequests()));
}
return Mono.just(count <= config.getMaxRequests());
})
.defaultIfEmpty(true);
}
private Map<String, RateLimitConfig> loadRateLimitConfigs() {
Map<String, RateLimitConfig> configs = new HashMap<>();
configs.put("/api/v1/chat/completions", new RateLimitConfig(30, 60)); // 每分钟30次
configs.put("/api/v1/models", new RateLimitConfig(10, 60)); // 每分钟10次
configs.put("/api/v1/usage", new RateLimitConfig(5, 60)); // 每分钟5次
return configs;
}
}
// 输入验证与防护
@Component
public class InputValidationFilter implements WebFilter {
private final List<Pattern> maliciousPatterns;
public InputValidationFilter() {
this.maliciousPatterns = Arrays.asList(
Pattern.compile("<script.*?>", Pattern.CASE_INSENSITIVE),
Pattern.compile("javascript:", Pattern.CASE_INSENSITIVE),
Pattern.compile("onload\\s*=", Pattern.CASE_INSENSITIVE),
Pattern.compile("union.*select", Pattern.CASE_INSENSITIVE),
Pattern.compile("drop.*table", Pattern.CASE_INSENSITIVE)
);
}
@Override
public Mono<Void> filter(ServerWebExchange exchange, WebFilterChain chain) {
// 检查URL参数
boolean hasMaliciousParams = exchange.getRequest().getQueryParams().values().stream()
.anyMatch(this::containsMaliciousContent);
if (hasMaliciousContent) {
return Mono.error(new SecurityException("Malicious input detected"));
}
// 对于POST请求,检查body
if (exchange.getRequest().getMethod() == HttpMethod.POST) {
return exchange.getRequest().getBody()
.next()
.flatMap(dataBuffer -> {
String body = dataBuffer.toString(StandardCharsets.UTF_8);
if (containsMaliciousContent(body)) {
return Mono.error(new SecurityException("Malicious input detected"));
}
return chain.filter(exchange);
});
}
return chain.filter(exchange);
}
private boolean containsMaliciousContent(String input) {
if (input == null) return false;
return maliciousPatterns.stream()
.anyMatch(pattern -> pattern.matcher(input).find());
}
}
7.2 性能优化策略
实现多种性能优化策略,包括缓存、连接池优化和响应压缩。
// 响应缓存服务
@Service
public class ResponseCacheService {
private final RedisTemplate<String, Object> redisTemplate;
private final ObjectMapper objectMapper;
public ResponseCacheService(RedisTemplate<String, Object> redisTemplate,
ObjectMapper objectMapper) {
this.redisTemplate = redisTemplate;
this.objectMapper = objectMapper;
}
public Mono<Object> getCachedResponse(String cacheKey) {
return redisTemplate.opsForValue().get(cacheKey)
.map(cached -> {
try {
return objectMapper.readValue(cached.toString(), Object.class);
} catch (Exception e) {
return null;
}
});
}
public Mono<Boolean> cacheResponse(String cacheKey, Object response, Duration ttl) {
try {
String jsonResponse = objectMapper.writeValueAsString(response);
return redisTemplate.opsForValue().set(cacheKey, jsonResponse, ttl);
} catch (Exception e) {
return Mono.just(false);
}
}
public String generateCacheKey(String endpoint, Map<String, Object> params) {
String paramString = params.entrySet().stream()
.sorted(Map.Entry.comparingByKey())
.map(entry -> entry.getKey() + "=" + entry.getValue())
.collect(Collectors.joining("&"));
return endpoint + ":" + DigestUtils.md5DigestAsHex(paramString.getBytes());
}
}
// 数据库连接池配置
@Configuration
public class DatabaseConfig {
@Value("${spring.datasource.url}")
private String url;
@Value("${spring.datasource.username}")
private String username;
@Value("${spring.datasource.password}")
private String password;
@Bean
public HikariDataSource dataSource() {
HikariConfig config = new HikariConfig();
config.setJdbcUrl(url);
config.setUsername(username);
config.setPassword(password);
config.setMaximumPoolSize(20);
config.setMinimumIdle(5);
config.setConnectionTimeout(30000);
config.setIdleTimeout(600000);
config.setMaxLifetime(1800000);
config.setAutoCommit(true);
config.setPoolName("LLMAppPool");
// 优化配置
config.addDataSourceProperty("cachePrepStmts", "true");
config.addDataSourceProperty("prepStmtCacheSize", "250");
config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
config.addDataSourceProperty("useServerPrepStmts", "true");
config.addDataSourceProperty("useLocalSessionState", "true");
config.addDataSourceProperty("rewriteBatchedStatements", "true");
config.addDataSourceProperty("cacheResultSetMetadata", "true");
config.addDataSourceProperty("cacheServerConfiguration", "true");
config.addDataSourceProperty("elideSetAutoCommits", "true");
config.addDataSourceProperty("maintainTimeStats", "false");
return new HikariDataSource(config);
}
}
// WebClient配置优化
@Configuration
public class WebClientConfig {
@Bean
public WebClient webClient(WebClient.Builder builder) {
return builder
.clientConnector(new ReactorClientHttpConnector(
HttpClient.create()
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 5000)
.doOnConnected(conn ->
conn.addHandlerLast(new ReadTimeoutHandler(10, TimeUnit.SECONDS))
.addHandlerLast(new WriteTimeoutHandler(10, TimeUnit.SECONDS))
)
.responseTimeout(Duration.ofSeconds(10))
.compress(true)
))
.baseUrl("https://maas-api.lanyun.net")
.defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
.build();
}
@Bean
public WebClient.Builder webClientBuilder() {
return WebClient.builder()
.filter(ExchangeFilterers.rateLimiting())
.filter(logRequest())
.filter(logResponse());
}
private ExchangeFilterFunction logRequest() {
return (request, next) -> {
logger.info("Request: {} {}", request.method(), request.url());
request.headers().forEach((name, values) ->
values.forEach(value -> logger.info("{}: {}", name, value)));
return next.exchange(request);
};
}
private ExchangeFilterFunction logResponse() {
return ExchangeFilterFunction.ofResponseProcessor(response -> {
logger.info("Response: {}", response.statusCode());
return Mono.just(response);
});
}
}
八、部署与监控
8.1 Docker容器化部署
提供完整的Docker部署方案,包括Dockerfile和docker-compose配置。
# 前端Dockerfile
FROM node:18-alpine as frontend-build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# 生产阶段
FROM nginx:alpine
COPY --from=frontend-build /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
# 后端Dockerfile
FROM openjdk:17-jdk-alpine as backend-build
WORKDIR /app
COPY mvnw pom.xml ./
COPY .mvn .mvn
RUN ./mvnw dependency:go-offline -B
COPY src src
RUN ./mvnw package -DskipTests
# 生产阶段
FROM openjdk:17-jdk-alpine
WORKDIR /app
COPY --from=backend-build /app/target/*.jar app.jar
# 创建非root用户
RUN addgroup -S spring && adduser -S spring -G spring
USER spring:spring
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]
# docker-compose.yml
version: '3.8'
services:
frontend:
build:
context: ./frontend
dockerfile: Dockerfile
ports:
- "80:80"
networks:
- llm-network
depends_on:
- backend
environment:
- API_BASE_URL=http://backend:8080
backend:
build:
context: ./backend
dockerfile: Dockerfile
ports:
- "8080:8080"
networks:
- llm-network
depends_on:
- redis
- mysql
environment:
- SPRING_DATASOURCE_URL=jdbc:mysql://mysql:3306/llm_app
- SPRING_REDIS_HOST=redis
- MAAS_API_KEY=${MAAS_API_KEY}
mysql:
image: mysql:8.0
environment:
- MYSQL_ROOT_PASSWORD=rootpassword
- MYSQL_DATABASE=llm_app
- MYSQL_USER=llm_user
- MYSQL_PASSWORD=llm_password
volumes:
- mysql_data:/var/lib/mysql
networks:
- llm-network
ports:
- "3306:3306"
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
networks:
- llm-network
ports:
- "6379:6379"
prometheus:
image: prom/prometheus:latest
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
ports:
- "9090:9090"
networks:
- llm-network
grafana:
image: grafana/grafana:latest
volumes:
- grafana_data:/var/lib/grafana
- ./monitoring/grafana-dashboards:/var/lib/grafana/dashboards
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
ports:
- "3000:3000"
networks:
- llm-network
depends_on:
- prometheus
volumes:
mysql_data:
redis_data:
prometheus_data:
grafana_data:
networks:
llm-network:
driver: bridge
8.2 监控与告警配置
实现全面的应用监控和告警系统。
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'llm-backend'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['backend:8080']
labels:
application: 'llm-backend'
- job_name: 'llm-frontend'
metrics_path: '/metrics'
static_configs:
- targets: ['frontend:80']
labels:
application: 'llm-frontend'
- job_name: 'mysql'
static_configs:
- targets: ['mysql:9104']
metrics_path: '/metrics'
- job_name: 'redis'
static_configs:
- targets: ['redis:9121']
metrics_path: '/metrics'
alerting:
alertmanagers:
- static_configs:
- targets:
- 'alertmanager:9093'
rule_files:
- 'alerts/*.yml'
# alert-rules.yml
groups:
- name: llm-app
rules:
- alert: HighErrorRate
expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) / rate(http_server_requests_seconds_count[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "高错误率检测"
description: "应用错误率超过5%,当前值: {{ $value }}"
- alert: HighResponseTime
expr: histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[5m])) > 2
for: 10m
labels:
severity: warning
annotations:
summary: "高响应时间检测"
description: "95%分位响应时间超过2秒,当前值: {{ $value }}s"
- alert: ModelTimeout
expr: rate(model_timeout_total[5m]) > 0
for: 2m
labels:
severity: warning
annotations:
summary: "模型超时检测"
description: "检测到模型调用超时,当前超时次数: {{ $value }}"
- alert: RateLimitExceeded
expr: rate(rate_limit_exceeded_total[5m]) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "速率限制频繁触发"
description: "速率限制频繁触发,可能表明异常流量模式"
九、测试与质量保障
9.1 自动化测试策略
实现全面的自动化测试覆盖,包括单元测试、集成测试和端到端测试。
// 模型服务单元测试
@ExtendWith(MockitoExtension.class)
class DeepSeekModelServiceTest {
@Mock
private WebClient webClient;
@Mock
private WebClient.RequestHeadersUriSpec requestHeadersUriSpec;
@Mock
private WebClient.RequestHeadersSpec requestHeadersSpec;
@Mock
private WebClient.ResponseSpec responseSpec;
@InjectMocks
private DeepSeekModelService modelService;
@Test
void testInvokeModelSuccess() {
// 准备测试数据
ChatRequest request = new ChatRequest();
request.setModel("/maas/deepseek-ai/DeepSeek-R1-0528");
request.setMessages(List.of(
new ChatMessage("user", "你好")
));
request.setStream(false);
// 模拟WebClient调用链
when(webClient.post()).thenReturn(requestHeadersUriSpec);
when(requestHeadersUriSpec.uri(anyString())).thenReturn(requestHeadersUriSpec);
when(requestHeadersUriSpec.header(anyString(), anyString())).thenReturn(requestHeadersUriSpec);
when(requestHeadersUriSpec.bodyValue(any())).thenReturn(requestHeadersSpec);
when(requestHeadersSpec.retrieve()).thenReturn(responseSpec);
when(responseSpec.bodyToMono(String.class)).thenReturn(Mono.just(createSuccessResponse()));
// 执行测试
StepVerifier.create(modelService.invokeModel(request, null))
.expectNextMatches(response ->
response.getMessage().getContent().contains("你好") &&
response.getId() != null)
.verifyComplete();
// 验证调用
verify(webClient).post();
verify(requestHeadersUriSpec).uri("/chat/completions");
}
@Test
void testInvokeModelTimeout() {
ChatRequest request = new ChatRequest();
request.setModel("/maas/deepseek-ai/DeepSeek-R1-0528");
request.setMessages(List.of(new ChatMessage("user", "test")));
when(webClient.post()).thenReturn(requestHeadersUriSpec);
when(requestHeadersUriSpec.uri(anyString())).thenReturn(requestHeadersUriSpec);
when(requestHeadersUriSpec.header(anyString(), anyString())).thenReturn(requestHeadersUriSpec);
when(requestHeadersUriSpec.bodyValue(any())).thenReturn(requestHeadersSpec);
when(requestHeadersSpec.retrieve()).thenReturn(responseSpec);
when(responseSpec.bodyToMono(String.class))
.thenReturn(Mono.delay(Duration.ofSeconds(35)).then(Mono.just("response")));
StepVerifier.create(modelService.invokeModel(request, null))
.expectError(ModelTimeoutException.class)
.verify();
}
private String createSuccessResponse() {
return """
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "/maas/deepseek-ai/DeepSeek-R1-0528",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "你好!我是DeepSeek AI助手,有什么可以帮你的吗?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}
""";
}
}
// API集成测试
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@Testcontainers
class ApiIntegrationTest {
@Container
static MySQLContainer<?> mysql = new MySQLContainer<>("mysql:8.0");
@Container
static GenericContainer<?> redis = new GenericContainer<>("redis:7-alpine")
.withExposedPorts(6379);
@DynamicPropertySource
static void configureProperties(DynamicPropertyRegistry registry) {
registry.add("spring.datasource.url", mysql::getJdbcUrl);
registry.add("spring.datasource.username", mysql::getUsername);
registry.add("spring.datasource.password", mysql::getPassword);
registry.add("spring.redis.host", redis::getHost);
registry.add("spring.redis.port", () -> redis.getMappedPort(6379));
}
@Test
void testChatEndpoint(@Autowired WebTestClient webTestClient) {
// 获取认证token
String token = obtainAuthToken(webTestClient);
// 测试聊天接口
webTestClient.post()
.uri("/api/v1/chat/completions")
.header("Authorization", "Bearer " + token)
.contentType(MediaType.APPLICATION_JSON)
.bodyValue("""
{
"model": "/maas/deepseek-ai/DeepSeek-R1-0528",
"messages": [{"role": "user", "content": "你好"}],
"stream": false
}
""")
.exchange()
.expectStatus().isOk()
.expectBody()
.jsonPath("$.choices[0].message.content").exists()
.jsonPath("$.id").isNotEmpty();
}
private String obtainAuthToken(WebTestClient webTestClient) {
// 实现获取token的逻辑
return "mock-token";
}
}
学习资源
- 注册与认证(https://console.lanyun.net/#/register)
- 蓝耘MaaS官方文档
- OpenAI Python SDK文档
- FastAPI官方文档
- 异步Python编程指南
通过持续学习和实践,您可以充分利用MaaS平台的强大能力,构建出AI应用。
更多推荐


所有评论(0)