YOLO+Java+Spring Boot:从0到1构建高可用、可扩展的AI检测微服务(附完整源码)
资源层:单例管理ONNX Runtime核心对象,线程池控制并发,避免资源泄漏;推理层:封装YOLO核心逻辑,与业务解耦,方便模型版本替换;接口层:提供同步/异步接口,满足不同并发场景;扩展层:支持批量检测、缓存、集群部署,适配工业级生产需求。这套架构既保证了模型推理的稳定性,又具备良好的扩展性,可直接落地到实际项目中。
在实际生产场景中,单纯把YOLO模型跑通Java还不够——业务需要的是“可部署、可扩展、高可用”的微服务:支持HTTP接口调用、批量检测、异步处理、资源隔离,甚至横向扩展。
本文基于Spring Boot框架,整合YOLO模型(以YOLOv8为例),从架构设计、核心代码实现到性能优化,手把手教你构建一套工业级的AI检测微服务,解决“模型能跑但用不起来”的痛点。
一、整体架构设计(可扩展的核心)
先明确微服务的核心诉求:低耦合、高可用、易扩展,整体架构分为4层,避免“一把梭”式的代码堆砌:
- API网关层:提供RESTful接口,处理请求参数校验、响应封装、异常统一处理;
- 业务服务层:核心是异步处理+任务分发,支持同步/异步检测、批量任务,避免请求阻塞;
- 模型推理层:封装YOLO推理逻辑,与业务解耦,方便替换模型版本;
- 资源管理层:单例管理ONNX Runtime核心对象,线程池控制推理并发,避免资源泄漏。
二、环境准备(避坑前置)
1. 核心依赖(pom.xml)
整合Spring Boot+ONNX Runtime+OpenCV,版本精准匹配是关键:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.7.15</version> <!-- 稳定版,避坑最新版兼容性问题 -->
<relativePath/>
</parent>
<groupId>com.ai</groupId>
<artifactId>yolo-spring-boot-demo</artifactId>
<version>1.0.0</version>
<dependencies>
<!-- Spring Boot核心 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-async</artifactId> <!-- 异步处理 -->
</dependency>
<!-- ONNX Runtime Java(核心:模型推理) -->
<dependency>
<groupId>com.microsoft.onnxruntime</groupId>
<artifactId>onnxruntime</artifactId>
<version>1.14.1</version>
<classifier>win-x86_64</classifier> <!-- Windows x86_64,Linux换linux-x86_64 -->
</dependency>
<!-- OpenCV(图片处理) -->
<dependency>
<groupId>org.openpnp</groupId>
<artifactId>opencv</artifactId>
<version>4.7.0-0</version> <!-- 适配ONNX Runtime -->
</dependency>
<!-- 工具类 -->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson2</artifactId>
<version>2.0.32</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<excludes>
<exclude>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</build>
</project>
2. 模型准备
提前将YOLOv8模型导出为ONNX格式(避坑要点回顾):
from ultralytics import YOLO
import torch
model = YOLO("yolov8n.pt")
model.eval()
dummy_input = torch.randn(1, 3, 640, 640)
model.export(
format="onnx",
imgsz=640,
batch=1,
opset=12,
simplify=True,
device="cpu",
nms=False
)
将导出的yolov8n.onnx放到项目resources/models目录下。
三、核心代码实现(分层设计)
1. 资源管理层:模型单例+线程池配置
核心避坑:OrtEnvironment和OrtSession不能频繁创建,必须单例;推理任务需用线程池控制并发,避免CPU打满。
(1)模型单例管理
package com.ai.yolo.manager;
import ai.onnxruntime.OrtEnvironment;
import ai.onnxruntime.OrtSession;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;
import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;
import java.io.File;
/**
* YOLO模型管理器:单例管理ONNX Runtime核心对象
*/
@Slf4j
@Component
public class YoloModelManager {
// 模型路径(配置文件读取)
@Value("${yolo.model.path:classpath:models/yolov8n.onnx}")
private String modelPath;
// ONNX Runtime核心对象(全局单例)
private OrtEnvironment env;
private OrtSession session;
/**
* 初始化:项目启动时加载模型
*/
@PostConstruct
public void initModel() {
try {
// 1. 创建环境
env = OrtEnvironment.getEnvironment();
// 2. 构建会话配置(性能优化+日志关闭)
OrtSession.SessionOptions options = new OrtSession.SessionOptions();
options.setLogSeverityLevel(OrtSession.SessionOptions.LogLevel.ORT_LOG_LEVEL_ERROR);
options.setIntraOpNumThreads(Runtime.getRuntime().availableProcessors() / 2); // 线程数=CPU核心数/2,避免资源抢占
// 3. 加载模型
File modelFile = new File(this.getClass().getClassLoader().getResource("models/yolov8n.onnx").getPath());
session = env.createSession(modelFile.getAbsolutePath(), options);
log.info("YOLO模型加载成功,输入节点:{}", session.getInputNames());
} catch (Exception e) {
log.error("YOLO模型加载失败", e);
throw new RuntimeException("模型初始化失败", e);
}
}
/**
* 销毁:项目关闭时释放资源
*/
@PreDestroy
public void destroyModel() {
if (session != null) {
session.close();
log.info("OrtSession已关闭");
}
if (env != null) {
env.close();
log.info("OrtEnvironment已关闭");
}
}
// 获取会话(对外提供)
public OrtSession getSession() {
return session;
}
// 获取环境(对外提供)
public OrtEnvironment getEnv() {
return env;
}
}
(2)线程池配置(异步推理)
package com.ai.yolo.config;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableAsync;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import java.util.concurrent.Executor;
import java.util.concurrent.ThreadPoolExecutor;
/**
* 异步线程池配置:控制推理任务并发
*/
@Configuration
@EnableAsync
public class AsyncConfig {
@Bean("yoloExecutor")
public Executor yoloExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
// 核心线程数=CPU核心数(根据服务器配置调整)
executor.setCorePoolSize(Runtime.getRuntime().availableProcessors());
// 最大线程数
executor.setMaxPoolSize(Runtime.getRuntime().availableProcessors() * 2);
// 队列容量
executor.setQueueCapacity(100);
// 线程前缀
executor.setThreadNamePrefix("yolo-infer-");
// 拒绝策略:队列满时抛异常(或自定义降级)
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.AbortPolicy());
// 初始化
executor.initialize();
return executor;
}
}
2. 模型推理层:YOLO核心推理逻辑
封装推理、张量转换、NMS逻辑,与业务解耦:
package com.ai.yolo.service;
import ai.onnxruntime.OrtTensor;
import ai.onnxruntime.OrtSession;
import com.ai.yolo.manager.YoloModelManager;
import lombok.extern.slf4j.Slf4j;
import org.opencv.core.Mat;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.nio.FloatBuffer;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
/**
* YOLO推理核心服务:封装模型推理逻辑
*/
@Slf4j
@Service
public class YoloInferService {
@Autowired
private YoloModelManager modelManager;
// YOLOv8参数配置
private static final int INPUT_SIZE = 640;
private static final float CONF_THRESH = 0.25f;
private static final float IOU_THRESH = 0.45f;
private static final int NUM_CLASSES = 80; // COCO数据集80类
/**
* 图片推理核心方法
* @param originalMat 原始图片Mat
* @return 检测结果
*/
public List<YoloBox> infer(Mat originalMat) {
try {
// 1. 图片转输入张量
OrtTensor inputTensor = imageToTensor(originalMat);
// 2. 模型推理
OrtSession.Result result = modelManager.getSession().run(Map.of("images", inputTensor));
// 3. 解析输出张量
float[] outputArray = ((float[][][]) result.get(0).getValue())[0]; // YOLOv8输出:84×8400 → 展平为一维
// 4. 解析检测框
List<YoloBox> allBoxes = parseOutput(outputArray, originalMat.width(), originalMat.height());
// 5. NMS过滤
return yoloNMS(allBoxes, CONF_THRESH, IOU_THRESH);
} catch (Exception e) {
log.error("YOLO推理失败", e);
throw new RuntimeException("推理失败", e);
}
}
/**
* 图片转YOLO输入张量(NCHW + 归一化 + RGB转换)
*/
private OrtTensor imageToTensor(Mat originalMat) throws Exception {
Mat resizedMat = new Mat();
org.opencv.imgproc.Imgproc.resize(originalMat, resizedMat, new org.opencv.core.Size(INPUT_SIZE, INPUT_SIZE));
// BGR→RGB
Mat rgbMat = new Mat();
org.opencv.imgproc.Imgproc.cvtColor(resizedMat, rgbMat, org.opencv.imgproc.Imgproc.COLOR_BGR2RGB);
// 归一化到0-1,转float32
rgbMat.convertTo(rgbMat, org.opencv.core.CvType.CV_32FC3, 1.0 / 255.0);
// NHWC→NCHW
float[] tensorData = new float[1 * 3 * INPUT_SIZE * INPUT_SIZE];
int index = 0;
for (int c = 0; c < 3; c++) {
for (int h = 0; h < INPUT_SIZE; h++) {
for (int w = 0; w < INPUT_SIZE; w++) {
tensorData[index++] = (float) rgbMat.get(h, w)[c];
}
}
}
// 创建张量
long[] shape = new long[]{1, 3, INPUT_SIZE, INPUT_SIZE};
FloatBuffer buffer = FloatBuffer.wrap(tensorData);
return OrtTensor.createTensor(modelManager.getEnv(), buffer, shape);
}
/**
* 解析YOLO输出为检测框(还原原始图片坐标)
*/
private List<YoloBox> parseOutput(float[] outputArray, int originalW, int originalH) {
List<YoloBox> boxes = new ArrayList<>();
int elementsPerBox = 4 + NUM_CLASSES; // 4个坐标 + 80类置信度
int numBoxes = outputArray.length / elementsPerBox;
for (int i = 0; i < numBoxes; i++) {
int baseIdx = i * elementsPerBox;
// 解析归一化坐标(x,y,w,h → x1,y1,x2,y2)
float cx = outputArray[baseIdx];
float cy = outputArray[baseIdx + 1];
float w = outputArray[baseIdx + 2];
float h = outputArray[baseIdx + 3];
// 转换为左上角、右下角
float x1 = (cx - w / 2) / INPUT_SIZE * originalW;
float y1 = (cy - h / 2) / INPUT_SIZE * originalH;
float x2 = (cx + w / 2) / INPUT_SIZE * originalW;
float y2 = (cy + h / 2) / INPUT_SIZE * originalH;
// 找最大置信度的类别
float maxConf = 0;
int classId = -1;
for (int c = 0; c < NUM_CLASSES; c++) {
float conf = outputArray[baseIdx + 4 + c];
if (conf > maxConf) {
maxConf = conf;
classId = c;
}
}
if (maxConf > CONF_THRESH) {
boxes.add(new YoloBox(x1, y1, x2, y2, maxConf, classId));
}
}
return boxes;
}
/**
* NMS非极大值抑制(按类别过滤)
*/
private List<YoloBox> yoloNMS(List<YoloBox> allBoxes, float confThresh, float iouThresh) {
// 过滤低置信度
List<YoloBox> validBoxes = allBoxes.stream().filter(b -> b.confidence > confThresh).toList();
// 按置信度降序
validBoxes.sort((a, b) -> Float.compare(b.confidence, a.confidence));
// NMS核心逻辑
List<YoloBox> result = new ArrayList<>();
while (!validBoxes.isEmpty()) {
YoloBox bestBox = validBoxes.get(0);
result.add(bestBox);
List<YoloBox> remaining = new ArrayList<>();
for (int i = 1; i < validBoxes.size(); i++) {
YoloBox currBox = validBoxes.get(i);
if (currBox.classId != bestBox.classId || calculateIOU(bestBox, currBox) < iouThresh) {
remaining.add(currBox);
}
}
validBoxes = remaining;
}
return result;
}
/**
* 计算IOU
*/
private float calculateIOU(YoloBox a, YoloBox b) {
float interX1 = Math.max(a.x1, b.x1);
float interY1 = Math.max(a.y1, b.y1);
float interX2 = Math.min(a.x2, b.x2);
float interY2 = Math.min(a.y2, b.y2);
float interArea = Math.max(0, interX2 - interX1) * Math.max(0, interY2 - interY1);
if (interArea == 0) return 0;
float areaA = (a.x2 - a.x1) * (a.y2 - a.y1);
float areaB = (b.x2 - b.x1) * (b.y2 - b.y1);
return interArea / (areaA + areaB - interArea);
}
// 检测框实体类
@lombok.Data
public static class YoloBox {
private float x1;
private float y1;
private float x2;
private float y2;
private float confidence;
private int classId;
public YoloBox(float x1, float y1, float x2, float y2, float confidence, int classId) {
this.x1 = x1;
this.y1 = y1;
this.x2 = x2;
this.y2 = y2;
this.confidence = confidence;
this.classId = classId;
}
}
}
3. 业务服务层:异步处理+接口封装
提供同步/异步两种接口,满足不同业务场景:
package com.ai.yolo.controller;
import com.ai.yolo.service.YoloInferService;
import com.alibaba.fastjson2.JSONObject;
import lombok.extern.slf4j.Slf4j;
import org.opencv.core.Mat;
import org.opencv.core.MatOfByte;
import org.opencv.imgcodecs.Imgcodecs;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.scheduling.annotation.Async;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;
import java.util.List;
import java.util.concurrent.CompletableFuture;
/**
* YOLO检测API接口层
*/
@Slf4j
@RestController
public class YoloDetectionController {
@Autowired
private YoloInferService yoloInferService;
/**
* 同步检测接口:适合小图片、低并发场景
*/
@PostMapping("/api/yolo/detect/sync")
public JSONObject detectSync(@RequestParam("file") MultipartFile file) {
JSONObject result = new JSONObject();
try {
// 1. 解析图片
byte[] bytes = file.getBytes();
MatOfByte matOfByte = new MatOfByte(bytes);
Mat imageMat = Imgcodecs.imdecode(matOfByte, Imgcodecs.IMREAD_COLOR);
if (imageMat.empty()) {
result.put("code", 500);
result.put("msg", "图片解析失败");
return result;
}
// 2. 模型推理
List<YoloInferService.YoloBox> boxes = yoloInferService.infer(imageMat);
// 3. 封装响应
result.put("code", 200);
result.put("msg", "success");
result.put("data", boxes);
return result;
} catch (Exception e) {
log.error("同步检测失败", e);
result.put("code", 500);
result.put("msg", "检测失败:" + e.getMessage());
return result;
}
}
/**
* 异步检测接口:适合大图片、高并发场景
*/
@Async("yoloExecutor")
@PostMapping("/api/yolo/detect/async")
public CompletableFuture<JSONObject> detectAsync(@RequestParam("file") MultipartFile file) {
JSONObject result = new JSONObject();
try {
byte[] bytes = file.getBytes();
MatOfByte matOfByte = new MatOfByte(bytes);
Mat imageMat = Imgcodecs.imdecode(matOfByte, Imgcodecs.IMREAD_COLOR);
if (imageMat.empty()) {
result.put("code", 500);
result.put("msg", "图片解析失败");
return CompletableFuture.completedFuture(result);
}
List<YoloInferService.YoloBox> boxes = yoloInferService.infer(imageMat);
result.put("code", 200);
result.put("msg", "success");
result.put("data", boxes);
return CompletableFuture.completedFuture(result);
} catch (Exception e) {
log.error("异步检测失败", e);
result.put("code", 500);
result.put("msg", "检测失败:" + e.getMessage());
return CompletableFuture.completedFuture(result);
}
}
}
4. 启动类+配置
package com.ai.yolo;
import org.opencv.core.Core;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import javax.annotation.PostConstruct;
@SpringBootApplication
public class YoloSpringBootApplication {
/**
* 加载OpenCV库(必须)
*/
@PostConstruct
public void initOpenCV() {
System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
System.out.println("OpenCV库加载成功,版本:" + Core.VERSION);
}
public static void main(String[] args) {
SpringApplication.run(YoloSpringBootApplication.class, args);
}
}
四、可扩展优化(工业级必备)
1. 批量检测支持
修改推理逻辑,支持批量图片输入(需提前导出batch>1的ONNX模型):
// 批量张量转换
public OrtTensor batchImagesToTensor(List<Mat> imageList) throws Exception {
int batchSize = imageList.size();
float[] tensorData = new float[batchSize * 3 * INPUT_SIZE * INPUT_SIZE];
int batchIndex = 0;
for (Mat mat : imageList) {
// 单张图片转换逻辑(同上),填充到batch维度
// ...
batchIndex++;
}
long[] shape = new long[]{batchSize, 3, INPUT_SIZE, INPUT_SIZE};
FloatBuffer buffer = FloatBuffer.wrap(tensorData);
return OrtTensor.createTensor(modelManager.getEnv(), buffer, shape);
}
2. 缓存优化
对高频检测的图片(如固定场景的监控图)做结果缓存,减少重复推理:
// 引入Spring Cache
@EnableCaching
@Configuration
public class CacheConfig {
@Bean
public CacheManager cacheManager() {
RedisCacheManager cacheManager = RedisCacheManager.builder(RedisConnectionFactory)
.cacheDefaults(RedisCacheConfiguration.defaultCacheConfig().entryTtl(Duration.ofMinutes(10)))
.build();
return cacheManager;
}
}
// 在推理方法上加缓存
@Cacheable(value = "yolo_detect", key = "#imageMd5")
public List<YoloBox> inferWithCache(String imageMd5, Mat originalMat) {
return infer(originalMat);
}
3. 横向扩展
结合Spring Cloud/Netflix Eureka,将微服务部署为集群,通过负载均衡分发请求:
- 每个节点独立加载模型,避免多节点共享模型导致的线程安全问题;
- 用Nginx做前端负载均衡,分发检测请求。
4. 监控告警
集成Spring Boot Actuator,监控模型推理耗时、线程池状态、内存使用:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
配置文件开启监控端点:
management:
endpoints:
web:
exposure:
include: health,metrics,threaddump
metrics:
tags:
application: yolo-detect-service
五、测试验证
- 启动Spring Boot应用,确保模型加载成功;
- 用Postman调用接口:
- 同步接口:
POST http://localhost:8080/api/yolo/detect/sync,上传图片文件; - 异步接口:
POST http://localhost:8080/api/yolo/detect/async,上传图片文件;
- 同步接口:
- 响应示例:
{
"code": 200,
"msg": "success",
"data": [
{
"x1": 100.5,
"y1": 80.2,
"x2": 200.8,
"y2": 180.5,
"confidence": 0.95,
"classId": 0
}
]
}
总结
基于Spring Boot构建YOLO AI检测微服务,核心是“分层解耦+资源管控+可扩展设计”:
- 资源层:单例管理ONNX Runtime核心对象,线程池控制并发,避免资源泄漏;
- 推理层:封装YOLO核心逻辑,与业务解耦,方便模型版本替换;
- 接口层:提供同步/异步接口,满足不同并发场景;
- 扩展层:支持批量检测、缓存、集群部署,适配工业级生产需求。
这套架构既保证了模型推理的稳定性,又具备良好的扩展性,可直接落地到实际项目中。
更多推荐


所有评论(0)