在实际生产场景中,单纯把YOLO模型跑通Java还不够——业务需要的是“可部署、可扩展、高可用”的微服务:支持HTTP接口调用、批量检测、异步处理、资源隔离,甚至横向扩展。

本文基于Spring Boot框架,整合YOLO模型(以YOLOv8为例),从架构设计、核心代码实现到性能优化,手把手教你构建一套工业级的AI检测微服务,解决“模型能跑但用不起来”的痛点。

一、整体架构设计(可扩展的核心)

先明确微服务的核心诉求:低耦合、高可用、易扩展,整体架构分为4层,避免“一把梭”式的代码堆砌:

客户端

API网关层(Spring MVC)

业务服务层(异步处理/任务分发)

模型推理层(YOLO+ONNX Runtime)

资源管理层(单例/线程池/内存控制)

  • API网关层:提供RESTful接口,处理请求参数校验、响应封装、异常统一处理;
  • 业务服务层:核心是异步处理+任务分发,支持同步/异步检测、批量任务,避免请求阻塞;
  • 模型推理层:封装YOLO推理逻辑,与业务解耦,方便替换模型版本;
  • 资源管理层:单例管理ONNX Runtime核心对象,线程池控制推理并发,避免资源泄漏。

二、环境准备(避坑前置)

1. 核心依赖(pom.xml)

整合Spring Boot+ONNX Runtime+OpenCV,版本精准匹配是关键:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.7.15</version> <!-- 稳定版,避坑最新版兼容性问题 -->
        <relativePath/>
    </parent>

    <groupId>com.ai</groupId>
    <artifactId>yolo-spring-boot-demo</artifactId>
    <version>1.0.0</version>

    <dependencies>
        <!-- Spring Boot核心 -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-async</artifactId> <!-- 异步处理 -->
        </dependency>

        <!-- ONNX Runtime Java(核心:模型推理) -->
        <dependency>
            <groupId>com.microsoft.onnxruntime</groupId>
            <artifactId>onnxruntime</artifactId>
            <version>1.14.1</version>
            <classifier>win-x86_64</classifier> <!-- Windows x86_64,Linux换linux-x86_64 -->
        </dependency>

        <!-- OpenCV(图片处理) -->
        <dependency>
            <groupId>org.openpnp</groupId>
            <artifactId>opencv</artifactId>
            <version>4.7.0-0</version> <!-- 适配ONNX Runtime -->
        </dependency>

        <!-- 工具类 -->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson2</artifactId>
            <version>2.0.32</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <excludes>
                        <exclude>
                            <groupId>org.projectlombok</groupId>
                            <artifactId>lombok</artifactId>
                        </exclude>
                    </excludes>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>
2. 模型准备

提前将YOLOv8模型导出为ONNX格式(避坑要点回顾):

from ultralytics import YOLO
import torch

model = YOLO("yolov8n.pt")
model.eval()
dummy_input = torch.randn(1, 3, 640, 640)
model.export(
    format="onnx",
    imgsz=640,
    batch=1,
    opset=12,
    simplify=True,
    device="cpu",
    nms=False
)

将导出的yolov8n.onnx放到项目resources/models目录下。

三、核心代码实现(分层设计)

1. 资源管理层:模型单例+线程池配置

核心避坑OrtEnvironmentOrtSession不能频繁创建,必须单例;推理任务需用线程池控制并发,避免CPU打满。

(1)模型单例管理
package com.ai.yolo.manager;

import ai.onnxruntime.OrtEnvironment;
import ai.onnxruntime.OrtSession;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;
import java.io.File;

/**
 * YOLO模型管理器:单例管理ONNX Runtime核心对象
 */
@Slf4j
@Component
public class YoloModelManager {
    // 模型路径(配置文件读取)
    @Value("${yolo.model.path:classpath:models/yolov8n.onnx}")
    private String modelPath;

    // ONNX Runtime核心对象(全局单例)
    private OrtEnvironment env;
    private OrtSession session;

    /**
     * 初始化:项目启动时加载模型
     */
    @PostConstruct
    public void initModel() {
        try {
            // 1. 创建环境
            env = OrtEnvironment.getEnvironment();
            // 2. 构建会话配置(性能优化+日志关闭)
            OrtSession.SessionOptions options = new OrtSession.SessionOptions();
            options.setLogSeverityLevel(OrtSession.SessionOptions.LogLevel.ORT_LOG_LEVEL_ERROR);
            options.setIntraOpNumThreads(Runtime.getRuntime().availableProcessors() / 2); // 线程数=CPU核心数/2,避免资源抢占
            // 3. 加载模型
            File modelFile = new File(this.getClass().getClassLoader().getResource("models/yolov8n.onnx").getPath());
            session = env.createSession(modelFile.getAbsolutePath(), options);
            log.info("YOLO模型加载成功,输入节点:{}", session.getInputNames());
        } catch (Exception e) {
            log.error("YOLO模型加载失败", e);
            throw new RuntimeException("模型初始化失败", e);
        }
    }

    /**
     * 销毁:项目关闭时释放资源
     */
    @PreDestroy
    public void destroyModel() {
        if (session != null) {
            session.close();
            log.info("OrtSession已关闭");
        }
        if (env != null) {
            env.close();
            log.info("OrtEnvironment已关闭");
        }
    }

    // 获取会话(对外提供)
    public OrtSession getSession() {
        return session;
    }

    // 获取环境(对外提供)
    public OrtEnvironment getEnv() {
        return env;
    }
}
(2)线程池配置(异步推理)
package com.ai.yolo.config;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableAsync;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;

import java.util.concurrent.Executor;
import java.util.concurrent.ThreadPoolExecutor;

/**
 * 异步线程池配置:控制推理任务并发
 */
@Configuration
@EnableAsync
public class AsyncConfig {

    @Bean("yoloExecutor")
    public Executor yoloExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        // 核心线程数=CPU核心数(根据服务器配置调整)
        executor.setCorePoolSize(Runtime.getRuntime().availableProcessors());
        // 最大线程数
        executor.setMaxPoolSize(Runtime.getRuntime().availableProcessors() * 2);
        // 队列容量
        executor.setQueueCapacity(100);
        // 线程前缀
        executor.setThreadNamePrefix("yolo-infer-");
        // 拒绝策略:队列满时抛异常(或自定义降级)
        executor.setRejectedExecutionHandler(new ThreadPoolExecutor.AbortPolicy());
        // 初始化
        executor.initialize();
        return executor;
    }
}
2. 模型推理层:YOLO核心推理逻辑

封装推理、张量转换、NMS逻辑,与业务解耦:

package com.ai.yolo.service;

import ai.onnxruntime.OrtTensor;
import ai.onnxruntime.OrtSession;
import com.ai.yolo.manager.YoloModelManager;
import lombok.extern.slf4j.Slf4j;
import org.opencv.core.Mat;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.nio.FloatBuffer;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
 * YOLO推理核心服务:封装模型推理逻辑
 */
@Slf4j
@Service
public class YoloInferService {
    @Autowired
    private YoloModelManager modelManager;

    // YOLOv8参数配置
    private static final int INPUT_SIZE = 640;
    private static final float CONF_THRESH = 0.25f;
    private static final float IOU_THRESH = 0.45f;
    private static final int NUM_CLASSES = 80; // COCO数据集80类

    /**
     * 图片推理核心方法
     * @param originalMat 原始图片Mat
     * @return 检测结果
     */
    public List<YoloBox> infer(Mat originalMat) {
        try {
            // 1. 图片转输入张量
            OrtTensor inputTensor = imageToTensor(originalMat);
            // 2. 模型推理
            OrtSession.Result result = modelManager.getSession().run(Map.of("images", inputTensor));
            // 3. 解析输出张量
            float[] outputArray = ((float[][][]) result.get(0).getValue())[0]; // YOLOv8输出:84×8400 → 展平为一维
            // 4. 解析检测框
            List<YoloBox> allBoxes = parseOutput(outputArray, originalMat.width(), originalMat.height());
            // 5. NMS过滤
            return yoloNMS(allBoxes, CONF_THRESH, IOU_THRESH);
        } catch (Exception e) {
            log.error("YOLO推理失败", e);
            throw new RuntimeException("推理失败", e);
        }
    }

    /**
     * 图片转YOLO输入张量(NCHW + 归一化 + RGB转换)
     */
    private OrtTensor imageToTensor(Mat originalMat) throws Exception {
        Mat resizedMat = new Mat();
        org.opencv.imgproc.Imgproc.resize(originalMat, resizedMat, new org.opencv.core.Size(INPUT_SIZE, INPUT_SIZE));
        // BGR→RGB
        Mat rgbMat = new Mat();
        org.opencv.imgproc.Imgproc.cvtColor(resizedMat, rgbMat, org.opencv.imgproc.Imgproc.COLOR_BGR2RGB);
        // 归一化到0-1,转float32
        rgbMat.convertTo(rgbMat, org.opencv.core.CvType.CV_32FC3, 1.0 / 255.0);
        // NHWC→NCHW
        float[] tensorData = new float[1 * 3 * INPUT_SIZE * INPUT_SIZE];
        int index = 0;
        for (int c = 0; c < 3; c++) {
            for (int h = 0; h < INPUT_SIZE; h++) {
                for (int w = 0; w < INPUT_SIZE; w++) {
                    tensorData[index++] = (float) rgbMat.get(h, w)[c];
                }
            }
        }
        // 创建张量
        long[] shape = new long[]{1, 3, INPUT_SIZE, INPUT_SIZE};
        FloatBuffer buffer = FloatBuffer.wrap(tensorData);
        return OrtTensor.createTensor(modelManager.getEnv(), buffer, shape);
    }

    /**
     * 解析YOLO输出为检测框(还原原始图片坐标)
     */
    private List<YoloBox> parseOutput(float[] outputArray, int originalW, int originalH) {
        List<YoloBox> boxes = new ArrayList<>();
        int elementsPerBox = 4 + NUM_CLASSES; // 4个坐标 + 80类置信度
        int numBoxes = outputArray.length / elementsPerBox;

        for (int i = 0; i < numBoxes; i++) {
            int baseIdx = i * elementsPerBox;
            // 解析归一化坐标(x,y,w,h → x1,y1,x2,y2)
            float cx = outputArray[baseIdx];
            float cy = outputArray[baseIdx + 1];
            float w = outputArray[baseIdx + 2];
            float h = outputArray[baseIdx + 3];
            // 转换为左上角、右下角
            float x1 = (cx - w / 2) / INPUT_SIZE * originalW;
            float y1 = (cy - h / 2) / INPUT_SIZE * originalH;
            float x2 = (cx + w / 2) / INPUT_SIZE * originalW;
            float y2 = (cy + h / 2) / INPUT_SIZE * originalH;

            // 找最大置信度的类别
            float maxConf = 0;
            int classId = -1;
            for (int c = 0; c < NUM_CLASSES; c++) {
                float conf = outputArray[baseIdx + 4 + c];
                if (conf > maxConf) {
                    maxConf = conf;
                    classId = c;
                }
            }

            if (maxConf > CONF_THRESH) {
                boxes.add(new YoloBox(x1, y1, x2, y2, maxConf, classId));
            }
        }
        return boxes;
    }

    /**
     * NMS非极大值抑制(按类别过滤)
     */
    private List<YoloBox> yoloNMS(List<YoloBox> allBoxes, float confThresh, float iouThresh) {
        // 过滤低置信度
        List<YoloBox> validBoxes = allBoxes.stream().filter(b -> b.confidence > confThresh).toList();
        // 按置信度降序
        validBoxes.sort((a, b) -> Float.compare(b.confidence, a.confidence));
        // NMS核心逻辑
        List<YoloBox> result = new ArrayList<>();
        while (!validBoxes.isEmpty()) {
            YoloBox bestBox = validBoxes.get(0);
            result.add(bestBox);
            List<YoloBox> remaining = new ArrayList<>();
            for (int i = 1; i < validBoxes.size(); i++) {
                YoloBox currBox = validBoxes.get(i);
                if (currBox.classId != bestBox.classId || calculateIOU(bestBox, currBox) < iouThresh) {
                    remaining.add(currBox);
                }
            }
            validBoxes = remaining;
        }
        return result;
    }

    /**
     * 计算IOU
     */
    private float calculateIOU(YoloBox a, YoloBox b) {
        float interX1 = Math.max(a.x1, b.x1);
        float interY1 = Math.max(a.y1, b.y1);
        float interX2 = Math.min(a.x2, b.x2);
        float interY2 = Math.min(a.y2, b.y2);
        float interArea = Math.max(0, interX2 - interX1) * Math.max(0, interY2 - interY1);
        if (interArea == 0) return 0;
        float areaA = (a.x2 - a.x1) * (a.y2 - a.y1);
        float areaB = (b.x2 - b.x1) * (b.y2 - b.y1);
        return interArea / (areaA + areaB - interArea);
    }

    // 检测框实体类
    @lombok.Data
    public static class YoloBox {
        private float x1;
        private float y1;
        private float x2;
        private float y2;
        private float confidence;
        private int classId;

        public YoloBox(float x1, float y1, float x2, float y2, float confidence, int classId) {
            this.x1 = x1;
            this.y1 = y1;
            this.x2 = x2;
            this.y2 = y2;
            this.confidence = confidence;
            this.classId = classId;
        }
    }
}
3. 业务服务层:异步处理+接口封装

提供同步/异步两种接口,满足不同业务场景:

package com.ai.yolo.controller;

import com.ai.yolo.service.YoloInferService;
import com.alibaba.fastjson2.JSONObject;
import lombok.extern.slf4j.Slf4j;
import org.opencv.core.Mat;
import org.opencv.core.MatOfByte;
import org.opencv.imgcodecs.Imgcodecs;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.scheduling.annotation.Async;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;

import java.util.List;
import java.util.concurrent.CompletableFuture;

/**
 * YOLO检测API接口层
 */
@Slf4j
@RestController
public class YoloDetectionController {
    @Autowired
    private YoloInferService yoloInferService;

    /**
     * 同步检测接口:适合小图片、低并发场景
     */
    @PostMapping("/api/yolo/detect/sync")
    public JSONObject detectSync(@RequestParam("file") MultipartFile file) {
        JSONObject result = new JSONObject();
        try {
            // 1. 解析图片
            byte[] bytes = file.getBytes();
            MatOfByte matOfByte = new MatOfByte(bytes);
            Mat imageMat = Imgcodecs.imdecode(matOfByte, Imgcodecs.IMREAD_COLOR);
            if (imageMat.empty()) {
                result.put("code", 500);
                result.put("msg", "图片解析失败");
                return result;
            }
            // 2. 模型推理
            List<YoloInferService.YoloBox> boxes = yoloInferService.infer(imageMat);
            // 3. 封装响应
            result.put("code", 200);
            result.put("msg", "success");
            result.put("data", boxes);
            return result;
        } catch (Exception e) {
            log.error("同步检测失败", e);
            result.put("code", 500);
            result.put("msg", "检测失败:" + e.getMessage());
            return result;
        }
    }

    /**
     * 异步检测接口:适合大图片、高并发场景
     */
    @Async("yoloExecutor")
    @PostMapping("/api/yolo/detect/async")
    public CompletableFuture<JSONObject> detectAsync(@RequestParam("file") MultipartFile file) {
        JSONObject result = new JSONObject();
        try {
            byte[] bytes = file.getBytes();
            MatOfByte matOfByte = new MatOfByte(bytes);
            Mat imageMat = Imgcodecs.imdecode(matOfByte, Imgcodecs.IMREAD_COLOR);
            if (imageMat.empty()) {
                result.put("code", 500);
                result.put("msg", "图片解析失败");
                return CompletableFuture.completedFuture(result);
            }
            List<YoloInferService.YoloBox> boxes = yoloInferService.infer(imageMat);
            result.put("code", 200);
            result.put("msg", "success");
            result.put("data", boxes);
            return CompletableFuture.completedFuture(result);
        } catch (Exception e) {
            log.error("异步检测失败", e);
            result.put("code", 500);
            result.put("msg", "检测失败:" + e.getMessage());
            return CompletableFuture.completedFuture(result);
        }
    }
}
4. 启动类+配置
package com.ai.yolo;

import org.opencv.core.Core;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

import javax.annotation.PostConstruct;

@SpringBootApplication
public class YoloSpringBootApplication {
    /**
     * 加载OpenCV库(必须)
     */
    @PostConstruct
    public void initOpenCV() {
        System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
        System.out.println("OpenCV库加载成功,版本:" + Core.VERSION);
    }

    public static void main(String[] args) {
        SpringApplication.run(YoloSpringBootApplication.class, args);
    }
}

四、可扩展优化(工业级必备)

1. 批量检测支持

修改推理逻辑,支持批量图片输入(需提前导出batch>1的ONNX模型):

// 批量张量转换
public OrtTensor batchImagesToTensor(List<Mat> imageList) throws Exception {
    int batchSize = imageList.size();
    float[] tensorData = new float[batchSize * 3 * INPUT_SIZE * INPUT_SIZE];
    int batchIndex = 0;
    for (Mat mat : imageList) {
        // 单张图片转换逻辑(同上),填充到batch维度
        // ...
        batchIndex++;
    }
    long[] shape = new long[]{batchSize, 3, INPUT_SIZE, INPUT_SIZE};
    FloatBuffer buffer = FloatBuffer.wrap(tensorData);
    return OrtTensor.createTensor(modelManager.getEnv(), buffer, shape);
}
2. 缓存优化

对高频检测的图片(如固定场景的监控图)做结果缓存,减少重复推理:

// 引入Spring Cache
@EnableCaching
@Configuration
public class CacheConfig {
    @Bean
    public CacheManager cacheManager() {
        RedisCacheManager cacheManager = RedisCacheManager.builder(RedisConnectionFactory)
                .cacheDefaults(RedisCacheConfiguration.defaultCacheConfig().entryTtl(Duration.ofMinutes(10)))
                .build();
        return cacheManager;
    }
}

// 在推理方法上加缓存
@Cacheable(value = "yolo_detect", key = "#imageMd5")
public List<YoloBox> inferWithCache(String imageMd5, Mat originalMat) {
    return infer(originalMat);
}
3. 横向扩展

结合Spring Cloud/Netflix Eureka,将微服务部署为集群,通过负载均衡分发请求:

  • 每个节点独立加载模型,避免多节点共享模型导致的线程安全问题;
  • 用Nginx做前端负载均衡,分发检测请求。
4. 监控告警

集成Spring Boot Actuator,监控模型推理耗时、线程池状态、内存使用:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

配置文件开启监控端点:

management:
  endpoints:
    web:
      exposure:
        include: health,metrics,threaddump
  metrics:
    tags:
      application: yolo-detect-service

五、测试验证

  1. 启动Spring Boot应用,确保模型加载成功;
  2. 用Postman调用接口:
    • 同步接口:POST http://localhost:8080/api/yolo/detect/sync,上传图片文件;
    • 异步接口:POST http://localhost:8080/api/yolo/detect/async,上传图片文件;
  3. 响应示例:
{
  "code": 200,
  "msg": "success",
  "data": [
    {
      "x1": 100.5,
      "y1": 80.2,
      "x2": 200.8,
      "y2": 180.5,
      "confidence": 0.95,
      "classId": 0
    }
  ]
}

总结

基于Spring Boot构建YOLO AI检测微服务,核心是“分层解耦+资源管控+可扩展设计”:

  1. 资源层:单例管理ONNX Runtime核心对象,线程池控制并发,避免资源泄漏;
  2. 推理层:封装YOLO核心逻辑,与业务解耦,方便模型版本替换;
  3. 接口层:提供同步/异步接口,满足不同并发场景;
  4. 扩展层:支持批量检测、缓存、集群部署,适配工业级生产需求。

这套架构既保证了模型推理的稳定性,又具备良好的扩展性,可直接落地到实际项目中。

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐