YOLO+Java+Spring Boot：从0到1构建高可用、可扩展的AI检测微服务（附完整源码）

资源层：单例管理ONNX Runtime核心对象，线程池控制并发，避免资源泄漏；推理层：封装YOLO核心逻辑，与业务解耦，方便模型版本替换；接口层：提供同步/异步接口，满足不同并发场景；扩展层：支持批量检测、缓存、集群部署，适配工业级生产需求。这套架构既保证了模型推理的稳定性，又具备良好的扩展性，可直接落地到实际项目中。

Java程序员威哥

41人浏览 · 2026-02-01 15:26:45

Java程序员威哥 · 2026-02-01 15:26:45 发布

在实际生产场景中，单纯把YOLO模型跑通Java还不够——业务需要的是“可部署、可扩展、高可用”的微服务：支持HTTP接口调用、批量检测、异步处理、资源隔离，甚至横向扩展。

本文基于Spring Boot框架，整合YOLO模型（以YOLOv8为例），从架构设计、核心代码实现到性能优化，手把手教你构建一套工业级的AI检测微服务，解决“模型能跑但用不起来”的痛点。

一、整体架构设计（可扩展的核心）

先明确微服务的核心诉求：低耦合、高可用、易扩展，整体架构分为4层，避免“一把梭”式的代码堆砌：

API网关层：提供RESTful接口，处理请求参数校验、响应封装、异常统一处理；
业务服务层：核心是异步处理+任务分发，支持同步/异步检测、批量任务，避免请求阻塞；
模型推理层：封装YOLO推理逻辑，与业务解耦，方便替换模型版本；
资源管理层：单例管理ONNX Runtime核心对象，线程池控制推理并发，避免资源泄漏。

二、环境准备（避坑前置）

1. 核心依赖（pom.xml）

整合Spring Boot+ONNX Runtime+OpenCV，版本精准匹配是关键：

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.7.15</version> <!-- 稳定版，避坑最新版兼容性问题 -->
        <relativePath/>
    </parent>

    <groupId>com.ai</groupId>
    <artifactId>yolo-spring-boot-demo</artifactId>
    <version>1.0.0</version>

    <dependencies>
        <!-- Spring Boot核心 -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-async</artifactId> <!-- 异步处理 -->
        </dependency>

        <!-- ONNX Runtime Java（核心：模型推理） -->
        <dependency>
            <groupId>com.microsoft.onnxruntime</groupId>
            <artifactId>onnxruntime</artifactId>
            <version>1.14.1</version>
            <classifier>win-x86_64</classifier> <!-- Windows x86_64，Linux换linux-x86_64 -->
        </dependency>

        <!-- OpenCV（图片处理） -->
        <dependency>
            <groupId>org.openpnp</groupId>
            <artifactId>opencv</artifactId>
            <version>4.7.0-0</version> <!-- 适配ONNX Runtime -->
        </dependency>

        <!-- 工具类 -->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson2</artifactId>
            <version>2.0.32</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <excludes>
                        <exclude>
                            <groupId>org.projectlombok</groupId>
                            <artifactId>lombok</artifactId>
                        </exclude>
                    </excludes>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

2. 模型准备

提前将YOLOv8模型导出为ONNX格式（避坑要点回顾）：

from ultralytics import YOLO
import torch

model = YOLO("yolov8n.pt")
model.eval()
dummy_input = torch.randn(1, 3, 640, 640)
model.export(
    format="onnx",
    imgsz=640,
    batch=1,
    opset=12,
    simplify=True,
    device="cpu",
    nms=False
)

将导出的yolov8n.onnx放到项目resources/models目录下。

三、核心代码实现（分层设计）

1. 资源管理层：模型单例+线程池配置

核心避坑：OrtEnvironment和OrtSession不能频繁创建，必须单例；推理任务需用线程池控制并发，避免CPU打满。

（1）模型单例管理

package com.ai.yolo.manager;

import ai.onnxruntime.OrtEnvironment;
import ai.onnxruntime.OrtSession;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;
import java.io.File;

/**
 * YOLO模型管理器：单例管理ONNX Runtime核心对象
 */
@Slf4j
@Component
public class YoloModelManager {
    // 模型路径（配置文件读取）
    @Value("${yolo.model.path:classpath:models/yolov8n.onnx}")
    private String modelPath;

    // ONNX Runtime核心对象（全局单例）
    private OrtEnvironment env;
    private OrtSession session;

    /**
     * 初始化：项目启动时加载模型
     */
    @PostConstruct
    public void initModel() {
        try {
            // 1. 创建环境
            env = OrtEnvironment.getEnvironment();
            // 2. 构建会话配置（性能优化+日志关闭）
            OrtSession.SessionOptions options = new OrtSession.SessionOptions();
            options.setLogSeverityLevel(OrtSession.SessionOptions.LogLevel.ORT_LOG_LEVEL_ERROR);
            options.setIntraOpNumThreads(Runtime.getRuntime().availableProcessors() / 2); // 线程数=CPU核心数/2，避免资源抢占
            // 3. 加载模型
            File modelFile = new File(this.getClass().getClassLoader().getResource("models/yolov8n.onnx").getPath());
            session = env.createSession(modelFile.getAbsolutePath(), options);
            log.info("YOLO模型加载成功，输入节点：{}", session.getInputNames());
        } catch (Exception e) {
            log.error("YOLO模型加载失败", e);
            throw new RuntimeException("模型初始化失败", e);
        }
    }

    /**
     * 销毁：项目关闭时释放资源
     */
    @PreDestroy
    public void destroyModel() {
        if (session != null) {
            session.close();
            log.info("OrtSession已关闭");
        }
        if (env != null) {
            env.close();
            log.info("OrtEnvironment已关闭");
        }
    }

    // 获取会话（对外提供）
    public OrtSession getSession() {
        return session;
    }

    // 获取环境（对外提供）
    public OrtEnvironment getEnv() {
        return env;
    }
}

（2）线程池配置（异步推理）

package com.ai.yolo.config;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableAsync;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;

import java.util.concurrent.Executor;
import java.util.concurrent.ThreadPoolExecutor;

/**
 * 异步线程池配置：控制推理任务并发
 */
@Configuration
@EnableAsync
public class AsyncConfig {

    @Bean("yoloExecutor")
    public Executor yoloExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        // 核心线程数=CPU核心数（根据服务器配置调整）
        executor.setCorePoolSize(Runtime.getRuntime().availableProcessors());
        // 最大线程数
        executor.setMaxPoolSize(Runtime.getRuntime().availableProcessors() * 2);
        // 队列容量
        executor.setQueueCapacity(100);
        // 线程前缀
        executor.setThreadNamePrefix("yolo-infer-");
        // 拒绝策略：队列满时抛异常（或自定义降级）
        executor.setRejectedExecutionHandler(new ThreadPoolExecutor.AbortPolicy());
        // 初始化
        executor.initialize();
        return executor;
    }
}

2. 模型推理层：YOLO核心推理逻辑

封装推理、张量转换、NMS逻辑，与业务解耦：

package com.ai.yolo.service;

import ai.onnxruntime.OrtTensor;
import ai.onnxruntime.OrtSession;
import com.ai.yolo.manager.YoloModelManager;
import lombok.extern.slf4j.Slf4j;
import org.opencv.core.Mat;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.nio.FloatBuffer;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
 * YOLO推理核心服务：封装模型推理逻辑
 */
@Slf4j
@Service
public class YoloInferService {
    @Autowired
    private YoloModelManager modelManager;

    // YOLOv8参数配置
    private static final int INPUT_SIZE = 640;
    private static final float CONF_THRESH = 0.25f;
    private static final float IOU_THRESH = 0.45f;
    private static final int NUM_CLASSES = 80; // COCO数据集80类

    /**
     * 图片推理核心方法
     * @param originalMat 原始图片Mat
     * @return 检测结果
     */
    public List<YoloBox> infer(Mat originalMat) {
        try {
            // 1. 图片转输入张量
            OrtTensor inputTensor = imageToTensor(originalMat);
            // 2. 模型推理
            OrtSession.Result result = modelManager.getSession().run(Map.of("images", inputTensor));
            // 3. 解析输出张量
            float[] outputArray = ((float[][][]) result.get(0).getValue())[0]; // YOLOv8输出：84×8400 → 展平为一维
            // 4. 解析检测框
            List<YoloBox> allBoxes = parseOutput(outputArray, originalMat.width(), originalMat.height());
            // 5. NMS过滤
            return yoloNMS(allBoxes, CONF_THRESH, IOU_THRESH);
        } catch (Exception e) {
            log.error("YOLO推理失败", e);
            throw new RuntimeException("推理失败", e);
        }
    }

    /**
     * 图片转YOLO输入张量（NCHW + 归一化 + RGB转换）
     */
    private OrtTensor imageToTensor(Mat originalMat) throws Exception {
        Mat resizedMat = new Mat();
        org.opencv.imgproc.Imgproc.resize(originalMat, resizedMat, new org.opencv.core.Size(INPUT_SIZE, INPUT_SIZE));
        // BGR→RGB
        Mat rgbMat = new Mat();
        org.opencv.imgproc.Imgproc.cvtColor(resizedMat, rgbMat, org.opencv.imgproc.Imgproc.COLOR_BGR2RGB);
        // 归一化到0-1，转float32
        rgbMat.convertTo(rgbMat, org.opencv.core.CvType.CV_32FC3, 1.0 / 255.0);
        // NHWC→NCHW
        float[] tensorData = new float[1 * 3 * INPUT_SIZE * INPUT_SIZE];
        int index = 0;
        for (int c = 0; c < 3; c++) {
            for (int h = 0; h < INPUT_SIZE; h++) {
                for (int w = 0; w < INPUT_SIZE; w++) {
                    tensorData[index++] = (float) rgbMat.get(h, w)[c];
                }
            }
        }
        // 创建张量
        long[] shape = new long[]{1, 3, INPUT_SIZE, INPUT_SIZE};
        FloatBuffer buffer = FloatBuffer.wrap(tensorData);
        return OrtTensor.createTensor(modelManager.getEnv(), buffer, shape);
    }

    /**
     * 解析YOLO输出为检测框（还原原始图片坐标）
     */
    private List<YoloBox> parseOutput(float[] outputArray, int originalW, int originalH) {
        List<YoloBox> boxes = new ArrayList<>();
        int elementsPerBox = 4 + NUM_CLASSES; // 4个坐标 + 80类置信度
        int numBoxes = outputArray.length / elementsPerBox;

        for (int i = 0; i < numBoxes; i++) {
            int baseIdx = i * elementsPerBox;
            // 解析归一化坐标（x,y,w,h → x1,y1,x2,y2）
            float cx = outputArray[baseIdx];
            float cy = outputArray[baseIdx + 1];
            float w = outputArray[baseIdx + 2];
            float h = outputArray[baseIdx + 3];
            // 转换为左上角、右下角
            float x1 = (cx - w / 2) / INPUT_SIZE * originalW;
            float y1 = (cy - h / 2) / INPUT_SIZE * originalH;
            float x2 = (cx + w / 2) / INPUT_SIZE * originalW;
            float y2 = (cy + h / 2) / INPUT_SIZE * originalH;

            // 找最大置信度的类别
            float maxConf = 0;
            int classId = -1;
            for (int c = 0; c < NUM_CLASSES; c++) {
                float conf = outputArray[baseIdx + 4 + c];
                if (conf > maxConf) {
                    maxConf = conf;
                    classId = c;
                }
            }

            if (maxConf > CONF_THRESH) {
                boxes.add(new YoloBox(x1, y1, x2, y2, maxConf, classId));
            }
        }
        return boxes;
    }

    /**
     * NMS非极大值抑制（按类别过滤）
     */
    private List<YoloBox> yoloNMS(List<YoloBox> allBoxes, float confThresh, float iouThresh) {
        // 过滤低置信度
        List<YoloBox> validBoxes = allBoxes.stream().filter(b -> b.confidence > confThresh).toList();
        // 按置信度降序
        validBoxes.sort((a, b) -> Float.compare(b.confidence, a.confidence));
        // NMS核心逻辑
        List<YoloBox> result = new ArrayList<>();
        while (!validBoxes.isEmpty()) {
            YoloBox bestBox = validBoxes.get(0);
            result.add(bestBox);
            List<YoloBox> remaining = new ArrayList<>();
            for (int i = 1; i < validBoxes.size(); i++) {
                YoloBox currBox = validBoxes.get(i);
                if (currBox.classId != bestBox.classId || calculateIOU(bestBox, currBox) < iouThresh) {
                    remaining.add(currBox);
                }
            }
            validBoxes = remaining;
        }
        return result;
    }

    /**
     * 计算IOU
     */
    private float calculateIOU(YoloBox a, YoloBox b) {
        float interX1 = Math.max(a.x1, b.x1);
        float interY1 = Math.max(a.y1, b.y1);
        float interX2 = Math.min(a.x2, b.x2);
        float interY2 = Math.min(a.y2, b.y2);
        float interArea = Math.max(0, interX2 - interX1) * Math.max(0, interY2 - interY1);
        if (interArea == 0) return 0;
        float areaA = (a.x2 - a.x1) * (a.y2 - a.y1);
        float areaB = (b.x2 - b.x1) * (b.y2 - b.y1);
        return interArea / (areaA + areaB - interArea);
    }

    // 检测框实体类
    @lombok.Data
    public static class YoloBox {
        private float x1;
        private float y1;
        private float x2;
        private float y2;
        private float confidence;
        private int classId;

        public YoloBox(float x1, float y1, float x2, float y2, float confidence, int classId) {
            this.x1 = x1;
            this.y1 = y1;
            this.x2 = x2;
            this.y2 = y2;
            this.confidence = confidence;
            this.classId = classId;
        }
    }
}

3. 业务服务层：异步处理+接口封装

提供同步/异步两种接口，满足不同业务场景：

package com.ai.yolo.controller;

import com.ai.yolo.service.YoloInferService;
import com.alibaba.fastjson2.JSONObject;
import lombok.extern.slf4j.Slf4j;
import org.opencv.core.Mat;
import org.opencv.core.MatOfByte;
import org.opencv.imgcodecs.Imgcodecs;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.scheduling.annotation.Async;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;

import java.util.List;
import java.util.concurrent.CompletableFuture;

/**
 * YOLO检测API接口层
 */
@Slf4j
@RestController
public class YoloDetectionController {
    @Autowired
    private YoloInferService yoloInferService;

    /**
     * 同步检测接口：适合小图片、低并发场景
     */
    @PostMapping("/api/yolo/detect/sync")
    public JSONObject detectSync(@RequestParam("file") MultipartFile file) {
        JSONObject result = new JSONObject();
        try {
            // 1. 解析图片
            byte[] bytes = file.getBytes();
            MatOfByte matOfByte = new MatOfByte(bytes);
            Mat imageMat = Imgcodecs.imdecode(matOfByte, Imgcodecs.IMREAD_COLOR);
            if (imageMat.empty()) {
                result.put("code", 500);
                result.put("msg", "图片解析失败");
                return result;
            }
            // 2. 模型推理
            List<YoloInferService.YoloBox> boxes = yoloInferService.infer(imageMat);
            // 3. 封装响应
            result.put("code", 200);
            result.put("msg", "success");
            result.put("data", boxes);
            return result;
        } catch (Exception e) {
            log.error("同步检测失败", e);
            result.put("code", 500);
            result.put("msg", "检测失败：" + e.getMessage());
            return result;
        }
    }

    /**
     * 异步检测接口：适合大图片、高并发场景
     */
    @Async("yoloExecutor")
    @PostMapping("/api/yolo/detect/async")
    public CompletableFuture<JSONObject> detectAsync(@RequestParam("file") MultipartFile file) {
        JSONObject result = new JSONObject();
        try {
            byte[] bytes = file.getBytes();
            MatOfByte matOfByte = new MatOfByte(bytes);
            Mat imageMat = Imgcodecs.imdecode(matOfByte, Imgcodecs.IMREAD_COLOR);
            if (imageMat.empty()) {
                result.put("code", 500);
                result.put("msg", "图片解析失败");
                return CompletableFuture.completedFuture(result);
            }
            List<YoloInferService.YoloBox> boxes = yoloInferService.infer(imageMat);
            result.put("code", 200);
            result.put("msg", "success");
            result.put("data", boxes);
            return CompletableFuture.completedFuture(result);
        } catch (Exception e) {
            log.error("异步检测失败", e);
            result.put("code", 500);
            result.put("msg", "检测失败：" + e.getMessage());
            return CompletableFuture.completedFuture(result);
        }
    }
}

4. 启动类+配置

package com.ai.yolo;

import org.opencv.core.Core;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

import javax.annotation.PostConstruct;

@SpringBootApplication
public class YoloSpringBootApplication {
    /**
     * 加载OpenCV库（必须）
     */
    @PostConstruct
    public void initOpenCV() {
        System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
        System.out.println("OpenCV库加载成功，版本：" + Core.VERSION);
    }

    public static void main(String[] args) {
        SpringApplication.run(YoloSpringBootApplication.class, args);
    }
}

四、可扩展优化（工业级必备）

1. 批量检测支持

修改推理逻辑，支持批量图片输入（需提前导出batch>1的ONNX模型）：

// 批量张量转换
public OrtTensor batchImagesToTensor(List<Mat> imageList) throws Exception {
    int batchSize = imageList.size();
    float[] tensorData = new float[batchSize * 3 * INPUT_SIZE * INPUT_SIZE];
    int batchIndex = 0;
    for (Mat mat : imageList) {
        // 单张图片转换逻辑（同上），填充到batch维度
        // ...
        batchIndex++;
    }
    long[] shape = new long[]{batchSize, 3, INPUT_SIZE, INPUT_SIZE};
    FloatBuffer buffer = FloatBuffer.wrap(tensorData);
    return OrtTensor.createTensor(modelManager.getEnv(), buffer, shape);
}

2. 缓存优化

对高频检测的图片（如固定场景的监控图）做结果缓存，减少重复推理：

// 引入Spring Cache
@EnableCaching
@Configuration
public class CacheConfig {
    @Bean
    public CacheManager cacheManager() {
        RedisCacheManager cacheManager = RedisCacheManager.builder(RedisConnectionFactory)
                .cacheDefaults(RedisCacheConfiguration.defaultCacheConfig().entryTtl(Duration.ofMinutes(10)))
                .build();
        return cacheManager;
    }
}

// 在推理方法上加缓存
@Cacheable(value = "yolo_detect", key = "#imageMd5")
public List<YoloBox> inferWithCache(String imageMd5, Mat originalMat) {
    return infer(originalMat);
}

3. 横向扩展

结合Spring Cloud/Netflix Eureka，将微服务部署为集群，通过负载均衡分发请求：

每个节点独立加载模型，避免多节点共享模型导致的线程安全问题；
用Nginx做前端负载均衡，分发检测请求。

4. 监控告警

集成Spring Boot Actuator，监控模型推理耗时、线程池状态、内存使用：

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

配置文件开启监控端点：

management:
  endpoints:
    web:
      exposure:
        include: health,metrics,threaddump
  metrics:
    tags:
      application: yolo-detect-service

五、测试验证

启动Spring Boot应用，确保模型加载成功；
用Postman调用接口：
- 同步接口：POST http://localhost:8080/api/yolo/detect/sync，上传图片文件；
- 异步接口：POST http://localhost:8080/api/yolo/detect/async，上传图片文件；
响应示例：

{
  "code": 200,
  "msg": "success",
  "data": [
    {
      "x1": 100.5,
      "y1": 80.2,
      "x2": 200.8,
      "y2": 180.5,
      "confidence": 0.95,
      "classId": 0
    }
  ]
}

总结

基于Spring Boot构建YOLO AI检测微服务，核心是“分层解耦+资源管控+可扩展设计”：

资源层：单例管理ONNX Runtime核心对象，线程池控制并发，避免资源泄漏；
推理层：封装YOLO核心逻辑，与业务解耦，方便模型版本替换；
接口层：提供同步/异步接口，满足不同并发场景；
扩展层：支持批量检测、缓存、集群部署，适配工业级生产需求。

这套架构既保证了模型推理的稳定性，又具备良好的扩展性，可直接落地到实际项目中。

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

告别繁琐！MarkItDown 如何革新你的文档转 Markdown 工作流

是一个由微软开发的轻量级Python实用程序，旨在将各种文件和办公文档转换为Markdown格式。它不仅仅是一个简单的文本提取工具，更专注于在转换过程中保留文档的关键结构，如标题、列表、表格和链接等。凭借高达86,098的GitHub星标，以及对AutoGenLangChain等热门AI框架的支持，MarkItDown无疑是LLM时代文档预处理的明星项目。它与传统的文本提取工具（如textract