基于Spring Boot与DeepSeek V3.2 API的智能文章摘要生成工具

1. 项目概述与背景

在信息爆炸的时代,我们每天都会接触到大量的网络文章。如何快速获取文章的核心内容并结构化存储成为了一个重要的技术挑战。本文将介绍如何利用Spring Boot框架和DeepSeek V3.2 API构建一个智能工具,能够自动抓取网页文章、生成高质量摘要,并以标准JSON格式返回结果。

DeepSeek V3.2是深度求索公司于2025年9月发布的最新实验性模型,它引入了革命性的DeepSeek Sparse Attention(DSA)技术,在长文本处理效率上有了显著提升。同时,API价格下调超过50%,使得开发者能够以更低的成本构建强大的AI应用。
在这里插入图片描述

1.1 工具核心功能

  • 网页爬虫:自动抓取指定URL的网页并提取正文内容
  • 智能摘要:利用DeepSeek V3.2生成高质量文章摘要
  • JSON标准化:确保返回数据格式统一且结构清晰
  • 格式验证与重试:当输出格式不符合要求时自动重新生成
  • 本地化部署:支持在本地环境中完整运行

2. 技术架构设计

2.1 系统架构概述

本工具采用分层架构设计,主要包括表示层、业务逻辑层和数据访问层。表示层提供RESTful API接口;业务逻辑层处理爬虫、AI调用和数据处理;数据访问层负责与DeepSeek API交互和本地数据存储。

2.2 核心技术选型

  • Spring Boot 3.0:作为基础框架,提供依赖注入和自动配置
  • Jsoup:用于网页解析和内容提取
  • Spring WebClient:用于与DeepSeek API进行HTTP通信
  • Jackson:处理JSON序列化和反序列化
  • Validation API:实现参数校验
  • DeepSeek V3.2 API:提供文章摘要生成能力

2.3 数据流程设计

工具的数据流程如下:用户输入URL → 爬虫抓取网页 → 提取并清洗正文 → 调用DeepSeek API → 生成摘要 → 验证JSON格式 → 返回结果(如格式不符则重试)。

3. 环境准备与依赖配置

3.1 开发环境要求

在开始项目前,需要准备以下开发环境:

  • Java 17或更高版本
  • Maven 3.6+或Gradle 7.x
  • IntelliJ IDEA或Eclipse IDE
  • DeepSeek API账户(获取API密钥)

3.2 Maven依赖配置

在pom.xml中添加以下依赖:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
         http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    
    <groupId>com.example</groupId>
    <artifactId>deepseek-summarizer</artifactId>
    <version>1.0.0</version>
    <packaging>jar</packaging>
    
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.2.0</version>
        <relativePath/>
    </parent>
    
    <properties>
        <maven.compiler.source>17</maven.compiler.source>
        <maven.compiler.target>17</maven.compiler.target>
        <jsoup.version>1.17.1</jsoup.version>
    </properties>
    
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-validation</artifactId>
        </dependency>
        
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>${jsoup.version}</version>
        </dependency>
        
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>
    
    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

3.3 配置文件设置

在application.yml中配置应用参数和DeepSeek API设置:

server:
  port: 8080
  servlet:
    context-path: /api

deepseek:
  api:
    key: ${DEEPSEEK_API_KEY:your_default_api_key_here}
    url: https://api.deepseek.com/chat/completions
    max-retries: 3
    timeout: 30000
    
app:
  crawler:
    timeout: 10000
    user-agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
  summary:
    max-length: 500
    temperature: 0.3
    
logging:
  level:
    com.example.deepseek: DEBUG
  pattern:
    console: "%d{yyyy-MM-dd HH:mm:ss} - %msg%n"

3.4 DeepSeek API配置

要使用DeepSeek V3.2 API,首先需要前往DeepSeek官方平台申请API密钥。DeepSeek V3.2提供了128K的上下文长度,支持思考模式和非思考模式两种推理方式。

下表对比了DeepSeek V3.2的不同配置选项:

表:DeepSeek V3.2 API配置参数

参数 类型 默认值 说明
model String deepseek-chat 使用的模型版本
temperature Float 0.3 生成文本的随机性,越低越确定
max_tokens Integer 1000 生成摘要的最大token数
top_p Float 0.9 核采样参数,控制生成多样性
stream Boolean false 是否流式输出

4. 核心功能实现

4.1 网页爬虫模块

爬虫模块负责从给定的URL抓取网页并提取正文内容。我们使用Jsoup库实现这一功能,它能够高效解析HTML并提取所需内容。

package com.example.deepseek.crawler;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

@Component
public class WebCrawlerService {
    
    private static final Logger logger = LoggerFactory.getLogger(WebCrawlerService.class);
    
    @Value("${app.crawler.timeout:10000}")
    private int timeout;
    
    @Value("${app.crawler.user-agent}")
    private String userAgent;
    
    public CrawledContent crawlWebPage(String url) {
        if (url == null || url.trim().isEmpty()) {
            throw new IllegalArgumentException("URL cannot be null or empty");
        }
        
        try {
            logger.info("Crawling URL: {}", url);
            
            // 使用JSoup连接并获取HTML文档
            Document doc = Jsoup.connect(url)
                    .userAgent(userAgent)
                    .timeout(timeout)
                    .get();
            
            // 提取页面标题
            String title = extractTitle(doc);
            
            // 提取正文内容
            String content = extractMainContent(doc);
            
            // 提取元数据
            String description = extractMetaDescription(doc);
            String author = extractAuthor(doc);
            
            logger.debug("Successfully crawled page: {}, title: {}, content length: {}", 
                        url, title, content.length());
            
            return new CrawledContent(title, content, description, author, url);
            
        } catch (IOException e) {
            logger.error("Error crawling URL: {}", url, e);
            throw new RuntimeException("Failed to crawl URL: " + url, e);
        }
    }
    
    private String extractTitle(Document doc) {
        String title = doc.title();
        return title != null ? title.trim() : "";
    }
    
    private String extractMainContent(Document doc) {
        // 策略1:尝试查找article标签
        Elements articleElements = doc.select("article");
        if (!articleElements.isEmpty()) {
            return articleElements.first().text();
        }
        
        // 策略2:尝试查找main标签
        Elements mainElements = doc.select("main");
        if (!mainElements.isEmpty()) {
            return mainElements.first().text();
        }
        
        // 策略3:查找包含大量文本的div
        List<Element> contentCandidates = new ArrayList<>();
        Elements divElements = doc.select("div");
        
        for (Element div : divElements) {
            int textLength = div.text().length();
            int pCount = div.select("p").size();
            
            // 如果包含多个段落或较长文本,认为是主要内容
            if (pCount >= 3 || textLength > 500) {
                contentCandidates.add(div);
            }
        }
        
        // 选择文本最长的候选元素
        if (!contentCandidates.isEmpty()) {
            Element bestCandidate = contentCandidates.stream()
                    .max((e1, e2) -> Integer.compare(e1.text().length(), e2.text().length()))
                    .orElse(null);
            
            if (bestCandidate != null) {
                return bestCandidate.text();
            }
        }
        
        // 策略4:回退到body内容
        return doc.body().text();
    }
    
    private String extractMetaDescription(Document doc) {
        Element meta = doc.selectFirst("meta[name=description]");
        return meta != null ? meta.attr("content") : "";
    }
    
    private String extractAuthor(Document doc) {
        // 尝试多种可能的作者选择器
        String[] authorSelectors = {
            "meta[name=author]",
            ".author", 
            ".byline",
            "[class*=author]",
            "[class*=byline]"
        };
        
        for (String selector : authorSelectors) {
            Element element = doc.selectFirst(selector);
            if (element != null) {
                String author = element.tagName().equals("meta") ? 
                    element.attr("content") : element.text();
                if (!author.trim().isEmpty()) {
                    return author.trim();
                }
            }
        }
        
        return "";
    }
}

4.2 数据模型定义

定义用于存储爬取内容和摘要结果的数据模型:

package com.example.deepseek.model;

import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.annotation.JsonProperty;

import java.util.List;

@JsonInclude(JsonInclude.Include.NON_NULL)
public class CrawledContent {
    private final String title;
    private final String content;
    private final String description;
    private final String author;
    private final String url;
    
    public CrawledContent(String title, String content, String description, String author, String url) {
        this.title = title;
        this.content = content;
        this.description = description;
        this.author = author;
        this.url = url;
    }
    
    // Getters
    public String getTitle() { return title; }
    public String getContent() { return content; }
    public String getDescription() { return description; }
    public String getAuthor() { return author; }
    public String getUrl() { return url; }
}

@JsonInclude(JsonInclude.Include.NON_NULL)
public class ArticleSummary {
    @JsonProperty("title")
    private String title;
    
    @JsonProperty("author")
    private String author;
    
    @JsonProperty("publish_date")
    private String publishDate;
    
    @JsonProperty("summary")
    private String summary;
    
    @JsonProperty("key_points")
    private List<String> keyPoints;
    
    @JsonProperty("tags")
    private List<String> tags;
    
    @JsonProperty("sentiment")
    private String sentiment;
    
    @JsonProperty("read_time")
    private Integer readTime;
    
    // Constructors, getters and setters
    public ArticleSummary() {}
    
    public ArticleSummary(String title, String author, String summary, 
                         List<String> keyPoints, List<String> tags) {
        this.title = title;
        this.author = author;
        this.summary = summary;
        this.keyPoints = keyPoints;
        this.tags = tags;
    }
    
    // 省略getter和setter方法
}

@JsonInclude(JsonInclude.Include.NON_NULL)
public class DeepSeekApiRequest {
    private String model;
    private List<Message> messages;
    private double temperature;
    private int max_tokens;
    private boolean stream;
    
    public DeepSeekApiRequest() {}
    
    public DeepSeekApiRequest(String model, List<Message> messages, 
                             double temperature, int max_tokens) {
        this.model = model;
        this.messages = messages;
        this.temperature = temperature;
        this.max_tokens = max_tokens;
        this.stream = false;
    }
    
    // Static inner class for messages
    public static class Message {
        private String role;
        private String content;
        
        public Message() {}
        
        public Message(String role, String content) {
            this.role = role;
            this.content = content;
        }
        
        // Getters and setters
        public String getRole() { return role; }
        public void setRole(String role) { this.role = role; }
        public String getContent() { return content; }
        public void setContent(String content) { this.content = content; }
    }
    
    // Getters and setters
    public String getModel() { return model; }
    public void setModel(String model) { this.model = model; }
    public List<Message> getMessages() { return messages; }
    public void setMessages(List<Message> messages) { this.messages = messages; }
    public double getTemperature() { return temperature; }
    public void setTemperature(double temperature) { this.temperature = temperature; }
    public int getMax_tokens() { return max_tokens; }
    public void setMax_tokens(int max_tokens) { this.max_tokens = max_tokens; }
    public boolean isStream() { return stream; }
    public void setStream(boolean stream) { this.stream = stream; }
}

@JsonInclude(JsonInclude.Include.NON_NULL)
public class DeepSeekApiResponse {
    private String id;
    private String object;
    private long created;
    private String model;
    private List<Choice> choices;
    private Usage usage;
    
    public DeepSeekApiResponse() {}
    
    // Static inner classes
    public static class Choice {
        private int index;
        private Message message;
        private String finish_reason;
        
        // Getters and setters
        public int getIndex() { return index; }
        public void setIndex(int index) { this.index = index; }
        public Message getMessage() { return message; }
        public void setMessage(Message message) { this.message = message; }
        public String getFinish_reason() { return finish_reason; }
        public void setFinish_reason(String finish_reason) { this.finish_reason = finish_reason; }
    }
    
    public static class Message {
        private String role;
        private String content;
        
        // Getters and setters
        public String getRole() { return role; }
        public void setRole(String role) { this.role = role; }
        public String getContent() { return content; }
        public void setContent(String content) { this.content = content; }
    }
    
    public static class Usage {
        private int prompt_tokens;
        private int completion_tokens;
        private int total_tokens;
        
        // Getters and setters
        public int getPrompt_tokens() { return prompt_tokens; }
        public void setPrompt_tokens(int prompt_tokens) { this.prompt_tokens = prompt_tokens; }
        public int getCompletion_tokens() { return completion_tokens; }
        public void setCompletion_tokens(int completion_tokens) { this.completion_tokens = completion_tokens; }
        public int getTotal_tokens() { return total_tokens; }
        public void setTotal_tokens(int total_tokens) { this.total_tokens = total_tokens; }
    }
    
    // Getters and setters
    public String getId() { return id; }
    public void setId(String id) { this.id = id; }
    public String getObject() { return object; }
    public void setObject(String object) { this.object = object; }
    public long getCreated() { return created; }
    public void setCreated(long created) { this.created = created; }
    public String getModel() { return model; }
    public void setModel(String model) { this.model = model; }
    public List<Choice> getChoices() { return choices; }
    public void setChoices(List<Choice> choices) { this.choices = choices; }
    public Usage getUsage() { return usage; }
    public void setUsage(Usage usage) { this.usage = usage; }
}

4.3 DeepSeek API客户端

实现与DeepSeek V3.2 API交互的客户端:

package com.example.deepseek.client;

import com.example.deepseek.model.DeepSeekApiRequest;
import com.example.deepseek.model.DeepSeekApiResponse;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.http.HttpHeaders;
import org.springframework.http.MediaType;
import org.springframework.stereotype.Component;
import org.springframework.web.reactive.function.client.WebClient;
import org.springframework.web.reactive.function.client.WebClientResponseException;
import reactor.core.publisher.Mono;

import java.util.Collections;

@Component
public class DeepSeekApiClient {
    
    private static final Logger logger = LoggerFactory.getLogger(DeepSeekApiClient.class);
    
    private final WebClient webClient;
    private final ObjectMapper objectMapper;
    
    @Value("${deepseek.api.key}")
    private String apiKey;
    
    @Value("${deepseek.api.url}")
    private String apiUrl;
    
    @Value("${deepseek.api.timeout:30000}")
    private int timeout;
    
    public DeepSeekApiClient(WebClient.Builder webClientBuilder, ObjectMapper objectMapper) {
        this.webClient = webClientBuilder
                .baseUrl(apiUrl)
                .defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
                .defaultHeader(HttpHeaders.AUTHORIZATION, "Bearer " + apiKey)
                .build();
        this.objectMapper = objectMapper;
    }
    
    public Mono<DeepSeekApiResponse> generateSummary(DeepSeekApiRequest request) {
        logger.debug("Sending request to DeepSeek API with model: {}", request.getModel());
        
        return webClient.post()
                .uri(apiUrl)
                .bodyValue(request)
                .retrieve()
                .bodyToMono(DeepSeekApiResponse.class)
                .timeout(java.time.Duration.ofMillis(timeout))
                .doOnSuccess(response -> {
                    logger.debug("Successfully received response from DeepSeek API");
                    if (response.getUsage() != null) {
                        logger.info("Token usage - Prompt: {}, Completion: {}, Total: {}", 
                                response.getUsage().getPrompt_tokens(),
                                response.getUsage().getCompletion_tokens(),
                                response.getUsage().getTotal_tokens());
                    }
                })
                .doOnError(error -> {
                    logger.error("Error calling DeepSeek API", error);
                    if (error instanceof WebClientResponseException) {
                        WebClientResponseException ex = (WebClientResponseException) error;
                        logger.error("API Response status: {}, body: {}", 
                                ex.getStatusCode(), ex.getResponseBodyAsString());
                    }
                })
                .retryWhen(org.springframework.web.reactive.function.client.ExchangeFilterFunction
                        .ofRetrier(org.springframework.web.reactive.function.client.DefaultClientRequestObservationConvention
                                .DEFAULT_INSTANCE.createRetryContext()
                                .maxAttempts(3)
                                .backoff(org.springframework.web.reactive.function.client.RetryBackoffSpec
                                        .backoff(500, 2000).jitter(0.5))
                        ));
    }
    
    public String createSummaryPrompt(String title, String content, int maxLength) {
        return String.format(
            "请为以下文章生成一个结构化的摘要。\n\n" +
            "文章标题: %s\n\n" +
            "文章内容: %s\n\n" +
            "要求:\n" +
            "1. 生成一个简洁的摘要,长度不超过%d个字符\n" +
            "2. 提取3-5个关键要点\n" +
            "3. 识别文章的情感倾向(积极、消极或中性)\n" +
            "4. 估算阅读时间(以分钟计)\n" +
            "5. 提取3-5个相关标签\n\n" +
            "请以JSON格式返回结果,包含以下字段:\n" +
            "- title: 文章标题\n" +
            "- author: 作者(如果文中提及)\n" + 
            "- publish_date: 发布日期(如果文中提及)\n" +
            "- summary: 摘要内容\n" +
            "- key_points: 关键要点数组\n" +
            "- tags: 标签数组\n" +
            "- sentiment: 情感倾向\n" +
            "- read_time: 阅读时间(分钟)\n\n" +
            "请确保返回纯JSON格式,不要包含其他文本。",
            title, content.length() > 10000 ? content.substring(0, 10000) + "..." : content, 
            maxLength
        );
    }
}

4.4 摘要服务与格式验证

实现核心的摘要生成服务,包括格式验证和重试机制:

package com.example.deepseek.service;

import com.example.deepseek.client.DeepSeekApiClient;
import com.example.deepseek.model.*;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import reactor.core.publisher.Mono;

import java.util.Arrays;
import java.util.List;

@Service
public class ArticleSummaryService {
    
    private static final Logger logger = LoggerFactory.getLogger(ArticleSummaryService.class);
    
    private final DeepSeekApiClient apiClient;
    private final ObjectMapper objectMapper;
    
    @Value("${app.summary.max-length:500}")
    private int maxSummaryLength;
    
    @Value("${app.summary.temperature:0.3}")
    private double temperature;
    
    @Value("${deepseek.api.max-retries:3}")
    private int maxRetries;
    
    public ArticleSummaryService(DeepSeekApiClient apiClient, ObjectMapper objectMapper) {
        this.apiClient = apiClient;
        this.objectMapper = objectMapper;
    }
    
    public Mono<ArticleSummary> generateSummary(CrawledContent content) {
        return generateSummaryWithRetry(content, maxRetries);
    }
    
    private Mono<ArticleSummary> generateSummaryWithRetry(CrawledContent content, int retriesLeft) {
        String prompt = apiClient.createSummaryPrompt(
            content.getTitle(), 
            content.getContent(), 
            maxSummaryLength
        );
        
        DeepSeekApiRequest request = new DeepSeekApiRequest();
        request.setModel("deepseek-chat");
        request.setMessages(Arrays.asList(
            new DeepSeekApiRequest.Message("system", "你是一个专业的文章摘要生成器,总是返回有效的JSON格式。"),
            new DeepSeekApiRequest.Message("user", prompt)
        ));
        request.setTemperature(temperature);
        request.setMax_tokens(1000);
        
        return apiClient.generateSummary(request)
                .flatMap(response -> {
                    if (response.getChoices() != null && !response.getChoices().isEmpty()) {
                        String summaryText = response.getChoices().get(0).getMessage().getContent();
                        return parseAndValidateSummary(summaryText, content);
                    } else {
                        return Mono.error(new RuntimeException("No choices in API response"));
                    }
                })
                .onErrorResume(error -> {
                    logger.warn("Error generating summary (retries left: {}): {}", 
                               retriesLeft, error.getMessage());
                    
                    if (retriesLeft > 0) {
                        return generateSummaryWithRetry(content, retriesLeft - 1);
                    } else {
                        return Mono.error(new RuntimeException(
                            "Failed to generate valid summary after " + maxRetries + " attempts", error));
                    }
                });
    }
    
    private Mono<ArticleSummary> parseAndValidateSummary(String summaryText, CrawledContent content) {
        try {
            // 首先尝试直接解析JSON
            JsonNode jsonNode = objectMapper.readTree(summaryText);
            
            // 验证必需字段
            if (!jsonNode.has("summary") || !jsonNode.has("key_points")) {
                throw new IllegalArgumentException("Missing required fields in summary JSON");
            }
            
            // 转换为ArticleSummary对象
            ArticleSummary summary = objectMapper.treeToValue(jsonNode, ArticleSummary.class);
            
            // 确保标题和作者信息
            if (summary.getTitle() == null || summary.getTitle().isEmpty()) {
                summary.setTitle(content.getTitle());
            }
            
            if ((summary.getAuthor() == null || summary.getAuthor().isEmpty()) && 
                !content.getAuthor().isEmpty()) {
                summary.setAuthor(content.getAuthor());
            }
            
            // 验证摘要长度
            if (summary.getSummary().length() > maxSummaryLength) {
                summary.setSummary(summary.getSummary().substring(0, maxSummaryLength) + "...");
            }
            
            logger.info("Successfully generated and validated summary for: {}", content.getTitle());
            return Mono.just(summary);
            
        } catch (JsonProcessingException e) {
            logger.warn("Failed to parse summary as JSON, attempting to extract JSON from text");
            
            // 尝试从文本中提取JSON
            String jsonStr = extractJsonFromText(summaryText);
            if (jsonStr != null) {
                return parseAndValidateSummary(jsonStr, content);
            }
            
            return Mono.error(new IllegalArgumentException(
                "Generated summary is not valid JSON and no JSON could be extracted"));
        } catch (Exception e) {
            return Mono.error(new IllegalArgumentException(
                "Failed to validate summary: " + e.getMessage()));
        }
    }
    
    private String extractJsonFromText(String text) {
        // 查找JSON对象的开始和结束位置
        int start = text.indexOf('{');
        int end = text.lastIndexOf('}');
        
        if (start >= 0 && end > start) {
            String potentialJson = text.substring(start, end + 1);
            try {
                objectMapper.readTree(potentialJson);
                return potentialJson;
            } catch (JsonProcessingException e) {
                // 不是有效的JSON,继续尝试其他可能的位置
                logger.debug("Extracted text is not valid JSON, trying other approaches");
            }
        }
        
        return null;
    }
    
    public boolean validateSummaryFormat(ArticleSummary summary) {
        if (summary == null) {
            return false;
        }
        
        if (summary.getSummary() == null || summary.getSummary().trim().isEmpty()) {
            return false;
        }
        
        if (summary.getKeyPoints() == null || summary.getKeyPoints().isEmpty()) {
            return false;
        }
        
        if (summary.getTags() == null || summary.getTags().isEmpty()) {
            return false;
        }
        
        // 验证情感倾向
        if (summary.getSentiment() != null) {
            List<String> validSentiments = Arrays.asList("积极", "消极", "中性");
            if (!validSentiments.contains(summary.getSentiment())) {
                return false;
            }
        }
        
        // 验证阅读时间
        if (summary.getReadTime() != null && summary.getReadTime() <= 0) {
            return false;
        }
        
        return true;
    }
}

4.5 REST控制器

提供API接口供外部调用:

package com.example.deepseek.controller;

import com.example.deepseek.model.ArticleSummary;
import com.example.deepseek.model.CrawledContent;
import com.example.deepseek.service.ArticleSummaryService;
import com.example.deepseek.crawler.WebCrawlerService;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import reactor.core.publisher.Mono;

import javax.validation.Valid;
import javax.validation.constraints.NotBlank;

@RestController
@RequestMapping("/v1/summary")
public class SummaryController {
    
    private static final Logger logger = LoggerFactory.getLogger(SummaryController.class);
    
    private final WebCrawlerService crawlerService;
    private final ArticleSummaryService summaryService;
    
    public SummaryController(WebCrawlerService crawlerService, 
                           ArticleSummaryService summaryService) {
        this.crawlerService = crawlerService;
        this.summaryService = summaryService;
    }
    
    @PostMapping("/from-url")
    public Mono<ResponseEntity<ArticleSummary>> generateSummaryFromUrl(
            @Valid @RequestBody SummaryRequest request) {
        
        logger.info("Received summary request for URL: {}", request.getUrl());
        
        return Mono.fromCallable(() -> crawlerService.crawlWebPage(request.getUrl()))
                .flatMap(crawledContent -> summaryService.generateSummary(crawledContent))
                .map(summary -> ResponseEntity.ok(summary))
                .onErrorResume(error -> {
                    logger.error("Error processing summary request for URL: {}", 
                                request.getUrl(), error);
                    return Mono.just(ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                            .body(new ArticleSummary()));
                });
    }
    
    @GetMapping("/health")
    public ResponseEntity<HealthResponse> healthCheck() {
        return ResponseEntity.ok(new HealthResponse("Service is healthy", System.currentTimeMillis()));
    }
    
    // Request and response DTOs
    public static class SummaryRequest {
        @NotBlank(message = "URL is required")
        private String url;
        
        public String getUrl() { return url; }
        public void setUrl(String url) { this.url = url; }
    }
    
    public static class HealthResponse {
        private final String status;
        private final long timestamp;
        
        public HealthResponse(String status, long timestamp) {
            this.status = status;
            this.timestamp = timestamp;
        }
        
        public String getStatus() { return status; }
        public long getTimestamp() { return timestamp; }
    }
}

5. 高级功能与优化

5.1 缓存机制实现

为了提升性能和减少API调用成本,实现缓存机制:

package com.example.deepseek.cache;

import com.example.deepseek.model.ArticleSummary;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

@Component
public class SummaryCache {
    
    private final ConcurrentMap<String, CacheEntry> cache;
    
    @Value("${app.cache.ttl:3600000}") // 1小时默认TTL
    private long defaultTtl;
    
    @Value("${app.cache.max-size:1000}")
    private int maxSize;
    
    public SummaryCache() {
        this.cache = new ConcurrentHashMap<>();
    }
    
    public void put(String url, ArticleSummary summary) {
        put(url, summary, defaultTtl);
    }
    
    public void put(String url, ArticleSummary summary, long ttl) {
        if (cache.size() >= maxSize) {
            // 简单的LRU策略:移除最早过期的条目
            evictExpiredEntries();
            if (cache.size() >= maxSize) {
                removeOldestEntry();
            }
        }
        
        long expiryTime = System.currentTimeMillis() + ttl;
        cache.put(url, new CacheEntry(summary, expiryTime));
    }
    
    public ArticleSummary get(String url) {
        CacheEntry entry = cache.get(url);
        if (entry != null) {
            if (entry.isExpired()) {
                cache.remove(url);
                return null;
            }
            return entry.getSummary();
        }
        return null;
    }
    
    private void evictExpiredEntries() {
        long now = System.currentTimeMillis();
        cache.entrySet().removeIf(entry -> entry.getValue().isExpired(now));
    }
    
    private void removeOldestEntry() {
        // 简化实现:随机移除一个条目
        if (!cache.isEmpty()) {
            String firstKey = cache.keys().nextElement();
            cache.remove(firstKey);
        }
    }
    
    private static class CacheEntry {
        private final ArticleSummary summary;
        private final long expiryTime;
        
        public CacheEntry(ArticleSummary summary, long expiryTime) {
            this.summary = summary;
            this.expiryTime = expiryTime;
        }
        
        public ArticleSummary getSummary() { return summary; }
        
        public boolean isExpired() {
            return isExpired(System.currentTimeMillis());
        }
        
        public boolean isExpired(long currentTime) {
            return currentTime > expiryTime;
        }
    }
}

5.2 错误处理与监控

实现全局异常处理和请求监控:

package com.example.deepseek.config;

import com.example.deepseek.model.ErrorResponse;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;
import org.springframework.web.bind.support.WebExchangeBindException;

@RestControllerAdvice
public class GlobalExceptionHandler {
    
    private static final Logger logger = LoggerFactory.getLogger(GlobalExceptionHandler.class);
    
    @ExceptionHandler(WebExchangeBindException.class)
    public ResponseEntity<ErrorResponse> handleValidationException(WebExchangeBindException ex) {
        String errorMessage = ex.getFieldErrors().stream()
                .map(error -> error.getField() + ": " + error.getDefaultMessage())
                .findFirst()
                .orElse("Validation error");
        
        ErrorResponse errorResponse = new ErrorResponse(
            "VALIDATION_ERROR", 
            errorMessage,
            System.currentTimeMillis()
        );
        
        return ResponseEntity.badRequest().body(errorResponse);
    }
    
    @ExceptionHandler(RuntimeException.class)
    public ResponseEntity<ErrorResponse> handleRuntimeException(RuntimeException ex) {
        logger.error("Runtime exception occurred", ex);
        
        ErrorResponse errorResponse = new ErrorResponse(
            "INTERNAL_ERROR",
            "An internal server error occurred",
            System.currentTimeMillis()
        );
        
        return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(errorResponse);
    }
    
    @ExceptionHandler(Exception.class)
    public ResponseEntity<ErrorResponse> handleGenericException(Exception ex) {
        logger.error("Unexpected exception occurred", ex);
        
        ErrorResponse errorResponse = new ErrorResponse(
            "UNKNOWN_ERROR",
            "An unexpected error occurred",
            System.currentTimeMillis()
        );
        
        return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(errorResponse);
    }
}

6. 部署与使用

6.1 本地部署配置

创建Spring Boot应用主类:

package com.example.deepseek;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cache.annotation.EnableCaching;

@SpringBootApplication
@EnableCaching
public class DeepSeekSummaryApplication {
    
    public static void main(String[] args) {
        SpringApplication.run(DeepSeekSummaryApplication.class, args);
    }
}

6.2 应用配置

创建完整的应用配置:

package com.example.deepseek.config;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.SerializationFeature;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.http.converter.json.Jackson2ObjectMapperBuilder;
import org.springframework.web.cors.CorsConfiguration;
import org.springframework.web.cors.reactive.CorsWebFilter;
import org.springframework.web.cors.reactive.UrlBasedCorsConfigurationSource;

import java.util.Arrays;

@Configuration
public class AppConfig {
    
    @Bean
    public ObjectMapper objectMapper() {
        return Jackson2ObjectMapperBuilder.json()
                .featuresToDisable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS)
                .build();
    }
    
    @Bean
    public CorsWebFilter corsFilter() {
        CorsConfiguration config = new CorsConfiguration();
        config.setAllowCredentials(true);
        config.setAllowedOriginPatterns(Arrays.asList("*"));
        config.setAllowedHeaders(Arrays.asList("*"));
        config.setAllowedMethods(Arrays.asList("GET", "POST", "PUT", "DELETE", "OPTIONS"));
        
        UrlBasedCorsConfigurationSource source = new UrlBasedCorsConfigurationSource();
        source.registerCorsConfiguration("/**", config);
        
        return new CorsWebFilter(source);
    }
}

6.3 使用示例

通过以下方式使用API服务:

# 启动应用
mvn spring-boot:run

# 测试健康检查
curl http://localhost:8080/api/v1/summary/health

# 生成文章摘要
curl -X POST http://localhost:8080/api/v1/summary/from-url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

7. 性能优化与最佳实践

7.1 性能优化策略

根据DeepSeek V3.2的特点,我们采用以下性能优化策略:

  1. 请求批处理:对多个URL进行批量处理,减少API调用次数
  2. 内容截断:对过长文章内容进行智能截断,控制在模型上下文限制内
  3. 缓存优化:使用多级缓存策略减少重复计算
  4. 连接池管理:优化HTTP连接池配置,提高并发处理能力

7.2 成本控制

DeepSeek V3.2 API价格大幅降低,但我们仍需关注成本控制:

表:DeepSeek V3.2 API价格对比

计费类型 V3.1价格 V3.2价格 降幅
输入Tokens $0.14/百万 $0.28/百万 降低50%
输出Tokens $0.28/百万 $0.42/百万 降低50%
缓存输入 - $0.028/百万 新增

7.3 监控与日志

实现全面的监控和日志记录:

package com.example.deepseek.monitoring;

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import org.springframework.stereotype.Component;

import java.util.concurrent.TimeUnit;

@Component
public class MetricsCollector {
    
    private final Counter successCounter;
    private final Counter errorCounter;
    private final Counter retryCounter;
    private final Timer summaryTimer;
    
    public MetricsCollector(MeterRegistry registry) {
        this.successCounter = registry.counter("summary.generate.success");
        this.errorCounter = registry.counter("summary.generate.errors");
        this.retryCounter = registry.counter("summary.generate.retries");
        this.summaryTimer = registry.timer("summary.generate.duration");
    }
    
    public void recordSuccess() {
        successCounter.increment();
    }
    
    public void recordError() {
        errorCounter.increment();
    }
    
    public void recordRetry() {
        retryCounter.increment();
    }
    
    public Timer.Sample startTimer() {
        return Timer.start();
    }
    
    public void stopTimer(Timer.Sample sample) {
        sample.stop(summaryTimer);
    }
}

8. 测试策略

8.1 单元测试

为关键组件编写单元测试:

package com.example.deepseek.service;

import com.example.deepseek.model.ArticleSummary;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;

import java.util.Arrays;

import static org.junit.jupiter.api.Assertions.*;

@SpringBootTest
class ArticleSummaryServiceTest {
    
    @Autowired
    private ArticleSummaryService summaryService;
    
    @Test
    void testValidateSummaryFormat_ValidSummary() {
        ArticleSummary summary = new ArticleSummary();
        summary.setTitle("Test Title");
        summary.setSummary("This is a test summary");
        summary.setKeyPoints(Arrays.asList("Point 1", "Point 2"));
        summary.setTags(Arrays.asList("tag1", "tag2"));
        summary.setSentiment("积极");
        summary.setReadTime(5);
        
        assertTrue(summaryService.validateSummaryFormat(summary));
    }
    
    @Test
    void testValidateSummaryFormat_InvalidSummary() {
        ArticleSummary summary = new ArticleSummary();
        summary.setTitle("Test Title");
        // 缺少必需的summary字段
        
        assertFalse(summaryService.validateSummaryFormat(summary));
    }
}

9. 结论与扩展

本文详细介绍了如何使用Spring Boot和DeepSeek V3.2 API构建一个完整的智能文章摘要生成工具。通过结合网页爬虫、AI摘要生成和格式验证重试机制,我们实现了一个强大且可靠的系统。

9.1 项目亮点

  1. 技术先进性:利用DeepSeek V3.2最新的稀疏注意力机制,提升长文本处理效率
  2. 架构健壮性:采用反应式编程和完善的错误处理机制
  3. 成本效益:充分利用API降价优势,同时通过缓存和重试优化资源使用
  4. 易于扩展:模块化设计便于后续功能扩展和维护

9.2 未来扩展方向

  1. 多语言支持:扩展支持英文、日文等其他语言的摘要生成
  2. 多模态处理:结合DeepSeek未来的多模态能力处理图文内容
  3. 实时处理:集成消息队列支持流式处理和实时摘要生成
  4. 个性化摘要:基于用户偏好生成不同风格和深度的摘要

9.3 资源链接

通过本解决方案,开发者可以快速构建属于自己的智能摘要工具,充分利用DeepSeek V3.2的强大能力,同时保证系统的稳定性和可扩展性。

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐