DeepSeek-OCR全面应用指南

摘要 DeepSeek-OCR是DeepSeek AI推出的创新多模态模型，采用"上下文光学压缩"技术实现高效文本处理，在10倍压缩比下精度达97%。该系统支持五种分辨率模式，日处理20万页数据，可识别文档、表格、公式等复杂内容。安装需24GB以上显存GPU和Python 3.12.9环境，提供本地部署和云端服务两种方案。应用场景包括企业文档数字化、学术研究、金融法律等领域，通

xyzroundo

4661人浏览 · 2025-11-03 23:31:58

xyzroundo · 2025-11-03 23:31:58 发布

DeepSeek-OCR全面应用指南：从安装部署到实战场景

一、DeepSeek-OCR核心价值与应用前景

DeepSeek-OCR是DeepSeek AI于2025年10月推出的创新性多模态模型，其核心突破在于提出了“上下文光学压缩”技术。这一技术通过将文本信息转换为视觉表征实现高效压缩，在10倍压缩比下解码精度高达97%，20倍压缩比下仍保持约60%精度，为长文本处理提供了全新解决方案。

1.1 技术优势亮点

高效压缩：仅需100个视觉token即可超越GOT-OCR2.0（每页256token）
多场景适配：提供Tiny/Small/Base/Large/Gundam五种分辨率模式
强大生产力：单张A100显卡日处理超过20万页数据
多格式支持：完美处理文档、表格、公式、图表等复杂内容

二、系统环境要求与准备工作

2.1 硬件配置建议

GPU：A100-40G或同级性能显卡（如RTX 4090、3090）
显存：≥24GB（处理PDF建议≥40GB）
内存：≥16GB
存储：≥10GB（模型文件约5-8GB）

2.2 软件环境要求

操作系统：Linux（推荐）/Windows/macOS
Python版本：3.12.9
CUDA版本：11.8+
PyTorch版本：2.6.0

表：DeepSeek-OCR环境配置要求

组件	最低要求	推荐配置
操作系统	Windows 10/Linux	Ubuntu 20.04+
Python	3.10+	3.12.9
CUDA	11.0+	11.8+
显存	8GB	24GB+

三、详细安装部署步骤

3.1 基础环境搭建

# 1. 克隆项目代码
git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
cd DeepSeek-OCR

# 2. 创建Conda环境（推荐）
conda create -n deepseek-ocr python=3.12.9 -y
conda activate deepseek-ocr

# 3. 安装PyTorch（CUDA 11.8版本）
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118

3.2 项目依赖安装

# 安装项目依赖
pip install -r requirements.txt

# 安装flash-attn（注意版本匹配）
pip install flash-attn==2.7.3 --no-build-isolation

注意：flash-attn安装是常见难点，如在线安装失败，可到GitHub下载对应版本离线安装。

3.3 模型文件下载

从以下平台下载模型文件：

Hugging Face Hub：https://huggingface.co/deepseek-ai/DeepSeek-OCR
ModelScope：https://modelscope.cn/models/deepseek-ai/DeepSeek-OCR

四、使用教程与实战示例

4.1 基本图像识别

from transformers import AutoModel, AutoTokenizer
import torch
import os

# 模型加载
model_name = 'deepseek-ai/DeepSeek-OCR'
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
model = model.eval().cuda().to(torch.bfloat16)

# 图像识别推理
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'

res = model.infer(tokenizer, prompt=prompt, image_file=image_file, 
                 output_path=output_path, base_size=1024, image_size=640, 
                 crop_mode=True, save_results=True)

4.2 五种分辨率模式选择

DeepSeek-OCR提供灵活的压缩等级适配不同场景：

# Tiny模式（轻量级）
res = model.infer(tokenizer, image_file=image_file, base_size=512, image_size=512, crop_mode=False)

# Small模式（平衡型）
res = model.infer(tokenizer, image_file=image_file, base_size=640, image_size=640, crop_mode=False)

# Base模式（通用场景）
res = model.infer(tokenizer, image_file=image_file, base_size=1024, image_size=1024, crop_mode=False)

# Large模式（高精度）
res = model.infer(tokenizer, image_file=image_file, base_size=1280, image_size=1280, crop_mode=False)

# Gundam模式（复杂文档）
res = model.infer(tokenizer, image_file=image_file, base_size=1024, image_size=640, crop_mode=True)

4.3 PDF文档处理（Linux环境）

# 使用vLLM加速推理（仅Linux支持）
pip install vllm
python -m deepseek_ocr.demo.vllm_demo \
    --model-path /path/to/deepseek-ocr \
    --pdf-file document.pdf \
    --output-dir ./output

注意：PDF直接处理需要vLLM支持，目前仅Linux系统可用。Windows用户可先将PDF转换为图片再处理。