源码级拆解：OpenAI 与阿里云 DashScope 流式输出的核心差异

摘要：本文对比了OpenAI和阿里云DashScope在流式输出（Streaming）设计上的差异。OpenAI采用“约定优于配置”原则，stream=True即自动实现增量输出，每次仅返回新增文本片段（delta.content），代码简洁且行业通用。而DashScope为兼容多模型和场景，需额外参数incremental_output=True才能增量输出，默认返回完整历史文本，灵活性更高但

Astron_ma

179人浏览 · 2026-04-18 19:45:53

Astron_ma · 2026-04-18 19:45:53 发布

引言

在大模型应用开发中，流式输出（Streaming） 是提升用户体验的核心技术 —— 它让模型 “边想边说”，而非让用户盯着空白屏幕等待完整回复。

但你是否发现：

用 OpenAI SDK 时，只需 stream=True 就能实现 “逐字增量输出”；
用阿里云 DashScope 原生 SDK 时，却必须额外加 incremental_output=True，否则每次都会返回历史全文。

本文将从源码级别深度解析这两种设计的实现原理、优缺点，并给出面向初级大模型开发工程师的最佳实践。

一、流式输出基础：增量 vs 非增量

在进入源码前，先明确两个核心概念：

模式	说明	示例（输出 “我喜欢 Python”）
非增量流式	每次返回当前已生成的完整文本	第 1 次：`我` 第 2 次：`我喜欢` 第 3 次：`我喜欢Python`
增量流式	每次仅返回新增的文本片段	第 1 次：`我` 第 2 次：`喜欢` 第 3 次：`Python`

显然，增量流式更高效（省流量、渲染快），这也是我们追求的目标。

二、OpenAI SDK：`stream=True` = 自动增量

OpenAI 的设计哲学是 “约定优于配置”，其 Python SDK 源码将这一点体现得淋漓尽致。

2.1 关键源码解析

我们以 OpenAI Python SDK v1.x 版本为例，看 chat.completions.create 中 stream=True 的处理逻辑：

python

运行

# openai/resources/chat/completions.py（简化版）
def create(
    self,
    *,
    messages: List[ChatCompletionMessageParam],
    model: str,
    stream: Optional[bool] = None,
    # ... 其他参数
) -> Union[ChatCompletion, Stream[ChatCompletionChunk]]:
    
    # 如果开启流式，直接返回 Stream 对象
    if stream:
        return self._post(
            "/chat/completions",
            body=maybe_transform(
                {
                    "messages": messages,
                    "model": model,
                    "stream": stream,
                },
                # ...
            ),
            stream=True,
            cast_type=ChatCompletion,
        )
    
    # 非流式则返回完整 ChatCompletion
    return self._post(
        "/chat/completions",
        body=maybe_transform(...),
        cast_type=ChatCompletion,
    )

2.2 核心：`ChatCompletionChunk` 的设计

OpenAI 流式返回的每个 chunk 都是 ChatCompletionChunk 对象，其结构强制约定为增量：

python

运行

# openai/types/chat/chat_completion_chunk.py
class ChatCompletionChunk(BaseModel):
    id: str
    choices: List[ChatCompletionChunkChoice]
    # ...

class ChatCompletionChunkChoice(BaseModel):
    delta: ChatCompletionChunkDelta  # 👈 关键：只存增量
    finish_reason: Optional[str] = None
    index: int

class ChatCompletionChunkDelta(BaseModel):
    content: Optional[str] = None  # 👈 仅新增的文本片段
    role: Optional[str] = None

2.3 为什么 OpenAI 不需要 “增量开关”？

因为 OpenAI API 协议本身就强制流式输出为增量模式。

服务端返回的每个 SSE（Server-Sent Events）数据包只包含 delta；
SDK 无需做额外逻辑，直接透传 delta.content 即可。

这就是为什么你写 OpenAI 流式代码时，只需：

python

运行

for chunk in stream_completion:
    print(chunk.choices[0].delta.content, end="", flush=True)

三、阿里云 DashScope：`stream=True` ≠ 增量

DashScope 作为阿里云的大模型平台，需要兼容更多模型（如通义千问、DeepSeek 等），设计上更偏向 “灵活配置”。

3.1 关键源码解析

我们看 DashScope Python SDK 中 Generation.call 的处理：

python

运行

# dashscope/apis/generation.py（简化版）
@classmethod
def call(
    cls,
    model: str,
    messages: Optional[List[Message]] = None,
    stream: bool = False,
    incremental_output: bool = False,
    # ... 其他参数
) -> Union[GenerationResponse, Iterator[GenerationResponse]]:
    
    # 构建请求参数
    request_data = {
        "model": model,
        "input": {"messages": messages} if messages else {},
        "parameters": {
            "stream": stream,
            "incremental_output": incremental_output,  # 👈 透传给服务端
            # ...
        },
    }
    
    # 流式处理
    if stream:
        return cls._handle_stream(
            request_data=request_data,
            api_key=api_key,
            **kwargs
        )
    
    # 非流式处理
    return cls._handle_request(
        request_data=request_data,
        api_key=api_key,
        **kwargs
    )

3.2 核心：`incremental_output` 是服务端参数

与 OpenAI 不同，DashScope 的增量逻辑在服务端实现：

当 incremental_output=False（默认）：服务端每次返回当前完整文本；
当 incremental_output=True：服务端仅返回新增文本片段。

我们可以从返回的 GenerationResponse 结构看出差异：

python

运行

# dashscope/api_resources/generation.py
class GenerationResponse(BaseApiResponse):
    @property
    def output(self) -> GenerationOutput:
        return GenerationOutput(self._get_output())

class GenerationOutput(BaseApiOutput):
    @property
    def text(self) -> Optional[str]:
        return self._get_field("text")  # 👈 非增量时这里存完整文本
    
    @property
    def choices(self) -> Optional[List[GenerationChoice]]:
        # 👈 增量/思考内容在这里
        choices = self._get_field("choices")
        if choices:
            return [GenerationChoice(choice) for choice in choices]
        return None

3.3 为什么 DashScope 需要额外参数？

因为 DashScope 要兼容：

历史版本：早期版本默认非增量，为了向后兼容保留了开关；
多模型类型：部分模型（如 DeepSeek-R1）有 reasoning_content（思考过程），需要更灵活的输出结构；
不同业务场景：某些场景下开发者可能需要每次获取完整文本做校验。

四、对比：两种设计的优缺点

维度	OpenAI SDK	阿里云 DashScope
易用性	✅ 极简：只需 `stream=True`	⚠️ 稍复杂：需配合 `incremental_output=True`
灵活性	❌ 固定：只能增量输出	✅ 灵活：可选择增量 / 非增量
协议统一性	✅ 强：OpenAI 协议已成行业标准	⚠️ 弱：原生协议有自定义扩展
功能扩展性	❌ 弱：仅支持标准对话	✅ 强：支持思考过程、多模态等
学习成本	✅ 低：一次学会，通用所有兼容平台	⚠️ 中：需记忆原生 SDK 特殊参数

五、初级工程师最佳实践

5.1 追求简单：用 OpenAI 兼容模式调用 DashScope

DashScope 支持 OpenAI 协议兼容模式，你可以完全用 OpenAI SDK 的写法调用阿里云模型：

python

运行

from openai import OpenAI

client = OpenAI(
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
    api_key="your-dashscope-api-key"  # 或从环境变量读取
)

stream_completion = client.chat.completions.create(
    model="qwen-turbo",
    messages=[{"role": "user", "content": "介绍一下Python"}],
    stream=True
)

for chunk in stream_completion:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

优点：代码极简，一次学会，通用所有支持 OpenAI 协议的平台。

5.2 需要深度功能：用 DashScope 原生 SDK

如果你需要使用 DeepSeek-R1 的思考过程等特色功能，必须用原生 SDK：

python

运行

import dashscope
from dashscope import Generation

dashscope.api_key = "your-dashscope-api-key"

responses = Generation.call(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "介绍一下Python"}],
    stream=True,
    incremental_output=True  # 👈 必须加
)

print("<<<AI 思考: >>>", end="")
flag = True

for chunk in responses:
    if chunk.output.choices:
        choice = chunk.output.choices[0]
        # 打印思考过程
        if choice.message.reasoning_content:
            print(choice.message.reasoning_content, end="", flush=True)
        # 打印正式回复
        elif choice.message.content:
            if flag:
                print("\n<<<AI 回复: >>>", end="")
                flag = False
            print(choice.message.content, end="", flush=True)

六、总结

OpenAI SDK：设计哲学是 “约定优于配置”，stream=True 自动实现增量输出，适合追求简单、通用的场景；
阿里云 DashScope：设计哲学是 “灵活优先”，需配合 incremental_output=True 实现增量，适合需要深度功能（如思考过程）的场景。

作为初级大模型开发工程师，建议你：