Ollama API服务启动与OpenAI接口兼容

方面 (Aspect)【使用前】原生Ollama调用【使用后】通过LiteLLM的OpenAI兼容调用优势API端点与OpenAI标准统一请求结构支持多轮对话格式Python库requests(需要手动构造)openai(官方SDK，功能强大)生态成熟，代码简洁代码兼容性不兼容，需为Ollama重写代码完全兼容，只需修改base_url无缝迁移现有应用。

万山y

900人浏览 · 2025-08-02 00:42:58

万山y · 2025-08-02 00:42:58 发布

示例模型: qwen3:4b, llama3.1
Ollama服务地址: http://localhost:11434
LiteLLM代理地址: http://localhost:4000
拉取示例模型：在终端运行以下命令，下载本教程所需的模型。
bash ollama pull qwen3:4b ollama pull llama3.1 # 同时拉取一个支持工具调用的可靠模型

第一部分：【使用前】直接调用Ollama原生API

在不使用任何工具的情况下，我们需要按照Ollama自有的API格式来编写代码，这与OpenAI的格式完全不同。

步骤1: 启动Ollama原生服务

在终端中运行 ollama serve。服务将默认启动在 http://localhost:11434。

步骤2: 调用原生API

Ollama的原生接口是 /api/generate，它使用 prompt 字段，并且不支持 messages 这种对话列表格式。

使用 curl 调用:

curl http://localhost:11434/api/generate -d '{
  "model": "qwen3:4b",
  "prompt": "你好，介绍一下你自己。",
  "stream": false
}'

使用Python requests 库调用:

import requests
import json

# 定义Ollama原生API的URL和请求体
ollama_url = "http://localhost:11434/api/generate"
payload = {
    "model": "qwen3:4b",
    "prompt": "你好，介绍一下你自己。",
    "stream": False
}

# 发送请求
try:
    response = requests.post(ollama_url, data=json.dumps(payload))
    response.raise_for_status() # 如果请求失败则抛出异常
    
    # 解析并打印结果
    data = response.json()
    print("来自Ollama原生API的回复:")
    print(data.get("response"))

except requests.exceptions.RequestException as e:
    print(f"请求失败: {e}")

小结: 这种方式是可行的，但如果您的应用之前是为OpenAI开发的，那么您需要重写所有的API调用逻辑，非常繁琐。

第二部分：【使用后】通过LiteLLM调用OpenAI兼容API

现在，我们引入LiteLLM作为中间代理，看看事情会变得多么简单。

步骤1: 安装并配置LiteLLM

安装LiteLLM:
```
pip install litellm
```

创建配置文件 config.yaml:
这个文件告诉LiteLLM，当一个应用请求一个模型时，应该去调用Ollama中对应的模型。

# config.yaml
model_list:
  - model_name: qwen-local  # 为你的模型创建一个别名
    litellm_params:
      model: ollama/qwen3:4b
      api_base: http://localhost:11434

  - model_name: llama3.1-tool-use # 为支持工具调用的模型创建另一个别名
    litellm_params:
      model: ollama/llama3.1
      api_base: http://localhost:11434

litellm_settings:
  set_verbose: true

步骤2: 启动LiteLLM代理服务

在 config.yaml 文件所在的目录下，运行以下命令。注意我们使用 --port 参数来指定端口。

litellm --config config.yaml --port 4000

服务会启动在一个新的端口 http://localhost:4000，这个端口现在是您新的、兼容OpenAI的API入口。

步骤3: 以OpenAI的方式调用

现在，您所有的代码都可以按照OpenAI的标准来写，只需将API地址指向LiteLLM即可。

使用 curl 调用 (注意看结构的变化):

curl -X POST http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-local",
    "messages": [
        {
            "role": "user",
            "content": "你好，介绍一下你自己。"
        }
    ]
}'

使用Python openai 官方库调用 (代码的巨大简化):

from openai import OpenAI

# 仅需修改这两行，指向LiteLLM代理
client = OpenAI(
    base_url="http://localhost:4000/v1",
    api_key="not-needed" 
)

# 后续代码与调用官方OpenAI完全一样，无需任何改动！
try:
    response = client.chat.completions.create(
        model="llama3.1-tool-use", # 改为调用更可靠的模型
        messages=[
            {"role": "user", "content": "你好，介绍一下你自己。"}
        ]
    )
    print("\n通过LiteLLM + OpenAI SDK的回复:")
    print(response.choices[0].message.content)

except Exception as e:
    print(f"请求失败: {e}")

第三部分：高级用法 - 工具调用 (Tool Calling)

工具调用是让大模型与外部世界交互的关键。模型可以决定调用您定义的函数来获取信息，从而回答问题。

步骤1: 定义工具并发出请求

我们向模型提问，并提供一个它可以使用的工具（例如get_current_weather）。

使用 curl 请求工具调用:

curl -X POST http://localhost:4000/v1/chat/completions \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "llama3.1-tool-use",
    "messages": [
        {
            "role": "user",
            "content": "上海今天的天气怎么样？"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "获取指定地点的当前天气信息",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "城市名，例如：北京"
                        }
                    },
                    "required": ["location"]
                }
            }
        }
    ]
}'

模型会返回一个 tool_calls 对象，请求您的应用去执行这个函数。

步骤2: 执行工具并返回结果

您的应用程序需要解析模型的请求，执行相应的函数，然后将函数的返回结果再次提交给模型。

使用Python openai 库完成完整的工具调用循环:

import json
from openai import OpenAI

# 指向在4000端口运行的LiteLLM代理
client = OpenAI(
    base_url="http://localhost:4000/v1",
    api_key="not-needed"
)

# 1. 在本地定义一个模型可以调用的函数
def get_current_weather(location):
    """一个模拟的获取天气的函数"""
    print(f"--- 正在执行工具：获取 {location} 的天气 ---")
    if "上海" in location:
        return json.dumps({"temperature": "25°C", "condition": "多云"})
    return json.dumps({"error": "未知地点"})

# 2. 准备第一次请求，包含问题和工具定义
messages = [{"role": "user", "content": "上海今天的天气怎么样？"}]
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "获取指定地点的当前天气信息",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "城市名"}
                },
                "required": ["location"],
            },
        },
    }
]

# 3. 发送第一次请求
response = client.chat.completions.create(
    model="llama3.1-tool-use",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)

response_message = response.choices[0].message
tool_calls = response_message.tool_calls

# 4. 检查模型是否请求调用工具
if tool_calls:
    print("--- 模型请求调用工具 ---")
    messages.append(response_message)  # 将模型的回复加入对话历史

    # 5. 执行所有被请求的工具
    for tool_call in tool_calls:
        function_name = tool_call.function.name
        function_args = json.loads(tool_call.function.arguments)
        
        if function_name == "get_current_weather":
            function_response = get_current_weather(
                location=function_args.get("location")
            )
            # 6. 将工具的执行结果加入对话历史
            messages.append(
                {
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": function_response,
                }
            )

    # 7. 发送第二次请求，包含工具的返回结果
    print("--- 正在将工具结果返回给模型 ---")
    second_response = client.chat.completions.create(
        model="llama3.1-tool-use",
        messages=messages,
    )
    
    # 8. 打印最终的自然语言回复
    print("\n模型的最终回复:")
    print(second_response.choices[0].message.content)

直观对比总结

方面 (Aspect)	【使用前】原生Ollama调用	【使用后】通过LiteLLM的OpenAI兼容调用	优势
API端点	`/api/generate`	`/v1/chat/completions`	与OpenAI标准统一
请求结构	`{"prompt": "..."}`	`{"messages": [...], "tools": [...]}`	支持多轮对话和工具调用
Python库	`requests` (需要手动构造)	`openai` (官方SDK，功能强大)	生态成熟，代码简洁
代码兼容性	不兼容，需为Ollama重写代码	完全兼容，只需修改`base_url`	无缝迁移现有应用

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

什么是DeepSeek？如何入门DeepSeek？

准备训练数据（JSON格式）：代码语言：javascript代码运行次数：0运行AI代码解释"instruction": "生成产品描述","input": "无线蓝牙耳机，降噪，30小时续航","output": "XX蓝牙耳机采用主动降噪技术..."启动微调训练：代码语言：javascript代码运行次数：0运行AI代码解释。

2048 AI社区

AI在医疗领域的典型应用案例

‌医学影像分析‌‌病理诊断‌01:24AI也能辅助医生看诊记者带你体验科大讯飞智医助理央广网清镇医共体:数智化织就健康保障网便民红利直达百姓身边天眼新闻《人工智能+行动意见》出台,哪些 AI 应用板块将迎来政策红利?梓开工作室医策科技精彩亮相第十四届病理年会,AI赋能病理诊断新未来新浪财经‌手术规划‌‌药物研发‌‌智能问诊‌‌医院管理‌‌远程医疗‌04:515分钟AI速成课——第三集 AI看病