[langchain]How to pass multimodal data to models

但是我遇到了问题，就是我手头只有deepkeek这一张牌能用，其他的模型我在本地调用不了，而deepseek还不支持多模态的格式。问题在于目前deepseek还不支持“type”: “image”,我在尝试把多模态数据传递给大模型。

叶常落

688人浏览 · 2025-08-19 13:01:52

叶常落 · 2025-08-19 13:01:52 发布

https://python.langchain.com/docs/how_to/multimodal_inputs/#images-from-a-url

使用多模态数据，调用大模型

我在尝试把多模态数据传递给大模型
但是我遇到了问题，就是我手头只有deepkeek这一张牌能用，其他的模型我在本地调用不了，而deepseek还不支持多模态的格式。

入参
message = {
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “Describe the weather in this image:”,
},
{
“type”: “image”,
“url”: “https://cdn.deepseek.com/official_account.jpg”
}
],
}
I just get the error like behind:

422
Failed to deserialize the JSON body into the target type: messages[0]: data did not match any variant of untagged enum ChatCompletionRequestContent at line 1 column 232

问题在于目前deepseek还不支持 “type”: “image”,
这种用法

切换到z.ai,使用glm-4.5v，可以调通图片的

主要是content这个参数里面的格式

image

llm = ChatOpenAI(
temperature=0.6,
model=“glm-4.5v”,
openai_api_key=“xxxxx”,
openai_api_base=“https://open.bigmodel.cn/api/paas/v4/”
)

message = {
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “Describe what in this image:”,
},
{
“type”: “image”,
“source_type”: “url”,
“url”: image_url,
},
],
}
response = llm.invoke([message])
print(response.text())