Hugging Face transformers 里 pipelines 的注册信息
摘要:HuggingFace Transformers库的pipeline注册信息汇总了各类AI任务的处理能力,涵盖文本、图像、音频等多模态任务。该表详细列出了26种任务类型,包括音频分类、语音识别、文本分类、图像分割等,标注了PyTorch/TensorFlow框架支持情况、默认模型及任务类别。其中15种任务支持双框架,6种专属PyTorch;文本类任务最多(12种),其次是多模态(8种)和图像
·
Hugging Face transformers 里 pipelines 的注册信息。
里面每个条目对应一种任务类型(task),记录了:
-
impl:对应的 pipeline 类
-
tf:是否有 TensorFlow 实现(列表空表示没有)
-
pt:是否有 PyTorch 实现(以及可用的 AutoModel 类)
-
default.model:如果你直接
pipeline(task_name)不指定模型,会用的默认模型(及其哈希 ID) -
type:任务类别(text / image / audio / multimodal / video)
| 任务名 (task) | 默认模型 (PyTorch) | 默认模型 (TensorFlow) | 类型 | TF支持? | PT支持? | 中文说明 |
|---|---|---|---|---|---|---|
| audio-classification | superb/wav2vec2-base-superb-ks | — | audio | ❌ | ✅ | 音频分类,将音频分成预定义类别 |
| automatic-speech-recognition | facebook/wav2vec2-base-960h | — | multimodal | ❌ | ✅ | 语音转文字(ASR) |
| text-to-audio | suno/bark-small | — | text | ❌ | ✅ | 文本生成音频 |
| feature-extraction | distilbert/distilbert-base-cased | distilbert/distilbert-base-cased | multimodal | ✅ | ✅ | 提取文本或多模态特征向量 |
| text-classification | distilbert/distilbert-base-uncased-finetuned-sst-2-english | 同左 | text | ✅ | ✅ | 文本分类(如情感分析) |
| token-classification | dbmdz/bert-large-cased-finetuned-conll03-english | 同左 | text | ✅ | ✅ | 序列标注(如命名实体识别) |
| question-answering | distilbert/distilbert-base-cased-distilled-squad | 同左 | text | ✅ | ✅ | 阅读理解问答 |
| table-question-answering | google/tapas-base-finetuned-wtq | 同左 | text | ✅ | ✅ | 基于表格的问答 |
| visual-question-answering | dandelin/vilt-b32-finetuned-vqa | — | multimodal | ❌ | ✅ | 图文问答 |
| document-question-answering | impira/layoutlm-document-qa | — | multimodal | ❌ | ✅ | 文档理解问答 |
| fill-mask | distilbert/distilroberta-base | 同左 | text | ✅ | ✅ | 掩码语言模型填空 |
| summarization | sshleifer/distilbart-cnn-12-6 | google-t5/t5-small | text | ✅ | ✅ | 文本摘要 |
| translation | google-t5/t5-base | 同左 | text | ✅ | ✅ | 机器翻译 |
| text2text-generation | google-t5/t5-base | 同左 | text | ✅ | ✅ | 文本到文本生成 |
| text-generation | openai-community/gpt2 | 同左 | text | ✅ | ✅ | 自然语言生成 |
| zero-shot-classification | facebook/bart-large-mnli | FacebookAI/roberta-large-mnli | text | ✅ | ✅ | 零样本文本分类 |
| zero-shot-image-classification | openai/clip-vit-base-patch32 | 同左 | multimodal | ✅ | ✅ | 零样本图像分类 |
| zero-shot-audio-classification | laion/clap-htsat-fused | — | multimodal | ❌ | ✅ | 零样本音频分类 |
| image-classification | google/vit-base-patch16-224 | 同左 | image | ✅ | ✅ | 图像分类 |
| image-feature-extraction | google/vit-base-patch16-224 | 同左 | image | ✅ | ✅ | 图像特征提取 |
| image-segmentation | facebook/detr-resnet-50-panoptic | — | multimodal | ❌ | ✅ | 图像分割 |
| image-to-text | ydshieh/vit-gpt2-coco-en | 同左 | multimodal | ✅ | ✅ | 图像描述生成 |
| image-text-to-text | llava-hf/llava-onevision-qwen2-0.5b-ov-hf | — | multimodal | ❌ | ✅ | 图文混合输入生成文本 |
| object-detection | facebook/detr-resnet-50 | — | multimodal | ❌ | ✅ | 目标检测 |
| zero-shot-object-detection | google/owlvit-base-patch32 | — | multimodal | ❌ | ✅ | 零样本目标检测 |
| depth-estimation | Intel/dpt-large | — | image | ❌ | ✅ | 深度估计 |
| video-classification | MCG-NJU/videomae-base-finetuned-kinetics | — | video | ❌ | ✅ | 视频分类 |
| mask-generation | facebook/sam-vit-huge | — | multimodal | ❌ | ✅ | 图像掩码生成 |
| image-to-image | caidas/swin2SR-classical-sr-x2-64 | — | image | ❌ | ✅ | 图像到图像转换(超分辨率等) |
更多推荐


所有评论(0)