当前位置:   article > 正文

【人工智能】Transformers之Pipeline(概述):30w+大模型极简应用_transformers pipeline

transformers pipeline

​​​​​​​

目录

一、引言 

二、pipeline库

2.1 概述

2.2 使用task实例化pipeline对象

2.2.1 基于task实例化“自动语音识别”

2.2.2 task列表

2.2.3 task默认模型

2.3 使用model实例化pipeline对象

2.3.1 基于model实例化“自动语音识别”

 2.3.2 查看model与task的对应关系

三、总结


一、引言 

 pipeline(管道)是huggingface transformers库中一种极简方式使用大模型推理的抽象,将所有大模型分为语音(Audio)、计算机视觉(Computer vision)、自然语言处理(NLP)、多模态(Multimodal)等4大类,28小类任务(tasks)。共计覆盖32万个模型

本文对pipeline进行整体介绍,之后本专栏以每个task为主题,分别介绍各种task使用方法。

二、pipeline库

2.1 概述

管道是一种使用模型进行推理的简单而好用的方法。这些管道是从库中抽象出大部分复杂代码的对象,提供了专用于多项任务的简单 API,包括命名实体识别、掩码语言建模、情感分析、特征提取和问答。在使用上,主要有2种方法

  • 使用task实例化pipeline对象
  • 使用model实例化pipeline对象

2.2 使用task实例化pipeline对象

2.2.1 基于task实例化“自动语音识别

自动语音识别的task为automatic-speech-recognition:

  1. import os
  2. os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
  3. os.environ["CUDA_VISIBLE_DEVICES"] = "2"
  4. from transformers import pipeline
  5. speech_file = "./output_video_enhanced.mp3"
  6. pipe = pipeline(task="automatic-speech-recognition")
  7. result = pipe(speech_file)
  8. print(result)

2.2.2 task列表

task共计28类,按首字母排序,列表如下,直接替换2.2.1代码中的pipeline的task即可应用:

2.2.3 task默认模型

针对每一个task,pipeline默认配置了模型,可以通过pipeline源代码查看:

  1. SUPPORTED_TASKS = {
  2. "audio-classification": {
  3. "impl": AudioClassificationPipeline,
  4. "tf": (),
  5. "pt": (AutoModelForAudioClassification,) if is_torch_available() else (),
  6. "default": {"model": {"pt": ("superb/wav2vec2-base-superb-ks", "372e048")}},
  7. "type": "audio",
  8. },
  9. "automatic-speech-recognition": {
  10. "impl": AutomaticSpeechRecognitionPipeline,
  11. "tf": (),
  12. "pt": (AutoModelForCTC, AutoModelForSpeechSeq2Seq) if is_torch_available() else (),
  13. "default": {"model": {"pt": ("facebook/wav2vec2-base-960h", "55bb623")}},
  14. "type": "multimodal",
  15. },
  16. "text-to-audio": {
  17. "impl": TextToAudioPipeline,
  18. "tf": (),
  19. "pt": (AutoModelForTextToWaveform, AutoModelForTextToSpectrogram) if is_torch_available() else (),
  20. "default": {"model": {"pt": ("suno/bark-small", "645cfba")}},
  21. "type": "text",
  22. },
  23. "feature-extraction": {
  24. "impl": FeatureExtractionPipeline,
  25. "tf": (TFAutoModel,) if is_tf_available() else (),
  26. "pt": (AutoModel,) if is_torch_available() else (),
  27. "default": {
  28. "model": {
  29. "pt": ("distilbert/distilbert-base-cased", "935ac13"),
  30. "tf": ("distilbert/distilbert-base-cased", "935ac13"),
  31. }
  32. },
  33. "type": "multimodal",
  34. },
  35. "text-classification": {
  36. "impl": TextClassificationPipeline,
  37. "tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),
  38. "pt": (AutoModelForSequenceClassification,) if is_torch_available() else (),
  39. "default": {
  40. "model": {
  41. "pt": ("distilbert/distilbert-base-uncased-finetuned-sst-2-english", "af0f99b"),
  42. "tf": ("distilbert/distilbert-base-uncased-finetuned-sst-2-english", "af0f99b"),
  43. },
  44. },
  45. "type": "text",
  46. },
  47. "token-classification": {
  48. "impl": TokenClassificationPipeline,
  49. "tf": (TFAutoModelForTokenClassification,) if is_tf_available() else (),
  50. "pt": (AutoModelForTokenClassification,) if is_torch_available() else (),
  51. "default": {
  52. "model": {
  53. "pt": ("dbmdz/bert-large-cased-finetuned-conll03-english", "f2482bf"),
  54. "tf": ("dbmdz/bert-large-cased-finetuned-conll03-english", "f2482bf"),
  55. },
  56. },
  57. "type": "text",
  58. },
  59. "question-answering": {
  60. "impl": QuestionAnsweringPipeline,
  61. "tf": (TFAutoModelForQuestionAnswering,) if is_tf_available() else (),
  62. "pt": (AutoModelForQuestionAnswering,) if is_torch_available() else (),
  63. "default": {
  64. "model": {
  65. "pt": ("distilbert/distilbert-base-cased-distilled-squad", "626af31"),
  66. "tf": ("distilbert/distilbert-base-cased-distilled-squad", "626af31"),
  67. },
  68. },
  69. "type": "text",
  70. },
  71. "table-question-answering": {
  72. "impl": TableQuestionAnsweringPipeline,
  73. "pt": (AutoModelForTableQuestionAnswering,) if is_torch_available() else (),
  74. "tf": (TFAutoModelForTableQuestionAnswering,) if is_tf_available() else (),
  75. "default": {
  76. "model": {
  77. "pt": ("google/tapas-base-finetuned-wtq", "69ceee2"),
  78. "tf": ("google/tapas-base-finetuned-wtq", "69ceee2"),
  79. },
  80. },
  81. "type": "text",
  82. },
  83. "visual-question-answering": {
  84. "impl": VisualQuestionAnsweringPipeline,
  85. "pt": (AutoModelForVisualQuestionAnswering,) if is_torch_available() else (),
  86. "tf": (),
  87. "default": {
  88. "model": {"pt": ("dandelin/vilt-b32-finetuned-vqa", "4355f59")},
  89. },
  90. "type": "multimodal",
  91. },
  92. "document-question-answering": {
  93. "impl": DocumentQuestionAnsweringPipeline,
  94. "pt": (AutoModelForDocumentQuestionAnswering,) if is_torch_available() else (),
  95. "tf": (),
  96. "default": {
  97. "model": {"pt": ("impira/layoutlm-document-qa", "52e01b3")},
  98. },
  99. "type": "multimodal",
  100. },
  101. "fill-mask": {
  102. "impl": FillMaskPipeline,
  103. "tf": (TFAutoModelForMaskedLM,) if is_tf_available() else (),
  104. "pt": (AutoModelForMaskedLM,) if is_torch_available() else (),
  105. "default": {
  106. "model": {
  107. "pt": ("distilbert/distilroberta-base", "ec58a5b"),
  108. "tf": ("distilbert/distilroberta-base", "ec58a5b"),
  109. }
  110. },
  111. "type": "text",
  112. },
  113. "summarization": {
  114. "impl": SummarizationPipeline,
  115. "tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),
  116. "pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),
  117. "default": {
  118. "model": {"pt": ("sshleifer/distilbart-cnn-12-6", "a4f8f3e"), "tf": ("google-t5/t5-small", "d769bba")}
  119. },
  120. "type": "text",
  121. },
  122. # This task is a special case as it's parametrized by SRC, TGT languages.
  123. "translation": {
  124. "impl": TranslationPipeline,
  125. "tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),
  126. "pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),
  127. "default": {
  128. ("en", "fr"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},
  129. ("en", "de"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},
  130. ("en", "ro"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},
  131. },
  132. "type": "text",
  133. },
  134. "text2text-generation": {
  135. "impl": Text2TextGenerationPipeline,
  136. "tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),
  137. "pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),
  138. "default": {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},
  139. "type": "text",
  140. },
  141. "text-generation": {
  142. "impl": TextGenerationPipeline,
  143. "tf": (TFAutoModelForCausalLM,) if is_tf_available() else (),
  144. "pt": (AutoModelForCausalLM,) if is_torch_available() else (),
  145. "default": {"model": {"pt": ("openai-community/gpt2", "6c0e608"), "tf": ("openai-community/gpt2", "6c0e608")}},
  146. "type": "text",
  147. },
  148. "zero-shot-classification": {
  149. "impl": ZeroShotClassificationPipeline,
  150. "tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),
  151. "pt": (AutoModelForSequenceClassification,) if is_torch_available() else (),
  152. "default": {
  153. "model": {
  154. "pt": ("facebook/bart-large-mnli", "c626438"),
  155. "tf": ("FacebookAI/roberta-large-mnli", "130fb28"),
  156. },
  157. "config": {
  158. "pt": ("facebook/bart-large-mnli", "c626438"),
  159. "tf": ("FacebookAI/roberta-large-mnli", "130fb28"),
  160. },
  161. },
  162. "type": "text",
  163. },
  164. "zero-shot-image-classification": {
  165. "impl": ZeroShotImageClassificationPipeline,
  166. "tf": (TFAutoModelForZeroShotImageClassification,) if is_tf_available() else (),
  167. "pt": (AutoModelForZeroShotImageClassification,) if is_torch_available() else (),
  168. "default": {
  169. "model": {
  170. "pt": ("openai/clip-vit-base-patch32", "f4881ba"),
  171. "tf": ("openai/clip-vit-base-patch32", "f4881ba"),
  172. }
  173. },
  174. "type": "multimodal",
  175. },
  176. "zero-shot-audio-classification": {
  177. "impl": ZeroShotAudioClassificationPipeline,
  178. "tf": (),
  179. "pt": (AutoModel,) if is_torch_available() else (),
  180. "default": {
  181. "model": {
  182. "pt": ("laion/clap-htsat-fused", "973b6e5"),
  183. }
  184. },
  185. "type": "multimodal",
  186. },
  187. "image-classification": {
  188. "impl": ImageClassificationPipeline,
  189. "tf": (TFAutoModelForImageClassification,) if is_tf_available() else (),
  190. "pt": (AutoModelForImageClassification,) if is_torch_available() else (),
  191. "default": {
  192. "model": {
  193. "pt": ("google/vit-base-patch16-224", "5dca96d"),
  194. "tf": ("google/vit-base-patch16-224", "5dca96d"),
  195. }
  196. },
  197. "type": "image",
  198. },
  199. "image-feature-extraction": {
  200. "impl": ImageFeatureExtractionPipeline,
  201. "tf": (TFAutoModel,) if is_tf_available() else (),
  202. "pt": (AutoModel,) if is_torch_available() else (),
  203. "default": {
  204. "model": {
  205. "pt": ("google/vit-base-patch16-224", "3f49326"),
  206. "tf": ("google/vit-base-patch16-224", "3f49326"),
  207. }
  208. },
  209. "type": "image",
  210. },
  211. "image-segmentation": {
  212. "impl": ImageSegmentationPipeline,
  213. "tf": (),
  214. "pt": (AutoModelForImageSegmentation, AutoModelForSemanticSegmentation) if is_torch_available() else (),
  215. "default": {"model": {"pt": ("facebook/detr-resnet-50-panoptic", "fc15262")}},
  216. "type": "multimodal",
  217. },
  218. "image-to-text": {
  219. "impl": ImageToTextPipeline,
  220. "tf": (TFAutoModelForVision2Seq,) if is_tf_available() else (),
  221. "pt": (AutoModelForVision2Seq,) if is_torch_available() else (),
  222. "default": {
  223. "model": {
  224. "pt": ("ydshieh/vit-gpt2-coco-en", "65636df"),
  225. "tf": ("ydshieh/vit-gpt2-coco-en", "65636df"),
  226. }
  227. },
  228. "type": "multimodal",
  229. },
  230. "object-detection": {
  231. "impl": ObjectDetectionPipeline,
  232. "tf": (),
  233. "pt": (AutoModelForObjectDetection,) if is_torch_available() else (),
  234. "default": {"model": {"pt": ("facebook/detr-resnet-50", "2729413")}},
  235. "type": "multimodal",
  236. },
  237. "zero-shot-object-detection": {
  238. "impl": ZeroShotObjectDetectionPipeline,
  239. "tf": (),
  240. "pt": (AutoModelForZeroShotObjectDetection,) if is_torch_available() else (),
  241. "default": {"model": {"pt": ("google/owlvit-base-patch32", "17740e1")}},
  242. "type": "multimodal",
  243. },
  244. "depth-estimation": {
  245. "impl": DepthEstimationPipeline,
  246. "tf": (),
  247. "pt": (AutoModelForDepthEstimation,) if is_torch_available() else (),
  248. "default": {"model": {"pt": ("Intel/dpt-large", "e93beec")}},
  249. "type": "image",
  250. },
  251. "video-classification": {
  252. "impl": VideoClassificationPipeline,
  253. "tf": (),
  254. "pt": (AutoModelForVideoClassification,) if is_torch_available() else (),
  255. "default": {"model": {"pt": ("MCG-NJU/videomae-base-finetuned-kinetics", "4800870")}},
  256. "type": "video",
  257. },
  258. "mask-generation": {
  259. "impl": MaskGenerationPipeline,
  260. "tf": (),
  261. "pt": (AutoModelForMaskGeneration,) if is_torch_available() else (),
  262. "default": {"model": {"pt": ("facebook/sam-vit-huge", "997b15")}},
  263. "type": "multimodal",
  264. },
  265. "image-to-image": {
  266. "impl": ImageToImagePipeline,
  267. "tf": (),
  268. "pt": (AutoModelForImageToImage,) if is_torch_available() else (),
  269. "default": {"model": {"pt": ("caidas/swin2SR-classical-sr-x2-64", "4aaedcb")}},
  270. "type": "image",
  271. },
  272. }

2.3 使用model实例化pipeline对象

2.3.1 基于model实例化“自动语音识别”

如果不想使用task中默认的模型,可以指定huggingface中的模型:

  1. import os
  2. os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
  3. os.environ["CUDA_VISIBLE_DEVICES"] = "2"
  4. from transformers import pipeline
  5. speech_file = "./output_video_enhanced.mp3"
  6. #transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-medium")
  7. pipe = pipeline(model="openai/whisper-medium")
  8. result = pipe(speech_file)
  9. print(result)

 2.3.2 查看model与task的对应关系

可以登录https://huggingface.co/tasks查看

三、总结

本文为transformers之pipeline专栏的第0篇,后面会以每个task为一篇,共计讲述28+个tasks的用法,通过28个tasks的pipeline使用学习,可以掌握语音、计算机视觉、自然语言处理、多模态乃至强化学习等30w+个huggingface上的开源大模型。让你成为大模型领域的专家!

期待您的3连+关注,如何还有时间,欢迎阅读我的其他文章:

《AI—工程篇》

AI智能体研发之路-工程篇(一):Docker助力AI智能体开发提效

AI智能体研发之路-工程篇(二):Dify智能体开发平台一键部署

AI智能体研发之路-工程篇(三):大模型推理服务框架Ollama一键部署

AI智能体研发之路-工程篇(四):大模型推理服务框架Xinference一键部署

AI智能体研发之路-工程篇(五):大模型推理服务框架LocalAI一键部署

《AI—模型篇》

AI智能体研发之路-模型篇(一):大模型训练框架LLaMA-Factory在国内网络环境下的安装、部署及使用

AI智能体研发之路-模型篇(二):DeepSeek-V2-Chat 训练与推理实战

AI智能体研发之路-模型篇(三):中文大模型开、闭源之争

AI智能体研发之路-模型篇(四):一文入门pytorch开发

AI智能体研发之路-模型篇(五):pytorch vs tensorflow框架DNN网络结构源码级对比

AI智能体研发之路-模型篇(六):【机器学习】基于tensorflow实现你的第一个DNN网络

AI智能体研发之路-模型篇(七):【机器学习】基于YOLOv10实现你的第一个视觉AI大模型

AI智能体研发之路-模型篇(八):【机器学习】Qwen1.5-14B-Chat大模型训练与推理实战

AI智能体研发之路-模型篇(九):【机器学习】GLM4-9B-Chat大模型/GLM-4V-9B多模态大模型概述、原理及推理实战

《AI—Transformers应用》

【AI大模型】Transformers大模型库(一):Tokenizer

【AI大模型】Transformers大模型库(二):AutoModelForCausalLM

【AI大模型】Transformers大模型库(三):特殊标记(special tokens)

【AI大模型】Transformers大模型库(四):AutoTokenizer

【AI大模型】Transformers大模型库(五):AutoModel、Model Head及查看模型结构

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/煮酒与君饮/article/detail/821784
推荐阅读
相关标签
  

闽ICP备14008679号