huggingface概要

Bert 的输入需要用 [CLS] 和 [SEP] 进行标记，开头用 [CLS]，句子结尾用 [SEP]：分词的类型id，比如是个句子对，则属于第一句的token就把他的id设置为0，第二句的就设置为1。：分词，就是一句话分成的每个词。见ref. bert相关-2。

hellopbc · 2022-09-23 19:52:37 发布

huggingface

token：分词，就是一句话分成的每个词

token_type_ids：分词的类型id，比如是个句子对，则属于第一句的token就把他的id设置为0，第二句的就设置为1。

demo：

Bert 的输入需要用 [CLS] 和 [SEP] 进行标记，开头用 [CLS]，句子结尾用 [SEP]

两个句子：

tokens：[CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]

token_type_ids：0 0 0 0 0 0 0 0 1 1 1 1 1 1

第一个 [SEP] 属于第一句

一个句子：

tokens：[CLS] the dog is hairy . [SEP]

token_type_ids：0 0 0 0 0 0 0

见ref. bert相关-2

conda install -c huggingface transformers==4.11.3 tokenizers==0.10.3

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

SpringBoot+Vue 社区智慧养老监护管理平台平台完整项目源码+SQL脚本+接口文档【Java Web毕设】

基于SpringBoot+Vue的开发景区民宿预约系统管理系统设计与实现【Java+MySQL+MyBatis完整源码】

企业级体育馆管理系统管理系统源码｜SpringBoot+Vue+MyBatis架构+MySQL数据库【完整版】

查看更多评论

已为社区贡献2条内容