读入的地址是文件夹地址,不指定语言的话默认是英语,不能指定未下载语言包的语言。在这里,CoreNLP 不能在同一个语料中处理两种不同的语言,比如“他听到录音里说:‘Please open your textbook.’。”这句话无论是用指定英语的 nlp 还是指定中文的 nlp_ch 都是没法正确处理的。
#仅适用于 token 字符串的处理
def stringToList(x):
s = x[2:len(x)-2]
l = s.split(“', '”)
return l
3. 词性标注 Part-of-speech Tagging
nlp.pos_tag(‘This is an example of tokenziation.’) #结果:[(‘This’, ‘DT’), (‘is’, ‘VBZ’), (‘an’, ‘DT’), (‘example’, ‘NN’), (‘of’, ‘IN’), (‘tokenziation’, ‘NN’), (‘.’, ‘.’)]
Clause Level
S Simple declarative clause, i.e. one that is not introduced by a (possible empty) subordinating conjunction or a wh-word and that does not exhibit subject-verb inversion.
SBAR Clause introduced by a (possibly empty) subordinating conjunction.
SBARQ Direct question introduced by a wh-word or a wh-phrase. Indirect questions and relative clauses should be bracketed as SBAR, not SBARQ.
SINV Inverted declarative sentence, i.e. one in which the subject follows the tensed verb or modal.
SQ Inverted yes/no question, or main clause of a wh-question, following the wh-phrase in SBARQ.
Phrase Level
ADJP Adjective Phrase.
ADVP Adverb Phrase.
CONJP Conjunction Phrase.
FRAG Fragment.
INTJ Interjection. Corresponds approximately to the part-of-speech tag UH.
LST List marker. Includes surrounding punctuation.
NAC Not a Constituent; used to show the scope of certain prenominal modifiers within an NP.
NP Noun Phrase.
NX Used within certain complex NPs to mark the head of the NP. Corresponds very roughly to N-bar level but used quite differently.
PP Prepositional Phrase.
PRN Parenthetical.
PRT Particle. Category for words that should be tagged RP.
QP Quantifier Phrase (i.e. complex measure/amount phrase); used within NP.
RRC Reduced Relative Clause.
UCP Unlike Coordinated Phrase.
VP Vereb Phrase.
WHADJP Wh-adjective Phrase. Adjectival phrase containing a wh-adverb, as in how hot.
WHAVP Wh-adverb Phrase. Introduces a clause with an NP gap. May be null (containing the 0 complementizer) or lexical, containing a wh-adverb such as how or why.
WHNP Wh-noun Phrase. Introduces a clause with an NP gap. May be null (containing the 0 complementizer) or lexical, containing some wh-word, e.g. who, which book, whose daughter, none of which, or how many leopards.
WHPP Wh-prepositional Phrase. Prepositional phrase containing a wh-noun phrase (such as of which or by whose authority) that either introduces a PP gap or is contained by a WHNP.
X Unknown, uncertain, or unbracketable. X is often used for bracketing typos and in bracketing the…the-constructions.
Word level
CC Coordinating conjunction
CD Cardinal number
DT Determiner
EX Existential there
FW Foreign word
IN Preposition or subordinating conjunction
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
LS List item marker
MD Modal
NN Noun, singular or mass
NNS Noun, plural
NNP Proper noun, singular
NNPS Proper noun, plural
PDT Predeterminer
POS Possessive ending
PRP Personal pronoun
PRP$ Possessive pronoun (prolog version PRP-S)
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
RP Particle
SYM Symbol
TO to
UH Interjection
VB Verb, base form
VBD Verb, past tense
VBG Verb, gerund or present participle
VBN Verb, past participle
VBP Verb, non-3rd person singular present
VBZ Verb, 3rd person singular present
WDT Wh-determiner
WP Wh-pronoun
WP$ Possessive wh-pronoun (prolog version WP-S)
WRB Wh-adverb
上面是宾州树库的标注集[1],但是中文的标注集实际上是 Penn Chinese Treebank Tagset,跟这个有所区别,可以参考下面这个链接:
所有评论(0)