640?wx_fmt=jpeg


textstat是python的文本可读性计算包,可以计算 文章层面、段落层面·句子层面 的文本的

  • 音节统计syllable_count

  • 词汇数统计lexicon_count

  • 句子数统计sentence_count

  • 各种可读性算法

目前支持的语言有英语en、德语de、西班牙语es、法语fr、意大利语it、荷兰语nl、波兰语pl、俄语ru,目前不支持中文呢。

可读性计算方法有

  • The Flesch Reading Ease formula

  • Flesch-Kincaid Grade Level

  • The Fog Scale (Gunning FOG Formula)

  • The SMOG Index

  • Automated Readability Index

  • The Coleman-Liau Index

  • Linsear Write Formula

  • Dale-Chall Readability Score

安装

!pip3 install textstat

音节统计

textstat.syllable_count(text)

import textstat	
test = 'Playing games'	
textstat.syllable_count(test)

Run

    3

词汇统计

textstat.lexicon_count(text, removepunct=True)

test2 = "Playing games has always!"	
textstat.lexicon_count(test2, removepunct=True)

Run

    4

可读性

输入text,返回可读性值。

  • textstat.fleschreadingease(text)

  • textstat.smog_index(text)

  • textstat.fleschkincaidgrade(text)

  • textstat.colemanliauindex(text)

  • textstat.automatedreadabilityindex(text)

  • textstat.dalechallreadability_score(text)

  • textstat.difficult_words(text)

  • textstat.linsearwriteformula(text)

  • textstat.gunning_fog(text)

  • textstat.text_standard(text)

每种算法大家请移步到github项目链接

https://github.com/shivam5992/textstat

查看计算原理及得分的解读。

test_data = "Playing games has always been thought to be important to the development of well-balanced \	
and creative children; however, what part, if any, they should play in the lives of \	
adults has never been researched that deeply. I believe that playing games is every bit \	
as important for adults as for children. Not only is taking time out to play games with our \	
children and other adults valuable to building interpersonal relationships but is also a wonderful way \	
to release built up tension."

Run

print(textstat.flesch_reading_ease(test_data))	
print(textstat.smog_index(test_data))	
print(textstat.flesch_kincaid_grade(test_data))	
print(textstat.coleman_liau_index(test_data))	
print(textstat.automated_readability_index(test_data))	
print(textstat.dale_chall_readability_score(test_data))	
print(textstat.difficult_words(test_data))	
print(textstat.linsear_write_formula(test_data))	
print(textstat.gunning_fog(test_data))	
print(textstat.text_standard(test_data))

Run

52.23	
    12.5	
    12.8	
    11.03	
    15.5	
    6.72	
    9	
    16.333333333333332	
    12.38	
    12th and 13th grade

近期文章

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐