TextBlob简介

TextBlob简介TextBlob是一个用python编写的开源的文本处理库,它可以用来执行很多自然语言处理的任务,比如,词性标注、名词性成分提取、情感分析、文本翻译等等git网址:https://github.com/sloria/TextBlob官方文档:https://textblob.readthedocs.io/en/dev/安装:pipinstalltextblob貌似是针对英文进…

大家好,欢迎来到IT知识分享网。TextBlob简介"

TextBlob是一个用python编写的开源的文本处理库,它可以用来执行很多自然语言处理的任务,比如,词性标注、名词性成分提取、情感分析、文本翻译等等
git 网址:https://github.com/sloria/TextBlob
官方文档:https://textblob.readthedocs.io/en/dev/

安装:
pip install textblob

貌似是针对英文进行处理的

textblob运行成功之前,需要先下载下面的一些东西

import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('brown')
nltk.download('wordnet')

现在相关的数据
python -m textblob.download_corpora

代码例子:

# coding:utf-8
from textblob import TextBlob

text = 'I love natural language processing! i do not like you'
blob = TextBlob(text)

# 词性标注

print('词性标注')
print(blob.tags)

”’
结果显示
[(‘I’, ‘PRP’), (‘love’, ‘VBP’), (‘natural’, ‘JJ’), (‘language’, ‘NN’), (‘processing’, ‘NN’), (‘i’, ‘NNS’), (‘do’, ‘VBP’), (‘not’, ‘RB’), (‘like’, ‘IN’), (‘you’, ‘PRP’)]
”’

# 短语抽取

np = blob.noun_phrases
print('短语抽取')
for w in np:
    print(w)

”’
结果显示
natural language processing
”’
# 计算句子情感值

print('计算句子情感值')
for sentence in blob.sentences:
    print(sentence + '------>' + str(sentence.sentiment.polarity))

”’
结果显示
I love natural language processing!——>0.3125
i do not like you——>0.0
”’# 将句子切分成词或者句子 Tokenization

print('将句子切分成词或者句子')
token = blob.words
for w in token:
    print(w)

sentence = blob.sentences
for s in sentence:
    print(s)

”’
结果显示
I
love
natural
language
processing
i
do
not
like
you
I love natural language processing!
i do not like you
”’

# 词语变形 Words Inflection

print('词语变形')
token = blob.words
for w in token:
    # 变复数
    print(w.pluralize())
    # 变单数
    print(w.singularize())

”’
结果显示
we
I
love
love
naturals
natural
languages
language
processings
processing
is
i
does
do
nots
not
likes
like
you
you

”’# 词干化 Words Lemmatization

from textblob import Word

print('词干化')
w = Word('went')
print(w.lemmatize('v'))
w = Word('octopi')
print(w.lemmatize())

”’
结果显示
go
octopus
”’# 集成WordNet

from textblob.wordnet import VERB

print('集成WordNet')
word = Word('octopus')
syn_word = word.synsets
for syn in syn_word:
    print(syn)

”’
结果显示
Synset(‘octopus.n.01’)
Synset(‘octopus.n.02’)
”’
# 指定返回的同义词集为动词

print('指定返回的同义词集为动词')
syn_word1 = Word("hack").get_synsets(pos=VERB)
for syn in syn_word1:
    print(syn)
Word("beautiful").definitions  # 查看synset(同义词集)的具体定义

”’
结果显示
Synset(‘chop.v.05’)
Synset(‘hack.v.02’)
Synset(‘hack.v.03’)
Synset(‘hack.v.04’)
Synset(‘hack.v.05’)
Synset(‘hack.v.06’)
Synset(‘hack.v.07’)
Synset(‘hack.v.08’)
”’# 拼写纠正(Spelling Correction)

print('拼写纠正')
sen = 'I lvoe naturl language processing!'
sen = TextBlob(sen)
print(sen.correct())

# Word.spellcheck() #返回拼写建议以及置信度

w1 = Word('good')
w2 = Word('god')
w3 = Word('gd')
print(w1.spellcheck())
print(w2.spellcheck())
print(w3.spellcheck())

”’
结果显示
I love nature language processing!
[(‘good’, 1.0)]
[(‘god’, 1.0)]
[(‘go’, 0.586139896373057), (‘god’, 0.23510362694300518), (‘d’, 0.11658031088082901), (‘g’, 0.03626943005181347), (‘ed’, 0.009067357512953367), (‘rd’, 0.006476683937823834), (‘nd’, 0.0038860103626943004), (‘gr’, 0.0025906735751295338), (‘sd’, 0.0006476683937823834), (‘md’, 0.0006476683937823834), (‘id’, 0.0006476683937823834), (‘gdp’, 0.0006476683937823834), (‘ga’, 0.0006476683937823834), (‘ad’, 0.0006476683937823834)]
”’

# 句法分析(Parsing)

print('句法分析')
text = TextBlob('I lvoe naturl language processing!')
print(text.parse())

”’
结果显示
I/PRP/B-NP/O lvoe/NN/I-NP/O naturl/NN/I-NP/O language/NN/I-NP/O processing/NN/I-NP/O !/./O/O

”’
 

免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://yundeesoft.com/21975.html

(0)

相关推荐

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注微信