一、NLTK:Natural Language Toolkit(自然语言工具包)
pip install nltk
二、使用
import nltk nltk.download()#下载数据
import nltk text = 'Hello, Tom! How are you recently?' sens = nltk.sent_tokenize(text) #对文本按照句子进行分割 sens#['Hello, Tom!', 'How are you recently?'] words = [] for sen in sens: words.append(nltk.word_tokenize(sen))#对句子进行分词 words#[['Hello', ',', 'Tom', '!'], ['How', 'are', 'you', 'recently', '?']] tags = [] for tokens in words: tags.append(nltk.pos_tag(tokens))#对句子进行词性标注 tags#[[('Hello', 'NNP'), (',', ','), ('Tom', 'NNP'), ('!', '.')], [('How', 'WRB'), ('are', 'VBP'), ('you', 'PRP'), ('recently', 'RB'), ('?', '.')]]
三、安装成功,导入报错
已经成功安装nltk,但是import nltk时报错:No module named '_sqlite3'
背景:linux系统自带的python2,已经成功安装nltk,本人自己安装了python3,import nltk出错
解决方法:sudo apt-get install sqlite*之后,重新安装python3
#step1 sudo apt-get install sqlite* #step2 ./configure --prefix=/python3_path make && make install