zoukankan html css js c++ java

python nltk 学习笔记(2)

Example	Description
`fileids()`	the files of the corpus
`fileids([categories])`	the files of the corpus corresponding to these categories
`categories()`	the categories of the corpus
`categories([fileids])`	the categories of the corpus corresponding to these files
`raw()`	the raw content of the corpus
`raw(fileids=[f1,f2,f3])`	the raw content of the specified files
`raw(categories=[c1,c2])`	the raw content of the specified categories
`words()`	the words of the whole corpus
`words(fileids=[f1,f2,f3])`	the words of the specified fileids
`words(categories=[c1,c2])`	the words of the specified categories
`sents()`	the sentences of the whole corpus
`sents(fileids=[f1,f2,f3])`	the sentences of the specified fileids
`sents(categories=[c1,c2])`	the sentences of the specified categories
`abspath(fileid)`	the location of the given file on disk
`encoding(fileid)`	the encoding of the file (if known)
`open(fileid)`	open a stream for reading the given corpus file
`root()`	the path to the root of locally installed corpus
`readme()`	the contents of the README file of the corpus

Load your own corpus
>>> from nltk.corpus import PlaintextCorpusReader
>>> corpus_root = '/usr/share/dict' 
>>> wordlists = PlaintextCorpusReader(corpus_root, '.*') 
>>> wordlists.fileids()

def unusual_words(text):
    text_vocab = set(w.lower() for w in text if w.isalpha())
    english_vocab = set(w.lower() for w in nltk.corpus.words.words())
    unusual = text_vocab.difference(english_vocab)
    return sorted(unusual)
Set:

Operation	Equivalent	Result
`len(s)`		cardinality of set s
`x in s`		test x for membership in s
`x not in s`		test x for non-membership in s
`s.issubset(t)`	`s <= t`	test whether every element in s is in t
`s.issuperset(t)`	`s >= t`	test whether every element in t is in s
`s.union(t)`	`s \| t`	new set with elements from both s and t
`s.intersection(t)`	`s & t`	new set with elements common to s and t
`s.difference(t)`	`s - t`	new set with elements in s but not in t
`s.symmetric_difference(t)`	`s ^ t`	new set with elements in either s or t but not both
`s.copy()`		new set with a shallow copy of s

>>> from nltk.corpus import stopwords

>>> stopwords.words('english')

WordNet:

>>> from nltk.corpus import wordnet as wn

>>> wn.synsets('motorcar')

查看全文

相关阅读:
http://www.sqlservercentral.com/Forums/Topic6111071461.aspx
SQL 2012 New Location for Query Templates
How to Share Data between Stored Procedures
DB Development Standard summary
fn_SplitStringToTable
PowerShell Database Server Disk Space Checking
IIS支持htaccess的Rewrite3配置过程
 html select按纽代码
 jquery插件集 HA
HTML基础特殊字符(易记版) HA

原文地址：https://www.cnblogs.com/wintor12/p/3622283.html