zoukankan      html  css  js  c++  java
  • python nltk 学习笔记(2)

    ExampleDescription
    fileids() the files of the corpus
    fileids([categories]) the files of the corpus corresponding to these categories
    categories() the categories of the corpus
    categories([fileids]) the categories of the corpus corresponding to these files
    raw() the raw content of the corpus
    raw(fileids=[f1,f2,f3]) the raw content of the specified files
    raw(categories=[c1,c2]) the raw content of the specified categories
    words() the words of the whole corpus
    words(fileids=[f1,f2,f3]) the words of the specified fileids
    words(categories=[c1,c2]) the words of the specified categories
    sents() the sentences of the whole corpus
    sents(fileids=[f1,f2,f3]) the sentences of the specified fileids
    sents(categories=[c1,c2]) the sentences of the specified categories
    abspath(fileid) the location of the given file on disk
    encoding(fileid) the encoding of the file (if known)
    open(fileid) open a stream for reading the given corpus file
    root() the path to the root of locally installed corpus
    readme() the contents of the README file of the corpus
    Load your own corpus
    >>>
    from nltk.corpus import PlaintextCorpusReader >>> corpus_root = '/usr/share/dict' >>> wordlists = PlaintextCorpusReader(corpus_root, '.*') >>> wordlists.fileids()

    def unusual_words(text):
        text_vocab = set(w.lower() for w in text if w.isalpha())
        english_vocab = set(w.lower() for w in nltk.corpus.words.words())
        unusual = text_vocab.difference(english_vocab)
        return sorted(unusual)
    Set:
    OperationEquivalentResult
    len(s)   cardinality of set s
    x in s   test x for membership in s
    x not in s   test x for non-membership in s
    s.issubset(t) s <= t test whether every element in s is in t
    s.issuperset(t) s >= t test whether every element in t is in s
    s.union(t) s | t new set with elements from both s and t
    s.intersection(t) s & t new set with elements common to s and t
    s.difference(t) s - t new set with elements in s but not in t
    s.symmetric_difference(t) s ^ t new set with elements in either s or t but not both
    s.copy()   new set with a shallow copy of s

    >>> from nltk.corpus import stopwords

    >>> stopwords.words('english')

     

    WordNet:

    >>> from nltk.corpus import wordnet as wn

    >>> wn.synsets('motorcar')

  • 相关阅读:
    UVA 11991 Easy Problem from Rujia Liu(map,vector的使用)
    UVA 11995 I Can Guess the Data Structure! (STL应用)
    HDU 2795 Billboard(线段树,单点更新)
    HDU 1394 Minimum Inversion Number (线段树,单点更新)
    UVA 11827 Maximum GCD(读入技巧,stringstream的使用)
    contest 2 总结
    Const 1 总结
    开始进行大量题目练习
    函数式线段树的个人理解
    poj 2318 TOYS
  • 原文地址:https://www.cnblogs.com/wintor12/p/3622283.html
Copyright © 2011-2022 走看看