zoukankan      html  css  js  c++  java
  • NLTK——常用函数

    1.Functions Defined for NLTK's Frequency Distributions

    ExampleDescription
    fdist = FreqDist(samples) create a frequency distribution containing the given samples
    fdist[sample] += 1 increment the count for this sample
    fdist['monstrous'] count of the number of times a given sample occurred
    fdist.freq('monstrous') frequency of a given sample
    fdist.N() total number of samples
    fdist.most_common(n) the n most common samples and their frequencies
    for sample in fdist: iterate over the samples
    fdist.max() sample with the greatest count
    fdist.tabulate() tabulate the frequency distribution
    fdist.plot() graphical plot of the frequency distribution
    fdist.plot(cumulative=True) cumulative plot of the frequency distribution
    fdist1 |= fdist2 update fdist1 with counts from fdist2
    fdist1 < fdist2 test if samples in fdist1 occur less frequently than in fdist2

    2.Some Word Comparison Operators

    FunctionMeaning
    s.startswith(t) test if s starts with t
    s.endswith(t) test if s ends with t
    t in s test if t is a substring of s
    s.islower() test if s contains cased characters and all are lowercase
    s.isupper() test if s contains cased characters and all are uppercase
    s.isalpha() test if s is non-empty and all characters in s are alphabetic
    s.isalnum() test if s is non-empty and all characters in s are alphanumeric
    s.isdigit() test if s is non-empty and all characters in s are digits
    s.istitle() test if s contains cased characters and is titlecased (i.e. all words in s have initial capitals)

    3.Basic Corpus Functionality defined in NLTK

    ExampleDescription
    fileids() the files of the corpus
    fileids([categories]) the files of the corpus corresponding to these categories
    categories() the categories of the corpus
    categories([fileids]) the categories of the corpus corresponding to these files
    raw() the raw content of the corpus
    raw(fileids=[f1,f2,f3]) the raw content of the specified files
    raw(categories=[c1,c2]) the raw content of the specified categories
    words() the words of the whole corpus
    words(fileids=[f1,f2,f3]) the words of the specified fileids
    words(categories=[c1,c2]) the words of the specified categories
    sents() the sentences of the whole corpus
    sents(fileids=[f1,f2,f3]) the sentences of the specified fileids
    sents(categories=[c1,c2]) the sentences of the specified categories
    abspath(fileid) the location of the given file on disk
    encoding(fileid) the encoding of the file (if known)
    open(fileid) open a stream for reading the given corpus file
    root if the path to the root of locally installed corpus
    readme() the contents of the README file of the corpus

    4.NLTK's Conditional Frequency Distributions

    ExampleDescription
    cfdist = ConditionalFreqDist(pairs) create a conditional frequency distribution from a list of pairs
    cfdist.conditions() the conditions
    cfdist[condition] the frequency distribution for this condition
    cfdist[condition][sample] frequency for the given sample for this condition
    cfdist.tabulate() tabulate the conditional frequency distribution
    cfdist.tabulate(samples, conditions) tabulation limited to the specified samples and conditions
    cfdist.plot() graphical plot of the conditional frequency distribution
    cfdist.plot(samples, conditions) graphical plot limited to the specified samples and conditions
    cfdist1 < cfdist2 test if samples in cfdist1 occur less frequently than in cfdist2
  • 相关阅读:
    laravel中使用ElasticSearch详情
    linux 使用大全
    常见监控软件介绍及原理介绍
    DNS与域名解析
    linux基础入门(基础命令+vi+shell)
    php-fpm与fastcgi、php-cgi之间的关系及源码解析
    web系统整体性能测试
    typescript入门
    webpack
    React
  • 原文地址:https://www.cnblogs.com/LCharles/p/10774738.html
Copyright © 2011-2022 走看看