zoukankan      html  css  js  c++  java
  • Python自然语言处理学习笔记(16):2.8 Exercises 练习

     博主懒人...练习未做完,下回补全...

     2.8 Exercises 练习

    1. Create a variable phrase containing a list of words. Experiment with the operations described in this chapter, including addition, multiplication, indexing, slicing, and sorting.

    List_practice=['Hello','World!']
    List_practice
    +['Pythoner']
    List_practice
    *3
    List_practice.index(‘Hello’)
    List_practice[:]
    List_practice.sort()

    2. Use the corpus module to explore austen-persuasion.txt. How many word tokens does this book have? How many word types?

    from nltk.corpus import gutenberg
    austen 
    =gutenberg.words('austen-persuasion.txt')
    len(austen)
    =98171
     
    不太明白这里的单词种类是指单词有多少种还是单词类型有多少?
    len(
    set((s.lower() for s in austen if s.isalpha())))
    5739

    3. Use the Brown Corpus reader nltk.corpus.brown.words() or the Web Text Corpus reader nltk.corpus.webtext.words() to access some sample text in two different genres.

    from nltk.corpus import brown
    romance_text
    =brown.words(categories='romance')
    print 
    ' '.join(romance_text[:50])
     
    from nltk.corpus import webtext
    firefox_text
    =webtext.words('firefox.txt')
    print 
    ' '.join(firefox_text[:50])

    4. Read in the texts of the State of the Union addresses, using the state_union corpus reader. Count occurrences of men, women, and people in each document. What has happened to the usage of these words over time?

    import nltk
    from nltk.corpus import inaugural
    cfd 
    = nltk.ConditionalFreqDist(
        (target, fileid[:
    4])
        
    for fileid in inaugural.fileids()
        
    for w in inaugural.words(fileid)
        
    for target in ['men''women''people']
        
    if w.lower().startswith(target))
    cfd.plot()

     


    men:内战和二次大战的时候出现的比较多,参军保卫国家呗,冷战之后频率下降

    women:1909年之前为0!之后开始出现,1913年最多,从1949年到1977年又为0,估计那会都去搞苏联,冷战,建立北约什么去了。1981年之后又开始出现,看来米国的性别歧视相当严重。

    people:只要是打仗的时候出现的非常频率...人民啊人民,仅挂在嘴上的这群政客!跑题了...

    7/16更新 发现审题审错了...应该是State of the Union addresses(国情咨文),不是就职演讲...重做!

    import nltk
    from nltk.corpus import state_union
    cfd 
    = nltk.ConditionalFreqDist(
            (target,fileid[:
    4])
            
    for fileid in state_union.fileids()
            
    for w in state_union.words(fileid)
            
    for target in ['men''women''people']
            
    if w.lower().startswith(target))
    cfd.plot()


    5. Investigate the holonym-meronym relations for some nouns. Remember that there are three kinds of holonym-meronym(整体部分) relation, so you need to use member_meronyms(),part_meronyms(),substance_meronyms(), member_holonyms(),

    part_holonyms(), and substance_holonyms().

    from nltk.corpus import wordnet as wn
    N
    =wn.synset('human.n.01')
    >>> N
    Synset(
    'homo.n.02')
    >>> N. member_meronyms()
    []
    >>> N.part_meronyms()
    [Synset(
    'human_head.n.01'), Synset('foot.n.01'), Synset('mane.n.02'), Synset('loin.n.02'), Synset('face.n.01'), Synset('arm.n.01'), Synset('hand.n.01'), Synset('human_body.n.01'), Synset('body_hair.n.01')]
    >>> N.substance_meronyms()
    []
    >>> N.member_holonyms()
    [Synset(
    'genus_homo.n.01')]
    >>> N.part_holonyms()
    []
    >>> N.substance_holonyms()
    []
     
     
    >>> F=wn.synset('fish.n.01')
    >>> F
    Synset(
    'fish.n.01')
    >>> F.member_meronyms()
    []
    >>> F.part_meronyms()
    [Synset(
    'tail_fin.n.03'), Synset('milt.n.02'), Synset('fin.n.06'), Synset('fishbone.n.01'), Synset('roe.n.02'), Synset('fish_scale.n.01'), Synset('lateral_line.n.01')]
    >>> F.substance_meronyms()
    []
    >>> F.member_holonyms()
    [Synset(
    'school.n.07'), Synset('pisces.n.04')]
    >>> F.substance_holonyms()
    []
    >>> F.part_holonyms()
    []

    6.  In the discussion of comparative wordlists, we created an object called translate, which you could look up using words in both German and Italian in order to get corresponding words in English. What problem might arise with this approach? Can you suggest a way to avoid this problem?

    如果输入错误、不存在的词语或者其他没有通过translate.update(dict(XX2en))加入字典的语言词语,则会引发KeyError。添加一个错误处理或者用if判断

    7. According to Strunk and Whites Elements of Style, the word however, used at the start of a sentence, means “in whatever way” or “to whatever extent,” and not nevertheless.” They give this example of correct usage: However you advise him, he will probably do as he thinks best. (http://www.bartleby.com/141/strunk3.html) Use the concordance tool to study actual usage of this word in the various texts we have been considering. See also the LanguageLog posting “Fossilized prejudices about ‘however’” at http://itre.cis.upenn.edu/~myl/languagelog/archives/001913.html.

    emma = nltk.Text(nltk.corpus.gutenberg.words('austen-emma.txt'))
    emma.concordance(
    "However")

    None
  • 相关阅读:
    oracle中varchar2和nvarchar2的区别
    Hbuilder与夜神连接
    BUILD 2015: Visual Studio对GitHub的支持
    Visual Studio Developer Assistant 3月新功能展示
    Visual Studio 发布新版API智能提示
    微软发布手机版 Sample Browser。7000多示例代码一手掌握
    微软 PowerShell Script Explorer 满血复活,正式发布
    IBM的“认知计算时代”
    最新 Windows 10 应用项目模板发布
    Windows 10四大版本区别详解:家庭版, 专业版, 企业版和教育版
  • 原文地址:https://www.cnblogs.com/yuxc/p/2129043.html
Copyright © 2011-2022 走看看