zoukankan      html  css  js  c++  java
  • 自然语言22_Wordnet with NLTK

     python机器学习-乳腺癌细胞挖掘(博主亲自录制视频)https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

     机器学习,统计项目合作QQ:231469242

    Wordnet with NLTK

    英语的同义词和反义词函数
    # -*- coding: utf-8 -*-
    """
    Spyder Editor
    
    英语的同义词和反义词函数
    """
    
    import nltk
    from nltk.corpus import wordnet
    syns=wordnet.synsets('program')
    '''
    syns
    Out[11]: 
    [Synset('plan.n.01'),
     Synset('program.n.02'),
     Synset('broadcast.n.02'),
     Synset('platform.n.02'),
     Synset('program.n.05'),
     Synset('course_of_study.n.01'),
     Synset('program.n.07'),
     Synset('program.n.08'),
     Synset('program.v.01'),
     Synset('program.v.02')]
    
    '''
    
    print(syns[0].name())
    
    '''
    plan.n.01
    '''    
        
    #just the word只显示文字,lemma要点
    print(syns[0].lemmas()[0].name())    
    '''
    plan
    '''    
    #单词句子使用
    print(syns[0].examples())
    '''
    ['they drew up a six-step plan', 'they discussed plans for a new bond issue']
    '''    
    
    '''
    synonyms=[]
    antonyms=[]
    
    list_good=wordnet.synsets("good")
    for syn in list_good:
        for l in syn.lemmas():
            #print('l.name()',l.name())
            synonyms.append(l.name())
            if l.antonyms():
                antonyms.append(l.antonyms()[0].name())
    
    print(set(synonyms))
    print(set(antonyms))
    '''
    
    word="good"
    #返回一个单词的同义词和反义词列表
    def Word_synonyms_and_antonyms(word):
        synonyms=[]
        antonyms=[]
        list_good=wordnet.synsets(word)
        for syn in list_good:
            for l in syn.lemmas():
                #print('l.name()',l.name())
                synonyms.append(l.name())
                if l.antonyms():
                    antonyms.append(l.antonyms()[0].name())
        return (set(synonyms),set(antonyms))
    
    #返回一个单词的同义词列表
    def Word_synonyms(word):
        list_synonyms_and_antonyms=Word_synonyms_and_antonyms(word)
        return list_synonyms_and_antonyms[0]
        
        
    #返回一个单词的反义词列表
    def Word_antonyms(word):
        list_synonyms_and_antonyms=Word_synonyms_and_antonyms(word)
        return list_synonyms_and_antonyms[1]    
    
    
    '''
    Word_synonyms("evil")
    Out[43]: 
    {'evil',
     'evilness',
     'immorality',
     'iniquity',
     'malefic',
     'malevolent',
     'malign',
     'vicious',
     'wickedness'}
    
    Word_antonyms('evil')
    Out[44]: {'good', 'goodness'}
    '''
    

     



    wordNet是一个英语词汇数据库,普林斯顿大学创建,是nltk语料库的一部分

    WordNet is a lexical database for the English language, which was created by Princeton, and is part of the NLTK corpus.

    You can use WordNet alongside the NLTK module to find the meanings of words, synonyms同义词, antonyms反义词, and more. Let's cover some examples.

    First, you're going to need to import wordnet:

    from nltk.corpus import wordnet

    Then, we're going to use the term "program" to find synsets 同义词集合like so:

    syns = wordnet.synsets("program")

    An example of a synset:

    print(syns[0].name())

    plan.n.01

    Just the word: 只显示单词

    print(syns[0].lemmas()[0].name())

    plan

    Definition of that first synset:

    print(syns[0].definition())

    a series of steps to be carried out or goals to be accomplished

    Examples of the word in use:

    print(syns[0].examples())

    ['they drew up a six-step plan', 'they discussed plans for a new bond issue']

    Next, how might we discern synonyms and antonyms to a word? The lemmas will be synonyms, and then you can use .antonyms to find the antonyms to the lemmas. As such, we can populate some lists like:

    synonyms = []
    antonyms = []
    
    for syn in wordnet.synsets("good"):
        for l in syn.lemmas():
            synonyms.append(l.name())
            if l.antonyms():
                antonyms.append(l.antonyms()[0].name())
    
    print(set(synonyms))
    print(set(antonyms))
    {'beneficial', 'just', 'upright', 'thoroughly', 'in_force', 'well', 'skilful', 'skillful', 'sound', 'unspoiled', 'expert', 'proficient', 'in_effect', 'honorable', 'adept', 'secure', 'commodity', 'estimable', 'soundly', 'right', 'respectable', 'good', 'serious', 'ripe', 'salutary', 'dear', 'practiced', 'goodness', 'safe', 'effective', 'unspoilt', 'dependable', 'undecomposed', 'honest', 'full', 'near', 'trade_good'} {'evil', 'evilness', 'bad', 'badness', 'ill'}

    As you can see, we got many more synonyms than antonyms, since we just looked up the antonym for the first lemma, but you could easily balance this buy also doing the exact same process for the term "bad."

    比较单词近似度

    Next, we can also easily use WordNet to compare the similarity of two words and their tenses, by incorporating the Wu and Palmer method for semantic related-ness.

    Let's compare the noun of "ship" and "boat:"

    w1 = wordnet.synset('ship.n.01')
    w2 = wordnet.synset('boat.n.01')
    print(w1.wup_similarity(w2))

    0.9090909090909091

    w1 = wordnet.synset('ship.n.01')
    w2 = wordnet.synset('car.n.01')
    print(w1.wup_similarity(w2))

    0.6956521739130435

    w1 = wordnet.synset('ship.n.01')
    w2 = wordnet.synset('cat.n.01')
    print(w1.wup_similarity(w2))

    0.38095238095238093

    Next, we're going to pick things up a bit and begin to cover the topic of Text Classification.

  • 相关阅读:
    数据科学工作中存在的7大问题与解决方案
    搞定SEO,看这一篇就够了
    李宏毅老师机器学习课程笔记_ML Lecture 3-1: Gradient Descent
    李宏毅老师机器学习课程笔记_ML Lecture 2: Where does the error come from?
    李宏毅老师机器学习课程笔记_ML Lecture 1: ML Lecture 1: Regression
    李宏毅老师机器学习课程笔记_ML Lecture 1: 回归案例研究
    python爬取中国大学排名
    爬虫实战_爬取静态单张图片
    李宏毅老师机器学习课程笔记_ML Lecture 0-2: Why we need to learn machine learning?
    多线程基础(一)
  • 原文地址:https://www.cnblogs.com/webRobot/p/6080208.html
Copyright © 2011-2022 走看看