zoukankan      html  css  js  c++  java
  • 综合练习:词频统计

    1.英文词频统

    代码如下:

    f = open('lyric.txt','r')
    lyric = f.read()
    f.close()
    
    
    punctuation = ''',.?/:;'"'''
    a = {'in','on','with','by','for','at','about','under','of','i','a','is','its','so','and','dont','it','to','ill','the'}
    for i in punctuation:
        lyric = lyric.replace(i,'')
    result = lyric.lower().lstrip().rstrip()
    tempwords = result.split()
    print(tempwords)
    count = {}
    words = list(set(tempwords)-a)
    
    print(words)
    print(result)
    
    for i in range(0,len(words)):
        count[words[i]]=result.count(str(words[i]))
        print('单词  '+ words[i] + ' 的出现次数为:'+str(result.count(words[i])))
    
    for i in count:
        print(i)
        print(count[i])
    
    countList = list(count.items())
    countList.sort(key=lambda x:x[1],reverse=True)
    print(countList)
    
    f = open('lyricCount.txt','a')
    for i in range(20):
        f.write(countList[i][0]+':'+str(countList[i][1])+'
    ')
    f.close()
    

      运行结果图:

    2.中文词频统计

    代码如下

    import jieba
    
    
    f = open('sanguoyanyi.txt', 'r',encoding='utf-8')
    text = f.read()
    f.close()
    
    jieba.add_word('曹操')
    jieba.add_word('诸葛亮')
    jieba.add_word('孔明')
    punctuation = ''',。‘’“”:;()!?、 '''
    a = {'的','
    ','u3000','曰','之','不','人','军','操','一','将',
         '大','马','来','德','有','于','下','兵','此',
         '玄','公','见','为','何','中','而','可','吾',
         '出','也','以','与','上','后','今','其','去',
         '日','明','言'}
    for i in punctuation:
        text = text.replace(i, '')
    print(list(jieba.cut(text)))
    tempwords = list(jieba.cut(text))
    print(tempwords)
    count = {}
    words = list(set(tempwords) - a)
    print(words)
    
    
    for i in range(0, len(words)):
        count[words[i]] = text.count(str(words[i]))
    
    
    countList = list(count.items())
    countList.sort(key=lambda x: x[1], reverse=True)
    print(countList)
    
    f = open('zzzCount.txt', 'a')
    for i in range(20):
        f.write(countList[i][0] + ':' + str(countList[i][1]) + '
    ')
    f.close()
    

     运行结果图:

  • 相关阅读:
    jQuery遍历节点方法汇总
    python_30期自动化【艺龙酒店】
    python_30期【条件判断语句】
    python_30期【os模块 path处理路径】
    python_30期自动化【类的封装】
    python_30期【函数里面的位置参数/默认参数】
    python_30期【while循环】
    python_30期【http_requsts】
    python_30期【类方法之间的调用 return】
    python_30期【实例函数 类里面的函数】
  • 原文地址:https://www.cnblogs.com/zzrf/p/8658484.html
Copyright © 2011-2022 走看看