zoukankan      html  css  js  c++  java
  • 综合练习:词频统计

    下载一首英文的歌词或文章

    将所有,.?!’:等分隔符全部替换为空格

    将所有大写转换为小写

    生成单词列表

    f=open('news.txt','r')
    news=f.read()
    f.close()
    sep=''',.'!"?:'''
    for c in sep:
       news=news.replace(c,' ')
       wordList=news.lower().split()
    
    for w in wordList:
          print(w)

    f=open('news.txt','r')
    news=f.read()
    f.close()
    sep=''',.'!"?:'''
    for c in sep:
       news=news.replace(c,' ')
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])

    f=open('news.txt','r')
    news=f.read()
    f.close()
    sep=''',.'!"?:'''
    exclude={'be','i','so','over','hearing'}
    for c in sep:
       news=news.replace(c,' ')
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])

    f=open('news.txt','r')
    news=f.read()
    f.close()
    sep=''',.'!"?:'''
    exclude={'be','i','so','over','hearing'}
    for c in sep:
    news=news.replace(c,' ')
    wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
    wordDict[w]=wordList.count(w)

    dic=sorted(wordDict.items(),key=lambda d:d[1],reverse=True)
    print(dic)
    for i in range(20):
    print(dic[i])

    f=open('news.txt','r')
    text=f.read()
    f.close()
    print(text)

    
    
    
    
    
    
    
    
    
    
    
    
  • 相关阅读:
    mysql 设置无密码登陆
    phpstudy mysql 升级5.7.18
    php 统计二维数组中某个相等值的总个数,并且组合成一个新的数组 转发
    centos 安装 composer
    PHP不定维数组去除空值
    jQuery中$.ajax()详解(转)
    JSON详解(转发自博客园)
    详解CMS垃圾回收机制
    内存管理
    什么是同源策略
  • 原文地址:https://www.cnblogs.com/dean666/p/8653994.html
Copyright © 2011-2022 走看看