zoukankan      html  css  js  c++  java
  • 综合练习:英文词频统计

    1. 词频统计预处理
    2. 下载一首英文的歌词或文章
    3. 将所有,.?!’:等分隔符全部替换为空格
    4. 将所有大写转换为小写
    5. 生成单词列表
    6. 生成词频统计
    7. 排序
    8. 排除语法型词汇,代词、冠词、连词
    9. 输出词频最大TOP10
      str='''Earthquake early warning detection is more effective for minor quakes than major ones.
      This is according to a new study from the United States Geological Survey.
      Seismologists modelled ground shaking along California's San Andreas Fault, where an earthquake of magnitude 6.5 or more is expected within 30 years.
      They found that warning time could be increased for residents if they were willing to tolerate a number of "false alarms" for smaller events.
      This would mean issuing alerts early in an earthquake's lifespan, before its full magnitude is determined. Those living far from the epicentre would occasionally receive warnings for ground shaking they could not feel.
      "We can get [greater] warning times for weak ground motion levels, but we can't get long warning times for strong shaking," Sarah Minson, lead author of the study, told BBC News.
      "Alternatively, we could warn you every time there was an earthquake that might produce weak ground shaking at your location... A lot of baby earthquakes don't grow up to become big earthquakes," she added.
      The physics of earthquakes is one of the reasons why a single, universal warning system hasn't been rolled out across all quake prone countries.
      California and Japan have populations living directly alongside fault lines, and cannot waste precious seconds before warning their citizens.
      In both countries, the p-waves and some very rapid algorithms determine the potential magnitude and dispatch an alert.
      But in Mexico, the capital city is about 300km from the nearest tectonic plate boundary.
      This allows geologists to use a system that can take some more time to issue a warning. They wait to detect the s-waves.
      Sirens blare in the streets of Mexico City whenever ground shaking above M5 is detected.'''
      
      # 将分隔符替换为空格
      symbol=[".",",","'",'"']
      for i in range(len(symbol)):
          str=str.replace(symbol[i]," ")
      
      # 将所有大写转换为小写
      str=str.lower()
      
      # 生成单词列表
      str=str.split()
      
      d=dict(zip())
      #生成词频统计
      for key in str:
          d[key]=str.count(key)
      
      #排除语法型词汇,代词、冠词、连词
      str1=['a','an','more','for','is','of','to','from','or','that','if','the','were','in','s','not','can','get','could','might','up','and','this','t']
      for i in str1:
          del d[i]
      
      # 排序
      d=sorted(d.items(),key=lambda e:e[1],reverse=True)
      
      # 输出词频最大TOP10
      for i in range(10):
          print(d[i])
  • 相关阅读:
    工具.MySQL
    SqlServer.日期时间格式化输出(资料)
    SqlServer2012.安装
    SQL.【转】获取存储过程返回值的几种方式
    SQL.【转】SqlServer如何获取存储过程的返回值
    SQL.@,@@、#,##
    Oracle10g.CentOS6安装_遇到的问题(02)
    jQuery FileUpload 插件[转]
    EF6+Oracle12c+DBFirst+VS2015:EF6.0添加实体模型闪退问题解决
    IIS
  • 原文地址:https://www.cnblogs.com/ffde/p/8652553.html
Copyright © 2011-2022 走看看