zoukankan      html  css  js  c++  java
  • 复合数据类型,英文词频统计

    本次作业来源https://edu.cnblogs.com/campus/gzcc/GZCC-16SE1/homework/2753

    1.列表,元组,字典,集合分别如何增删改查及遍历。

    列表操作:

    list1 = ['speakingSirqin', 'softqin', 1999, 2000]
    list2 = [1, 2, 3, 4, 5 ]
    list3 = ["a", "b", "c", "d"]
    #输出
    print(list1)
    print(list2)
    print(list3)
    print(list2[0:2])#下标0开始,下标2结束,但不包含下标2所对应的上键(元素)'
    #增加
    list1.insert(2,'lili')
    list1.insert(5,'qin')
    print(list1)
    #删除
    list2.pop(0)
    print(list2)
    #修改
    list3[1]='A'
    print(list3)
    #查找
    index=list3.index('c')
    print("找到c在列表3的下标为:",index)

    2.元组

    tup1 = ('Google', 'Runoob', 1997, 2000)
    tup2 = (1, 2, 3, 4, 5, 8)
    tup3 = "a", "b", "c", "d"
    #输出
    print(tup1)
    print(tup2)
    print(tup3)
    print(tup1[0])
    print(tup2[1:3])
    #连接元组
    tup4=tup1+tup2+tup3
    print(tup4)

    3.词频统计

    import pandas as pd
    file=open('artical.txt',encoding='utf-8')
    text=file.read()
    text=text.lower()
    for i in str('''?!",.'''):
    text=text.replace(i,'')
    text=text.split()

    # 统计单词数量
    exclude = ['a', 'the', 'and', 'if', 'you', 'in', 'but', 'not', 'it', ' s', 'if', "i"]
    dict={}
    for i in text:
    if i not in exclude:
    if i not in dict:
    dict[i]=text.count(i)
    print(dict)

    # 排序单词数量
    word=list(dict.items())
    word.sort(key=lambda x: x[1], reverse=True)
    print(word)

    # 输出前二十位的单词
    for i in range(20):
    print(word[i])

    pd.DataFrame(data=word).to_csv('b.csv',encoding='utf-8')

     输出TOP(20)

    ('no', 44)
    ("there's", 12)
    ('get', 11)
    ('let', 9)
    ('away', 9)
    ('way', 9)
    ('for', 9)
    ('broken', 9)
    ('to', 8)
    ('be', 7)
    ('that', 7)
    ("don't", 7)
    ("i'm", 7)
    ('hope', 7)
    ('girl', 6)
    ('wanna', 6)
    ('cause', 5)
    ('one', 5)
    ("can't", 5)
    ('gotta', 5)

              可视化:词云

     排序好的单词列表word保存成csv文件

    import pandas as pd
    pd.DataFrame(data=word).to_csv('big.csv',encoding='utf-8')

    线上工具生成词云:
    https://wordart.com/create
  • 相关阅读:
    《构建之法》阅读笔记07
    学习进度条——第六周
    《构建之法》阅读笔记06
    团队开发
    《构建之法》阅读笔记05
    数组3——返回二维数组中最大联通子数组的和
    学习进度条——第五周
    坯布检验管控系统
    DateTime日期格式转换,不受系统格式的影响
    多层下firebird自增长字段的处理
  • 原文地址:https://www.cnblogs.com/gzcchyf/p/10552067.html
Copyright © 2011-2022 走看看