zoukankan      html  css  js  c++  java
  • 20180918博客作业

    作业要求参见 https://edu.cnblogs.com/campus/nenu/2018fall/homework/2126

    本次作业代码地址https://git.coding.net/zsy1996/text.git代码为wf.py文件

    1.根据ASCII码表来去掉特殊字符,比如,。!“”等

       定义词频字典

       循环判定每个单词的频率

    根据字典的value每个单词词频排序

    def getFrequency(testtext):
        testtext = re.sub('[^a-zA-Z0-9n]', ' ', testtext) #根据ASCII码表去掉特殊字符,比如,。!“”等
        frequency = {} #定义词频字典
        for word in testtext.split(): #循环判定每个单词频率
            if word in frequency:
                frequency[word] += 1
            else:
                frequency[word] = 1
        frequency = sorted(frequency.items(),key = lambda x:x[1],reverse = True)

     2.全文共有___个不重复的单词,以及每个单词出现的次数

    根据文本的词汇量总数来判定显示多少个数据

     print('全文共有',len(frequency),'个不重复的单词')
    
        if (len(frequency) > 100):
            for x in range(0,10):
                a = frequency[x][0]
                b = frequency[x][1]       
                print('单词',a,'出现的次数为',b)
        else:
            for x in range(0,len(frequency)):
                a = frequency[x][0]
                b = frequency[x][1]       
                print('单词',a,'出现的次数为',b)

    3.输出单一文本

    def inputfc(inputtxt):      
        with open(inputtxt,encoding = 'UTF-8') as wf: 
            getFrequency(wf.read())     

     4.输入文件夹统计文件夹内文本函数

    def inputfilefc(self):
        name_delete = '([sS]*?).txt'
        txtlist = []
        txtlist = os.listdir(inputfile)
        for i in range(0,len(txtlist)):
            a = re.findall(name_delete, txtlist[i])
            print(a)
            inputfc(txtlist[i])

    功能实现

    1. 测试文本,无重难点

     

    代码

    for word in testtext.split(): #循环判定每个单词频率
            if word in frequency:
                frequency[word] += 1
            else:
                frequency[word] = 1
        frequency = sorted(frequency.items(),key = lambda x:x[1],reverse = True) #根据字典的value(每个单词词频)排序
        print('') #为了好看
        print('全文共有',len(frequency),'个不重复的单词')

     2.重点是文件名后面一定要输入.txt

       def(定义函数)inputfc输入文本到python

     

    输入文本地址

    inputtxt = input() #输入文本地址
    inputfc(inputtxt)

    def(定义函数)inputfc输入文本到python

    def inputfc(inputtxt):      
        with open(inputtxt,encoding = 'UTF-8') as wf: 
            getFrequency(wf.read())     

     

    3.输入文件夹,对文件夹内的所有文本进行词频统计

     

     

     

    重点:导入文本

    def inputfilefc(self)

    txtlist = os.listdir(inputfile)导入文件夹

    def inputfilefc(self):
        name_delete = '([sS]*?).txt'
        txtlist = []
        txtlist = os.listdir(inputfile)
        for i in range(0,len(txtlist)):
            a = re.findall(name_delete, txtlist[i])
            print(a)
            inputfc(txtlist[i])

    统计结果

    4.

    重点:command = input()键盘输入指令

    将数据导入CSV文档中

    command = 1
    while command:
        print("输入1导入文档,输入其他退出程序,回车键确认!")
        command = input()
        if command == '1':
            print("请输入导入的文档地址,回车键确认!")
            inputcommand = input()
            with open(inputcommand,encoding = 'UTF-8') as wf: 
                outtxt = wf.read()
                with open("1.csv","w", newline='') as csvfile: 
                    writer = csv.writer(csvfile)
                    writer.writerow(["单词","词频"])
                    outtxt = re.sub('[^a-zA-Z0-9n]', ' ', outtxt) #根据ASCII码表去掉特殊字符,比如,。!“”等
                    frequency = {} #定义词频字典   

     

    PSP阶段

     

  • 相关阅读:
    JFreeChart生成图片
    itext生成Word
    itext生成PDF
    物理模型name与comment互相转化
    表单序列化为Json(只限input)
    c#多线程同步之EventWaitHandle的应用
    C#多线程之异步编程
    Java环境变量设置
    sharepoint 2013实践
    WPF研究之道——数据驱动UI
  • 原文地址:https://www.cnblogs.com/zhushenying123/p/9695690.html
Copyright © 2011-2022 走看看