zoukankan      html  css  js  c++  java
  • 第一次个人编程作业

    第一次个人编程作业

    我的github

    计算模块接口的设计与实现过程

    具体算法流程图如下

    模块介绍

    基本思想:余弦相似度算法 参考博客
    one_hot用于onehot编码
        def one_hot(word_dict, keywords):  
             cut_code = [word_dict[word] for word in keywords]
            cut_code = [0]*len(word_dict)
            for word in keywords:
                cut_code[word_dict[word]] += 1
            return cut_code
    
    def extract_keyword用于提取关键词
        def extract_keyword(content):  
            re_exp = re.compile(r'(<style>.*?</style>)|(<[^>]+>)', re.S)
            content = re_exp.sub(' ', content)
            content = html.unescape(content)
            seg = [i for i in jieba.cut(content, cut_all=True) if i != '']
            # 提取关键词
            keywords = jieba.analyse.extract_tags("|".join(seg), topK=200, withWeight=False)
            return keywords
    

    计算模块接口部分的性能改进

    消耗最大的部分


    如图所示,main.py消耗最大

    性能分析图

    计算模块部分单元测试展示

    测试结果:


    基本都在0.8左右,上下浮动,较为符合预期。

    部分测试代码:
    if __name__ == '__main__':
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x1, open('F:/qq/sim_0.8/orig_0.8_add.txt', 'r',
                                                                              encoding="UTF-8") as y1:
            content_x1 = x1.read()
            content_y1 = y1.read()
            similarity = CosineSimilarity(content_x1, content_y1)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x2, open('F:/qq/sim_0.8/orig_0.8_del.txt', 'r',
                                                                              encoding="UTF-8") as y2:
            content_x2 = x2.read()
            content_y2 = y2.read()
            similarity = CosineSimilarity(content_x2, content_y2)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x3, open('F:/qq/sim_0.8/orig_0.8_dis_1.txt', 'r',
                                                                              encoding="UTF-8") as y3:
            content_x3 = x3.read()
            content_y3 = y3.read()
            similarity = CosineSimilarity(content_x3, content_y3)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x4, open('F:/qq/sim_0.8/orig_0.8_dis_3.txt', 'r',
                                                                                   encoding="UTF-8") as y4:
            content_x4 = x4.read()
            content_y4 = y4.read()
            similarity = CosineSimilarity(content_x4, content_y4)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
    
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x6, open('F:/qq/sim_0.8/orig_0.8_dis_7.txt', 'r',
                                                                              encoding="UTF-8") as y6:
            content_x6 = x6.read()
            content_y6 = y6.read()
            similarity = CosineSimilarity(content_x6, content_y6)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x7, open('F:/qq/sim_0.8/orig_0.8_dis_10.txt', 'r',
                                                                              encoding="UTF-8") as y7:
            content_x7 = x7.read()
            content_y7 = y7.read()
            similarity = CosineSimilarity(content_x7, content_y7)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x8, open('F:/qq/sim_0.8/orig_0.8_dis_15.txt', 'r',
                                                                              encoding="UTF-8") as y8:
            content_x8 = x8.read()
            content_y8 = y8.read()
            similarity = CosineSimilarity(content_x8, content_y8)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x9, open('F:/qq/sim_0.8/orig_0.8_mix.txt', 'r',
                                                                              encoding="UTF-8") as y9:
            content_x9 = x9.read()
            content_y9 = y9.read()
            similarity = CosineSimilarity(content_x9, content_y9)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x0, open('F:/qq/sim_0.8/orig_0.8_rep.txt', 'r',
                                                                              encoding="UTF-8") as y0:
            content_x0 = x0.read()
            content_y0 = y0.read()
            similarity = CosineSimilarity(content_x0, content_y0)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
    

    计算模块部分异常处理说明

    设计空白对比文档和完全一致的文档

      空白文档的结果:
    


    没有异常。

    完全一致文档的结果:


    没有异常。
    时间有限,暂时没有发现模块异常。

    PSP表格如下

    PSP2.1 Personal Software Process Stages 预估耗时(分钟) 实际耗时(分钟)
    Planning 计划 30 40
    Estimate 估计这个任务需要多少时间 20 20
    Development 开发 480 300
    Analysis 需求分析 (包括学习新技术) 300 200
    Design Spec 生成设计文档 60 30
    Design Review 设计复审 30 20
    Coding Standard 代码规范 (为目前的开发制定合适的规范) 30 30
    Design 具体设计 60 60
    Coding 具体编码 300 200
    Code Review 代码复审 30 30
    Test 测试(自我测试,修改代码,提交修改) 120 90
    Reporting 报告 60 80
    Test Repor 测试报告 30 20
    Size Measurement 计算工作量 30 15
    Postmortem & Process Improvement Plan 事后总结, 并提出过程改进计划 40 35
    Total 合计 1620 1140

    小总结

      第一次做这种作业,没有经验,难度有点高。只能在网上论坛上找找别人的东西,参考了很多才完成作业。自己还是有很多不足,希望以后再接再厉。
  • 相关阅读:
    RSA加密系统
    安装homebrew
    go helloworld
    下载文件checksum
    6月3日
    6月1日
    5月30日
    5月28日
    5月26日
    5月24日
  • 原文地址:https://www.cnblogs.com/gaoyichao/p/13682075.html
Copyright © 2011-2022 走看看