zoukankan      html  css  js  c++  java
  • 第一次个人编程作业

    第一次个人编程作业

    我的github

    计算模块接口的设计与实现过程

    具体算法流程图如下

    模块介绍

    基本思想:余弦相似度算法 参考博客
    one_hot用于onehot编码
        def one_hot(word_dict, keywords):  
             cut_code = [word_dict[word] for word in keywords]
            cut_code = [0]*len(word_dict)
            for word in keywords:
                cut_code[word_dict[word]] += 1
            return cut_code
    
    def extract_keyword用于提取关键词
        def extract_keyword(content):  
            re_exp = re.compile(r'(<style>.*?</style>)|(<[^>]+>)', re.S)
            content = re_exp.sub(' ', content)
            content = html.unescape(content)
            seg = [i for i in jieba.cut(content, cut_all=True) if i != '']
            # 提取关键词
            keywords = jieba.analyse.extract_tags("|".join(seg), topK=200, withWeight=False)
            return keywords
    

    计算模块接口部分的性能改进

    消耗最大的部分


    如图所示,main.py消耗最大

    性能分析图

    计算模块部分单元测试展示

    测试结果:


    基本都在0.8左右,上下浮动,较为符合预期。

    部分测试代码:
    if __name__ == '__main__':
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x1, open('F:/qq/sim_0.8/orig_0.8_add.txt', 'r',
                                                                              encoding="UTF-8") as y1:
            content_x1 = x1.read()
            content_y1 = y1.read()
            similarity = CosineSimilarity(content_x1, content_y1)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x2, open('F:/qq/sim_0.8/orig_0.8_del.txt', 'r',
                                                                              encoding="UTF-8") as y2:
            content_x2 = x2.read()
            content_y2 = y2.read()
            similarity = CosineSimilarity(content_x2, content_y2)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x3, open('F:/qq/sim_0.8/orig_0.8_dis_1.txt', 'r',
                                                                              encoding="UTF-8") as y3:
            content_x3 = x3.read()
            content_y3 = y3.read()
            similarity = CosineSimilarity(content_x3, content_y3)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x4, open('F:/qq/sim_0.8/orig_0.8_dis_3.txt', 'r',
                                                                                   encoding="UTF-8") as y4:
            content_x4 = x4.read()
            content_y4 = y4.read()
            similarity = CosineSimilarity(content_x4, content_y4)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
    
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x6, open('F:/qq/sim_0.8/orig_0.8_dis_7.txt', 'r',
                                                                              encoding="UTF-8") as y6:
            content_x6 = x6.read()
            content_y6 = y6.read()
            similarity = CosineSimilarity(content_x6, content_y6)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x7, open('F:/qq/sim_0.8/orig_0.8_dis_10.txt', 'r',
                                                                              encoding="UTF-8") as y7:
            content_x7 = x7.read()
            content_y7 = y7.read()
            similarity = CosineSimilarity(content_x7, content_y7)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x8, open('F:/qq/sim_0.8/orig_0.8_dis_15.txt', 'r',
                                                                              encoding="UTF-8") as y8:
            content_x8 = x8.read()
            content_y8 = y8.read()
            similarity = CosineSimilarity(content_x8, content_y8)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x9, open('F:/qq/sim_0.8/orig_0.8_mix.txt', 'r',
                                                                              encoding="UTF-8") as y9:
            content_x9 = x9.read()
            content_y9 = y9.read()
            similarity = CosineSimilarity(content_x9, content_y9)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
        with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x0, open('F:/qq/sim_0.8/orig_0.8_rep.txt', 'r',
                                                                              encoding="UTF-8") as y0:
            content_x0 = x0.read()
            content_y0 = y0.read()
            similarity = CosineSimilarity(content_x0, content_y0)
            similarity = similarity.main()
            print('相似度: %.2f%%
    ' % (similarity * 100))
    

    计算模块部分异常处理说明

    设计空白对比文档和完全一致的文档

      空白文档的结果:
    


    没有异常。

    完全一致文档的结果:


    没有异常。
    时间有限,暂时没有发现模块异常。

    PSP表格如下

    PSP2.1 Personal Software Process Stages 预估耗时(分钟) 实际耗时(分钟)
    Planning 计划 30 40
    Estimate 估计这个任务需要多少时间 20 20
    Development 开发 480 300
    Analysis 需求分析 (包括学习新技术) 300 200
    Design Spec 生成设计文档 60 30
    Design Review 设计复审 30 20
    Coding Standard 代码规范 (为目前的开发制定合适的规范) 30 30
    Design 具体设计 60 60
    Coding 具体编码 300 200
    Code Review 代码复审 30 30
    Test 测试(自我测试,修改代码,提交修改) 120 90
    Reporting 报告 60 80
    Test Repor 测试报告 30 20
    Size Measurement 计算工作量 30 15
    Postmortem & Process Improvement Plan 事后总结, 并提出过程改进计划 40 35
    Total 合计 1620 1140

    小总结

      第一次做这种作业,没有经验,难度有点高。只能在网上论坛上找找别人的东西,参考了很多才完成作业。自己还是有很多不足,希望以后再接再厉。
  • 相关阅读:
    由js apply与call方法想到的js数据类型(原始类型和引用类型)
    ajax跨域
    悟透JavaScript(理解JS面向对象的好文章)
    Item 18: Understand the Difference between Function, Method, and Constructor Calls
    torch:hardsigmoid
    torch:CrossEntropy是个构造器,所以loss = torch.nn.CrossEntropyLoss()(output, target)这么写就对了
    检查运行文件名称与模块名称是否重复。如果重复,需要将文件名称重新命名。
    a = np.r_[1:4,0,4] 这语法有点神奇
    lstm官方demo,有好几种输出的shape
    np.c_ 这个玩意儿不是个方法,而是返回一个类然后直接和后面的ndarry相乘。。。相当于concatenate(【】,dim)
  • 原文地址:https://www.cnblogs.com/gaoyichao/p/13682075.html
Copyright © 2011-2022 走看看