zoukankan      html  css  js  c++  java
  • 第一次个人作业

    第一次个人作业

    源码地址


    题目地址


    试验要求

    1. 对源文件(*.txt,*.cpp,*.h,*.cs,*.html,*.js,*.java,*.py,*.php等,文件夹内的所有文件)统计字符数、单词数、行数、词频,统计结果以指定格式输出到默认文件中,以及其他扩展功能,并能够快速地处理多个文件。

    2. 使用性能测试工具进行分析,找到性能的瓶颈并改进

    3. 对代码进行质量分析,消除所有警告

    4. 设计10个测试样例用于测试,确保程序正常运行(例如:空文件,只包含一个词的文件,只有一行的文件,典型文件等等)

    5. 使用Github进行代码管理

    6. 撰写博客 

    前期准备

    需求分析

    本次作业要求对任意文件或者特定目录下所有文件中的字符、单词、词组做相应的统计,并将统计结果以文件的形式保存。其中主要有以下需要特别注意的:

    1. 要求较为复杂,细节较多,要考虑许多特殊情况。其中尤其要注意实际输出的单词或词组必须是出现过的字典顺序最小的单词,而不能是全大写或者全小写,这也是我觉得甲方最不人道的一件事,加上这个功能会导致性能慢了将近一倍,而实际输出结果大部分都是大写,甚至不存在大小写混杂的情况,真的是多余。
    2. 代码质量要求高,除了性能上要越快越好以外,要做到没有“警告”。
    3. 有跨平台的需求,对代码的可移植性提出了要求。
    4. 需要学习
    5. 博客的撰写、作业进程的记录与分析等。

    代码规范

    Visual Studio已经有对代码风格进行自动规范的功能,在此基础上根据我个人的习惯,结合Menci大神的总结,暂定了以下规范:

    • 所有的 #include 指令必须放置于整个程序开头。
    • main 函数应该放置于整个程序末尾。
    • #include 中,C 标准库头文件应该放置于 C++ 标准库头文件前,其它头文件(如果有)应放置于最后。
    • 对于每个代码块,使用 4 空格或等长的 Tab 缩进。
    • 花括号必须遵循「花括号不换行」,且左花括号的左边必须有且仅有一个空格。所有右花括号必须与上一级代码块的缩进相同。

    • 多个意义独立的代码块之间应该用空行隔开。

    • 右花括号前不应该有多余的空行。

    • 不应该有两个连续的空行。

    • 非空行尾不应该有多余的空格。

    • 所有的 #include 指令之后必须有一个空行。

    • 如果有 using namespace std;,则必须紧跟在 #include 后的空行后,之后必须一个空行。

    • main 函数的返回值类型必须是 int可以省略 return 0;

    • 空函数体可以使用 {}

    • 传参时,应该根据实际需要使用「引用」、「const 引用」和「值传递」。
    • 应该尽量少使用全局变量。

    • 局部变量必须在用时定义,变量名不应该与上一个块中的变量重名,可以与全局变量重名。

    • 逗号 , 与 for 中的分号 ; 后面必须有一个空格,前面不能有空格。
    • 双目运算符、三目运算符的两侧必须有一个空格。单目运算符的两侧不能有空格。冒号的两侧必须有一个空格。

    PSP表格----随着进度会随时更新

    PSP2.1

    任务内容

    计划完成需要的时间(min)

    实际完成需要的时间(min)

    Planning

    计划

    30

    20

     Estimate

    估计这个任务需要多少时间,并规划大致工作步骤

    30

    20

    Development

    开发

    510

    1780

    Analysis

    需求分析 (包括学习新技术)

    30

    30

    Design Spec

    生成设计文档

    30

    30

    Design Review

    设计复审 (和同事审核设计文档)

    10

    15

    Coding Standard

    代码规范 (为目前的开发制定合适的规范)

    20

    35

    Design

    具体设计

    40

    50

    Coding

    具体编码

    300

    900

    Code Review

    代码复审

    40

    320

    est

    测试(自我测试,修改代码,提交修改)

    40

    400

    Reporting

    报告

    240

     330

    Test Report

    测试报告

    60

    270

    Size Measurement

    计算工作量

    30

    20

    Postmortem & Process

    Improvement Plan

    事后总结 ,并提出过程改进计划

    150

    40

    Summary

    合计

    780

    代码设计

     1.解决文件遍历问题

    遍历得到文件夹下的所有文件名,并存在<vector>string files中,代码如下

    /*
    the funtion to get the name of all the files and subfiles under the homepath
    */
    
    #include"functions.h"
    #include<io.h>
    #include<iostream>
    using namespace std;
    void getallfiles(string homepath, vector<string>& files)
    {
        //file handle
        long hFile = 0;
        //file info
        struct _finddata_t fileinfo;
        string p;
        if ((hFile = (long)_findfirst(p.assign(homepath).append("\*").c_str(), &fileinfo)) != -1)
        {
            //find all the file under the homepath, and assert this is not empty
            do
            {
                //if it is a dirctory, iterate
                //if it is a file, push to the files(list)
                if ((fileinfo.attrib & _A_SUBDIR))
                {
                    if (strcmp(fileinfo.name, ".") != 0 && strcmp(fileinfo.name, "..") != 0)
                        getallfiles(p.assign(homepath).append("\").append(fileinfo.name), files);
                }
                else
                {
                    files.push_back(p.assign(homepath).append("\").append(fileinfo.name));
                }
            } while(_findnext(hFile, &fileinfo) == 0);
            _findclose(hFile);
        }
    
    }
    View Code

    linux下的版本

    /*
    the funtion to get the name of all the files and subfiles under the homepath
    */
    #include"functions.h"
    //#include<io.h>
    #include<iostream>
    #include<dirent.h>
    #include<iomanip>
    #include<string>
    #include<string.h>
    using namespace std;
    void getallfiles(string homepath, vector<string>& files)
    {
        DIR* dir = opendir(homepath.c_str());
        struct dirent* ptr;
        string absolutePath;
        string subDirect;
        while((ptr = readdir(dir)) != NULL)
        {
            if(strcmp(ptr->d_name,".") != 0 && strcmp(ptr->d_name,"..") != 0)
            {
                subDirect.assign(homepath);
                if(ptr->d_type == 4)
                {
                    subDirect += "/";
                    subDirect += ptr->d_name;
                    getallfiles(subDirect, files);
                    //files.push_back(subDirect);
                }
                else if(ptr->d_type == 8)
                {
                    string absolutePath = subDirect + "/";
                    absolutePath += ptr->d_name;
                    //cout << absolutePath << endl;
                    files.push_back(absolutePath);
                }
            }
        }
    }
    View Code

     2.i/o 读取文件内容,采用的方法是,通过上面函数得到的文件名对文件读取,并且一次性将文件以二进制读入到内存,用char* buffer指向,然后之后对buffer进行操作,并在结束读取下一个文件时,释放buffer内存,这样做可以减少i/o次数,节省代码运行的时间,代码如下

    /*
    this function transfer the content of a file to RAM at one time, and generate a head pointer buffer
    */
    #include"functions.h"
    #include<stdio.h>
    using namespace std;
    void getcontent(string path, char *& buffer, long &size)
    {
        FILE* fp;
        size_t result;
        buffer = NULL;
        //to open the file in the path
        fp = fopen(path.c_str(), "rb");
        fseek(fp, 0, SEEK_END);
        size = ftell(fp);
        rewind(fp);
        //to get the same size space in the RAM
        buffer = new char[size];
        //to copy the file
        result = fread(buffer, 1, size, fp);
        fclose(fp);
    
    }
    View Code

       3.数据结构是一个HashTable的对象

    变量

      char WordTable[L1][200]用来存储检测到的字符,存在里面的形式是当前出现同类词里面,按ASCII最小的词

      int WordFrequency[L1]用来存储WordTable里面对应index单词出现的频率

      int Formar[L2] 用来存储词组Phrase的第一个单词在WordTable里面的地址

      int Latter[L2]用来存储Phrase的第二个单词在WordTable里面的地址

      int PhraseFrequency[L2]用来存储Formar Latter[L2]中对应地址词组的频率

    方法

      HashTable(int, int)//构造函数,可以初始化L1,L2长度

      int append(char* sample, int formar);//用来将一个已检测为单词的词加入到表中,其中还要输入前一个单词的在WordTable的index:formar, 同时方法返回该单词在WordTable中的index以便下一个单词加入时初入参数.

      get10words();;//输出10个频率最高的单词,里面采用普通的排序方法

      get10Phrase();//输出10个频率最高的单词,

      内部方法:

        int hash(char*);//输入单词,初步得到在WordTable里面的index,冲突在append方法中解决,在append方法中被调用

        int hash2(int formar, int latter);//输入词组两词在WordTable中的index,初步得到词组在Formar[] Latter[] PhraseFrequency[]里面的index, 冲突在append方法中解决, 在append方法中被调用.

    代码如下:

    #ifndef _DATASTRUCTURE_H
    #define _DATASTRUCTURE_H
    #define MAXINITLENGTH 3010349
    #define PHRASELENGTH 16785407
    //#define EMERGENCYLENGTH 10
    //TODO: to think about whether add the safe operation
    #include<iostream>
    #include<string>
    #include<math.h>
    #include<string.h>
    using namespace std;
    #define LETTERH 64
    #define LETTERL 96
    #define NUMBER 48
    #define DELIMITER 0
    int WhatKindChar(char c)
    {
        if( ((int)c>=65) && ((int)c<=90) )
            return LETTERH;
        if( ((int)c>=97) && ((int)c<=122) )
            return LETTERL;
        if( ((int)c>=48) && ((int)c<=57) )
            return NUMBER;
        else
            return DELIMITER;
    }
    
    
    class HashTable
    {
    public:
        int* WordFrequency;
        char(*WordTable)[200];
        int* Formar;
        int* Latter;
        int* PhraseFrequency;
        HashTable()
        {
            WordFrequency = new int[MAXINITLENGTH];
            WordTable = new char[MAXINITLENGTH][200];
            Formar = new int[PHRASELENGTH];
            Latter = new int[PHRASELENGTH];
            PhraseFrequency = new int[PHRASELENGTH];
            for (int i = 0; i < MAXINITLENGTH; i++)
            {
                WordFrequency[i] = 0;
            }
            for (int i = 0; i < PHRASELENGTH; i++)
            {
                PhraseFrequency[i] = 0;
            }
        }
    
        int getwordnumber()
        {
            int result = 0;
            for (int i = 0; (i < MAXINITLENGTH); i++)
            {
                if (WordFrequency[i] != 0)
                    result++;
            }
            return result;
        }
        long getphrasenumber()
        {
            long result = 0;
            for (int i = 0; i < PHRASELENGTH; i++)
            {
                if (PhraseFrequency[i] != 0)
                    result++;
            }
            return result;
        }
        void get10words()
        {
            int index[10] = { 0 };
            for (int m = 0; m < 10; m++)
            {
                int flag = 0;
                for (int i = 0; i < MAXINITLENGTH; i++)
                {
                    for (int c = 0; c < m; c++)
                    {
                        if (i == index[c])
                            i++;
                    }
                    if (WordFrequency[i] > WordFrequency[flag])
                    {
                        flag = i;
                    }
                }
                index[m] = flag;
                cout << WordTable[flag] << ": " << WordFrequency[flag] << endl;
            }
        }
        void get10phrases()
        {
            int index[10] = { 0 };
            for (int m = 0; m < 10; m++)
            {
                int flag = 0;
                for (int i = 0; i < PHRASELENGTH; i++)
                {
                    for (int c = 0; c < m; c++)
                    {
                        if (i == index[c])
                            i++;
                    }
                    if (PhraseFrequency[i] > PhraseFrequency[flag])
                    {
                        flag = i;
                    }
                }
                index[m] = flag;
                cout << WordTable[Formar[flag]] << " " << WordTable[Latter[flag]] << ": " << PhraseFrequency[flag] << endl;
            }
        }
        int append(char* sample, int formarindex) {
            char tochangestring[200];//wordcompare doesn't changed, use string like before
            unsigned int index = hash(sample);//TODO
            int c;//speciallly for the char*
            for (int i = 1; WordFrequency[index] != 0 && (wordcompare(WordTable[index], sample, tochangestring)) == false; i++)
            {
                index = (index + i * i) % MAXINITLENGTH;
            }
            if (WordFrequency[index] == 0)
            {
                WordFrequency[index]++;
                //wordcompare(sample, sample, tochangestring);
                for (c = 0; sample[c] != ''; c++)
                {
                    WordTable[index][c] = sample[c];
                }
                WordTable[index][c] = '';
            }
            else if (strcmp(tochangestring, WordTable[index]) == 0)
            {
                WordFrequency[index]++;
            }
            else if (strcmp(tochangestring, WordTable[index]) != 0)
            {
                WordFrequency[index]++;
                //tochangestring = wordcompare(sample, sample);
                for (c = 0; tochangestring[c] != ''; c++)
                {
                    WordTable[index][c] = tochangestring[c];
                }
                WordTable[index][c] = '';
            }
            if (formarindex == -1)
                return index;
            //the append the phrase
            unsigned int phraseindex = hash2(formarindex, index);//hash again
            for (int i = 0; ((PhraseFrequency[phraseindex] != 0) && ((formarindex != Formar[phraseindex]) || (index != Latter[phraseindex]))); i++)
            {
                phraseindex = (phraseindex + i * i) % PHRASELENGTH;
            }
            if (PhraseFrequency[phraseindex] == 0)
            {
                PhraseFrequency[phraseindex]++;
                Formar[phraseindex] = formarindex;
                Latter[phraseindex] = index;
            }
            else
            {
                PhraseFrequency[phraseindex]++;
            }
            return (int)index;
        }
        /*void append(char* a, char* b)
        {
            char tochangestringa[200];//wordcompare doesn't changed, use string like before
            char tochangestringb[200];
            int index = hash(a,b);//TODO
            int c;//speciallly for the char*
            for (int i = 1; ((Frequency[index] != 0) && (((wordcompare(WordTable1[index], a, tochangestringa)) == false) || ((wordcompare(WordTable2[index],b,tochangestringb)) == false))); i++)
            {//have confident when Frequence > 0 and one of the word is different from the original one
                index = (index + i * i) % MAXINITLENGTH;
            }
            if (Frequency[index] == 0)
            {
                Frequency[index]++;
                //wordcompare(sample, sample, tochangestring);
                for (c = 0; a[c] != ''; c++)
                {
                    WordTable1[index][c] = a[c];
                }
                WordTable1[index][c] = '';
                for (c = 0; b[c] != ''; c++)
                {
                    WordTable2[index][c] = b[c];
                }
                WordTable2[index][c] = '';
            }
            else
            {
                Frequency[index]++;
                for (c = 0; tochangestringa[c] != ''; c++)
                    WordTable1[index][c] = tochangestringa[c];
                WordTable1[index][c] = '';
                for (c = 0; tochangestringb[c] != ''; c++)
                    WordTable2[index][c] = tochangestringb[c];
                WordTable2[index][c] = '';
            }
        }*/
        int getwordfrequency(char* sample)
        {
            unsigned int index = hash(sample);
            char tochangestring[200];
            for (int i = 1; (WordFrequency[index] != 0) && (wordcompare(sample, WordTable[index], tochangestring) == false); i++)
            {
                index = (index + i * i) % MAXINITLENGTH;
            }
            return WordFrequency[index];
        }
        int getphrasefrequency(char* formar, char* latter)
        {
            int formarindex = hash(formar);
            char tochangestring[200];
            for (int i = 1; (WordFrequency[formarindex] != 0) && (wordcompare(formar, WordTable[formarindex], tochangestring) == false); i++)
            {
                formarindex = (formarindex + i * i) % MAXINITLENGTH;
            }
            int latterindex = hash(latter);
            for (int i = 1; (WordFrequency[latterindex] != 0) && (wordcompare(latter, WordTable[latterindex], tochangestring) == false); i++)
            {
                latterindex = (latterindex + i * i) % MAXINITLENGTH;
            }
            int phraseindex = hash2(formarindex, latterindex);//hash again
            for (int i = 0; ((PhraseFrequency[phraseindex] != 0) && ((formarindex != Formar[phraseindex]) || (latterindex != Latter[phraseindex]))); i++)
            {
                phraseindex = (phraseindex + i * i) % PHRASELENGTH;
            }
            return PhraseFrequency[phraseindex];
        }
        /*int getfrequency(char* a, char* b)
        {
            int index = hash(a, b);
            char tochangestringa[200];
            char tochangestringb[200];
            for (int i = 1; ((Frequency[index] != 0) && (((wordcompare(WordTable1[index], a, tochangestringa)) == false) || ((wordcompare(WordTable2[index], b, tochangestringb)) == false))); i++)
            {
                index = (index + i * i) % MAXINITLENGTH;
            }
        }*/
    private:
        /*unsigned int hash(char* str)
        {
            unsigned int hash = 1315423911;
            unsigned int i = 0;
            for (i = 0; str[i] != '';i++)
            {
                hash ^= ((hash << 5) + ((int)(*str)-WhatKindChar(*str)) + (hash >> 2));
            }
            hash = hash % MAXINITLENGTH;
            return hash;
        }*/
        unsigned int hash(char* sample)
        {//TODO: to pay attention on the loop bound
            long long result = 0;
            unsigned int i, j, m;
            int samplelength;
            for (samplelength = 0; sample[samplelength] != ''; samplelength++);//get the length of sample
            for (j = samplelength - 1; (j > 0) && (WhatKindChar(sample[j]) == NUMBER); j--);
            for (m = 0; m <= j; m++)
            {
                result += (((int)sample[m] - WhatKindChar(sample[m]))) * (unsigned int)pow(13, m);
            }
            result = result % MAXINITLENGTH;
            return (int)result;
    
        }
        unsigned int hash2(int formarindex, unsigned int latterindex)
        {
            unsigned phraseindex = (unsigned int)formarindex * 7 + latterindex * 23;
            phraseindex = phraseindex & PHRASELENGTH;
            return (int)phraseindex;
        }
        /*int hash(char* a, char* b)
        {
            long long result = 0;
            int alen,blen;
            int aeffect, beffect;//to mark the last effective locate in the word
            int m;
            for (alen = 0; a[alen] != ''; alen++);//get the length of sample
            for (blen = 0; b[blen] != ''; blen++);
            for (aeffect = alen - 1; (aeffect > 0) && (WhatKindChar(a[aeffect]) == NUMBER); aeffect--);
            for (beffect = blen - 1; (beffect > 0) && (WhatKindChar(b[beffect]) == NUMBER); beffect--);
            for (m = 0; m <= aeffect; m++)
            {
                result += ((int)a[m] - WhatKindChar(a[m])) * 16 ^ m;
            }
            for (m = 0; m <= beffect; m++)
            {
                result += ((int)b[m] - WhatKindChar(b[m])) * 16 ^ m;
            }
            result = result % MAXINITLENGTH;
            return (int)result;
        }*/
    
        bool wordcompare(char* a, char* b, char *p)
        {/*
            this function is to compare a and b to see if they are the same wordorphrase of the
            if they are not the same , the function will return a empty string: std:string()/""
            if they are the same , the function will return a string to represent the one between them with higher rank in priority
         */
            int c;//counte for char*
            int alen, blen;
            for (alen = 0; a[alen] != ''; alen++);
            for (blen = 0; b[blen] != ''; blen++);
    
            int wi, wj, wm;
            for (wi = alen - 1; (wi > 0) && (WhatKindChar(a[wi]) == NUMBER); wi--);
            for (wj = blen - 1; (wj > 0) && (WhatKindChar(b[wj]) == NUMBER); wj--);
            if (wi != wj)
                return false;//return a empty string ""
            else
            {
                for (wm = 0; (wm <= wi) && ((a[wm] - WhatKindChar(a[wm]))== (b[wm] - WhatKindChar(b[wm]))); wm++);//find the place starting to be different
                if (wm <= wi)
                {//is different in fommer part, return a empty string ""
                    return false;
                }
                else
                {//is the same word
                    if (strcmp(a, b) > 0)
                    {
                        for (c = 0; c < blen; c++)
                            p[c] = b[c];
                        p[c] = '';
                    }
                    else
                    {
                        for (c = 0; c < alen; c++)
                            p[c] = a[c];
                        p[c] = '';
                    }
                    return true;
                }
            }
        }
    };
    
    /*class TwoQueue
    {
    public:
        char item[2][200];
        void enqueue(char* sample)
        {
            int samplelength;
            int c;
            for (samplelength = 0; sample[samplelength] != ''; samplelength++);
            if (empty())
            {
                for (c = 0; c < samplelength; c++)
                    item[0][c] = sample[c];
                item[0][c] = '';
            }
            else
            {
                for (c = 0; c < samplelength; c++)
                    item[1][c] = sample[c];
                item[1][c] = '';
            }
        }
        void dequeue(char* p)
        {
            int c, len0, len1;
            for (len0 = 0; item[0][len0] != ''; len0++);
            for (len1 = 0; item[0][len1] != ''; len1++);
            for (c = 0; c < len0; c++)
                p[c] = item[0][c];
            p[c] = '';
            for (c = 0; c < len1; c++)
                item[0][c] = item[1][c];
            item[0][c] = '';
            item[1][0] = '';
        }
        bool empty()
        {
            return (item[0][0] == '');
        }
    };*/
    
    #endif
    View Code

        4.对buffer中的字符进行操作,判断是否是单词,比调用HashTable的一些方法,代码如下

    /* this function is for computer and statistic the words frequency in the file(transfered to
    the memory buffer
    and in this function we must free the memory of buffer to avoid the memory overflow
    */
    
    #include"functions.h"
    #include"datastructure.h"
    #include<iostream>
    #include<stdlib.h>
    HashTable WordTable;//TODO
    //HashTable PhraseTable(true);   //TODO
    //TwoQueue PhraseQueue;   //TODO
    long SumofChar = 0;
    long SumofWord = 0;
    long Sumofline = 0;
    long SumofPhrase = 0;
    int formarindex = -1;//coout for the phrase
    using namespace std;
    void statistics(char * buffer, long size)
    { 
        formarindex = -1;
        char sample[200] = { '' };//a sample word in the proceeding of finding words
        int samplelength = 0;
        int charkind;//a mark to show what kind of char it is
        //int c;//specially counting for char*
        for (int i = 0; i < size; i++)
        {
            //whole iterate to traverse the file
            charkind = WhatKindChar(*(buffer + i));//TODO:add accumulate SumChar and Sumofline in this function
            if((int)*(buffer+i)>=32 && (int)*(buffer+i)<=126)
                SumofChar++;
            if (*(buffer + i) == '
    ')
                Sumofline++;
            if ( samplelength < 4 && (charkind == LETTERH || charkind == LETTERL))
            {
                sample[samplelength] = *(buffer + i);
                samplelength++;
                sample[samplelength] = '';
                continue;
            }
            if ( samplelength >= 4 && (charkind != DELIMITER))
            {
                sample[samplelength] = *(buffer + i);
                samplelength++;
                sample[samplelength] = '';
                continue;
            }
            else
            {
                if (samplelength< 4)
                {
                    sample[0] = '';
                    samplelength = 0;
                    continue;
                }
                else
                {//have found a word 
                    //Low(sample);//TODO
                    formarindex = WordTable.append(sample, formarindex);
                    SumofWord++;
                    SumofPhrase++; 
                    sample[0] = '';
                    samplelength = 0;
                }
            }
        }
        delete(buffer);
    }
    
    void Low(char* sample)
    {
        for(int i =0; sample[i] != ''; i++)
        {
            if(WhatKindChar(sample[i]) == LETTERH)
                sample[i] += ('a' - 'A');
        }
    }
    
    
    void outputresult()
    {
        cout <<"word(same): "<< SumofWord << endl;
        cout << "word(diff): " << WordTable.getwordnumber() << endl;
        cout << "Phrase(same):" << SumofPhrase << endl;
        cout << "Pharse(diff): " << WordTable.getphrasenumber() << endl;
        cout << "chars: " << SumofChar << endl;
        cout << "lines: " << Sumofline << endl;
        cout << "10 most common words are :" << endl;
        WordTable.get10words();
        cout << "10 most common phrases are:" << endl;
        WordTable.get10phrases();
    }
    View Code

     5.main函数,对命令行参数分析,并调用相关函数,代码如下

    #include <iostream>
    #include <string>
    #include <vector>
    #include "functions.h"
    #include <time.h>
    using namespace std;
    
    
    int main(int argc, char* argv[])
    {
        time_t start, end;
        double cost;
        time(&start);
        if (argc != 2) {
            cout << "wrong paremeter, please input the folder's full path";
            return 1;
        }
        string homepath = argv[1];
        vector<string> files;
        char* buffer = NULL;
        long size = 0;
        //get all the files names, and put them into vector<string> files
        getallfiles(homepath, files);
        for (int i = 0; i < files.size(); i++)
        {
            //transfer the content to the RAM at one time, and get the head pointer buffer
            getcontent(files[i], buffer, size);//buffer apply memory in this function
            /*
            all above have been tested
            */
            statistics(buffer, size);//TODO:free the memory of buffer in this function
            //TODO
        }
        outputresult();
        time(&end);
        cost = difftime(end, start);
        cout << "totally cost: " << cost << "s" << endl;
          return 0;
    }
    View Code

    windows下的性能分析和优化

    我最开始的代码不是上面那样的, 区别是最开始的代码用了大量的string定义变量,还用了大量的string函数, 数据结构里面的词组也不是像上面的那样只存index,  而是另外有一个string[] 用来存词组.

    然后我跑一边测试数据比较慢.

    我利用visual studio的性能分析工具进行分析的结果如下:

    可以看到,在stirng 内存分配和操作上面,程序花了大量的时间,debug版下需要运行9分钟

    所以我决心改变,

    我将所有的string 变量改成了char* 并修改了代码,

    我将数据结构对象HashTable的词组存储进行了修改,原来用string[] 存储词组,现在用两个int的数组存储, 词组中的词在WordTable[]里面的index, 大大的提高了程序的速度.

    同时我还将hash表的长度换成了同量级的质数,有效的调高了散列的效果。

    dubug版下只要运行60s,release下运行30s

    修改以后的vs结果如下:

    结果正常了很多

    git 做源码版本管理

    github代码管理

    在linux环境下进行代码性能分析

    过程:

      安装gperftools, 及它的可视化工具:

        注意,这里好像不能直接apt install Google-perftools, 亲试会出现动态链接库名冲突

        将源码克隆到本地 :https://github.com/gperftools/gperftools,

        ./autogen

        ./configure

        ./make

        ./make install

        ./make check

        ./make clean

        链接动态链接库, g++ -o gperf-test *.cpp -lprofiler

        env CPUPROFILE=./final.prof ./gperf-test  ~/Documents/newsample(指定可执行文件和输出文件)

        在这里可能出现提示找不到动态连接哭的情况,库下载在/usr/local/lib/下面, 而不是通常的/usr/lib/下面

        vim  /etc/id.co.conf文件中加上/usr/local/lib/

        su 下运行ldconfig(/sbin/ldconfig)

        解决链接库问题

        pprof --txt ./a.out final.prof > prog.txt 结果转成txt格式

        cat prog.txt 查看结果

        pprof --pdf ./a.out final.prof >prog.pdf  结果转成pdf格式

        google-chrome prog.pdf   用浏览器查看

        结果如图所示

        

      可以看出在append中消耗的计算资源科时间都是最多的

      编译是加入-ltcmalloc 可以提供内存泄露风险测试,如图

    过程改进

    纵观这次的作业完成过程,我在一开始的总体设计上过于草率,导致之后的优化上面花了很多的时间。我今后应该在设计上面多下功夫,做到沉着,冷静分析。

    ---恢复内容结束---

  • 相关阅读:
    things to analysis
    retrieve jenkins console output
    temp
    temp
    mysql on Mac OS
    Scala的几个小tips
    docker查看容器使用率
    JVM性能监控(jconsole和jvisualvm)
    线程死锁问题
    线程阻塞状态
  • 原文地址:https://www.cnblogs.com/huangzp1104/p/8679324.html
Copyright © 2011-2022 走看看