软件工程基础个人项目2014 - 走看看

zoukankan html css js c++ java

软件工程基础个人项目2014

第一篇专业相关的博客。

项目要求：http://www.cnblogs.com/jiel/p/3978727.html

1.项目预计时间

虽然大二时java写过比这个复杂的词频统计程序，但是现在对c++或者c#都不熟，因此还是有一定挑战性，前几天都不想写这个程序。

由于对于c++比c#更加了解，因此选择c#完成程序，一边加深对c#的理解。不过不打算直接修改java的代码，准备重新用c#写。

预计学习c#的时间：4小时

预计程序主要的类有两个，一个是遍历目录和文件的，一个是单词分析的，预计编写程序用时：6小时

预计调试时间：4小时

预计优化时间：4小时

2.项目实际用时

学习c#的时间（包含学习加上百度查阅不懂的用法）：4小时

编写程序用时：4小时

调试时间：2小时

优化用时：8小时

3.程序优化及性能分析

优化用时记录为8小时，实际上应该不止。事实上在项目的进行过程中我是边写程序边优化的，经常为了得到更优化的编写方案而卡很久，大部分时间都花在思考如何优化以及查询优化方法上了。

优化的用时花费在以下几个点：

遍历目录文件、单词分析，数据存储，数据排序

遍历目录文件优化主要是因为很久没写程序了，因此一开始进行目录遍历时不知道如何下手（c#不熟），查询了很多都不奏效，因此对目录的判断，目录的广度和深度优先扫描都做了小范围的优化。

单词分析的编写就花费了相当长的时间，优化主要是在查漏补缺各种没有考虑到的情况，添加一下关键性语句，对逻辑不严密的地方进行补充，这一部分的优化一直做到了最后，一直在发现问题并进行修改。

数据存储的优化花了相当多的功夫，其实这一部分是在编写程序中进行的，但我更倾向于把它分类到优化中。一开始也没多想准备用arraylist或者vector存储，后来偶然在网上发现学长的项目分析报告，在他的报告中，他提到dictionary能够极大地提高程序的性能。在msdn中查看之后，我了解到dictionary内部实现是哈希表结构，因此随机检索非常快。在项目中搜索重复单词时对大小写不敏感，而存储时敏感。因此可以以单词的小写作为键，单词的标准形式以及出现次数构成的对象作为值，这样方便进行单词的统计。由于哈希表根据键检索值的时间复杂度是O（1），因此我坚定了使用哈希表的想法。然而哈希表不能索引取值，也就不便进行排序。dictionary类的成员方法中没有排序的方法，但是项目要求必须排序，而且是多层排序。因此我在哈希表与动态数组之间抉择不定。虽然后来又发现了ordereddictionary类可以完美地解决哈希表的索引取值问题，进而可以自定义排序。然而ordereddictionary的空间消耗是dictionary的将近三倍，时间也慢一些，这是因为ordereddictionary内部结构式哈希表+数组。这样我又在dictionary和ordereddictionary之间抉择不定，很久才做出抉择使用dictionary。

数据排序也花了相当的功夫。在选择dictionary作为数据存储的结构之后，接下来需要寻找dictionary的排序方法，我想了很多方法，比如将键设计成小写字符串与索引拼起来，然后利用正则表达式识别索引，进而进行自定义的快速排序，但是构思了2个多小时后确定该方案无法实现。上网找了一段时间才找到system.linq命名空间的orderby方法可以对dictionary进行排序。但进行多层排序（类似动态排序）却有些困难。我在网上查阅了很多相关资料，还专门查看了动态排序的底层实现分析，却仍然没有结果，大概1小时候才找到利用thenby进行多层排序的方法。写完之后运行，结果是按照频度从大到小排序了，但是频度相同的字符串并没有按照ASCII码顺序排序。在网上寻找很久也没有找到答案，尝试了几个StringComparer的属性传入orderby，也仍然是按照unicode编码进行排序。最终在我查看msdn的过程中，发现orderby方法支持传入自定义的实现了比较器接口的比较器类，因此我自定义了一个将字符串按照ASCII码进行排序的类并传入orderby中，最终成功得到预期结果，优化ASCII排序至少花了2个小时的时间。

编写边优化完第一遍之后进行了程序的性能分析，检索vs2012的安装目录花费了6分46秒的时间，有些太慢，查看error list，发现两个警告。第一个是对于异常处理过多的警告，另一个是关于垃圾太多以致垃圾处理频繁的警告。对于第一个警告，我在程序中使用了很多try-catch语句，根据抛出的异常来判断一个路径代表的是目录还是文件，以及处理IO异常。由于程序处理的数据量比较大，因此try-catch语句的调用次数也相当可观。因此我去掉了所有的try-catch语句，改用其他方法来实现之前借助try-catch语句判断的功能。性能分析之后，异常处理的警告消失，垃圾处理的警告也消失了，检索vs2012目录的时间也由之前的6分多钟乃至接近十分钟变为4秒，最多25秒，变为秒级。对于检索vs2012安装目录（大小2.78 GB，占用空间2.86 GB，包含33,443 个文件，6,633 个文件夹），单个单词用时大约5秒，连续两个单词大约用时4秒，连续三个单词大约用时4秒。当然在变为秒级之前我对单词分析以及目录遍历也做了一定程度的优化。

由于当时做完第一次性能分析后就进行了优化并删除了性能分析的报告，因此只列出优化之后的性能分析报告，分四种模式的性能分析；

CPU sampling：

Instrumentation：

.NET memory allocation(sampling):

Resource contention data(concurrency):

4.项目的测试样例

（1）测试识别后缀为"txt", "cpp", "h", “cs”的文件

构建了一个文件夹，文件夹内有4个子文件夹，分别装有后缀为.txt,.cpp,.h,.cs的文件，每个文件内都有该文件独有的单词。如果最后输出文件中四个特殊单词都有，则说明程序可以识别这些文件。

四个文件的内容：

.cpp文件：hello, this is the c plus plus file

.cs文件:hello, this is the c sharp file

.h文件:hello, this is the header file

.txt文件:hello, this is the txt file

输出结果：

<file>:4
<hello>:4
<the>:4
<this>:4
<plus>:2
<header>:1
<sharp>:1
<txt>:1

（2）测试按照ASCII码排序

文件内容：

element Electricyt fjeowajfoap FJEOZZZZZZZZA
classpath 432 423 3&%#classpath Classpathjiou
ElemeNt ElectriCyt fjeoWAjfoap FJEOzzzzzzzZA

输出结果：

<ElectriCyt>:2
<ElemeNt>:2
<FJEOZZZZZZZZA>:2
<classpath>:2
<fjeoWAjfoap>:2
<Classpathjiou>:1

主要测试在次数相同的情况下对字符串按照ASCII码顺序排序，大写排在小写前面。

（3）测试超大文件夹（vs2012安装目录）

测试扫描超大文件夹，例如vs2012安装目录，测试程序的数据结构及算法是否合理。

vs2012安装目录信息：

文件夹名：Microsoft Visual Studio 11.0

大小：2.78 GB (2,995,686,049 字节)

占用空间：2.86 GB (3,080,278,016 字节)

包含：33,443 个文件，6,633 个文件夹

运行结果：

单个单词：

大约耗时5秒。

结果太长，只截部分：

连续两词：

大约耗时4秒。

输出结果：

<using System>:13086
<param name>:9541
<see cref>:4930
<virtual void>:4072
<unsigned int>:3858
<Return FALSE>:3759
<Return TRUE>:2909
<public static>:2546
<For the>:2291
<WITH THE>:2265

连续三词：

大约耗时4秒。

输出结果：

<Microsoft Foundation Classes>:2149
<localized string similar>:1520
<public static string>:1520
<ALL RIGHTS RESERVED>:1496
<public partial class>:1276
<using namespace Windows>:1198
<The message block>:994
<message block type>:976
<Reference and related>:812
<Foundation Classes Reference>:778

(4)测试单词大小写的统计情况

文件内容：

file123 file File 123file files
classpath&Classpath&&classPath*claSspAth)*)(*(^%^|~~

输出结果：

<Classpath>:4
<File>:2
<file123>:1
<files>:1

主要测试统计单词时的忽略大小写以及最后输出时的输出字典序最小的同类字符串，同时，123file中的file不能被识别为单词，因为不是被分隔符隔开的，因此File为2个而不是3个。

（5）测试多国语言的输入情况

文件内容：

Eventually I 一流learn一流ed to stop worrying and love the fまあlow. The pervasiveness of the new multiplicity, and my participation in it, altered my peまあrspective. Altered my Self. The まあtransition was gradual, but eventually I realized I waまあs on the other side.
Eveまあntu一流ally I learned to stop worrying and love the flow. The pervasiveness of the new multiplicity, and my participation in iまあt, altered my perspective. Altereまあd my Self. The transition was gradual, but eventually I realized I wまあas on the other side.
Eventua一流lly I learned ????to st一流op wo一流rryi一流ng and love まあthe flow. The pervasiveness of the new ????multiplicity, and my participatまあion???? in it, altereまあd my perspective. Altered my Self. The transition was gradual, but e????ventually I realized I was???? on the other side.
Eventuall????y I le????arned to stop worry????ing and love t一流he flow. The pervasiveness of the new ????multiplicity, and my participation in it, altered my perspe????ctive. A????ltered my Self. The tまあrans????ition was gradual, but eventually I realized I was ????on the otまあher side.
Eventually I learned to stop worrying and love the flまあowまあ. The pervasiveness of the new multiplicity, and my participation in it, altered my pまあerspective. Altered my Self. The transition was gradual, but eventually I realized I was on the other side.
Event一流ually I learned to stop worrying and love the flow. The pervasiveness of the new multiplicity, and my participation in it, altered my pe????rspecまあtive. Altered my Self. The transition was gradual, but eventually I realized I was on the other side.
Eventually I learne专业的留学趋势，预估出未????来的????申请结果，从而更有效的做申请定位，针对性的提高。截止目前，该公益????培训班得到全面推广，与北大、まあ北航、人まあ大、外经贸、复旦、上海交大、上海财经、上外、西安交大、西????南财大等 30 余所国内著名大学和高中均ま????あ签署合作协议，开展了公益培训课程まあ，报名d to stop worrying and love the flow. The pervasiveness of the new multiplicity, and my participation in it, altered my perspective. Altered my Self. The transition was gradual, but eventually I realized I was on the other side.
Evまあentually I learned to stop worrying and love the flow. The pervasiveness of the new multiplicity, and my parti????cipation in it, altered m????y まあperspective. Altered my Sel????f. The transition was gradual, but eventually I realized I was on theまあ other side.
t technologies have serve一流d that purposまあe in the pa一流st. Thanks to Twitter backchannels identified by hashtag s, I was able to pまあarticipate with friends and audiまあencまあe members at some talks at SXSW (5) this past year, despite まあbeing uまあnable to attend in person.
t technol一流ogies have s????まあerved that purpose in the past. Thanks to Twitter backchannels identified by hashtag s, I w????as able to participate witまあh fr????iends and aまあudience memb????ers at some talks at SX????SW (5) this past year, despite being unable to attend in person.
t technologies have served that pur一流pose in the past. Thanks to Twitter backchannels identified by hashtag s, I was???? able to participate with friends and audience members at some talks at SXSW (5) this past year, despite bein????g unable to attend in person.
t technologie专业的留学趋势，预估出未来的申请结果，从而更有效的做申请定位，针对性的提高。截止目前，该公益培训班得到全面推广，与北大、北航、人大、外经贸、复まあ旦、上海交大、上海财经、上外、西安交大、西南财大等 30 余所国内著名大学和高中均签署合作协议，开展了公益培训课程，报名s have まあserved ????that purpose in t????he past. Thanks to Twitter backchannels identified by hashtag s, I was able to participate with friends and audience members at some talks at SXSW (5) this past year, despite being unable to attend in person.
t technoまあlogies have served that purpose in the past. Thanks to Twitter backchannels identified by hashtag s, I???? was able to p????articipまあate with friまあends and audiencまあe members at some talks at SXSW (5) this past year, despite being unaまあble to attend in person.
t technolog一流ies hav一流e served that purp一流ose in the past. Thanks to Twitter backchannels identified by hashtag s, I was ab????le to partici????pate with friends and audienまあe members at some t????alks at SXSW (5) t????his past year, despite being unable to attend???? in person.
t technologies have served that purpose in the past. Thanks to Twitter backchannels identified by hashtag s, I was able to participate witまあh friends and aまあudience members ????at some talks at SXSW (5) this past year, despite being unable to attend in person.
专业的留学趋势，预估出未来的申请结果，从而更有效的做申请定位，针对性的提高。截止目前，该公益培训班得到全面推广，与北大、北航、人大、外经贸、复旦、上海交大、上海财经、上外、西安交大、西南财大等 30 余所国内著名大学和高中均签署合作协议，开展了公益培训课程，报名
【いくまあゆずやこまな】Carry Me Off【DANCEROID五人组】
????hui????huihiui????iuuyhui????
????????????????????????????

输出结果：

<The>:45
<and>:23
<was>:20
<Altered>:13
<past>:13
<Eventually>:10
<but>:8
<gradual>:8
<love>:8
<multiplicity>:8
<new>:8
<pervasiveness>:8
<realized>:8
<side>:8
<Self>:7
<Thanks>:7
<Twitter>:7
<attend>:7
<backchannels>:7
<despite>:7
<hashtag>:7
<identified>:7
<other>:7
<person>:7
<some>:7
<stop>:7
<that>:7
<transition>:7
<year>:7
<SXSW>:6
<able>:6
<being>:6
<flow>:6
<have>:6
<members>:6
<participation>:6
<talks>:6
<this>:6
<worrying>:6
<friends>:5
<learned>:5
<served>:5
<unable>:5
<with>:5
<participate>:4
<perspective>:4
<purpose>:4
<technologies>:3
<Altere>:2
<audience>:2
<udience>:2
<wit>:2
<Carry>:1
<DANCEROID>:1
<Eve>:1
<Event>:1
<Eventua>:1
<Eventuall>:1
<Off>:1
<Sel>:1
<alks>:1
<ally>:1
<arned>:1
<articip>:1
<articipate>:1
<ate>:1
<audi>:1
<audien>:1
<audienc>:1
<bein>:1
<ble>:1
<cipation>:1
<ctive>:1
<enc>:1
<ends>:1
<entually>:1
<ers>:1
<erspective>:1
<erved>:1
<fri>:1
<hav>:1
<her>:1
<his>:1
<hui>:1
<huihiui>:1
<iends>:1
<ies>:1
<ing>:1
<ion>:1
<ition>:1
<iuuyhui>:1
<learn>:1
<learne>:1
<lly>:1
<logies>:1
<low>:1
<ltered>:1
<memb>:1
<nable>:1
<ntu>:1
<ogies>:1
<ose>:1
<parti>:1
<partici>:1
<participat>:1
<pate>:1
<perspe>:1
<pose>:1
<pur>:1
<purp>:1
<purpos>:1
<rans>:1
<rryi>:1
<rspec>:1
<rspective>:1
<serve>:1
<techno>:1
<technol>:1
<technolog>:1
<technologie>:1
<tive>:1
<ually>:1
<una>:1
<ventually>:1
<worry>:1

主要测试能否将除英文以外的其他语言，比如中文、日文等，识别为分隔符。

（6）测试对分隔符的识别

文件内容：

fewo 3iafj 646ewaoij &*^fowae&^ijffjo&&(aiwe^jfgers13$%214h$srtg^$ssr^%$

输出结果：

<aiwe>:1
<fewo>:1
<fowae>:1
<ijffjo>:1
<jfgers13>:1
<srtg>:1
<ssr>:1

测试程序对分隔符的识别。

（7）测试空文件和空文件夹

输出结果均为空

（8）测试连续两词的识别

文件内容：

123file file File FILE
word ranking list useful not exceptional
word rANking list useful not exceptional
word ranking list useful not exceptIONal
word ranking list usEFul not exceptional
you are such a wonderful man

输出结果：

<list usEFul>:4
<not exceptIONal>:4
<rANking list>:4
<usEFul not>:4
<word rANking>:4
<File FILE>:2
<are such>:1
<wonderful man>:1
<you are>:1

主要测试程序对连续两词的识别以及排序。

（9）测试连续三词的识别

文件内容：

please be quiet pleaSe be quiEt pleaSE BE quiet
please be quiet
Please Be quiet
pLEase be quiet
please be qUIEt
word ranking list is great word raNKing list is great
word ranking list is gREat
word ranKIng list is great
how are you
how Are you
fine thank you and YOU
fine Thank you And you
fine thank YOU and"
this program really drives me crazy
this program rEAlly drives me crazy
this program really drives me cRAzy
THis program really drives me crazy

输出结果：

<THis program really>:4
<program rEAlly drives>:4
<word raNKing list>:4
<Thank you And>:3
<fine Thank you>:3
<how Are you>:2
<you And you>:2
<great word raNKing>:1

主要测试程序对连续三词的识别能力以及排序能力。

（10）测试长文章的识别

测试长文章的识别情况。这里选取的是本项目的项目要求。

文件内容：

作业提交截止时间：2014.09.25之前。

Individual Project - Word frequency program

Implement a console application to tally the frequency of words under a directory (2 modes).

For all text files (file extensions: "txt", "cpp", "h", “cs”) under a directory (recursively), calculate the frequency of each word, and output the result into a text file. Write the code in C++ or C#, using .Net Framework, the running environment is 32-bit Win7 or Win 8.

Run performance analysis tool on your code, find performance bottlenecks and improve.

Enable Code Quality Analysis for your code and get rid of all warnings.

Code Quality Analysis: http://msdn.microsoft.com/en-us/library/dd264897.aspx

Write 10 simple test cases to make sure your program can handle these cases correctly (e.g. a good test case could be: one of the sub-directories is empty).

Submission:

Submit your source code and exe to TA, TA will run it on his testing environment and check for

- correctness (incorrect program will get 0 points)

- performance

- write a blog (see blog requirement below)

Definition:

A word: a string with at least 3 English alphabet letters, then followed by optional alphanumerical characters. Words are separated by delimiters. If a string contains non-alphanumerical characters, it’s not a word. Word is case insensitive, i.e. “file”, “FILE” and “File” are considered the same word.

“file123” is a word, and “123file” is NOT a word.

　　- Alphabetic letters: A-Z, a-z.

　　- Alphanumerical characters: A-Z, a-z, 0-9.

　　- Delimiter: space, non-alphanumerical letters.

　　- Output text file: filename is <your email name>.txt

　　- Each line has this format

　　　　　　<word>: number

Where <word> is the string, it has to be the exact upper/lower case as shown in the text file. E.g. if only “File” and “file” appear in the test cases, the program should not show “FILE”. <word> should be the first word in dictionary order (based on ASCII). For exmaple, if only “File” and “file” appear in the text file, the program should output “File: 2”.

Where “number” is the number of times this word appears in the scan. The output should be sorted with most frequently word first. If 2 words have the same frequency, list the words by dictionary order.

Requirements:
1) Simple mode. Output simple word frequency.

Myapp.exe <directory-name>

Will output <your-name>.txt file in current directory, the text file contains word ranking list.

2) Extended mode.

在执行 Myapp.exe -e2 <directory-name>时，找出最频繁出现的连续两个词（列出前10名）。例如，在一本英文小说中，“good morning” 出现次数最多。

在执行 Myapp.exe -e3 <directory-name>时，找出最频繁出现的连续三个词（列出前10名）。例如“how are you"。

这里连续的词是指由单个空格分隔的词。

The app will output <your-name>.txt file in current directory, the text file contains word ranking list.

Blog Requirement:
You can publish this to BOTH your own blog, and your team blog (to help your team blog get some traffic)

1) Before you implement this project, Record your estimate about the time you WILL spend in each component of your program.

2) After you had implemented this project, record the ACTUAL time you spent in each component of your program.

3) Describe how much time you spent on improving the performance of your program, and show a performance analysis graph (generated by VS2012 perf analysis tool), if possible, please show the most costly function in your program.

4) Share your 10 test cases, and how did you make sure your program can produce the correct result. (programs with incorrect result will get 0 points, regardless of speed)

5) Describe what you had learned in this exercise.

输出结果：

单个单词：

<The>:28
<FILE>:18
<Word>:17
<your>:17
<and>:12
<program>:10
<You>:9
<Output>:7
<directory>:7
<text>:7
<Blog>:6
<Code>:6
<WILL>:6
<name>:6
<this>:6
<Analysis>:5
<frequency>:5
<performance>:5
<Alphanumerical>:4
<Each>:4
<For>:4
<Words>:4
<cases>:4
<exe>:4
<get>:4
<should>:4
<test>:4
<txt>:4
<Myapp>:3
<NOT>:3
<Project>:3
<Simple>:3
<Write>:3
<are>:3
<can>:3
<case>:3
<characters>:3
<contains>:3
<how>:3
<letters>:3
<list>:3
<number>:3
<result>:3
<show>:3
<string>:3
<time>:3
<with>:3
<Describe>:2
<Implement>:2
<Quality>:2
<Record>:2
<Requirement>:2
<Run>:2
<Where>:2
<all>:2
<appear>:2
<component>:2
<current>:2
<dictionary>:2
<environment>:2
<first>:2
<good>:2
<had>:2
<has>:2
<incorrect>:2
<make>:2
<mode>:2
<most>:2
<non>:2
<only>:2
<order>:2
<points>:2
<ranking>:2
<same>:2
<spent>:2
<sure>:2
<team>:2
<tool>:2
<under>:2
<ACTUAL>:1
<ASCII>:1
<After>:1
<Alphabetic>:1
<BOTH>:1
<Before>:1
<Definition>:1
<Delimiter>:1
<Enable>:1
<English>:1
<Extended>:1
<Framework>:1
<Individual>:1
<Net>:1
<Requirements>:1
<Share>:1
<Submission>:1
<Submit>:1
<Win>:1
<Win7>:1
<about>:1
<alphabet>:1
<app>:1
<appears>:1
<application>:1
<aspx>:1
<based>:1
<below>:1
<bit>:1
<bottlenecks>:1
<calculate>:1
<check>:1
<com>:1
<considered>:1
<console>:1
<correct>:1
<correctly>:1
<correctness>:1
<costly>:1
<could>:1
<cpp>:1
<delimiters>:1
<did>:1
<directories>:1
<email>:1
<empty>:1
<estimate>:1
<exact>:1
<exercise>:1
<exmaple>:1
<extensions>:1
<file123>:1
<filename>:1
<files>:1
<find>:1
<followed>:1
<format>:1
<frequently>:1
<function>:1
<generated>:1
<graph>:1
<handle>:1
<have>:1
<help>:1
<his>:1
<http>:1
<implemented>:1
<improve>:1
<improving>:1
<insensitive>:1
<into>:1
<learned>:1
<least>:1
<library>:1
<line>:1
<lower>:1
<microsoft>:1
<modes>:1
<morning>:1
<msdn>:1
<much>:1
<one>:1
<optional>:1
<own>:1
<perf>:1
<please>:1
<possible>:1
<produce>:1
<programs>:1
<publish>:1
<recursively>:1
<regardless>:1
<rid>:1
<running>:1
<scan>:1
<see>:1
<separated>:1
<shown>:1
<some>:1
<sorted>:1
<source>:1
<space>:1
<speed>:1
<spend>:1
<sub>:1
<tally>:1
<testing>:1
<then>:1
<these>:1
<times>:1
<traffic>:1
<upper>:1
<using>:1
<warnings>:1
<what>:1

连续两词：

<text file>:6
<your program>:6
<the text>:4
<Alphanumerical characters>:3
<test cases>:3
<time you>:3
<Blog Requirement>:2
<Code Quality>:2
<Quality Analysis>:2
<Will output>:2

连续三词：

<the text file>:4
<Code Quality Analysis>:2
<contains word ranking>:2
<file contains word>:2
<make sure your>:2
<sure your program>:2
<text file contains>:2
<the program should>:2
<time you spent>:2
<word ranking list>:2

如何保证程序能输出正确结果：

首先根据项目要求对程序的逻辑结构以及各种单词分析情况对程序进行了逻辑结构的分析与检查，减少、避免程序的逻辑漏洞。

其次，设计多样化的测试用例测试程序的正确性。

以上两步能够很大程度上地保证程序的正确性，但仍不能绝对地保证，程序在使用过程中有可能暴露出各种难以想象的错误，需要不断地进行维护。

5.项目经验与收获

虽说不是第一次做项目，但是用c#做项目还是第一次。在项目的进行过程中，我体会到c#语言的便利，尤其是c#提供的dictionary类，哈希表的结构为检索提供了O(1)的时间复杂度，大大减少了程序运行的时间，而哈希结构在我之前的项目中是没有被用过的。虽然哈希结构不利于索引，也没有排序的成员方法，但是c#提供了ordereddictionary牺牲空间，以内在的哈希表+数组的形式提供了索引取值，进而可以进行自定义排序；system.linq命名空间中的enumerable类的orderby和theyby方法可以对dictionary进行排序，同时进行多层排序以及按照不同编码规则的排序以及自定义的排序，为程序的多样化提供了便利。linq中的orderby排序方法在dictionary类的extension method中列有，但是因为不是成员方法所以我就没看，导致上网查资料花费了很多时间！下次一定要看看msdn中类下面的extension method，积累一些能够在该类上实行的方法。同时，c#高度的面向对象思想以及与java的相似性也使得我很快上手，这也充分体现了把一部分语言和思想学透理解，其他的语言就很容易上手的道理。

这次使用了新的IDE：vs2012.之前一直是用的VC6.0。vs2012并没有想象中那么耗内存，简洁而优雅的界面设计让我眼前一亮。最让我印象深刻的就是它的性能分析功能。在之前做java程序的时候，也使用过junit作为代码测试工具以及tptp作为性能分析工具。vs2012的性能分析工具跟tptp一样功能强大，并且有不同的模式，而vs2012的显示较tptp更为形象，主要表现在summary的折线图，hot path的分析以及function details中的图表分析，让我了解到自己的程序那些部分调用次数多，运行时间长；哪些部分占用内存较大，哪些部分占用CPU较大，以及在本项目中用不到的多线程竞争情况分析。最有用的是，在下方的error list中会提出error或者warning，我将两个warning修改之后程序效率提高了很多倍，从分钟级变为秒钟级。在warning中，vs指出了我的程序中的几个大问题，比如异常处理过多，垃圾太多导致垃圾处理频繁等。这些vs的性能分析以及警示都使程序开发者享有更好地编程体验，能够更快地发现错误以及程序中的问题并及时修复，不得不说这一次我对vs刮目相看了。有趣的是，eclipse（我开发java的IDE）虽然有tptp插件作为性能分析工具，但是到jre1.5之后就不再支持了，换句话说，tptp在2011年左右之后就不再更新了，导致我在新版本的Eclipse上无法使用tptp进行性能分析，只能使用Eclipse 进行java EE开发才能使用tptp进行性能分析。而vs2012的性能分析功能还在一直变得更好。

在调试的过程中，我经常会被一些莫名其妙的错误输出而困扰。这些错误输出大多是部分错误，而由于程序较长，找问题也有一定难度。每次发现问题都是非常细微的，比如少写了一句语句，或者某一句语句放错了位置，都有可能导致结果的错误。这也对我提出了要求。在写程序的时候就要逻辑清晰并且注意细节，这样在调试的时候才会轻松一些。

这次项目的经历，也让我明白msdn以及网络的巨大用处。在查询msdn的过程中，我系统地了解了dictionary等类的用法以及使用内涵，msdn中配有的用例也让我颇受启发。而网络上具有相似经历的网友的博客也让我受益匪浅。这些网友将他们的经验在cnblog、csdn等地进行分享，让我在茫然无助的时候看到了曙光，节省了很多时间，我也因此明白，分享是一个互惠互利的过程。因此这一次我把自己的项目经历写出来，供大家参考，也让有相同困惑的网友们了解一些我的经验教训，获得一些可能的启发。

同时要感谢我的同学们。在你们的博客中，我得到了很多启示。在与同学以及老师的讨论过程中，我们都收获了更多东西，既方便了他人参考，给予启迪，同时间接地让他人检查自己博客中的错误，从而达到双赢。面对即将开始的结对编程以及团队项目，我希望自己能够在与他人更好地合作中收获更多的知识与友谊，促进团队和自身的发展。

查看全文

相关阅读:
Neo4j简介
 HiBench算法简介
 Spark性能测试工具
 常用Benchmark
Mapreduce的性能调优
 YARN node labels
Yarn on Docker集群方案
 YARN on Docker
HDP YARN MapReduce参数调优建议
 JVM优化:生产环境参数实例及分析

原文地址：https://www.cnblogs.com/hks1994/p/3991963.html

Copyright © 2011-2022 走看看