zoukankan html css js c++ java

个人项目----词频统计（补全功能）

对每个功能 (或/和子功能)的预计花费时间

功能	预计时间（min）	实际时间（min）
文件存放、分词、词频统计	60	82
词频排序	20	27
读取目录下书目	15	26
主函数设计	50	74

词频统计psp

日期	类型	任务	开始时间	结束时间	被打断时间	计划（min）	实际（min）
2016.10.07	需求分析	看spec，分析每个功能的需求	14：59	15：38	3	30	36
2016.10.07	编码学习	设计文件存放、分词、词频统计，阅读同学的代码	15：44	17：11	5	60	82
2016.10.07	编码学习	词频排序、读取目录下书目、主函数设计	19：00	21：26	19	85	127
2016.10.08	编码学习	学习重定向	15.01	15：39	2	30	36
2016.10.08	代码复审	写博客、调试运行结果	15：45	17：12	6	30	81
2016.10.08	代码复审	写博客、调试运行结果	17：53	18：26	3	30	30
2016.10.09	总结psp	总结各项时间，总结心得，发布博客	9：48	10：57	7	30	62

对比分析

拖拉很久的作业，利用这个假期终于能勉强补上点了，之前动手搜集了一些资料进行学习，一直摸不清头绪，这次看了同学的代码才弄清楚大概。对于预期和实际上的差距，主要有下面几个原因：

看过资料后感觉上程序的流程是这样，但实际动手做起来的时候会遇到很多细节上的问题。例如在DOS下运行程序时会出现“wordcounta.java:13: 错误: 找不到符号”这样的错误提示。

在编写程序时遇到类型、声明类的格式不对，调用方法的规则不正确等等错误。在更改这些错误上花费了不少的时间。

学习代码时遇到了很多问题。例如分词时split()方法中要用的参数的使用，BufferedReader、FileReader的用法，Map对存储到ArrayList的方法等等。

需求分析

　　作业中需要完成四个功能。

　　第一，用户输入的小文件进行词频统计。输出统计的单词总数，每个单词的词频。可以利用这个方法，来满足其他需求下的这个功能。

　　第二，用户可以输入文件的名字来对此文件进行词频统计。输出统计的单词总数，每个单词的词频。

　　第三，用户输入文件所在目录。在该目录下显示所有.txt文件，随后对每个文件进行词频统计。输出结果中应显示单词总数，以及不重复的单词数。由于在书目数量过多的情况下，显示每个单词的词频结果篇幅非常长，用户使用起来非常不方便。于是要在结果中只显示每个文档词频排名前十的结果。

　　第四，用户输入重定向指令，在重定向的目录下对文件进行词频统计。输出统计的单词总数，每个单词的词频。

功能实现

创建wordcount类，该类实现了词频统计的基本功能。包含以下三个方法：

　　public Map<String, Integer> map(File dir)

　　public ArrayList<Map.Entry<String,Integer>> SortMap(Map<String,Integer> oldmap)

　　public File[] Outputlist(Scanner sc)

public Map<String, Integer> map(File dir):对输入的File文件读取，对文件每一行分词去空格及标点后存入ls队列，对ls中的单词进行统计，存入“Map对”wc中。Map<String,Integer>中的String表示单词变量，Integer表示出现次数变量。

 1 public Map<String, Integer> map(File dir) throws IOException{
 2            BufferedReader reader = new BufferedReader(new FileReader(dir));
 3            List<String> ls = new ArrayList<String>();
 4            String readLine = null;  //定义readLine初始值
 5            Map<String,Integer> wc = new TreeMap<String,Integer>();
 6            while((readLine = reader.readLine()) != null){    
 7                  String[] wordsArr1 = readLine.split("[^a-zA-Z]");  //将每个单词分割    
 8                  for (String word : wordsArr1) {    
 9                      if(word.length() != 0){  //去除长度为0的单词    
10                          ls.add(word);    //将每个单词存入列表
11                      }    
12                  }    
13              }    
14             reader.close();  //关闭流
15 
16             //单词的词频统计  
17             for (String li : ls) {  
18                 if(wc.get(li) != null){  //get(li)表示获得当前的单词数
19                     wc.put(li,wc.get(li) + 1);  
20                 }else{  
21                     wc.put(li,1);  
22                 }  
23       
24             }
25             return wc;
26        }

public ArrayList<Map.Entry<String,Integer>> SortMap(Map<String,Integer> oldmap)：对“Map对”进行排序，按照Map中Integer的降序排序。

 1     public ArrayList<Map.Entry<String,Integer>> SortMap(Map<String,Integer> oldmap){  
 2         
 3            ArrayList<Map.Entry<String,Integer>> list = new ArrayList<Map.Entry<String,Integer>>(oldmap.entrySet());  
 4              
 5            Collections.sort(list,new Comparator<Map.Entry<String,Integer>>(){  //降序  
 6                @Override  
 7                public int compare(Entry<String, Integer> o1, Entry<String, Integer> o2) {  
 8                    return o2.getValue().compareTo(o1.getValue());  
 9                }  
10            }); 
11            
12            return list;
13        }

public File[] Outputlist(Scanner sc)：对输入的地址查找txt文档，把文档存入File数组中，输出数组，并返回数组，以便对数组下的每个文档进行词频统计。

 1 public File[] Outputlist(Scanner sc) throws IOException{
 2            File file=new File(sc.nextLine());
 3            File[] tempList = file.listFiles();
 4            System.out.println("该目录下的书目有：");
 5            for (int i = 0; i < tempList.length; i++) {
 6                 if (tempList[i].isFile()) {
 7                 System.out.println(tempList[i].getName());
 8                }
 9            }
10            return tempList; 
11     }

创建wordcounta,wordcountb,wordcountc,wordcountd四个类，对应着分别实现了需求的四个功能

wordcounta:

 1 public class wordcounta {
 2     public static void main(String[] args) throws IOException {
 3         // TODO Auto-generated method stub    
 4         @SuppressWarnings("resource")
 5         Scanner input = new Scanner(System.in);
 6         wordcount yl = new wordcount();
 7         File file = new File(input.nextLine());
 8         Map<String, Integer> wc = yl.map(file);
 9         ArrayList<Map.Entry<String,Integer>> list =  yl.SortMap(wc);
10         int j = 0;//出事单词总数
11         
12         for(int k = 0;k < list.size();k++){
13                j+=list.get(k).getValue();
14         }
15         System.out.println("单词的总数是:"+j);
16         for(int k = 0;k < list.size();k++){  
17                System.out.println(list.get(k).getKey()+ ": " +list.get(k).getValue());  
18         } 
19 
20     } 
21        
22 }

运行结果如下

wordcountb：

 1 public class wordcountb {
 2 
 3     public static void main(String[] args) throws IOException {
 4         // TODO Auto-generated method stub
 5         wordcount yl = new wordcount();
 6         Scanner inputxt = new Scanner(System.in);
 7         File[] tempList = yl.Outputlist(inputxt);
 8         for(int i = 0;i<tempList.length;i++){    //对目录下的每个文件进行统计
 9             System.out.println(tempList[i].getName());
10             Map<String, Integer> wc = yl.map(tempList[i]);//统计词频
11             ArrayList<Map.Entry<String,Integer>> list = yl.SortMap(wc);//词频排序
12             int j = 0;
13              for(int k = 0;k < list.size();k++)
14             {
15                 j+=list.get(k).getValue();
16             }
17                 
18             System.out.println("单词的总数是"+j+"  "+"不重复的单词的个数"+list.size());  
19             if(list.size()>=10){
20                 for(int m = 0; m<10; m++){  
21                     System.out.println(list.get(m).getKey()+ ": " +list.get(m).getValue());  
22                 }
23             }else{
24                for(int m = 0; m<list.size(); m++){  
25                    System.out.println(list.get(m).getKey()+ ": " +list.get(m).getValue());  
26                }
27               System.out.println("该文档下不重复的单词数不足十个");
28             }
29         }
30         
31     }
32 
33 }

运行结果如下

wordcountc:

 1 public class wordcountc {
 2 
 3     public static void main(String[] args) throws IOException {
 4         // TODO Auto-generated method stub
 5         @SuppressWarnings("resource")
 6         Scanner input = new Scanner(System.in);
 7         String path = "D:\小说\";
 8         path += input.next();
 9         File file = new File(path+".txt");
10         wordcount yl = new wordcount();
11         Map<String, Integer> wc = yl.map(file);
12         ArrayList<Map.Entry<String,Integer>> list =  yl.SortMap(wc);
13         int j = 0;
14         
15         for(int k = 0;k < list.size();k++){
16                j+=list.get(k).getValue();
17         }
18         System.out.println("单词的总数是:"+j);
19         for(int k = 0;k < list.size();k++){  
20                System.out.println(list.get(k).getKey()+ ": " +list.get(k).getValue());  
21         } 
22     }
23     
24 }

结果如下

wordcountd：

 1 public class wordcountd {
 2 
 3     public static void main(String[] args) throws IOException {
 4         // TODO Auto-generated method stub
 5         if (args.length == 0) {
 6             Scanner in = new Scanner(System.in);
 7             FileWriter out = new FileWriter("D:\小说\new.txt"); 
 8             while (in.hasNext()) {
 9                 out.write(in.nextLine()+"
"); 
10             }
11             out.close();
12             in.close(); 
13         }
14         File file = new File("D:\小说\new.txt");
15         wordcount yl = new wordcount();
16         Map<String, Integer> wc = yl.map(file);
17         ArrayList<Map.Entry<String,Integer>> list =  yl.SortMap(wc);
18         
19         int j = 0;
20         for(int k = 0;k < list.size();k++){
21                j+=list.get(k).getValue();
22         }
23         System.out.println("单词的总数是:"+j);
24         for(int k = 0;k < list.size();k++){  
25                System.out.println(list.get(k).getKey()+ ": " +list.get(k).getValue());  
26         } 
27         
28     }
29 
30 }

结果如下

HTTP:https://git.coding.net/YangXiaomoo/wordCountNO.1.git

SSH：git@git.coding.net:YangXiaomoo/wordCountNO.1.git

GIT：git://git.coding.net/YangXiaomoo/wordCountNO.1.git

查看全文

相关阅读:
Groovy 闭包_胖子的家_百度空间
 活动 | Think+大声思考
 微软对联背后的故事
 核心技术研发工程师百度在线网络技术（北京）有限公司庞果网Pongo.cn
Groovy 闭包深入浅出终点就是起点 ITeye技术网站
 Groovy闭包深入学习 [203] 一直都有新高度 ITeye技术网站
 用groovy写抓票程序
 eating machine=chi huo
sql case when学习
 韦式词典发音

原文地址：https://www.cnblogs.com/YangXiaomoo/p/5939743.html