zoukankan      html  css  js  c++  java
  • java实现文本词频统计

    File f=new File(path);
    Map<String,Integer>map=new HashMap<>();
    Version matchVersion = Version.LUCENE_31;
    Analyzer analyzer = new StopAnalyzer(matchVersion);
    BufferedReader br = new BufferedReader(new FileReader(f));//读取文件

    TokenStream ts = analyzer.tokenStream(null, br);//用analyzer分词,得到token流
    ts = new PorterStemFilter(ts);//过滤器提取词干
    CharTermAttribute ca = ts.addAttribute(CharTermAttribute.class);//ca存储了ts的文本信息
    ts.reset();//必须的
    while(ts.incrementToken()){
    String term = ca.toString();
    if(!map.keySet().contains(term)){
    map.put(term, 1);
    }else
    {
    map.put(term, map.get(term)+1);
    }
    }
    ts.end();
    ts.close();
    analyzer.close();
    br.close();

    StringBuilder sb=new StringBuilder();
    File gh=new File(path);
    for(String key:map.keySet()){
    sb.append(key+" "+map.get(key)+" ");
    }
    BufferedWriter bw=new BufferedWriter(new FileWriter(gh));
    bw.write(sb.toString());
    bw.flush();
    bw.close();

  • 相关阅读:
    MongoDb
    js暴露内部方法属性等
    JS闭包
    k8s设计模式
    scrum
    死锁
    Linux下安装php 扩展fileinfo
    linux中whereis、which、find、location的区别和用法
    Linux 命令学习记录
    windows 下 redis 的安装及使用
  • 原文地址:https://www.cnblogs.com/altlb/p/6856296.html
Copyright © 2011-2022 走看看