zoukankan      html  css  js  c++  java
  • 基于Lucene3.5.0怎样从TokenStream获得Token

    通过学习Lucene3.5.0的doc文档,对不同release版本号 lucene版本号的API修改做分析。最后找到了有价值的修改信息。
  • LUCENE-2302: Deprecated TermAttribute and replaced by a new CharTermAttribute. The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java's StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence is allowed (e.g. regular expressions). (Uwe Schindler, Robert Muir)
  • 以上信息可以知道,原来的通过的方法已经不可以提取响应的Token了
    StringReader reader = new StringReader(s);
    TokenStream ts =analyzer.tokenStream(s, reader);
    TermAttribute ta = ts.getAttribute(TermAttribute.class);

  • 通过分析Api文档信息 可知,CharTermAttribute已经成为替换TermAttribute的接口
  • 因此我编写了一个样例来更好的从TokenStream中提取Token


  • package com.segment;
    
    import java.io.StringReader;
    import org.apache.lucene.analysis.Analyzer;
    import org.apache.lucene.analysis.Token;
    import org.apache.lucene.analysis.TokenStream;
    import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
    import org.apache.lucene.analysis.tokenattributes.TermAttribute;
    import org.apache.lucene.util.AttributeImpl;
    import org.wltea.analyzer.lucene.IKAnalyzer;
    
    
    public class Segment {
    	public static String show(Analyzer a, String s) throws Exception {
    
    		StringReader reader = new StringReader(s);
    		TokenStream ts = a.tokenStream(s, reader);
    		String s1 = "", s2 = "";
    		boolean hasnext= ts.incrementToken();
    		//Token t = ts.next();
    		while (hasnext) {
    			//AttributeImpl ta = new AttributeImpl();
    			CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);
    			//TermAttribute ta = ts.getAttribute(TermAttribute.class);
    			
    			s2 = ta.toString() + " ";
    			s1 += s2;
    			hasnext = ts.incrementToken();
    		}
    		return s1;
    	}
    
    	public String segment(String s) throws Exception {
    		Analyzer a = new IKAnalyzer();
    		return show(a, s);
    	}
    	public static void main(String args[])
    	{
    		String name = "我是俊杰,我爱编程,我的測试用例";
    		Segment s = new Segment();
    		String test = "";
    		try {
    			System.out.println(test+s.segment(name));
    		} catch (Exception e) {
    			// TODO Auto-generated catch block
    			e.printStackTrace();
    		}
    	}
    
    }

查看全文
  • 相关阅读:
    hdu 2485 Destroying the bus stations 迭代加深搜索
    hdu 2487 Ugly Windows 模拟
    hdu 2492 Ping pong 线段树
    hdu 1059 Dividing 多重背包
    hdu 3315 My Brute 费用流,费用最小且代价最小
    第四天 下载网络图片显示
    第三天 单元测试和数据库操作
    第二天 布局文件
    第一天 安卓简介
    Android 获取存储空间
  • 原文地址:https://www.cnblogs.com/gcczhongduan/p/4014987.html
  • Copyright © 2011-2022 走看看