zoukankan      html  css  js  c++  java
  • 基于Lucene3.5.0怎样从TokenStream获得Token

    通过学习Lucene3.5.0的doc文档,对不同release版本号 lucene版本号的API修改做分析。最后找到了有价值的修改信息。
  • LUCENE-2302: Deprecated TermAttribute and replaced by a new CharTermAttribute. The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java's StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence is allowed (e.g. regular expressions). (Uwe Schindler, Robert Muir)
  • 以上信息可以知道,原来的通过的方法已经不可以提取响应的Token了
    StringReader reader = new StringReader(s);
    TokenStream ts =analyzer.tokenStream(s, reader);
    TermAttribute ta = ts.getAttribute(TermAttribute.class);

  • 通过分析Api文档信息 可知,CharTermAttribute已经成为替换TermAttribute的接口
  • 因此我编写了一个样例来更好的从TokenStream中提取Token


  • package com.segment;
    
    import java.io.StringReader;
    import org.apache.lucene.analysis.Analyzer;
    import org.apache.lucene.analysis.Token;
    import org.apache.lucene.analysis.TokenStream;
    import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
    import org.apache.lucene.analysis.tokenattributes.TermAttribute;
    import org.apache.lucene.util.AttributeImpl;
    import org.wltea.analyzer.lucene.IKAnalyzer;
    
    
    public class Segment {
    	public static String show(Analyzer a, String s) throws Exception {
    
    		StringReader reader = new StringReader(s);
    		TokenStream ts = a.tokenStream(s, reader);
    		String s1 = "", s2 = "";
    		boolean hasnext= ts.incrementToken();
    		//Token t = ts.next();
    		while (hasnext) {
    			//AttributeImpl ta = new AttributeImpl();
    			CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);
    			//TermAttribute ta = ts.getAttribute(TermAttribute.class);
    			
    			s2 = ta.toString() + " ";
    			s1 += s2;
    			hasnext = ts.incrementToken();
    		}
    		return s1;
    	}
    
    	public String segment(String s) throws Exception {
    		Analyzer a = new IKAnalyzer();
    		return show(a, s);
    	}
    	public static void main(String args[])
    	{
    		String name = "我是俊杰,我爱编程,我的測试用例";
    		Segment s = new Segment();
    		String test = "";
    		try {
    			System.out.println(test+s.segment(name));
    		} catch (Exception e) {
    			// TODO Auto-generated catch block
    			e.printStackTrace();
    		}
    	}
    
    }

查看全文
  • 相关阅读:
    Atitit..组件化事件化的编程模型(2)Web datagridview 服务器端控件的实现原理and总结
    Atitit.dwr3 不能显示错误详细信息的解决方案,控件显示错误详细信息的解决方案 java .net php
    Atitit.实现继承的原理and方法java javascript .net c# php ...
    Atitit. 解压缩zip文件 的实现最佳实践 java c# .net php
    Atitit..文件上传组件选型and最佳实践总结(2)断点续传
    atitit.web的动态include 跟变量传递 java .net php
    Atitit. BigConfirmTips 控件 大数据量提示确认控件的原理and总结O9
    Atitit.guice3 ioc 最佳实践 o9o
    Atitit.hybrid混合型应用 浏览器插件,控件的实现方式 浏览器运行本地程序的解决方案大的总结提升用户体验and开发效率..
    atitit.提升开发效率使用服务器控件生命周期 asp.net 11个阶段 java jsf 的6个阶段比较
  • 原文地址:https://www.cnblogs.com/mfrbuaa/p/4266936.html
  • Copyright © 2011-2022 走看看