zoukankan      html  css  js  c++  java
  • Solr5.3.1整合IKAnalyzer

      由于solr5.3.1本身不支持中文分词,而msseg4j的分词效果不明显。因而采用IK进行分词,然而参考http://www.superwu.cn/2015/05/08/2134/在google上下载的jar包放到solr目录下直接报如下异常。

    严重: Servlet.service() for servlet [default] in context with path [/solr] threw exception [Filter execution threw an exception] with root cause
    java.lang.AbstractMethodError
        at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:179)
        at org.apache.solr.handler.AnalysisRequestHandlerBase.analyzeValue(AnalysisRequestHandlerBase.java:91)
        at org.apache.solr.handler.FieldAnalysisRequestHandler.analyzeValues(FieldAnalysisRequestHandler.java:221)
        at org.apache.solr.handler.FieldAnalysisRequestHandler.handleAnalysisRequest(FieldAnalysisRequestHandler.java:182)
        at org.apache.solr.handler.FieldAnalysisRequestHandler.doAnalysis(FieldAnalysisRequestHandler.java:102)
        at org.apache.solr.handler.AnalysisRequestHandlerBase.handleRequestBody(AnalysisRequestHandlerBase.java:63)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
        at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1079)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625)
        at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2522)
        at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2511)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:745)

      一开始以为是配置问题,怎么配都不行。后来看了下源码,发现solr5.3.1中 Luecene的Analyzer接口的createComponents方法把第二个参数去掉了。因此修改源码是在所难免了。源码的修改可参考:http://iamyida.iteye.com/blog/2193513。也可以直接获取改好的源码重新打包即可。

      主要修改部分、IKAnalyzer.java

    /** 
         * 重载Analyzer接口,构造分词组件 
         */  
        @Override  
        protected TokenStreamComponents createComponents(String text) {  
            Reader reader = new BufferedReader(new StringReader(text));  
            Tokenizer _IKTokenizer = new IKTokenizer(reader , this.useSmart());  
            return new TokenStreamComponents(_IKTokenizer);  
        } 

      IKTokenizer.java中添加如下构造函数

        public IKTokenizer(AttributeFactory factory, boolean useSmart) {  
            super(factory);  
            offsetAtt = addAttribute(OffsetAttribute.class);  
            termAtt = addAttribute(CharTermAttribute.class);  
            typeAtt = addAttribute(TypeAttribute.class);  
            _IKImplement = new IKSegmenter(input , useSmart);  
        }

      其它都是一些零零碎碎的修改。可查看修改后的源文件。

      新建一个工程(附件中的IK-Analyzer-extra),添加工厂类IKTokenizerFactory,方便程序的扩展和维护。

    package org.wltea.analyzer.util;
    
    import java.util.Map;
    
    import org.apache.lucene.analysis.Tokenizer;
    import org.apache.lucene.analysis.util.TokenizerFactory;
    import org.apache.lucene.util.AttributeFactory;
    import org.wltea.analyzer.lucene.IKTokenizer;
    
    public class IKTokenizerFactory extends TokenizerFactory {
        private boolean useSmart;  
    
        public IKTokenizerFactory(Map<String, String> args) {  
            super(args);  
            useSmart = getBoolean(args, "useSmart", false);  
        }  
      
        @Override  
        public Tokenizer create(AttributeFactory attributeFactory) {  
            Tokenizer tokenizer = new IKTokenizer(attributeFactory,useSmart);  
            return tokenizer;  
        } 
    
    }

      最后是schema.xml中添加如下配置

        <fieldType name="text_ik" class="solr.TextField">
          <!--索引时候的分词器-->
          <analyzer type="index">
            <tokenizer class="org.wltea.analyzer.util.IKTokenizerFactory" useSmart="true"/>
          </analyzer>
          <!--查询时候的分词器-->
          <analyzer type="query">
            <tokenizer class="org.wltea.analyzer.util.IKTokenizerFactory" useSmart="false"/>
          </analyzer>
        </fieldType>

      最后将IK-Analyzer-5.3.1.jar和IK-Analyzer-extra-5.3.1.jar拷贝至solr项目的lib目录下即可。

      另外提醒下各位,IK的源码已经搬迁至这了:http://git.oschina.net/wltea/IK-Analyzer-2012FF/。

      工程文件:

        http://pan.baidu.com/s/1skv1jCp

        http://pan.baidu.com/s/1c1o0gI8

      参考文献:

      http://iamyida.iteye.com/blog/2220474

      http://iamyida.iteye.com/blog/2193513

  • 相关阅读:
    Java 中Timer和TimerTask 定时器和定时任务使用的例子
    PowerDesigner逆向工程导入MYSQL数据库总结
    Powerdesigner 连接mysql 在指定的DSN中,驱动程序和应用程序之间的体系结构不匹配 SQLSTATE = IM014
    关于web.xml中的<welcome-file-list>
    SQL查询四舍五入 解决方法
    HTML页面跳转的5种方法
    easyUI datagrid 列宽自适应(简单 图解)(转)
    navicat for mysql只导出数据表结构(转)
    每一位想有所成就的程序员都必须知道的15件事(走不一样的路,要去做,实践实践再实践,推销自己,关注市场)good
    2017除夕夜的感悟:学习工作不分家,工作生活不分家,读书用兵不分家
  • 原文地址:https://www.cnblogs.com/rwxwsblog/p/5048935.html
Copyright © 2011-2022 走看看