zoukankan      html  css  js  c++  java
  • Build IKAnalyzer With Solr 5.1.0

    中文分詞裡IKAnalyzer和結巴是大家比較常用的分詞器, 不過IKAnalyzer已經很久沒有更新了, IKAnalyzer中文分词器V2012使用手册也跟IK Analyer 2012-FF Hotfix 1對不起來。我自己觀察的結果是

    1. IKAnalyzer中文分词器V2012使用手册是IK Analyer 2012 upgrade 6的使用手册, 不是IK Analyer 2012-FF Hotfix 1的使用手册
    2. IK Analyer 2012 upgrade 6支援Lucene 3.X API, 不支援Lucene 4.X API
    3. IK Analyer 2012-FF Hotfix 1支援Lucene 4.X API, 不支援Lucene 5.X API
    4. 如果你硬要在Solr 5.1.0上使用IK Analyer 2012-FF Hotfix 1, 會產生下列錯誤訊息

    java.lang.AbstractMethodError
    at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:101)
    at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:101)
    at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:179)
    at org.apache.lucene.document.Field.tokenStream(Field.java:556)
    at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:606)
    at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
    at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)
    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458)
    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1350)
    at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
    at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163)
    at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:947)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1102)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:703)
    at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
    at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250)
    at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
    at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:103)
    at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
    at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
    at org.eclipse.jetty.server.Server.handle(Server.java:364)
    at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
    at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
    at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
    at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
    at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
    at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:745)
    

    另外在IK Analyer 2012-FF Hotfix 1, IKTokenizerFactory被拿掉了, 如果你要在schema.xml裡使用filter element, 就會用到IKTokenizerFactory, 語法類似

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    <fieldType name="text_ik" class="solr.TextField">
      <analyzer type="index">
        <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false" />
        <filter class="solr.LowerCaseFilterFactory" min="0" max="32764" />
      </analyzer>
      <analyzer type="query">
        <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true" />
        <filter class="solr.LengthFilterFactory" min="2" max="7" />
      </analyzer>
    </fieldType>
    

    當然如果你沒有要使用任何的filters, 是可以不需要IKTokenizerFactory, 語法可以簡化成

    1
    2
    3
    4
    
    <fieldType name="text_ik" class="solr.TextField">
      <analyzer type="index" useSmart="false" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
      <analyzer type="query" useSmart="true" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
    </fieldType>
    

    在程式碼的差異上, Tokenizer的constructor不需要再提供input reader當參數, Analyzer class的createComponents method也不需要再提供input reader當參數,TokenizerFactory也是不再使用input reader當參數。

    詳細source code請到https://github.com/EugenePig/ik-analyzer-solr5下載

  • 相关阅读:
    我回来了
    wget 官方jdk
    linux rpm命令安装卸载 初步使用
    关于一些对location认识的误区(转)
    直接插入排序
    冒泡排序
    Wireshark下TCP三次握手四次挥手
    linux内存使用率详解
    Linux下硬盘使用率详解及shell脚本实现
    Linux下CPU使用率详解
  • 原文地址:https://www.cnblogs.com/seaspring/p/5587013.html
Copyright © 2011-2022 走看看