zoukankan      html  css  js  c++  java
  • Solr入门之(8)中文分词器配置

    Solr中虽然提供了一个中文分词器,但是效果很差,可以使用IKAnalyzerMmseg4j 或其他中文分词器。

    一、IKAnalyzer分词器配置

      1、下载IKAnalyzerIKAnalyzer2012_u6)包,当前使用版本IKAnalyzer2012_u6.jar

      2、将IKAnalyzer2012_u6包下的IKAnalyzer.cfg.xmlstopword.dic复制到solr应用/WEB-INF/classes下。

      3、在${solr_home}/[core路径下]/conf/schema.xml中增加一个自定义fieldType

    <!-- 中文IK分词 -->
        <fieldType name="text_ik_analyzer" positionIncrementGap="100" class="solr.TextField">
            <analyzer type="index">
                <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory"/>
                <filter class="solr.StopFilterFactory" enablePositionIncrements="true" words="stopwords.txt" ignoreCase="true"/>
                <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" catenateAll="0" catenateNumbers="1" catenateWords="1" generateNumberParts="1" generateWordParts="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory"/>
                <filter class="solr.SynonymFilterFactory" ignoreCase="true" expand="true" synonyms="synonyms.txt"/>
                <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
                <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" catenateAll="0" catenateNumbers="0" catenateWords="0" generateNumberParts="1" generateWordParts="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
        </fieldType>

     

      4、在schema.xml中增加一个字段:

    <field name="test_ik_field" type="text_ik_analyzer" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />

      5、启动solr应用,即可在客户端界面查看分词效果。

        


    二、Mmseg4j分词器: 

     配置方式与上面类似,暂时未定义。

     

     

     

  • 相关阅读:
    Different AG groups have the exactly same group_id value if the group names are same and the ‘CLUSTER_TYPE = EXTERNAL/NONE’
    An example of polybase for Oracle
    use azure data studio to create external table for oracle
    Missing MSI and MSP files
    You may fail to backup log or restore log after TDE certification/key rotation.
    Password is required when adding a database to AG group if the database has a master key
    Use KTPASS instead of adden to configure mssql.keytab
    ardunio+舵机
    android webview 全屏100%显示图片
    glide 长方形图片显示圆角问题
  • 原文地址:https://www.cnblogs.com/tq03/p/3607964.html
Copyright © 2011-2022 走看看