zoukankan      html  css  js  c++  java
  • [ solr入门 ] 在schema.xml中加入中文分词(IKAnalyzer)

    http://www.cnblogs.com/huangfox/archive/2012/02/08/2342881.html

    一文中介绍的怎么将solr发布到eclipse中,现在就在原有的基础上将IKAnalyzer加入。

    1.下载IKAnalyzer的源码,将其复制到solr3.5项目中,如下图:

    2.在schema.xml配置IKAnalyzer

    <!-- IKAnalyzer3.2.8 中文分词-->
    	<fieldType name="text" class="solr.TextField">
    		<analyzer type="index">
    			<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory"  isMaxWordLength="false"/>
    				<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
                    <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
            <analyzer type="query">
    			<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="true"/>
    				<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
                    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                    <filter class="solr.LowerCaseFilterFactory"/>
    		</analyzer>   
        </fieldType>
    

    3.启动solr进行验证

    在field中选择type,并输入test,在field value中输入一段中文,Analyze既可以看到分词效果。

    verbose output 选项可以查看分词详细信息。

    具体的schema.xml的配置可以查看solr wiki:

    http://wiki.apache.org/solr/SchemaXml

    Data Types
    
    The <types> section allows you to define a list of <fieldtype> declarations you wish to use in your schema, along with the underlying Solr class that should be used for that type, as well as the default options you want for fields that use that type.
    
    Any subclass of FieldType may be used as a field type class, using either its full package name, or the "solr" alias if it is in the default Solr package. For common numeric types (integer, float, etc...) there are multiple implementations provided depending on your needs, please see SolrPlugins for information on how to ensure that your own custom Field Types can be loaded into Solr.
    
    Common options that field types can have are...
    sortMissingLast=true|false
    sortMissingFirst=true|false
    indexed=true|false
    stored=true|false
    multiValued=true|false
    omitNorms=true|false
    omitTermFreqAndPositions=true|false  Solr1.4
    omitPositions|false  Solr3.4
    positionIncrementGap=N
    TextFields can also support Analyzers with highly configurable Tokenizers and Token Filters.
    
    Field types that store text (TextField, StrField) support compression of stored contents:
    
    compressed=true|false
    compressThreshold=<integer>
    compressThreshold is the minimum length required for text compression to be invoked. This applies only if compressed=true; a common pattern is to set compressThreshold on the field type definition, and turn compression on and off in the individual field definitions.
    

      

  • 相关阅读:
    URLProtocol服务协议
    ODBC、OLEDB和ADO之间的关系 ,以及性能比较
    如何在VS2015查看C#界面窗体里的控件层次
    SpeechVoiceSpeakFlags枚举类型的详细解释
    SQL中遇到多条相同内容只取一条的最简单实现方法
    flink elasticsearch sink table 忽略部分字段开发
    flink elasticsearch source table 集成elasticsearch-hadoop connector开发
    记一次python 协程给合多线程死锁问题
    kubernetes gitlab runner java maven ci/cd 整体方案示例
    某云elasticsearch节点失效,手动重置primary,迁移分区
  • 原文地址:https://www.cnblogs.com/huangfox/p/2342915.html
Copyright © 2011-2022 走看看