zoukankan      html  css  js  c++  java
  • ElasticSearch使用IK中文分词---安装步骤记录

    提示1:必须保证之前的ES中不存在index, 否则ES集群无法启动, 会提示red!

    提示2:下载的IK如果太新,会报错 TokenStream被重载Caused by: java.lang.VerifyError: class org.wltea.analyzer.lucene.IKAnalyzer overrides final method tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream; 这个时候可以换一个旧的版本.

    1.下载IK字典配置文件

    http://download.csdn.net/detail/xxx0624/8464751

    然后解压该文件(可以得到一个ik文件夹)并把它放到ES的config文件夹下.

    2.下载 ik.jar

    http://download.csdn.net/detail/xxx0624/8464743

    下载后进入.plugins文件夹(若不存在 新建一个):

    新建一个名字为analysis-ik的文件夹,再把下载的jar文件放入文件夹内

    以上链接的jar包是最新的 可能不适用  你需要到github上下载旧版本的代码, 然后用mvn clean package来进行编译.

    3.修改elasticsearch.yml(config文件夹中)

    添加:

    index:
    analysis:
    analyzer:
    ik:
    alias: [ik_analyzer]
    type: org.elasticsearch.index.analysis.IkAnalyzerProvider
    ik_max_word:
    type: ik
    use_smart: false
    ik_smart:
    type: ik
    use_smart: true

    附上官方说明:

    IK Analysis for ElasticSearch
    ==================================
    
    The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary.
    
    
    Version
    -------------
     master                      | 0.90.0 -> master
     1.1.4                       | 0.90.0
     1.1.3                       | 0.20.2
     1.1.2                       | 0.19.x
     1.0.0                       | 0.16.2 -> 0.19.0   
    
    
    Install
    -------------
    you can download this plugin from RTF project(https://github.com/medcl/elasticsearch-rtf)
    https://github.com/medcl/elasticsearch-rtf/tree/master/elasticsearch/plugins/analysis-ik
    https://github.com/medcl/elasticsearch-rtf/tree/master/elasticsearch/config/ik
    
    <del>also remember to download the dict files,unzip these dict file into your elasticsearch's config folder,such as: your-es-root/config/ik</del>
    
    you need a service restart after that!
    
    Dict Configuration (es-root/config/ik/IKAnalyzer.cfg.xml)
    -------------
    
    https://github.com/medcl/elasticsearch-analysis-ik/blob/master/config/ik/IKAnalyzer.cfg.xml
    
    <pre><?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">  
    <properties>  
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 -->    
        <entry key="ext_dict">custom/mydict.dic;custom/sougou.dict</entry>     
         <!--用户可以在这里配置自己的扩展停止词字典-->
        <entry key="ext_stopwords">custom/ext_stopword.dic</entry>     
    </properties>
    
    </pre>
    
    Analysis Configuration (elasticsearch.yml)
    -------------
    
    <Pre>
    index:
      analysis:                   
        analyzer:      
          ik:
              alias: [ik_analyzer]
              type: org.elasticsearch.index.analysis.IkAnalyzerProvider
          ik_max_word:
              type: ik
              use_smart: false
          ik_smart:
              type: ik
              use_smart: true
    </pre>
    Or
    <pre>
    index.analysis.analyzer.ik.type : "ik"
    </pre>
    
    you can set your prefer segment mode,default `use_smart` is false.
    
    Mapping Configuration
    -------------
    
    Here is a quick example:
    1.create a index
    
    <pre>
    
    curl -XPUT http://localhost:9200/index
    
    </pre>
    
    2.create a mapping
    
    <pre>
    
    curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
    {
        "fulltext": {
                 "_all": {
                "indexAnalyzer": "ik",
                "searchAnalyzer": "ik",
                "term_vector": "no",
                "store": "false"
            },
            "properties": {
                "content": {
                    "type": "string",
                    "store": "no",
                    "term_vector": "with_positions_offsets",
                    "indexAnalyzer": "ik",
                    "searchAnalyzer": "ik",
                    "include_in_all": "true",
                    "boost": 8
                }
            }
        }
    }'
    </pre>
    
    3.index some docs
    
    <pre>
    
    curl -XPOST http://localhost:9200/index/fulltext/1 -d'
    {content:"美国留给伊拉克的是个烂摊子吗"}
    '
    
    curl -XPOST http://localhost:9200/index/fulltext/2 -d'
    {content:"公安部:各地校车将享最高路权"}
    '
    
    curl -XPOST http://localhost:9200/index/fulltext/3 -d'
    {content:"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}
    '
    
    curl -XPOST http://localhost:9200/index/fulltext/4 -d'
    {content:"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
    '
    </pre>
    
    4.query with highlighting
    
    <pre>
    
    curl -XPOST http://localhost:9200/index/fulltext/_search  -d'
    {
        "query" : { "term" : { "content" : "中国" }},
        "highlight" : {
            "pre_tags" : ["<tag1>", "<tag2>"],
            "post_tags" : ["</tag1>", "</tag2>"],
            "fields" : {
                "content" : {}
            }
        }
    }
    '
    </pre>
    
    here is the query result
    
    <pre>
    
    {
        "took": 14,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 2,
            "max_score": 2,
            "hits": [
                {
                    "_index": "index",
                    "_type": "fulltext",
                    "_id": "4",
                    "_score": 2,
                    "_source": {
                        "content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
                    },
                    "highlight": {
                        "content": [
                            "<tag1>中国</tag1>驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首 "
                        ]
                    }
                },
                {
                    "_index": "index",
                    "_type": "fulltext",
                    "_id": "3",
                    "_score": 2,
                    "_source": {
                        "content": "中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"
                    },
                    "highlight": {
                        "content": [
                            "均每天扣1艘<tag1>中国</tag1>渔船 "
                        ]
                    }
                }
            ]
        }
    }
    
    </pre>
    
    
    have fun.
  • 相关阅读:
    repeater 设置分页
    table表格合并
    repeater分页
    http错误500.19 错误代码 0x80070021
    asp文件上传和下载
    asp:Repeater控件使用
    vs2013标签
    "Uncaught SyntaxError: Unexpected token <"错误完美解决
    监控系统说明文档
    限制input输入类型(多种方法实现)
  • 原文地址:https://www.cnblogs.com/xxx0624/p/4307252.html
Copyright © 2011-2022 走看看