zoukankan      html  css  js  c++  java
  • es 5.0 拼音分词器 mac

    安装方法和ik中文分词器一样,

    先下载:

    https://github.com/medcl/elasticsearch-analysis-pinyin

    执行:

    mvn package;

    打包成功以后,会生成一个target文件夹,在elasticsearch-analysis-ik-master/target/releases目录下,找到elasticsearch-analysis-ik-5.1.1.zip,这就是我们需要的安装文件。解压elasticsearch-analysis-ik-5.1.1.zip,得到下面内容:

    如果mvn 有问题的话,可以将其导入eclipse中,进行maven clean ,maven install 

    知道这个文件夹,将其拷贝出来,并解压,也可以.

    将其放到es安装目录下,文件路径为:

    
    
    

     重启es;

    测试:

     

     中文与拼音结合测试:

    IK+pinyin分词配置

    5.1创建索引与分析器设置

    创建一个索引,并设置index分析器相关属性:

    文档1:
    curl -XPUT "http://localhost:9200/medcl/" -d' { "index": { "analysis": { "analyzer": { "ik_pinyin_analyzer": { "type": "custom", "tokenizer": "ik_smart", "filter": ["my_pinyin", "word_delimiter"] } }, "filter": { "my_pinyin": { "type": "pinyin", "first_letter": "prefix", "padding_char": " " } } } } }'
    文档2:
    
    curl -XPOST http://localhost:9200/medcl/folks/tina -d'{"name":"中华人民共和国国歌"}'

    5.3测试(1)拼音分词

    下面四条命命令都可以匹配”刘德华”:

    1,curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:liu"
    
    2,curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:de"
    
    3,curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:hua"
    
    4,curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:ldh"

    5.4测试(2)IK分词测试

    curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d'
    {
      "query": {
        "match": {
          "name.pinyin": "国歌"
        }
      },
      "highlight": {
        "fields": {
          "name.pinyin": {}
        }
      }
    }'
    结果如下:
    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 1,
        "max_score" : 16.698704,
        "hits" : [
          {
            "_index" : "medcl",
            "_type" : "folks",
            "_id" : "tina",
            "_score" : 16.698704,
            "_source" : {
              "name" : "中华人民共和国国歌"
            },
            "highlight" : {
              "name.pinyin" : [
                "<em>中华人民共和国</em><em>国歌</em>"
              ]
            }
          }
        ]
      }
    }

    5.3测试(4)pinyin+ik分词测试:

    curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d'
    {
      "query": {
        "match": {
          "name.pinyin": "zhonghua"
        }
      },
      "highlight": {
        "fields": {
          "name.pinyin": {}
        }
      }
    }'
    结果如下
    
    
    {
      "took" : 3,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 2,
        "max_score" : 5.9814634,
        "hits" : [
          {
            "_index" : "medcl",
            "_type" : "folks",
            "_id" : "tina",
            "_score" : 5.9814634,
            "_source" : {
              "name" : "中华人民共和国国歌"
            },
            "highlight" : {
              "name.pinyin" : [
                "<em>中华人民共和国</em>国歌"
              ]
            }
          },
          {
            "_index" : "medcl",
            "_type" : "folks",
            "_id" : "andy",
            "_score" : 2.2534127,
            "_source" : {
              "name" : "刘德华"
            },
            "highlight" : {
              "name.pinyin" : [
                "<em>刘德华</em>"
              ]
            }
          }
        ]
      }
    }

    参考文献:

    https://github.com/medcl/elasticsearch-analysis-pinyin

    http://blog.csdn.net/napoay/article/details/53907921

  • 相关阅读:
    无符号数和有符号数字操作的一些注意事项
    C/C++的基本数据类型
    GoLang字符串比较(二)
    Sphinx使用指南
    GoLang字符串比较(一)
    并发与调度亲和性
    golang scheduler工作窃取
    g0
    如何将markdown转换成微信公众号中支持的html
    goroutine是如何被回收的
  • 原文地址:https://www.cnblogs.com/wangchuanfu/p/7239269.html
Copyright © 2011-2022 走看看