zoukankan html css js c++ java

ElasticSearch学习笔记——ik分词添加词库

前置条件是安装ik分词，请参考

1.在ik分词的config下添加词库文件

~/software/apache/elasticsearch-6.2.4/config/analysis-ik$ ls | grep mydic.dic
mydic.dic

内容为

我给祖国献石油

2.配置词库路径，编辑IKAnalyzer.cfg.xml配置文件，添加新增的词库

3.重启es

4.测试

data.json

{
        "analyzer":"ik_max_word",
        "text": "我给祖国献石油"
}

添加之后的ik分词结果

curl -H 'Content-Type: application/json' http://localhost:9200/_analyze?pretty=true -d@data.json
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "给",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "祖国",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "献",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "石油",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

添加之后的ik分词结果，分词结果的tokens中增加了 "我给祖国献石油"

curl -H 'Content-Type: application/json' http://localhost:9200/_analyze?pretty=true -d@data.json
{
  "tokens" : [
    {
      "token" : "我给祖国献石油",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "祖国",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "献",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "石油",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

查看全文

相关阅读:
手机端阻止页面滑动-模板
 window.location各个属性-笔记
 面向对象的编程思想
 异步执行原理
 移动端rem布局实现（vw）
用css3实现摩天轮旋转的动画效果
 js如何从一个数组中随机取出n个不同且不重复的值
 js数组中如何去除重复值？
各大主流流浪器的内核是什么？
javascript数组常用方法

原文地址：https://www.cnblogs.com/tonglin0325/p/14246882.html