zoukankan      html  css  js  c++  java
  • ElasticSearch学习笔记——ik分词添加词库

    前置条件是安装ik分词,请参考

    Elasticsearch学习笔记——分词

    1.在ik分词的config下添加词库文件

    ~/software/apache/elasticsearch-6.2.4/config/analysis-ik$ ls | grep mydic.dic
    mydic.dic
    

    内容为

    我给祖国献石油
    

    2.配置词库路径,编辑IKAnalyzer.cfg.xml配置文件,添加新增的词库

    3.重启es

    4.测试

    data.json

    {
            "analyzer":"ik_max_word",
            "text": "我给祖国献石油"
    }
    

    添加之后的ik分词结果

    curl -H 'Content-Type: application/json' http://localhost:9200/_analyze?pretty=true -d@data.json
    {
      "tokens" : [
        {
          "token" : "我",
          "start_offset" : 0,
          "end_offset" : 1,
          "type" : "CN_CHAR",
          "position" : 0
        },
        {
          "token" : "给",
          "start_offset" : 1,
          "end_offset" : 2,
          "type" : "CN_CHAR",
          "position" : 1
        },
        {
          "token" : "祖国",
          "start_offset" : 2,
          "end_offset" : 4,
          "type" : "CN_WORD",
          "position" : 2
        },
        {
          "token" : "献",
          "start_offset" : 4,
          "end_offset" : 5,
          "type" : "CN_CHAR",
          "position" : 3
        },
        {
          "token" : "石油",
          "start_offset" : 5,
          "end_offset" : 7,
          "type" : "CN_WORD",
          "position" : 4
        }
      ]
    }
    

    添加之后的ik分词结果,分词结果的tokens中增加了 "我给祖国献石油"

    curl -H 'Content-Type: application/json' http://localhost:9200/_analyze?pretty=true -d@data.json
    {
      "tokens" : [
        {
          "token" : "我给祖国献石油",
          "start_offset" : 0,
          "end_offset" : 7,
          "type" : "CN_WORD",
          "position" : 0
        },
        {
          "token" : "祖国",
          "start_offset" : 2,
          "end_offset" : 4,
          "type" : "CN_WORD",
          "position" : 1
        },
        {
          "token" : "献",
          "start_offset" : 4,
          "end_offset" : 5,
          "type" : "CN_CHAR",
          "position" : 2
        },
        {
          "token" : "石油",
          "start_offset" : 5,
          "end_offset" : 7,
          "type" : "CN_WORD",
          "position" : 3
        }
      ]
    }
    

      

  • 相关阅读:
    Java垃圾收集学习笔记
    好IT男不能“淫”-谈IT人员目前普遍存在的“A情绪”
    亲密接触Redis-第三天(Redis的Load Balance)
    数据库面试常问的一些基本概念
    Mina的ssl加密
    Mina入门教程(二)----Spring4 集成Mina
    Mina Session
    Mina入门实例(一)
    java keytool生成ssl加密密钥
    Python学习(一)——数据类型
  • 原文地址:https://www.cnblogs.com/tonglin0325/p/14246882.html
Copyright © 2011-2022 走看看