前置条件是安装ik分词,请参考
1.在ik分词的config下添加词库文件
~/software/apache/elasticsearch-6.2.4/config/analysis-ik$ ls | grep mydic.dic mydic.dic
内容为
我给祖国献石油
2.配置词库路径,编辑IKAnalyzer.cfg.xml配置文件,添加新增的词库
3.重启es
4.测试
data.json
{ "analyzer":"ik_max_word", "text": "我给祖国献石油" }
添加之后的ik分词结果
curl -H 'Content-Type: application/json' http://localhost:9200/_analyze?pretty=true -d@data.json { "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "给", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "祖国", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 2 }, { "token" : "献", "start_offset" : 4, "end_offset" : 5, "type" : "CN_CHAR", "position" : 3 }, { "token" : "石油", "start_offset" : 5, "end_offset" : 7, "type" : "CN_WORD", "position" : 4 } ] }
添加之后的ik分词结果,分词结果的tokens中增加了 "我给祖国献石油"
curl -H 'Content-Type: application/json' http://localhost:9200/_analyze?pretty=true -d@data.json { "tokens" : [ { "token" : "我给祖国献石油", "start_offset" : 0, "end_offset" : 7, "type" : "CN_WORD", "position" : 0 }, { "token" : "祖国", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 1 }, { "token" : "献", "start_offset" : 4, "end_offset" : 5, "type" : "CN_CHAR", "position" : 2 }, { "token" : "石油", "start_offset" : 5, "end_offset" : 7, "type" : "CN_WORD", "position" : 3 } ] }