倒排索引原理
可以查看这里得分词原理https://www.cnblogs.com/LQBlog/articles/5743991.html
分析器处理过程的3步骤
1.字符过滤器:去除字符的特殊字符
2.分词器:将词组分词
3.对分词词组进行操作,比如转大写 分词后的词组替换等
ES内置的几种分析器结果
例句:Set the shape to semi-transparent by calling set_trans(5)
根据指定符号分词器
{ "settings": { "analysis": { "analyzer": { "comma": { "type": "pattern", "pattern":" " #分词符号 } } } } }
分词后结果
Set,the,shape,to,semi-transparent,by,calling,set_trans(5)
标准分析器
适合英文 es默认的分词器
根据单词边界分词 然后去掉特殊符号 最后转小写
分词后结果
set, the, shape, to, semi, transparent, by, calling, set_trans, 5
简单分析器
根据单词边界分词 非单词切割
分词后结果
set, the, shape, to, semi, transparent, by, calling, set, trans
语言分析器
特定语言分析器。自带一套字库
测试分析器
get请求:http://127.0.0.1:9200/_analyze
body:
{ "analyzer":"standard",//分词器 "text":"Set the shape to semi-transparent by calling set_trans(5)"//测试分词的fulltext }
结果:
{ "tokens": [ { "token": "set",//被索引的词 "start_offset": 0,//原文本起始位置 "end_offset": 3,//原文本结束位置 "type": "<ALPHANUM>", "position": 0//第几个出现 }, { "token": "the", "start_offset": 4, "end_offset": 7, "type": "<ALPHANUM>", "position": 1 }, { "token": "shape", "start_offset": 8, "end_offset": 13, "type": "<ALPHANUM>", "position": 2 }, { "token": "to", "start_offset": 14, "end_offset": 16, "type": "<ALPHANUM>", "position": 3 }, { "token": "semi", "start_offset": 17, "end_offset": 21, "type": "<ALPHANUM>", "position": 4 }, { "token": "transparent", "start_offset": 22, "end_offset": 33, "type": "<ALPHANUM>", "position": 5 }, { "token": "by", "start_offset": 34, "end_offset": 36, "type": "<ALPHANUM>", "position": 6 }, { "token": "calling", "start_offset": 37, "end_offset": 44, "type": "<ALPHANUM>", "position": 7 }, { "token": "set_trans", "start_offset": 45, "end_offset": 54, "type": "<ALPHANUM>", "position": 8 }, { "token": "5", "start_offset": 55, "end_offset": 56, "type": "<NUM>", "position": 9 } ] }
查询某个文档分词结果
GET /${index}/${type}/${id}/_termvectors?fields=${fields_name}
查询指定索引指定分词器分词结果
GET /${index}/_analyze?analyzer={分词器名字}&text=2,3,4,5,100-100'
添加分析器
在已有索引新增分析器
POST /{index}/_close #目标索引关闭,执行需要的更新操作 期间不能对索引进行操作 PUT /{index}/_settings { "settings": { "analysis": { "analyzer": { "ik_word": {#要新增的分析器 "tokenizer": "ik_max_word" } } } } } POST /{index}/_open #索引打开