POST _analyze { "text": [ "Lucene is cool", "Elasticsearch builds on top of lucene", "Elasticsearch rocks", "Elastic is the company behind ELK stack", "elk rocks", "elasticsearch is rock solid" ] } PUT /blogs_completion/ { "mappings": { "tech": { "properties": { "body": { "type": "completion" }, "body1": { "type": "text", "analyzer":"ik_smart" } } } } } DELETE /blogs_completion/ POST _bulk/?refresh=true { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "Lucene is cool","body1": "Lucene is cool"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "Elasticsearch builds on top of lucene","body1":"Elasticsearch builds on top of lucene"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "Elasticsearch rocks","body1":"Elasticsearch rocks"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "Elastic is the company behind ELK stack","body1":"Elastic is the company behind ELK stack"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "the elk stack rocks","body1":"the elk stack rocks"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "elasticsearch is rock solid","body1":"elasticsearch is rock solid"} POST blogs_completion/_search?pretty { "size": 0, "suggest": { "blog-suggest": { "prefix": "Elasticsearc b", "completion": { "field": "body" } } } } POST /blogs_completion/_search?pretty { "suggest": { "blog-suggest": { "text": "biuds", "term": { "suggest_mode": "missing", "field": "body1" } } } } POST blogs_completion/_search { "size": 0, "suggest": { "blog-suggest": { "text": "Elastcserch rock", "phrase": { "field": "body1" } } } }
使用es搞定自动完成功能,使用es提供的suggested方式,suggested支持三种匹配模式:
index设置mapping时:检索精准度 completion>phrase>term
completion模式需要设置对应字段type为:completion
phrase模式和term模式需要设置对应字段type为:text
completion直接返回的option数组结果中是根据左前缀匹配出来的;
phrase在涉及的文档中会做词组的匹配;
term会针对单个词的纠错匹配;(实现方式为Levenstein edit distance,在一定范围内移动字符能匹配就可以作为结果返回)
结果召回率上:completion<phrase<term
所以在自动完成功能中要有完整的方案,如果没有匹配项,应该使用term分词后的纠错匹配来增加数据召回率。