ES 相似度算法设置（续）

zoukankan html css js c++ java

ES 相似度算法设置（续）
Tuning BM25

One of the nice features of BM25 is that, unlike TF/IDF, it has two parameters that allow it to be tuned:

k1
This parameter controls how quickly an increase in term frequency results in term-frequency saturation. The default value is 1.2. Lower values result in quicker saturation, and higher values in slower saturation.
b
This parameter controls how much effect field-length normalization should have. A value of 0.0disables normalization completely, and a value of 1.0 normalizes fully. The default is 0.75.

The practicalities of tuning BM25 are another matter. The default values for k1 and b should be suitable for most document collections, but the optimal values really depend on the collection. Finding good values for your collection is a matter of adjusting, checking, and adjusting again.

The similarity algorithm can be set on a per-field basis. It’s just a matter of specifying the chosen algorithm in the field’s mapping:
PUT /my_index { "mappings": { "doc": { "properties": { "title": { "type": "string", "similarity": "BM25"

}, "body": { "type": "string", "similarity": "default"

} } } }
The title field uses BM25 similarity.

The body field uses the default similarity (see Lucene’s Practical Scoring Function).

Currently, it is not possible to change the similarity mapping for an existing field. You would need to reindex your data in order to do that.
Configuring BM25

Configuring a similarity is much like configuring an analyzer. Custom similarities can be specified when creating an index. For instance:

PUT /my_index { "settings": { "similarity": { "my_bm25": {

"type": "BM25", "b": 0

} } }, "mappings": { "doc": { "properties": { "title": { "type": "string", "similarity": "my_bm25"

}, "body": { "type": "string", "similarity": "BM25"

} } } } }

参考：https://www.elastic.co/guide/en/elasticsearch/guide/current/changing-similarities.html
查看全文

相关阅读:
linux常用命令（持续更新）
nginx和redis
网络编程BIO、NIO、AIO
同步和异步、阻塞和非阻塞
 执行一条sql语句过程
 InnoDB 的B+树索引原理
 InnoDB 为啥要选择B+树来存储数据
 MySQL数据库引擎简介
 java并发编程（同步、同步容器、线程池）
putty登录linux遭refuse

原文地址：https://www.cnblogs.com/bonelee/p/6472828.html

	The `title` field uses BM25 similarity.
	The `body` field uses the default similarity (see Lucene’s Practical Scoring Function).

ES 相似度算法设置（续）

Tuning BM25

Configuring BM25