zoukankan      html  css  js  c++  java
  • lucene内置的评分函数

    For multiterm queries, Lucene takes the Boolean modelTF/IDF, and the vector space model and combines them in a single efficient package that collects matching documents and scores them as it goes.

    A multiterm query like

    GET /my_index/doc/_search
    {
      "query": {
        "match": {
          "text": "quick fox"
        }
      }
    }

    As soon as a document matches a query, Lucene calculates its score for that query, combining the scores of each matching term. The formula used for scoring is called the practical scoring function. 

    score(q,d)  =  

                queryNorm(q)  

              · coord(q,d)    

              · ∑ (           

                    tf(t in d)   

                  · idf(t)²      

                  · t.getBoost() 

                  · norm(t,d)    

                ) (t in q)    

    score(q,d) is the relevance score of document d for query q.

    queryNorm(q) is the query normalization factor (new).

    coord(q,d) is the coordination factor (new).

     

    The sum of the weights for each term t in the query q for document d.

    tf(t in d) is the term frequency for term t in document d.

    idf(t) is the inverse document frequency for term t.

    t.getBoost() is the boost that has been applied to the query (new).

    norm(t,d) is the field-length norm, combined with the index-time field-level boost, if any. (new). 官方不推荐用index-time find level

    You should recognize scoretf, and idf. The queryNormcoordt.getBoost, and norm are new.

    We will talk more about query-time boosting later in this chapter, but first let’s get query normalization, coordination, and index-time field-level boosting out of the way.

    Query Normalization Factor

    queryNorm = 1 / √sumOfSquaredWeights 

    The sumOfSquaredWeights is calculated by adding together the IDF of each term in the query, squared.

    The same query normalization factor is applied to every document, and you have no way of changing it. For all intents and purposes, it can be ignored. (每个文档都有这个因子,说明它没有什么卵用!)

    Query Coordination

    The coordination factor (coord) is used to reward documents that contain a higher percentage of the query terms. The more query terms that appear in the document, the greater the chances that the document is a good match for the query.

    The coordination factor results in the document that contains all three terms being much more relevant than the document that contains just two of them.

  • 相关阅读:
    LINQ To SQL: Eager Loading
    返回JSon格式数据
    Tips
    Easyui的DateBox日期格式化
    jquery treeview 展开指定节点,选中指定节点
    jquery treeview 功能参数
    Javascript 中 ShowModalDialog 的使用方法
    GetDlgItem用法
    20个开源项目托管站点推荐
    DLINQ(十): 分层构架的例子
  • 原文地址:https://www.cnblogs.com/bonelee/p/6475880.html
Copyright © 2011-2022 走看看