zoukankan      html  css  js  c++  java
  • Lucene4:了解评分(explain)机制

    1. 要求

    使用explain()方法深入理解搜索结果评分。

    使用explain()理解搜索结果评分,通过此方法可以方便地看到评分计算的内部运作,
    但它需要的开销是和查询操作一样的。

    2. 实现代码

    package com.clzhang.sample.lucene;
    
    import java.io.File;
    
    import org.apache.lucene.document.Document;
    import org.apache.lucene.index.DirectoryReader;
    import org.apache.lucene.search.Explanation;
    import org.apache.lucene.search.TopDocs;
    import org.apache.lucene.search.ScoreDoc;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.Query;
    import org.apache.lucene.store.FSDirectory;
    import org.apache.lucene.util.Version;
    import org.apache.lucene.queryparser.classic.QueryParser;
    
    //import org.wltea.analyzer.lucene.IKAnalyzer;
    import com.chenlb.mmseg4j.Dictionary;
    import com.chenlb.mmseg4j.analysis.ComplexAnalyzer;
    
    import org.junit.Test;
    
    /**
     * 使用explain()理解搜索结果评分,通过此方法可以方便地看到评分计算的内部运作,
     * 但它需要的开销是和查询操作一样的。
     * @author Administrator
     *
     */
    public class ExplainerDemo {
        // mmseg4j字典路径
        private static final String MMSEG4J_DICT_PATH = "C:\\solr\\news\\conf";
        private static Dictionary dictionary = Dictionary.getInstance(MMSEG4J_DICT_PATH);
        
        // Lucene索引存放路径 
        private static final String LUCENE_INDEX_DIR = "C:\\solr\\news\\data\\index";
        
        @Test
        public void explainIt() throws Exception {
            String keyword = "苏州";
    
            FSDirectory directory = FSDirectory.open(new File(LUCENE_INDEX_DIR));
            DirectoryReader ireader = DirectoryReader.open(directory);
            IndexSearcher searcher = new IndexSearcher(ireader);
            QueryParser parser = new QueryParser(Version.LUCENE_41, "text",
                    new ComplexAnalyzer(dictionary));
            Query query = parser.parse(keyword);
            System.out.println("Query: " + keyword);
    
            TopDocs topDocs = searcher.search(query, 10);
            for (ScoreDoc match : topDocs.scoreDocs) {
                // Generate Explanation
                Explanation explanation = searcher.explain(query, match.doc);
    
                Document doc = searcher.doc(match.doc);
                System.out.println("----------------------");
                System.out.println(doc.get("webTitle"));
                // Output Explanation
                System.out.println(explanation.toString());
            }
            ireader.close();
            directory.close();
        }
    }

    输出:

    Query: 苏州
    ----------------------
    苏州市司法局为何敢于如此明目张胆指鹿为马?
    1525.8616 = (MATCH) weight(text:苏州 in 32116) [DefaultSimilarity], result of:
      1525.8616 = score(doc=32116,freq=57.0 = termFreq=57.0
    ), product of:
        0.99999994 = queryWeight, product of:
          6.315791 = idf(docFreq=313, maxDocs=63907)
          0.15833329 = queryNorm
        1525.8617 = fieldWeight in 32116, product of:
          7.5498343 = tf(freq=57.0), with freq of:
            57.0 = termFreq=57.0
          6.315791 = idf(docFreq=313, maxDocs=63907)
          32.0 = fieldNorm(doc=32116)
    
    ----------------------
    给中共临汾市纪检委陈国荣的实名举报信
    1010.52655 = (MATCH) weight(text:苏州 in 5075) [DefaultSimilarity], result of:
      1010.52655 = score(doc=5075,freq=1.0 = termFreq=1.0
    ), product of:
        0.99999994 = queryWeight, product of:
          6.315791 = idf(docFreq=313, maxDocs=63907)
          0.15833329 = queryNorm
        1010.5266 = fieldWeight in 5075, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.315791 = idf(docFreq=313, maxDocs=63907)
          160.0 = fieldNorm(doc=5075)
    
    ----------------------

    说明:

    上述结果中的fieldNorm值都非常大,是因为在创建索引的时候,已经把包含负面关键词的field的boost值设置为较高的值。这样在查询时,包括负面关键词的东西会显示在前面。

  • 相关阅读:
    12_2 数据分析工具包。
    11_29
    11_28 mongoDB与scrapy框架
    11_28,selenium定位元素,cookies获取
    11_26爬虫find与findall
    day_93_11_25爬虫一requests,项目框架
    11_14flask的启动和orm,反向生成model
    11_13Local与偏函数
    11_12 路由与正则
    day83_11_1 阿里配python使用。
  • 原文地址:https://www.cnblogs.com/nayitian/p/2876938.html
Copyright © 2011-2022 走看看