zoukankan      html  css  js  c++  java
  • Lucene4:了解评分(explain)机制

    1. 要求

    使用explain()方法深入理解搜索结果评分。

    使用explain()理解搜索结果评分,通过此方法可以方便地看到评分计算的内部运作,
    但它需要的开销是和查询操作一样的。

    2. 实现代码

    package com.clzhang.sample.lucene;
    
    import java.io.File;
    
    import org.apache.lucene.document.Document;
    import org.apache.lucene.index.DirectoryReader;
    import org.apache.lucene.search.Explanation;
    import org.apache.lucene.search.TopDocs;
    import org.apache.lucene.search.ScoreDoc;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.Query;
    import org.apache.lucene.store.FSDirectory;
    import org.apache.lucene.util.Version;
    import org.apache.lucene.queryparser.classic.QueryParser;
    
    //import org.wltea.analyzer.lucene.IKAnalyzer;
    import com.chenlb.mmseg4j.Dictionary;
    import com.chenlb.mmseg4j.analysis.ComplexAnalyzer;
    
    import org.junit.Test;
    
    /**
     * 使用explain()理解搜索结果评分,通过此方法可以方便地看到评分计算的内部运作,
     * 但它需要的开销是和查询操作一样的。
     * @author Administrator
     *
     */
    public class ExplainerDemo {
        // mmseg4j字典路径
        private static final String MMSEG4J_DICT_PATH = "C:\\solr\\news\\conf";
        private static Dictionary dictionary = Dictionary.getInstance(MMSEG4J_DICT_PATH);
        
        // Lucene索引存放路径 
        private static final String LUCENE_INDEX_DIR = "C:\\solr\\news\\data\\index";
        
        @Test
        public void explainIt() throws Exception {
            String keyword = "苏州";
    
            FSDirectory directory = FSDirectory.open(new File(LUCENE_INDEX_DIR));
            DirectoryReader ireader = DirectoryReader.open(directory);
            IndexSearcher searcher = new IndexSearcher(ireader);
            QueryParser parser = new QueryParser(Version.LUCENE_41, "text",
                    new ComplexAnalyzer(dictionary));
            Query query = parser.parse(keyword);
            System.out.println("Query: " + keyword);
    
            TopDocs topDocs = searcher.search(query, 10);
            for (ScoreDoc match : topDocs.scoreDocs) {
                // Generate Explanation
                Explanation explanation = searcher.explain(query, match.doc);
    
                Document doc = searcher.doc(match.doc);
                System.out.println("----------------------");
                System.out.println(doc.get("webTitle"));
                // Output Explanation
                System.out.println(explanation.toString());
            }
            ireader.close();
            directory.close();
        }
    }

    输出:

    Query: 苏州
    ----------------------
    苏州市司法局为何敢于如此明目张胆指鹿为马?
    1525.8616 = (MATCH) weight(text:苏州 in 32116) [DefaultSimilarity], result of:
      1525.8616 = score(doc=32116,freq=57.0 = termFreq=57.0
    ), product of:
        0.99999994 = queryWeight, product of:
          6.315791 = idf(docFreq=313, maxDocs=63907)
          0.15833329 = queryNorm
        1525.8617 = fieldWeight in 32116, product of:
          7.5498343 = tf(freq=57.0), with freq of:
            57.0 = termFreq=57.0
          6.315791 = idf(docFreq=313, maxDocs=63907)
          32.0 = fieldNorm(doc=32116)
    
    ----------------------
    给中共临汾市纪检委陈国荣的实名举报信
    1010.52655 = (MATCH) weight(text:苏州 in 5075) [DefaultSimilarity], result of:
      1010.52655 = score(doc=5075,freq=1.0 = termFreq=1.0
    ), product of:
        0.99999994 = queryWeight, product of:
          6.315791 = idf(docFreq=313, maxDocs=63907)
          0.15833329 = queryNorm
        1010.5266 = fieldWeight in 5075, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.315791 = idf(docFreq=313, maxDocs=63907)
          160.0 = fieldNorm(doc=5075)
    
    ----------------------

    说明:

    上述结果中的fieldNorm值都非常大,是因为在创建索引的时候,已经把包含负面关键词的field的boost值设置为较高的值。这样在查询时,包括负面关键词的东西会显示在前面。

  • 相关阅读:
    程序员的7中武器
    需要强化的知识
    微软中国联合小i推出MSN群Beta 不需任何插件
    XML Notepad 2006 v2.0
    Sandcastle August 2006 Community Technology Preview
    [推荐] TechNet 广播 SQL Server 2000完结篇
    《太空帝国 4》(Space Empires IV)以及 xxMod 英文版 中文版 TDM Mod 英文版 中文版
    IronPython 1.0 RC2 更新 1.0.60816
    Microsoft .NET Framework 3.0 RC1
    《Oracle Developer Suite 10g》(Oracle Developer Suite 10g)V10.1.2.0.2
  • 原文地址:https://www.cnblogs.com/nayitian/p/2876938.html
Copyright © 2011-2022 走看看