zoukankan      html  css  js  c++  java
  • lucene3.0_IndexSearcher的基础使用及注意事项

    较之lucene2.4版本,lucene3.0在indexSearcher这块变动比较大,从api可以直观的了解到。

    基类Searcher与搜索有关的方法:

     void search(Query query, Collector results) 
              Lower-level search API.
     void search(Query query, Filter filter, Collector results) 
              Lower-level search API.
     TopDocs search(Query query, Filter filter, int n) 
              Finds the top n hits for query, applying filter if non-null.
     TopFieldDocs search(Query query, Filter filter, int n, Sort sort) 
              Search implementation with arbitrary sorting.
     TopDocs search(Query query, int n) 
              Finds the top n hits for query.
    abstract  void search(Weight weight, Filter filter, Collector results) 
              Lower-level search API.
    abstract  TopDocs search(Weight weight, Filter filter, int n) 
              Expert: Low-level search implementation.
    abstract  TopFieldDocs search(Weight weight, Filter filter, int n, Sort sort) 
              Expert: Low-level search implementation with arbitrary sorting.

    该文将讲解上表格中黄色高亮的3个方法:

    1.search(Query query , int n)

    2.search(Query query , Collector results)

    3.search(Query query , Filter filter , int n , Sort sort)

    --------------------------------------------------------------------------------

    1.search(Query query , int n)示例

    取出前n条目标结果。

    publicvoid searcher(String queryString){
    try {
    FSDirectory dir
    = SimpleFSDirectory.open(new File("d:/20101015index"));
    //注意点1:创建IndexSearcher实例是传入IndexReader实例还是Directory实例呢?
    IndexReader reader = IndexReader.open(dir);
    IndexSearcher searcher
    =new IndexSearcher(reader);
    //使用:search(Query query, int n)
    QueryParser parser =new QueryParser(Version.LUCENE_30, "f1", new StandardAnalyzer(Version.LUCENE_30));
    Query query
    = parser.parse(queryString);
    TopDocs tds
    = searcher.search(query, 5);
    ScoreDoc[] sd
    = tds.scoreDocs;
    for (int i =0; i < sd.length; i++) {
    System.out.println(reader.document(sd[i].doc));
    //注意点2:怎么查看每个文档的打分的详情。
    //explain(Weight weight, int doc)
    //Expert: low-level implementation method Returns an Explanation that describes how doc scored against weight.
    System.out.println("Explanation:"+ (searcher.explain(query, sd[i].doc)));
    }
    }
    catch (IOException e) {
    e.printStackTrace();
    }
    catch (ParseException e) {
    e.printStackTrace();
    }
    }

    注意点:

    1.创建IndexSearcher实例是传入IndexReader实例还是Directory实例呢?

    详情请参见:

    lucene问题_IndexSearcher初始化,IndexSearcher(Directory dir)和IndexSearcher(IndexReader reader)有什么区别?到底使用那个更合理?

    2.查看每个目标文档的打分的详细情况。

    在网上看到有朋友问到这个,故在此提出来。

    3.真实的项目请不要像上面代码那样——将indexReader和IndexSearcher的实例在搜索方法中创建,为了节约系统开销、提高效率,

    应该将这些实例作为单例模式。

    这里做了反面教材了 = =!

    2.search(Query query , Collector results)

    在介绍这个方法之前,先来了解下Collector:

     Collectors are primarily meant to be used to gather raw results from a search, and implement sorting or custom result filtering, collation, etc.

    先重点了解:

    TopScoreDocCollector is a concrete subclass TopDocsCollector and sorts according to score + docID. 

    This is used internally by the IndexSearcher search methods that do not take an explicitSort. It is likely the most frequently used collector.

    他是最常用的collector子类,是默认相关度排序的。下面给出一个实例,使用TopScoreDocCollector进行结果的收集,并提供简单的分页功能。

    publicvoid searcher(String queryString ,int start, int howMany){
    try {
    FSDirectory dir
    = SimpleFSDirectory.open(new File("d:/20101015index"));
    //注意点1:创建IndexSearcher实例是传入IndexReader实例还是Directory实例呢?
    IndexReader reader = IndexReader.open(dir);
    IndexSearcher searcher
    =new IndexSearcher(reader);
    //使用:search(Query query , Collector results)
    QueryParser parser =new QueryParser(Version.LUCENE_30, "f1", new StandardAnalyzer(Version.LUCENE_30));
    Query query
    = parser.parse(queryString);
    int hm = start+howMany ;
    TopScoreDocCollector res
    = TopScoreDocCollector.create(hm, false);
    System.out.println(
    "total hits :"+res.getTotalHits());
    searcher.search(query, res);
    //注意点2:这里可以控制分页。
    TopDocs tds = res.topDocs(start, howMany);
    ScoreDoc[] sd
    = tds.scoreDocs;
    for (int i =0; i < sd.length; i++) {
    System.out.println(reader.document(sd[i].doc));
    // System.out.println("Explanation:" + (searcher.explain(query, sd[i].doc)));
    }
    }
    catch (IOException e) {
    e.printStackTrace();
    }
    catch (ParseException e) {
    e.printStackTrace();
    }
    }

    3.search(Query query , Filter filter , int n , Sort sort)

    这种方式的排序很简单,直接给出实例代码,注意的问题其他文章有详细说明。

    FSDirectory dir = SimpleFSDirectory.open(new File("d:/20101015index"));
    //注意点1:创建IndexSearcher实例是传入IndexReader实例还是Directory实例呢?
    IndexReader reader = IndexReader.open(dir);
    IndexSearcher searcher
    =new IndexSearcher(reader);
    //使用:search(Query query , Filter filter , int n , Sort sort)
    QueryParser parser =new QueryParser(Version.LUCENE_30, "f1", new StandardAnalyzer(Version.LUCENE_30));
    Query query
    = parser.parse(queryString);
    //sort
    SortField sf = new SortField("f1", SortField.INT);
    Sort sort = new Sort(sf);
    TopDocs tds = searcher.search(query, null, 5
    , sort);
    ScoreDoc[] sd
    = tds.scoreDocs;
    for (int i =0; i < sd.length; i++) {
    System.out.println(reader.document(sd[i].doc));
    //注意点2:怎么查看每个文档的打分的详情。
    //explain(Weight weight, int doc)
    //Expert: low-level implementation method Returns an Explanation that describes how doc scored against weight.
    System.out.println("Explanation:"+ (searcher.explain(query, sd[i].doc)));
    }

    注意点:

    lucene问题_检索结果怎么排序?对于不同类型(例如int型)的字段排序有什么区别吗?

    lucene问题_怎么对多个字段进行排序?

    ---------------------------------------------------------

    对于lucene3.0检索的基础使用就先介绍这些了,关于搜索应该重点关注:

    1.collector的选用

    2.分页,及分页效率问题

    3.排序,及排序的效率问题

    4.多索引搜索

    5.实际项目中实时搜索和效率的保障

  • 相关阅读:
    小议如何使用APPLY
    SQLServer复制(二)--事务代理作业
    T-SQL—理解CTEs
    jdbc与java.sql
    java设计模式之中介者模式
    fread函数和fwrite函数
    Linux多线程编程(不限Linux)
    poj 3320 技巧/尺取法 map标记
    poj 1260 dp
    HDU 4311 前缀和
  • 原文地址:https://www.cnblogs.com/huangfox/p/1853086.html
Copyright © 2011-2022 走看看