zoukankan      html  css  js  c++  java
  • lucene3.0_IndexSearcher分页

    系列汇总:

    lucene3.0_基础使用及注意事项汇总

    在绝大多数项目中需要分页取出目标结果。lucene当中提供了现成的方法,使用很方便。

    主要用到的方法(API):

     TopDocs topDocs(int start, int howMany) 
              Returns the documents in the rage [start ..

    Returns the documents in the rage [start .. start+howMany) that were collected by this collector. Note that if start >= pq.size(), an empty TopDocs is returned, and if pq.size() - start < howMany, then only the available documents in [start .. pq.size()) are returned.
    This method is useful to call in case pagination of search results is allowed by the search application, as well as it attempts to optimize the memory used by allocating only as much as requested by howMany.

    关于给定的start 和 howMany,有以下情况:

    1. start大于当前结果集合的大小,返回空;

    2.start与howMany的和大于当前结果集的大小,返回从start开始,当前结果集的大小减去start条文档。

    下面给出一个简单的实例:

    package com.fox.search;

    import java.io.File;
    import java.io.IOException;

    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.index.IndexReader;
    import org.apache.lucene.queryParser.ParseException;
    import org.apache.lucene.queryParser.QueryParser;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.Query;
    import org.apache.lucene.search.ScoreDoc;
    import org.apache.lucene.search.TopDocs;
    import org.apache.lucene.search.TopScoreDocCollector;
    import org.apache.lucene.store.FSDirectory;
    import org.apache.lucene.store.SimpleFSDirectory;
    import org.apache.lucene.util.Version;

    publicclass Searcher {

    String path
    ="d:/realtime" ;
    FSDirectory dir
    =null ;
    IndexReader reader
    =null ;
    IndexSearcher searcher
    =null ;

    public Searcher(){
    try {
    dir
    = SimpleFSDirectory.open(new File(path));
    reader
    = IndexReader.open(dir);
    searcher
    =new IndexSearcher(reader);
    }
    catch (IOException e) {
    e.printStackTrace();
    }
    }

    /**
    * 获取指定范围内的文档(该方法只打印文档内容,代表取得相应的文档)。
    *
    @param start 注意:从0开始计数
    *
    @param howMany
    */
    publicvoid getResults(int start , int howMany) {
    try {
    QueryParser parser
    =new QueryParser(Version.LUCENE_30, "f", new StandardAnalyzer(Version.LUCENE_30));
    Query query
    = parser.parse("a:fox");
    //
    TopScoreDocCollector results = TopScoreDocCollector.create(start+howMany, false);
    searcher.search(query, results);
    TopDocs tds
    = results.topDocs(start, howMany);
    ScoreDoc[] sd
    = tds.scoreDocs;
    for (int i =0; i < sd.length; i++) {
    System.out.println(reader.document(sd[i].doc));
    }
    }
    catch (IOException e) {
    e.printStackTrace();
    }
    catch (ParseException e) {
    e.printStackTrace();
    }
    }

    publicstaticvoid main(String[] fox){
    Searcher s
    =new Searcher();
    System.out.println(
    "第一页:--------------------");
    s.getResults(
    0, 5);
    System.out.println(
    "第二页:--------------------");
    s.getResults(
    5, 5);
    System.out.println(
    "第三页:--------------------");
    s.getResults(
    10, 5);
    }

    }

    这里给自己留一个问题!

    1.lucene返回的结果都是排好序的(默认按相关度排序),那么返回前面5条结果和返回最后5条结果都要经过排序阶段,是否效率是一样的呢?

    2.通过实验证明:大数据量搜索的情况下,

    static TopScoreDocCollector create(int numHits, boolean docsScoredInOrder) 
              Creates a new TopScoreDocCollector given the number of hits to collect and whether documents are scored in order by the input Scorer to setScorer(Scorer).

    numHits越大效率越低。那是为什么呢?

  • 相关阅读:
    day01-python基础
    python3.5爬虫实例:根据网站的反爬虫策略,启用代理来防止爬虫被禁用
    python3.5爬虫实例:根据城市名称来获取该城市最近七天的天气预报
    python3.5爬虫基础urllib结合beautifulsoup实例
    python3.5爬虫基础urllib实例
    面向对象相关知识及常用操作(二)
    面向对象相关知识点及常见的操作
    常用的基础模块介绍
    利用正则表达式来实现求一个数学表达式的和
    正则表达式的方法及其匹配规则
  • 原文地址:https://www.cnblogs.com/huangfox/p/1855490.html
Copyright © 2011-2022 走看看