zoukankan      html  css  js  c++  java
  • Lucene 查询分页技术

    常用的Lucene查询代码如下所示,该代码的作用是将path路径下的所有索引信息返回

     1 public String matchAll(String path) {
     2         try {
     3             Directory directory = FSDirectory.open(new File(path));
     4             DirectoryReader reader = DirectoryReader.open(directory);
     5             IndexSearcher searcher = new IndexSearcher(reader);
     6             MatchAllDocsQuery query = new MatchAllDocsQuery();
     7             
     8             ScoreDoc[] hits = searcher.search(query, null, Integer.MAX_VALUE).scoreDocs;
     9             StringBuffer buffer = new StringBuffer();
    10             for (int i = 0; i < hits.length; i++) {
    11                 Document hitDocument = searcher.doc(hits[i].doc);
    12 //                System.out.println(hitDocument.get("key")
    13 //                        + "......"+hitDocument.get("value"));
    14                 buffer.append(hitDocument.get("key")+";"+hitDocument.get("value")+"|");
    15             }
    16             return buffer.toString();
    17         } catch (IOException e) {
    18             e.printStackTrace();
    19         }
    20         return null;
    21     }

    但是当该文件夹下索引的数目比较巨大,那么在执行以下代码的时候,则会出现java.lang.OutOfMemoryError: Java heap space的提示

    ScoreDoc[] hits = searcher.search(query, null, Integer.MAX_VALUE).scoreDocs;

    这时候,我们可以考虑使用分页技术,比如以前大约1亿条数据,我们可以将其分成100个100W的页,每次对100W条索引数据进行处理,这样就可以避免上述情况的发生。在Lucene 中,我们使用searchAfter的方法实现上述功能。它的官方API介绍如下所示:

    public TopDocs searchAfter(ScoreDoc after,
                      Query query,
                      int n)
                        throws IOException
    Finds the top n hits for query where all results are after a previous result (after).

    By passing the bottom result from a previous page as after, this method can be used for efficient 'deep-paging' across potentially large result sets.

    Throws:
    BooleanQuery.TooManyClauses
    IOException
     1 private String transToContent(IndexSearcher searcher,TopDocs topDocs) throws IOException {
     2         ScoreDoc[] scoreDocs = topDocs.scoreDocs;
     3         StringBuffer sb = new StringBuffer();
     4         for(int i=0; i<scoreDocs.length; i++) {
     5             Document doc  = searcher.doc(scoreDocs[i].doc);
     6             sb.append(doc.get("key")+";"+doc.get("value")+"|");
     7         }
     8         return sb.toString();
     9     }
    10         
    11     private void matchAll(String path) {
    12         try {
    13             Directory directory = FSDirectory.open(new File(path));
    14             DirectoryReader reader = DirectoryReader.open(directory);
    15             IndexSearcher searcher = new IndexSearcher(reader);
    16             
    17             ScoreDoc after = null;
    18             TopDocs topDocs = searcher.searchAfter(after, new MatchAllDocsQuery(), Preference.PAGE_SIZE);
    19             int curPage = 1;
    20             while(topDocs.scoreDocs.length > 0) {
    21                 System.out.println("Current Page:"+ (curPage++) );
    22                 System.out.println(transToContent(searcher, topDocs));
    23                 after = topDocs.scoreDocs[topDocs.scoreDocs.length -1];
    24                 topDocs = searcher.searchAfter(after, new MatchAllDocsQuery(), Preference.PAGE_SIZE);
    25             }
    26         } catch (IOException e) {
    27             e.printStackTrace();
    28         }
    29     }
  • 相关阅读:
    java制作的applet小型播放器
    java文件路径问题及Eclipse package,source folder,folder区别及相互转换
    转:文件操作之File类使用
    JFrame setDefaultLookAndFeelDecorated(true)
    java错误Cannot make a static reference to the nonstatic method
    转:HTML操作 Swing Components
    组件服务简介
    计算ttest 的C程序
    Excel统计函数中比较常用的函数
    Correlation with pvalues
  • 原文地址:https://www.cnblogs.com/nashiyue/p/4652506.html
Copyright © 2011-2022 走看看