zoukankan html css js c++ java

Lucene:依据索引查找文档

功能描述：为某个文件夹下的所有后缀名为.txt的文件创建索引后，依据关键字查找相关文档。

为文本文件创建索引请参考：http://www.cnblogs.com/eczhou/archive/2011/11/21/2257753.html

开发环境：Lucene 3.4.0 + eclipse indigo + jdk1.6.0，配置如下：

依据关键字从索引中查找相关文件的是mytest包下的Searcher类，具体代码如下：

View Code

package mytest;

import org.apache.lucene.document.Document;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.Directory;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.util.Version;

import java.io.File;
import java.io.IOException;

// From chapter 1

/**
 * This code was originally written for
 * Erik's Lucene intro java.net article
 */
public class Searcher {

  public static void main(String[] args) throws IllegalArgumentException,
        IOException, ParseException {
//    if (args.length != 2) {
//      throw new IllegalArgumentException("Usage: java " + Searcher.class.getName()
//        + " <index dir> <query>");
//    }

    String indexDir = "F:\\lucene\\dir";               //1 
    String q = "project";                      //2   

    search(indexDir, q);
  }

  public static void search(String indexDir, String q)
    throws IOException, ParseException {

    Directory dir = FSDirectory.open(new File(indexDir)); //3
    IndexSearcher is = new IndexSearcher(dir);   //3   

    QueryParser parser = new QueryParser(Version.LUCENE_30, // 4
                                         "contents",  //4
                     new StandardAnalyzer(          //4
                       Version.LUCENE_34));  //4
    Query query = parser.parse(q);              //4   
    long start = System.currentTimeMillis();
    TopDocs hits = is.search(query, 10); //5
    long end = System.currentTimeMillis();

    System.err.println("Found " + hits.totalHits +   //6  
      " document(s) (in " + (end - start) +        // 6
      " milliseconds) that matched query '" +     // 6
      q + "':");                                   // 6

    for(ScoreDoc scoreDoc : hits.scoreDocs) {
      Document doc = is.doc(scoreDoc.doc);               //7      
      System.out.println(doc.get("fullpath"));  //8  
    }

    is.close();                                //9
  }
}

/*
#1 Parse provided index directory
#2 Parse provided query string
#3 Open index
#4 Parse query
#5 Search index
#6 Write search stats
#7 Retrieve matching document
#8 Display filename
#9 Close IndexSearcher
*/

程序运行结果如下：

推荐一个自己业余时间开发的网盘搜索引擎，360盘搜（www.360panso.com）

查看全文

相关阅读:
3D流水线
 log4cplus 配置文件的编写
 linux下的log4cplus的安装和使用
 日志信息的编写与调用
 转C++内存池实现
 转：自定义内存池的使用
 在linux查看内存的大小
 转：C++内存池
 数组指针和指针数组的区别
 new的三种形态

原文地址：https://www.cnblogs.com/eczhou/p/2258435.html