zoukankan      html  css  js  c++  java
  • Lucene:依据索引查找文档

    功能描述:为某个文件夹下的所有后缀名为.txt的文件创建索引后,依据关键字查找相关文档。

    为文本文件创建索引请参考:http://www.cnblogs.com/eczhou/archive/2011/11/21/2257753.html

    开发环境:Lucene 3.4.0 + eclipse indigo + jdk1.6.0,配置如下:

    依据关键字从索引中查找相关文件的是mytest包下的Searcher类,具体代码如下:

    View Code
    package mytest;

    import org.apache.lucene.document.Document;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.Query;
    import org.apache.lucene.search.ScoreDoc;
    import org.apache.lucene.search.TopDocs;
    import org.apache.lucene.store.FSDirectory;
    import org.apache.lucene.store.Directory;
    import org.apache.lucene.queryParser.QueryParser;
    import org.apache.lucene.queryParser.ParseException;
    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.util.Version;

    import java.io.File;
    import java.io.IOException;

    // From chapter 1

    /**
    * This code was originally written for
    * Erik's Lucene intro java.net article
    */
    public class Searcher {

    public static void main(String[] args) throws IllegalArgumentException,
    IOException, ParseException {
    // if (args.length != 2) {
    // throw new IllegalArgumentException("Usage: java " + Searcher.class.getName()
    // + " <index dir> <query>");
    // }

    String indexDir = "F:\\lucene\\dir"; //1
    String q = "project"; //2

    search(indexDir, q);
    }

    public static void search(String indexDir, String q)
    throws IOException, ParseException {

    Directory dir = FSDirectory.open(new File(indexDir)); //3
    IndexSearcher is = new IndexSearcher(dir); //3

    QueryParser parser = new QueryParser(Version.LUCENE_30, // 4
    "contents", //4
    new StandardAnalyzer( //4
    Version.LUCENE_34)); //4
    Query query = parser.parse(q); //4
    long start = System.currentTimeMillis();
    TopDocs hits = is.search(query, 10); //5
    long end = System.currentTimeMillis();

    System.err.println("Found " + hits.totalHits + //6
    " document(s) (in " + (end - start) + // 6
    " milliseconds) that matched query '" + // 6
    q + "':"); // 6

    for(ScoreDoc scoreDoc : hits.scoreDocs) {
    Document doc = is.doc(scoreDoc.doc); //7
    System.out.println(doc.get("fullpath")); //8
    }

    is.close(); //9
    }
    }

    /*
    #1 Parse provided index directory
    #2 Parse provided query string
    #3 Open index
    #4 Parse query
    #5 Search index
    #6 Write search stats
    #7 Retrieve matching document
    #8 Display filename
    #9 Close IndexSearcher
    */

    程序运行结果如下:

     推荐一个自己业余时间开发的网盘搜索引擎,360盘搜www.360panso.com

  • 相关阅读:
    《信息学奥赛一本通》提高版题解索引
    QUERY [ 单调栈 ]
    [ 模拟退火 ] bzoj3860 平衡点
    [ 考试 ] 7.12
    离线和简单分治
    [ 校内OJ ] NOIP2019模拟赛(九)
    校内模拟考 (一)
    Codeforces 808E
    学习笔记—点分治
    [ 线段树+哈希 ] 反等差数列
  • 原文地址:https://www.cnblogs.com/eczhou/p/2258435.html
Copyright © 2011-2022 走看看