zoukankan      html  css  js  c++  java
  • 5分钟学用Lucene

    5分钟学用Lucene

    Lucene很容易使你的应用程序添加文本搜索的功能,实际上,非常容易,我将在5分钟内向您展示!
    ( 译者注: 实际上,在此之前需要理解搜索引擎的工作原理,和Lucene的基本概念 )

    1. 索引.
       为了这个简单的例子,我们将建立一个存储在内存中的一些字符串的索引。

    Directory index = new RAMDirectory();
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35, analyzer);
    
    IndexWriter w = new IndexWriter(index, config);
    addDoc(w, "Lucene in Action");
    addDoc(w, "Lucene for Dummies");
    addDoc(w, "Managing Gigabytes");
    addDoc(w, "The Art of Computer Science");
    w.close();


    2. 查询.
       读取从标准输入(stdin)输入的查询,解析,并从中建立lucence的查询.

    String querystr = args.length > 0 ? args[0] : "lucene";
    Query q = new QueryParser(Version.LUCENE_35, "title", analyzer).parse(querystr);


     

    3. 搜索.
       通过使用Query来创建一个Searcher来搜索索引, 然后实例化一个 TopScoreDocCollector 来收集前10个Hits (译者注: 查询结果)

    int hitsPerPage = 10;
    IndexReader reader = IndexReader.open(index);
    IndexSearcher searcher = new IndexSearcher(reader);
    TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
    searcher.search(q, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;


     

    4. 显示.
       现在我们已经得到了搜索的结果,显示出来即可.

    System.out.println("Found " + hits.length + " hits.");
    for(int i=0;i<hits.length;++i) {
        int docId = hits[i].doc;
        Document d = searcher.doc(docId);
        System.out.println((i + 1) + ". " + d.get("title"));
    }


     


    完整的代码如下: 
    (译者注: 在最新的Lucene 3.6版本上面做了修正 )

    package hellolucene;
    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.document.Field;
    import org.apache.lucene.index.IndexReader;
    import org.apache.lucene.index.IndexWriter;
    import org.apache.lucene.index.IndexWriterConfig;
    import org.apache.lucene.queryParser.ParseException;
    import org.apache.lucene.queryParser.QueryParser;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.Query;
    import org.apache.lucene.search.ScoreDoc;
    import org.apache.lucene.search.TopScoreDocCollector;
    import org.apache.lucene.store.Directory;
    import org.apache.lucene.store.RAMDirectory;
    import org.apache.lucene.util.Version;
    
    import java.io.IOException;
    
    
    public class HelloLucene {
      public static void main(String[] args) throws IOException, ParseException {
        // 0. Specify the analyzer for tokenizing text.
        //    The same analyzer should be used for indexing and searching
        StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
    
        // 1. create the index
        Directory index = new RAMDirectory();
    
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);
    
        IndexWriter w = new IndexWriter(index, config);
        addDoc(w, "Lucene in Action");
        addDoc(w, "Lucene for Dummies");
        addDoc(w, "Managing Gigabytes");
        addDoc(w, "The Art of Computer Science");
        w.close();
    
        // 2. query
        String querystr = args.length > 0 ? args[0] : "lucene";
    
        // the "title" arg specifies the default field to use
        // when no field is explicitly specified in the query.
        Query q = new QueryParser(Version.LUCENE_36, "title", analyzer).parse(querystr);
    
        // 3. search
        int hitsPerPage = 10;
        IndexReader reader = IndexReader.open(index);
        IndexSearcher searcher = new IndexSearcher(reader);
        TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
        searcher.search(q, collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;
    
        // 4. display results
        System.out.println("Found " + hits.length + " hits.");
        for(int i=0;i<hits.length;++i) {
          int docId = hits[i].doc;
          Document d = searcher.doc(docId);
          System.out.println((i + 1) + ". " + d.get("title"));
        }
    
        // searcher can only be closed when there
        // is no need to access the documents any more.
        searcher.close();
      }
    
      private static void addDoc(IndexWriter w, String value) throws IOException {
        Document doc = new Document();
        doc.add(new Field("title", value, Field.Store.YES, Field.Index.ANALYZED));
        w.addDocument(doc);
      }
    }

    输出结果:

    Found 2 hits.
    1. Lucene in Action
    2. Lucene for Dummies

    参考原文: http://www.lucenetutorial.com/lucene-in-5-minutes.html

  • 相关阅读:
    LeetCode 23. 合并K个排序链表
    LeetCode 199. 二叉树的右视图
    LeetCode 560. 和为K的子数组
    LeetCode 1248. 统计「优美子数组」
    LeetCode 200. 岛屿数量
    LeetCode 466. 统计重复个数
    LeetCode 11. 盛最多水的容器
    LeetCode 55. 跳跃游戏
    LeetCode 56. 合并区间
    Java生鲜电商平台-订单架构实战
  • 原文地址:https://www.cnblogs.com/fdyang/p/2858725.html
Copyright © 2011-2022 走看看