zoukankan      html  css  js  c++  java
  • 【手把手教你全文检索】Lucene索引的【增、删、改、查】

    前言

      搞检索的,应该多少都会了解Lucene一些,它开源而且简单上手,官方API足够编写些小DEMO。并且根据倒排索引,实现快速检索。本文就简单的实现增量添加索引,删除索引,通过关键字查询,以及更新索引等操作。

      目前博猪使用的不爽的地方就是,读取文件内容进行全文检索时,需要自己编写读取过程(这个solr免费帮我们实现)。而且创建索引的过程比较慢,还有很大的优化空间,这个就要细心下来研究了。

      创建索引

      Lucene在进行创建索引时,根据前面一篇博客,已经讲完了大体的流程,这里再简单说下:

    复制代码
    1 Directory directory = FSDirectory.open("/tmp/testindex");
    2 IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);
    3 IndexWriter iwriter = new IndexWriter(directory, config);
    4 Document doc = new Document();
    5 String text = "This is the text to be indexed.";
    6 doc.add(new Field("fieldname", text, TextField.TYPE_STORED)); iwriter.close();
    复制代码

      1 创建Directory,获取索引目录

      2 创建词法分析器,创建IndexWriter对象

      3 创建document对象,存储数据

      4 关闭IndexWriter,提交

    复制代码
     1 /**
     2      * 建立索引
     3      * 
     4      * @param args
     5      */
     6     public static void index() throws Exception {
     7         
     8         String text1 = "hello,man!";
     9         String text2 = "goodbye,man!";
    10         String text3 = "hello,woman!";
    11         String text4 = "goodbye,woman!";
    12         
    13         Date date1 = new Date();
    14         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
    15         directory = FSDirectory.open(new File(INDEX_DIR));
    16 
    17         IndexWriterConfig config = new IndexWriterConfig(
    18                 Version.LUCENE_CURRENT, analyzer);
    19         indexWriter = new IndexWriter(directory, config);
    20 
    21         Document doc1 = new Document();
    22         doc1.add(new TextField("filename", "text1", Store.YES));
    23         doc1.add(new TextField("content", text1, Store.YES));
    24         indexWriter.addDocument(doc1);
    25         
    26         Document doc2 = new Document();
    27         doc2.add(new TextField("filename", "text2", Store.YES));
    28         doc2.add(new TextField("content", text2, Store.YES));
    29         indexWriter.addDocument(doc2);
    30         
    31         Document doc3 = new Document();
    32         doc3.add(new TextField("filename", "text3", Store.YES));
    33         doc3.add(new TextField("content", text3, Store.YES));
    34         indexWriter.addDocument(doc3);
    35         
    36         Document doc4 = new Document();
    37         doc4.add(new TextField("filename", "text4", Store.YES));
    38         doc4.add(new TextField("content", text4, Store.YES));
    39         indexWriter.addDocument(doc4);
    40         
    41         indexWriter.commit();
    42         indexWriter.close();
    43 
    44         Date date2 = new Date();
    45         System.out.println("创建索引耗时:" + (date2.getTime() - date1.getTime()) + "ms
    ");
    46     }
    复制代码

      增量添加索引

      Lucene拥有增量添加索引的功能,在不会影响之前的索引情况下,添加索引,它会在何时的时机,自动合并索引文件。

    复制代码
     1 /**
     2      * 增加索引
     3      * 
     4      * @throws Exception
     5      */
     6     public static void insert() throws Exception {
     7         String text5 = "hello,goodbye,man,woman";
     8         Date date1 = new Date();
     9         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
    10         directory = FSDirectory.open(new File(INDEX_DIR));
    11 
    12         IndexWriterConfig config = new IndexWriterConfig(
    13                 Version.LUCENE_CURRENT, analyzer);
    14         indexWriter = new IndexWriter(directory, config);
    15 
    16         Document doc1 = new Document();
    17         doc1.add(new TextField("filename", "text5", Store.YES));
    18         doc1.add(new TextField("content", text5, Store.YES));
    19         indexWriter.addDocument(doc1);
    20 
    21         indexWriter.commit();
    22         indexWriter.close();
    23 
    24         Date date2 = new Date();
    25         System.out.println("增加索引耗时:" + (date2.getTime() - date1.getTime()) + "ms
    ");
    26     }
    复制代码

      

      删除索引

      Lucene也是通过IndexWriter调用它的delete方法,来删除索引。我们可以通过关键字,删除与这个关键字有关的所有内容。如果仅仅是想要删除一个文档,那么最好就顶一个唯一的ID域,通过这个ID域,来进行删除操作。

    复制代码
     1 /**
     2      * 删除索引
     3      * 
     4      * @param str 删除的关键字
     5      * @throws Exception
     6      */
     7     public static void delete(String str) throws Exception {
     8         Date date1 = new Date();
     9         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
    10         directory = FSDirectory.open(new File(INDEX_DIR));
    11 
    12         IndexWriterConfig config = new IndexWriterConfig(
    13                 Version.LUCENE_CURRENT, analyzer);
    14         indexWriter = new IndexWriter(directory, config);
    15         
    16         indexWriter.deleteDocuments(new Term("filename",str));  
    17         
    18         indexWriter.close();
    19         
    20         Date date2 = new Date();
    21         System.out.println("删除索引耗时:" + (date2.getTime() - date1.getTime()) + "ms
    ");
    22     }
    复制代码

      

      更新索引

      Lucene没有真正的更新操作,通过某个fieldname,可以更新这个域对应的索引,但是实质上,它是先删除索引,再重新建立的。

    复制代码
     1 /**
     2      * 更新索引
     3      * 
     4      * @throws Exception
     5      */
     6     public static void update() throws Exception {
     7         String text1 = "update,hello,man!";
     8         Date date1 = new Date();
     9          analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
    10          directory = FSDirectory.open(new File(INDEX_DIR));
    11 
    12          IndexWriterConfig config = new IndexWriterConfig(
    13                  Version.LUCENE_CURRENT, analyzer);
    14          indexWriter = new IndexWriter(directory, config);
    15          
    16          Document doc1 = new Document();
    17         doc1.add(new TextField("filename", "text1", Store.YES));
    18         doc1.add(new TextField("content", text1, Store.YES));
    19         
    20         indexWriter.updateDocument(new Term("filename","text1"), doc1);
    21         
    22          indexWriter.close();
    23          
    24          Date date2 = new Date();
    25          System.out.println("更新索引耗时:" + (date2.getTime() - date1.getTime()) + "ms
    ");
    26     }
    复制代码

      

      通过索引查询关键字

      Lucene的查询方式有很多种,这里就不做详细介绍了。它会返回一个ScoreDoc的集合,类似ResultSet的集合,我们可以通过域名获取想要获取的内容。

    复制代码
     1 /**
     2      * 关键字查询
     3      * 
     4      * @param str
     5      * @throws Exception
     6      */
     7     public static void search(String str) throws Exception {
     8         directory = FSDirectory.open(new File(INDEX_DIR));
     9         analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
    10         DirectoryReader ireader = DirectoryReader.open(directory);
    11         IndexSearcher isearcher = new IndexSearcher(ireader);
    12 
    13         QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",analyzer);
    14         Query query = parser.parse(str);
    15 
    16         ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
    17         for (int i = 0; i < hits.length; i++) {
    18             Document hitDoc = isearcher.doc(hits[i].doc);
    19             System.out.println(hitDoc.get("filename"));
    20             System.out.println(hitDoc.get("content"));
    21         }
    22         ireader.close();
    23         directory.close();
    24     }
    复制代码

      全部代码

    View Code

      参考资料

      http://www.cnblogs.com/xing901022/p/3933675.html

  • 相关阅读:
    ES6(二)
    ES6
    bootstrap
    数组对象
    bootstrap
    html5(二)
    css3转换、动画、布局
    整理的一些兼容写法
    css渐变、背景、过渡、分页
    css3(一)
  • 原文地址:https://www.cnblogs.com/1130136248wlxk/p/4998947.html
Copyright © 2011-2022 走看看