zoukankan html css js c++ java

Lucene全文检索之-Lucene基础

Lucene是全文检索引擎

一.在学习Lucene之前我们先思考下,Lucene存在的意义.

1.在之前我们的应用场景中,基于数据库的检索,我们使用like语法进行.但是基于like语法效率低下.达不到我们对应用的使用要求.

而使用Lucene对我们的数据建立索引效率高速度快,满足企业要求.

我们使用Lucene先对整个结构建立索引库,然后根据索引去查找我们需要匹配的内容,效率高速度快,方便我们快速检索信息-这也是Lucene存在的目的.

2.有些查找数据库做不了,比如我们想查找附件中内容的匹配情况.使用传统的数据库开发复杂且效率低.而使用Lucene可以很方便的做到.

二.应用领域

　　OA,CMS等系统中,互联网企业

三.版本

　　2.9
　　3.0 比较大的变动
　　3.5 有一些变动

四.Lucene基础

　　1. 下载

　　　　http://lucene.apache.org
　　　　到系统官方网站下载3.5.0版本.
　　　　下载版本目录:http://archive.apache.org/dist/lucene/java/

　　　　作者在写此文章时候版本已到:

　　　　　　5.0.0/                          2015-02-19 08:46    -

　　　　　　5.1.0/                          2015-04-13 15:08    -

　　2.Lucene使用

　　　　在全文检索工具中,都是由这样的三部分组成

　　　　1> 索引部分

　　　　2> 分词部分

　　　　3> 搜索部分

　　3.基本实例

　　　　1> 创建索引

 1 // 建立索引
 2 public void index(){
 3     // 1.创建Directory-索引创建的地方(内存,硬盘)
 4     Directory directory = null;//= new RAMDirectory();// 建立在内存中
 5     IndexWriter indexWriter = null;
 6     try {
 7         directory = FSDirectory.open(new File("/Users/apple/Downloads/lucene/index"));// 建立在硬盘上
 8         // 2.创建IndexWriter-写索引 
 9         IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer(Version.LUCENE_35));
10         indexWriter = new IndexWriter(directory, config);
11         // 3.创建Document对象
12         Document doc = null;
13         // 4.为Document添加Filed
14         File f = new File("/Users/apple/Downloads/lucene/lucene/example");
15         for (File file : f.listFiles()) {
16             doc = new Document();
17             doc.add( new Field("content", new FileReader(file)));
18             doc.add( new Field("name", file.getName(),Field.Store.YES,Field.Index.NOT_ANALYZED));
19             doc.add(new Field("path",file.getAbsolutePath(),Field.Store.YES,Field.Index.NOT_ANALYZED));
20             // 5.通过IndexWriter添加Document到索引中
21             indexWriter.addDocument(doc);
22         }
23     } catch (Exception e) {
24         e.printStackTrace();
25     }finally{
26         if(indexWriter!= null){
27             try {
28                 indexWriter.close();
29             } catch (CorruptIndexException e) {
30                 e.printStackTrace();
31             } catch (IOException e) {
32                 e.printStackTrace();
33             }
34         }
35     }
36 }

　　　　2> 搜索索引

public void search(){
    Directory directory = null;//= new RAMDirectory();// 建立在内存中
    IndexWriter indexWriter = null;
    try {
        // 1.创建Directory
        directory = FSDirectory.open(new File("/Users/apple/Downloads/lucene/index"));// 建立在硬盘上
        // 2.创建IndexReader
        IndexReader indexReader  = IndexReader.open(directory);
        // 3.根据IndexReader创建IndexSearcher
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        // 4.创建搜索的Query
        // 创建QueryParser确定要搜索文件的内容,第二个参数表示搜索的域
        QueryParser parser = new QueryParser(Version.LUCENE_35, "content", new StandardAnalyzer(Version.LUCENE_35));
        Query query = parser.parse("Lucene");
        // 5.根据indexSearcher搜索并且返回TopDocs->前10条数据
        TopDocs topDocs = indexSearcher.search(query, 10);
        // 6.根据topDocs获取ScoreDoc对象数组
        ScoreDoc [] sds = topDocs.scoreDocs;
        for (ScoreDoc sd : sds) {
            int docID = sd.doc;// 文档id
            // 7.根据IndexSearcher和ScoreDoc对象获取具体的Document对象
            Document doc = indexSearcher.doc(docID);
            // 8.根据Document对象获取需要的值
            System.out.println(doc.get("name"));
        }
        // 9.关闭Searcher对象
        indexSearcher.close();
    } catch (Exception e) {
        e.printStackTrace();
    }finally{
        if(indexWriter!= null){
            try {
                indexWriter.close();
            } catch (CorruptIndexException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

　　4.系统架构

查看全文

相关阅读:
[转]MySQL日志——Undo | Redo
linux查看系统的硬件信息
 Linux HDD information (SATA/SCSI/SAS/SSD)
sysbench 0.5使用手册
 MYSQL数据丢失讨论
 innodb_flush_method理解
 快速从mysqldump文件中恢复一个表
 Python 交互模式中 Delete/Backspace 键乱码问题
 Django--源码安装
 greenplum-时间处理

原文地址：https://www.cnblogs.com/hnxubin/p/4472642.html