zoukankan html css js c++ java

lucene5.3.1+IKAnalyer 构建简单搜索引擎

项目应用场景

最近需要做一个简单的信息展示系统，信息和普通新闻差不多，主要有标题和内容，信息需要能通过关键词检索到，考虑到信息比较简单，检索也很简单，主要是通过标题和内容搜索，不想用Solr搭建搜索引擎，想用的Lucene写个简的搜索，能构增加索引、删除索引，通过关键字搜索信息就可以了。

项目依赖包

Lucene使用最新版本5.3.1 Maven配置如下

        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>5.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-queryparser</artifactId>
            <version>5.3.1</version>
        </dependency>

分词器选择

由于项目对搜索精度没有太高要求，所以选择常用的IK分词器就可以了，选用的版本为IKAnalyzer2012_V5.jar ，百度可以很好搜索

文章准备

在C:/MyIndex/doc 目录下准备了两篇测试新闻，如下图所示
搜索文章准备

创建索引

    public static void indexFile() throws IOException {
        Directory dir = FSDirectory.open(Paths.get("C:/MyIndex"));
        Analyzer analyzer = new IKAnalyzer();
        IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
        IndexWriter writer = new IndexWriter(dir, iwc);
        File path=new File("C:\MyIndex\doc");
        File [] file= path.listFiles();
        for(File f:file){
            String tile=f.getName();
            String content="";
            BufferedReader br=new BufferedReader(new InputStreamReader(new FileInputStream(f),"utf-8"));
            String line="";
            while (null!=(line=br.readLine())) {
                content+=line;
            }
            Document doc = new Document();
            doc.add(new TextField("name", tile, Field.Store.YES));
            doc.add(new TextField("content", content,Field.Store.YES));
            writer.addDocument(doc);
            System.out.println("add " + tile);  
        }
        writer.close();
    }

测试搜索

    public static void search() throws IOException, ParseException {
        Analyzer analyzer = new IKAnalyzer();
        IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get("C:/MyIndex")));
        IndexSearcher searcher = new IndexSearcher(reader);
        QueryParser parser = new QueryParser("name", analyzer);
        String queries = "奥巴马";
        Query query = parser.parse(queries);
        System.out.println("Searching for: " + query.toString(queries));
        TopDocs results = searcher.search(query, 10);
        System.out.println("Total match：" + results.totalHits);
        ScoreDoc[] hits = results.scoreDocs;
        int count = 1;
        for (ScoreDoc hit : hits) {
            Document doc1 = searcher.doc(hit.doc);
            String res = doc1.get("name");
            System.err.println(count + "  " + res + ", " + hit.score);
            count++;
        }

    }

运行结果

add 习近平会见奥巴马： 中美两国要牢牢把握构建新型大国关系正确方向.txt
add 最高法紧急下令暂缓运毒7.5公斤农民死刑执行.txt
Searching for: name:奥 name:巴马
Total match：1
1  习近平会见奥巴马： 中美两国要牢牢把握构建新型大国关系正确方向.txt, 0.30935922

原文链接

查看全文

相关阅读:
hihoCoder #1176 : 欧拉路·一（简单）
228 Summary Ranges 汇总区间
 227 Basic Calculator II 基本计算器II
226 Invert Binary Tree 翻转二叉树
 225 Implement Stack using Queues 队列实现栈
 224 Basic Calculator 基本计算器
 223 Rectangle Area 矩形面积
 222 Count Complete Tree Nodes 完全二叉树的节点个数
 221 Maximal Square 最大正方形
 220 Contains Duplicate III 存在重复 III

原文地址：https://www.cnblogs.com/whzhaochao/p/5023403.html