使用Lucene对预处理后的文档进行创建索引（可执行）

zoukankan html css js c++ java

使用Lucene对预处理后的文档进行创建索引（可执行）

时间：

2015/3/18

杨鑫newlife

对于文档的预处理后。就要開始使用Lucene来处理相关的内容了。

这里使用的Lucene的过程例如以下：

首先要为处理对象机那里索引

二是构建查询对象

三是在索引中查找

这里的代码是处理创建索引的部分

代码：

package ch2.lucenedemo.process;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

import jeasy.analysis.MMAnalyzer;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.index.IndexWriter;
public class IndexProcessor {
//成员变量。存储创建的索引文件存放的位置
private String INDEX_STORE_PATH = "E:\Lucene项目\索引文件夹";

//创建索引
public void createIndex(String inputDir){
try
{
System.out.println("程序開始执行。正在创建索引->->->->->");
IndexWriter writer = new IndexWriter(INDEX_STORE_PATH, new MMAnalyzer(), true);

File filesDir = new File(inputDir);

//取得全部须要建立索引的文件数组
File[] files = filesDir.listFiles();

//遍历数组
for(int i = 0; i < files.length; i++){

//获取文件名称
String fileName = files[i].getName();

//推断文件是否为txt类型的文件
if(fileName.substring(fileName.lastIndexOf(".")).equals(".txt")){

//创建一个新的Document
Document doc = new Document();
System.out.println("正在为文件名称创建索引->->->->");
//为文件名称创建一个Field
Field field = new Field("filename", files[i].getName(), Field.Store.YES, Field.Index.TOKENIZED);
doc.add(field);
System.out.println("正在为文件内容创建索引->->->->");
//为文件内容创建一个Field
field = new Field("content", loadFileToString(files[i]), Field.Store.NO, Field.Index.TOKENIZED);
doc.add(field);

//把Document增加到IndexWriter
writer.addDocument(doc);

}

}
writer.close();
System.out.println("程序创建结束->->->->");
}catch(Exception e){
e.printStackTrace();
}

}

/*
* 从文件里把内容读取出来，全部的内容就放在一个String中返回
* */
public String loadFileToString(File file){
try{
BufferedReader br = new BufferedReader(new FileReader(file));
StringBuffer sb = new StringBuffer();
String line= br.readLine();
while(line != null){
sb.append(line);
line = br.readLine();
}
br.close();
return sb.toString();
}catch(IOException e){
e.printStackTrace();
return null;
}
}

public static void main(String[] args){
IndexProcessor ip = new IndexProcessor();
ip.createIndex("E:\Lucene项目\目标文件");

}

}

查看全文

相关阅读:
《MySQL必知必会》第六章：过滤数据
 《MySQL必知必会》第七章：数据过滤
 《MySQL必知必会》第五章：排序检索数据
 Java高级特性：clone()方法
 Java基础知识详解：abstract修饰符
 Java虚拟机：虚拟机内存区域和内存溢出异常
 Java虚拟机：源码到机器码
 Java虚拟机：本地方法栈与Native方法
 [LeetCode] 1481. Least Number of Unique Integers after K Removals
[LeetCode] 331. Verify Preorder Serialization of a Binary Tree

原文地址：https://www.cnblogs.com/liguangsunls/p/6891270.html