<h1>
<span class="link_title"><a href="/tototuzuoquan/article/details/41794169">
2.Lucene3.6.2包介绍,第一个Lucene案例介绍,查看索引信息的工具lukeall介绍,Luke查看的索引库内容,索引查找过程
</a>
</span>
</h1>
<div class="article_manage clearfix">
<div class="article_r">
<span class="link_postdate">2014-12-07 23:39</span>
<span class="link_view" title="阅读次数">2623人阅读</span>
<span class="link_comments" title="评论次数"> <a href="#comments" onclick="_gaq.push(['_trackEvent','function', 'onclick', 'blog_articles_pinglun'])">评论</a>(0)</span>
<span class="link_collect tracking-ad" data-mod="popu_171"> <a href="javascript:void(0);" onclick="javascript:collectArticle('2.Lucene3.6.2%e5%8c%85%e4%bb%8b%e7%bb%8d%ef%bc%8c%e7%ac%ac%e4%b8%80%e4%b8%aaLucene%e6%a1%88%e4%be%8b%e4%bb%8b%e7%bb%8d%ef%bc%8c%e6%9f%a5%e7%9c%8b%e7%b4%a2%e5%bc%95%e4%bf%a1%e6%81%af%e7%9a%84%e5%b7%a5%e5%85%b7lukeall%e4%bb%8b%e7%bb%8d%ef%bc%8cLuke%e6%9f%a5%e7%9c%8b%e7%9a%84%e7%b4%a2%e5%bc%95%e5%ba%93%e5%86%85%e5%ae%b9%ef%bc%8c%e7%b4%a2%e5%bc%95%e6%9f%a5%e6%89%be%e8%bf%87%e7%a8%8b','41794169');return false;" title="收藏" target="_blank">收藏</a></span>
<span class="link_report"> <a href="#report" onclick="javascript:report(41794169,2);return false;" title="举报">举报</a></span>
</div>
</div> <style type="text/css">
.embody{
padding:10px 10px 10px;
margin:0 -20px;
border-bottom:solid 1px #ededed;
}
.embody_b{
margin:0 ;
padding:10px 0;
}
.embody .embody_t,.embody .embody_c{
display: inline-block;
margin-right:10px;
}
.embody_t{
font-size: 12px;
color:#999;
}
.embody_c{
font-size: 12px;
}
.embody_c img,.embody_c em{
display: inline-block;
vertical-align: middle;
}
.embody_c img{
30px;
height:30px;
}
.embody_c em{
margin: 0 20px 0 10px;
color:#333;
font-style: normal;
}
</style>
<script type="text/javascript">
$(function () {
try
{
var lib = eval("("+$("#lib").attr("value")+")");
var html = "";
if (lib.err == 0) {
$.each(lib.data, function (i) {
var obj = lib.data[i];
//html += '<img src="' + obj.logo + '"/>' + obj.name + " ";
html += ' <a href="' + obj.url + '" target="_blank">';
html += ' <img src="' + obj.logo + '">';
html += ' <em><b>' + obj.name + '</b></em>';
html += ' </a>';
});
if (html != "") {
setTimeout(function () {
$("#lib").html(html);
$("#embody").show();
}, 100);
}
}
} catch (err)
{ }
});
</script>
<div class="category clearfix">
<div class="category_l">
<img src="http://static.blog.csdn.net/images/category_icon.jpg">
<span>分类:</span>
</div>
<div class="category_r">
<label onclick="GetCategoryArticles('1305140','toto1297488504','top','41794169');">
<span onclick="_gaq.push(['_trackEvent','function', 'onclick', 'blog_articles_fenlei']);">爬虫<em>(8)</em></span>
<img class="arrow-down" src="http://static.blog.csdn.net/images/arrow_triangle _down.jpg" style="display:inline;">
<img class="arrow-up" src="http://static.blog.csdn.net/images/arrow_triangle_up.jpg" style="display:none;">
<div class="subItem">
<div class="subItem_t"><a href="http://blog.csdn.net/toto1297488504/article/category/1305140" target="_blank">作者同类文章</a><i class="J_close">X</i></div>
<ul class="subItem_l" id="top_1305140">
</ul>
</div>
</label>
</div>
</div>
<div class="bog_copyright">
<p class="copyright_p">版权声明:本文为博主原创文章,未经博主允许不得转载。</p>
</div>
1 Lucen目录介绍
2 lucene-core-3.6.2.jar是lucene开发核心jar包
contrib 目录存放,包含一些扩展jar包
3 案例
建立第一个Lucene项目:lucene3_day1
(1)需要先将数据转换成为Document对象,每一个数据信息转换成为Field(String name, String value, Field.Store store, Field.Indexindex)
(2)指定索引库位置Directorydirectory = FSDirectory.open(new File("index"));// 当前Index目录
(3)分词器Analyzeranalyzer = new StandardAnalyzer(Version.LUCENE_36);
(4)写入索引:
|
IndexWriterConfig indexWriterConfig = new IndexWriterConfig( Version.LUCENE_36, analyzer); IndexWriter indexWriter = new IndexWriter(directory,indexWriterConfig);
//将document数据写入索引库 indexWriter.addDocument(document); //关闭索引 indexWriter.close(); |
案例编写:
|
案例目录: |
|
Article.java |
|
package cn.toto.lucene.quickstart;
public class Article { private int id; private String title; private String content; /** * @return the id */ public int getId() { return id; } /** * @param id the id to set */ public void setId(int id) { this.id = id; } /** * @return the title */ public String getTitle() { return title; } /** * @param title the title to set */ public void setTitle(String title) { this.title = title; } /** * @return the content */ public String getContent() { return content; } /** * @param content the content to set */ public void setContent(String content) { this.content = content; } } |
|
package cn.toto.lucene.quickstart;
import java.io.File;
import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.Field.Index; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; import org.junit.Test;
/** * @brief LuceneTest.java 测试Lucene的案例 * @attention * @author toto-pc * @date 2014-12-7 * @note begin modify by 涂作权 2014/12/07 null */ public class LuceneTest { @Test public void buildIndex() throws Exception { Article article = new Article(); article.setId(100); article.setTitle("Lucene快速入门"); article.setContent("Lucene是提供了一个简单却强大的应用程式接口," + "能够做全文检索索引和搜寻,在Java开发环境里Lucene是" + "一个成熟的免费的开放源代码工具。");
// 将索引数据转换成为Document对象(Lucene要求) Document document = new Document(); document.add(new Field("id", // 字段 article.getId() + "", Store.YES, // 是否建立索引 Index.ANALYZED // 表示使用分词索引 )); document.add(new Field("title", article.getTitle(), Store.YES,Index.ANALYZED)); document.add(new Field("content", article.getContent(), Store.YES, Index.ANALYZED));
// 建立索引库 // 索引目录位置 Directory directory = FSDirectory.open(new File("index"));// 当前Index目录 // 分词器 Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36); // 写入索引 IndexWriterConfig indexWriterConfig = new IndexWriterConfig( Version.LUCENE_36, analyzer); IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
// 将document数据写入索引库 indexWriter.addDocument(document); // 关闭索引 indexWriter.close(); } } |
|
运行单元测试后的结果: 运行后index目录下的结果: |
4 可以通过luke工具查看索引库中内容(它是一个jar包)
下载网址:http://code.google.com/p/luke/
打开方式:
如果用这种方式打不可以,可以用命令的方式打开文件,进入这个目录,选中Shift+鼠标右键—>此处打开命令窗口—>输入命令:java -jar lukeall-3.5.0.jar
工具的截图如下:
点击OK后的结果:
通过overview可以查看到索引信息,通过Document可以查看文档对象信息
5 查找
|
和上面的并集的query代码如下: |
|
@Test public void searchIndex() throws Exception { //建立Query对象--根据标题 String queryString = "Lucene"; //第一个参数,版本号 //第二个参数,字段 //第三个参数,分词器 Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36); QueryParser queryParser = new QueryParser(Version.LUCENE_36,"title",analyzer); Query query = queryParser.parse(queryString);
//根据Query查找 // 索引目录位置 Directory directory = FSDirectory.open(new File("index")); IndexSearcher indexSearcher = new IndexSearcher(IndexReader.open(directory)); //查询满足结果的前100条数据 TopDocs topDocs = indexSearcher.search(query, 100); System.out.println("满足结果记录条数:" + topDocs.totalHits);
//获取结果 ScoreDoc[] scoreDocs = topDocs.scoreDocs; for (int i = 0; i < scoreDocs.length; i++) { //先获得Document下标 int docID = scoreDocs[i].doc; Document document = indexSearcher.doc(docID); System.out.println("id:" + document.get("id")); System.out.println("title:" + document.get("title")); System.out.println("content:" + document.get("content")); }
indexSearcher.close(); } |
|
运行结果: |
-
Luke查看的索引库内容:
索引库中信息,包括两大部分:
A 索引词条信息
B 文档对象信息
-
每个Field中都存在一个Store和一个Index
-
索引内容和Document内容有什么关系
查找时,通过索引内容 查找 文档对象信息
-
索引的查找过程
<div id="digg" articleid="41794169">
<dl id="btnDigg" class="digg digg_enable" onclick="btndigga();">
<dt>顶</dt>
<dd>0</dd>
</dl>
<dl id="btnBury" class="digg digg_enable" onclick="btnburya();">
<dt>踩</dt>
<dd>0</dd>
</dl>
</div>
<div class="tracking-ad" data-mod="popu_222"><a href="javascript:void(0);" target="_blank"> </a> </div>
<div class="tracking-ad" data-mod="popu_223"> <a href="javascript:void(0);" target="_blank"> </a></div>
<script type="text/javascript">
function btndigga() {
$(".tracking-ad[data-mod='popu_222'] a").click();
}
function btnburya() {
$(".tracking-ad[data-mod='popu_223'] a").click();
}
</script>