Lucene2.9.2 + 盘古分词2.3.1（一）入门：建立简单索引，搜索（原创）

zoukankan html css js c++ java

Lucene2.9.2 + 盘古分词2.3.1（一）入门：建立简单索引，搜索（原创）
有图有真相

ps：上图可以看到中文分词成功，搜索也命中了；

说明：如果想好好学Lucene建议看Lucene in action 2nd version，另外2.9.2中对以前很多方法已经废弃，旧代码就别看了；

下面是代码：
建立索引

public static void IndexFile(this IndexWriter writer, IO.FileInfo file)

{

    var watch = new Stopwatch();

    var startTime = DateTime.Now;

    watch.Start();

    Console.WriteLine("Indexing  {0}", file.Name);

    writer.AddDocument(file.GetDocument());

    watch.Stop();

    var timeSpan = DateTime.Now - startTime;

    Console.WriteLine("Indexing Completed! Cost time {0}[{1}]", timeSpan.ToString("c"), watch.ElapsedMilliseconds);

  }

public static Document GetDocument(this IO.FileInfo file)

{

    var doc = new Document();

    doc.Add(new Field("contents", new IO.StreamReader(file.FullName)));

    doc.Add(new Field("filename", file.Name,

    Field.Store.YES, Field.Index.ANALYZED));

    doc.Add(new Field("fullpath", file.FullName,

    Field.Store.YES, Field.Index.NOT_ANALYZED));

    return doc;

}
Output

Indexing Scott.txt
Indexing Completed! Cost time 00:00:02.4231386[2423]
Indexing 黄金瞳.txt
Indexing Completed! Cost time 00:00:00.0860049[85]
There are 2 doc Indexed!
Index Exit!

代码解释：

第14行 GetDocument 建立相应的doc,doc是Lucene核心对象之一，下面是它的定义：

The Document class represents a collection of fields. Think of it as a virtual document—
a chunk of data, such as a web page, an email message, or a text file—that you
want to make retrievable at a later time. Fields of a document represent the document
or metadata associated with that document. The original source (such as a database
record, a Microsoft Word document, a chapter from a book, and so on) of
document data is irrelevant to Lucene. It’s the text that you extract from such binary
documents, and add as a Field instance, that Lucene processes. The metadata (such
as author, title, subject and date modified) is indexed and stored separately as fields
of a document.

不关心的同学可以将它理解为数据库里表的一条记录，最后查询出结果后得到的也是doc对象，也就是一条记录；

那么建立索引就是添加很多记录到lucene里；

第19行第一个参数就不解释了，第二个参数NOT_ANALYZED并不是就搜不到这个字段而是作为整个字段搜索，不分词而已；
搜索

public ActionResult Index(string keyWord)

        {

            var originalKeyWords = keyWord;

            ViewBag.TotalResult = 0;

            ViewBag.Results = new List<KeyValuePair<string, string>>();

            if (string.IsNullOrEmpty(keyWord))

            { ViewBag.Message = "Welcome Today!"; return View("Index"); }

            var q = keyWord;

            var search = new IndexSearcher(_indexDir, true);

           // q = GetKeyWordsSplitBySpace(q, new PanGuTokenizer());

            var queryParser =  new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "contents", new PanGuAnalyzer(false));

            var query = queryParser.Parse(q);

            var hits = search.Search(query, 100); //search.Search(bq, 100);

            var recCount = hits.totalHits;

            ViewBag.TotalResult = recCount;



            //show explain

            for (int d = 0; d < search.MaxDoc(); d++)

            {

                ViewBag.Explain += search.Explain(query, d).ToHtml();

                var termReader=search.GetIndexReader().Terms();

                ViewBag.Explain += "<ul >";

                do

                {

                    if(termReader.Term()!=null)

                    ViewBag.Explain += string.Format("<li>{0}</li>", termReader.Term().Text());

                } while (termReader.Next());

                ViewBag.Explain += "</ul>";

            }

            foreach (var hit in hits.scoreDocs)

            {

                try

                {

                    var doc = search.Doc(hit.doc);

                    var fileName = doc.Get("filename");

                    // fileName = highlighter.GetBestFragment(originalKeyWords, fileName);

                    //var contents = GetBestFragment(originalKeyWords, new StreamReader(doc.Get("fullpath"), Encoding.GetEncoding("gb2312")));

                    (ViewBag.Results as List<KeyValuePair<string, string>>)

                        .Add(new KeyValuePair<string, string>(fileName, string.Empty));

                }

                catch (Exception exc)

                {

                    Response.Write(exc.Message);

                    throw;

                }

            }

            search.Close();

            ViewBag.Message = string.Format("????{0}", keyWord);

            return View("Index");

        }
后续文章会继续贴这些代码，并带上注释，在外面写距离有点远，也累。
作者：KKcat
　　　　
出处：http://jinzhao.cnblogs.com/

个人博客：http://jinzhao.me/
　　　　
本文版权归作者和博客园共有，欢迎转载，但未经作者同意必须保留此段声明，且在文章页面明显位置给出原文连接，否则保留追究法律责任的权利。
查看全文

相关阅读:
testng
RF相关命令
 批处理bat相关
 VIM常用快捷键
 JAVA异常处理
 cucumber+selenium
webDriver各版本对应
 python源码
 python之logging模块
 pywinauto进阶练习

原文地址：https://www.cnblogs.com/jinzhao/p/2154229.html

Lucene2.9.2 + 盘古分词2.3.1（一） 入门： 建立简单索引，搜索（原创）

Lucene2.9.2 + 盘古分词2.3.1（一）入门：建立简单索引，搜索（原创）