zoukankan      html  css  js  c++  java
  • Lucene.Net中 FSDirectory存储方式下一个 Document是如何得到的

    防止忘记的最好的方法就是记下来。

    这是一段最简单的搜索代码:

            public void Search()
            {
                var dir=FSDirectory.Open(new DirectoryInfo("xxx"));
                var searcher = new IndexSearcher(dir, true);
                var query = new TermQuery(new Term("Title", "jinzhao"));
                var tops=searcher.Search(query,100);
                foreach(var top in tops)
                {
                    var doc=searcher.Doc(top);
                    Output(doc);
                }
            }
    

     红色的一句话就返回了一个完整document,是search内部的IndexReader(Lucene.Net.Index.IndexReader)返回的document,方法如下:

    		public abstract Document Document(int n, FieldSelector fieldSelector);
    

    下面是这个类的实现:

    他们的关系如下:

    MultiReader和ParallelReader维护了IndexReader的一个集合(这些IndexReader可能由下面几重实现,但是不包含SegmentReader),封装了访问多个reader的方式,原理就是lucene里最常见的偏移的方式;

    DirectoryReader等除SegmentReader外模拟的是一个目录,就像索引文件夹一样,它维护了一组SegmentReader的实现,原理如上;

    SegmentReader是读取文档的最小单位它不再维护任何子的IndexReader,接收到ID后就会读取通过public sealed class FieldsReader 读取这个文档的字段(Lucene的核心就是文档,一个文档由若干字段组成),这里加载方式有立即加载、立即加载指定字段、懒加载等其它几种,方法如下:

    		public /*internal*/ Document Doc(int n, FieldSelector fieldSelector)
    		{
    			SeekIndex(n);
    			long position = indexStream.ReadLong();
    			fieldsStream.Seek(position);
    			
    			Document doc = new Document();
    			int numFields = fieldsStream.ReadVInt();
    			for (int i = 0; i < numFields; i++)
    			{
    				int fieldNumber = fieldsStream.ReadVInt();
    				FieldInfo fi = fieldInfos.FieldInfo(fieldNumber);
    				FieldSelectorResult acceptField = fieldSelector == null?FieldSelectorResult.LOAD:fieldSelector.Accept(fi.name);
    				
    				byte bits = fieldsStream.ReadByte();
    				System.Diagnostics.Debug.Assert(bits <= FieldsWriter.FIELD_IS_COMPRESSED + FieldsWriter.FIELD_IS_TOKENIZED + FieldsWriter.FIELD_IS_BINARY);
    				
    				bool compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0;
    				bool tokenize = (bits & FieldsWriter.FIELD_IS_TOKENIZED) != 0;
    				bool binary = (bits & FieldsWriter.FIELD_IS_BINARY) != 0;
    				//TODO: Find an alternative approach here if this list continues to grow beyond the
    				//list of 5 or 6 currently here.  See Lucene 762 for discussion
    				if (acceptField.Equals(FieldSelectorResult.LOAD))
    				{
    					AddField(doc, fi, binary, compressed, tokenize);
    				}
    				else if (acceptField.Equals(FieldSelectorResult.LOAD_FOR_MERGE))
    				{
    					AddFieldForMerge(doc, fi, binary, compressed, tokenize);
    				}
    				else if (acceptField.Equals(FieldSelectorResult.LOAD_AND_BREAK))
    				{
    					AddField(doc, fi, binary, compressed, tokenize);
    					break; //Get out of this loop
    				}
    				else if (acceptField.Equals(FieldSelectorResult.LAZY_LOAD))
    				{
    					AddFieldLazy(doc, fi, binary, compressed, tokenize);
    				}
    				else if (acceptField.Equals(FieldSelectorResult.SIZE))
    				{
    					SkipField(binary, compressed, AddFieldSize(doc, fi, binary, compressed));
    				}
    				else if (acceptField.Equals(FieldSelectorResult.SIZE_AND_BREAK))
    				{
    					AddFieldSize(doc, fi, binary, compressed);
    					break;
    				}
    				else
    				{
    					SkipField(binary, compressed);
    				}
    			}
    			
    			return doc;
    		}
    

    标红的是一个IndexInput的实现,它是具体读取的方法,实现一般在存储类中以嵌套公开的方式实现,比如此处例子的实现如下:

            public /*protected internal*/class SimpleFSIndexInput : BufferedIndexInput, System.ICloneable
            {
    
                protected internal class Descriptor : System.IO.BinaryReader
                {
                    // remember if the file is open, so that we don't try to close it
                    // more than once
                    protected internal volatile bool isOpen;
                    internal long position;
                    internal long length;
    
                    public Descriptor(/*FSIndexInput enclosingInstance,*/ System.IO.FileInfo file, System.IO.FileAccess mode)
                        : base(new System.IO.FileStream(file.FullName, System.IO.FileMode.Open, mode, System.IO.FileShare.ReadWrite))
                    {
                        isOpen = true;
                        length = file.Length;
                    }
    
                    public override void Close()
                    {
                        if (isOpen)
                        {
                            isOpen = false;
                            base.Close();
                        }
                    }
    
                    ~Descriptor()
                    {
                        try
                        {
                            Close();
                        }
                        finally
                        {
                        }
                    }
                }
    

     可以看到最后字段由System.IO.BinaryReader到文件中读取。

    完。

  • 相关阅读:
    [黑防VIP课程]汇编基础一日一学习2
    立即释放.net下的com组件
    WinExec,ShellExecute ,CreateProcess 区别
    .Net中如何操作IIS(原理篇)+实现类
    全用存储过程和全用SQL思考笔记
    C# 中的常用正则表达式总结
    .Net中窗体间传递值的一种方法
    [黑防VIP课程]汇编基础一日一学习1
    [黑防VIP课程]汇编基础一日一学习2
    浮点指令
  • 原文地址:https://www.cnblogs.com/jinzhao/p/2537068.html
Copyright © 2011-2022 走看看