zoukankan      html  css  js  c++  java
  • 关于lucene的RAMDirectory和FSDirectory的性能问题的困惑

    关于lucene的RAMDirectory和FSDirectory的性能问题的困惑

    在lucene in Action书中说RAMDirectory的性能总是比FSDirectory优越(书中2.7.2章节)
    并附了测试用例
    我根据测试用例去实际测试了一下,结果是相反的
    这让我很困惑,内存没道理比文件系统慢啊。。
    附上执行结果:
    RAMDirectory Time: 500 ms
    FSDirectory Time: 266 ms


    以下是我的代码(基本照搬书中例子,只更改了for循环写法和使用了2.9的推荐方法取代了老版本的方法) 
     

    import java.io.File;
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.Collection;

    import org.apache.lucene.analysis.SimpleAnalyzer;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.document.Field;
    import org.apache.lucene.index.IndexWriter;
    import org.apache.lucene.store.Directory;
    import org.apache.lucene.store.FSDirectory;
    import org.apache.lucene.store.RAMDirectory;

    import junit.framework.TestCase;

    /**
    * 测试FSDirectory和RAMDirectory之间的性能差异
    * 理论上来说,后者应大于前者,但是实际测试值相反,why? lucifer 2010-5-10
    * @author lucifer
    *
    */
    public class FSversusRAMDirectoryTest extends TestCase
    {
    private Directory fsDir;
    private Directory ramDir;
    private Collection<String> docs = loadDocuments(3000,5);

    protected void setUp() throws Exception
    {
    String fsIndexDir = System.getProperty("java.io.tmpdir","tmp")+File.separator+"fs-index";
    ramDir = new RAMDirectory();
    fsDir = FSDirectory.open(new File(fsIndexDir));
    }

    public void testTiming() throws IOException
    {
    long ramTiming = timeIndexWriter(ramDir);
    long fsTiming = timeIndexWriter(fsDir);

    // assertTrue(fsTiming>ramTiming);

    System.out.println("RAMDirectory Time: "+ ramTiming +" ms");
    System.out.println("FSDirectory Time: "+ fsTiming +" ms");
    }

    private long timeIndexWriter(Directory dir)throws IOException
    {
    long start = System.currentTimeMillis();
    addDocuments(dir);
    long stop = System.currentTimeMillis();
    return (stop - start);
    }

    private void addDocuments(Directory dir)throws IOException
    {
    /**
    * SimpleAnalyzer:把所有字符过滤成小写
    * 把参数设为false时,使用RAMDirectory出错,报文件找不到
    */
    IndexWriter writer = new IndexWriter(dir,new SimpleAnalyzer(),true,IndexWriter.MaxFieldLength.UNLIMITED);

    /**
    * 以下参数影响FSDirectory性能
    * MergeFactor的值不能小于2
    * MaxMergeDocs的值可以设置的比MergeFactor小,未见异常抛出
    */
    writer.setMergeFactor(10);
    writer.setMaxMergeDocs(10000);

    for(Object obj:docs)
    {
    Document doc = new Document();
    String word = (String)obj;
    doc.add(new Field("keyword",word,Field.Store.YES,Field.Index.NOT_ANALYZED));
    doc.add(new Field("unindexed",word,Field.Store.YES,Field.Index.NO));
    doc.add(new Field("unstored",word,Field.Store.NO,Field.Index.NOT_ANALYZED));
    doc.add(new Field("text",word,Field.Store.NO,Field.Index.ANALYZED));
    writer.addDocument(doc);
    }
    writer.optimize();
    writer.close();
    }

    private Collection<String> loadDocuments(int numDocs,int wordsPerDoc)
    {
    Collection<String> docs = new ArrayList<String>(numDocs);
    for(int i=0;i<numDocs;i++)
    {
    StringBuffer doc = new StringBuffer(wordsPerDoc);
    for(int j=0;j<wordsPerDoc;j++)
    doc.append("bibamus ");
    docs.add(doc.toString());
    }

    return docs;
    }
    }

    (转自:http://www.iteye.com/problems/42016)

  • 相关阅读:
    What is PE ?
    指针和引用(int*、int&、int*&、int&*、int**)
    诺顿12 免许可 英文版 Symantec Endpoint Protection 12.1.671.4971 下载
    RadControls for ASP.NET AJAX Q2 2011 最新版下载+源码下载
    请大家给小弟 改个SQL的错 谢谢啦!!
    [置顶]DAEMON Tools Pro Advanced 4.41.0314.0232 破解版
    Setting Environment Variable PATH on Ubuntu
    小弟的程序有点问题 请高手帮忙改一下
    Session 有没有必要使用它
    ASP.NET 2.0加密Web.config 配置文件
  • 原文地址:https://www.cnblogs.com/fengweixin/p/3598136.html
Copyright © 2011-2022 走看看