zoukankan      html  css  js  c++  java
  • lucene3.0_IndexWriter基础使用及注意事项

    系列汇总:

    lucene3.0_基础使用及注意事项汇总

    -------------------------------------------------

    在指定磁盘下建立索引,并对该过程中存在的问题进行说明:

    源代码如下所示:

    publicvoid createIndex(){
    IndexWriter writer
    =null ;
    FSDirectory dir
    =null ;
    try {
    //注意点1:在window系统中我们通常使用simpleFSDirectory,而其他操作系统则使用NIOFSDirectory。
    //NIOFSDirectory uses java.nio's FileChannel's positional io when reading to avoid synchronization
    //when reading from the same file. Unfortunately, due to a Windows-only Sun JRE bug this is a poor choice for Windows,
    //but on all other platforms this is the preferred choice.
    dir = SimpleFSDirectory.open(new File("d:/20101015index"));
    writer
    =new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), true, MaxFieldLength.UNLIMITED);
    //add document
    //注意点2:filed实例在多次添加的时候可以重用,节约构造field实例的时间。
    Field f1 =new Field("f1", "", Store.YES, Index.ANALYZED) ;
    Field f2
    =new Field("f2", "", Store.YES, Index.ANALYZED) ;
    for (int i =0; i <500000; i++) {
    Document doc
    =new Document();
    f1.setValue(
    "f1 hello doc"+ i);
    doc.add(f1);
    f2.setValue(
    "f2 world doc"+ i);
    doc.add(f2);
    writer.addDocument(doc);
    }
    }
    catch (IOException e) {
    e.printStackTrace();
    }
    //注意点3:关闭writer的时候可能发生异常(磁盘空间不够),那么此时indexWriter实例将继续持有文件锁。
    //那么下次打开该索引时(indexWriter or indexReader)将发生obtain lock file 异常,
    //为了避免这种情况,最好在finally里面将锁文件强行删除(unlock)。
    finally{
    try {
    if(writer!=null){
    writer.close();
    }
    }
    catch (CorruptIndexException e) {
    e.printStackTrace();
    }
    catch (IOException e) {
    e.printStackTrace();
    }
    finally{
    try {
    if(dir!=null&& IndexWriter.isLocked(dir)){
    IndexWriter.unlock(dir);
    }
    }
    catch (IOException e) {
    e.printStackTrace();
    }
    }
    }

    }

    注意点:

    1.选用FSDiretory的哪个子类?

    摘自api:

    • SimpleFSDirectory is a straightforward implementation using java.io.RandomAccessFile. However, it has poor concurrent performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from the same file.
    • NIOFSDirectory uses java.nio's FileChannel's positional io when reading to avoid synchronization when reading from the same file. Unfortunately, due to a Windows-only Sun JRE bug this is a poor choice for Windows, but on all other platforms this is the preferred choice.
    • MMapDirectory uses memory-mapped IO when reading. This is a good choice if you have plenty of virtual memory relative to your index size, eg if you are running on a 64 bit JRE, or you are running on a 32 bit JRE but your index sizes are small enough to fit into the virtual memory space. Java has currently the limitation of not being able to unmap files from user code. The files are unmapped, when GC releases the byte buffers. Due to this bug in Sun's JRE, MMapDirectory's IndexInput.close() is unable to close the underlying OS file handle. Only when GC finally collects the underlying objects, which could be quite some time later, will the file handle be closed. This will consume additional transient disk usage: on Windows, attempts to delete or overwrite the files will result in an exception; on other platforms, which typically have a "delete on last close" semantics, while such operations will succeed, the bytes are still consuming space on disk. For many applications this limitation is not a problem (e.g. if you have plenty of disk space, and you don't rely on overwriting files on Windows) but it's still an important limitation to be aware of. This class supplies a (possibly dangerous) workaround mentioned in the bug report, which may fail on non-Sun JVMs.

    ok,如果操作系统是windows系列的,还是使用SimpleFSDiretory吧!其他系统建议使用NIOFSDirectory。MMapDirectory尚未涉及。

    2.重用Field实例。

    上面代码给出了重用的方式,而不重用的方式如下所示:

    for (int i =0; i <500000; i++) {
    Field f1
    =new Field("f1", "f1 hello doc"+ i, Store.YES, Index.ANALYZED) ;
    Field f2
    =new Field("f2", "f2 world doc"+ i, Store.YES, Index.ANALYZED) ;
    Document doc
    =new Document();
    doc.add(f1);
    doc.add(f2);
    writer.addDocument(doc);
     
    }

    经过多次测试,重用filed实例的方式在时间上是优于不重用field方式的,但是效果不明显,500000次add就节约400ms左右。

    但是千万不要在每次addDocument后进行commit,本来8s就可以over的事得花好几分钟。

    commit:

    This may be a costly operation, so you should test the cost in your application and do it only when really necessary.

    3.如何正确关闭indexWriter实例?关闭过程中发生问题如何处理?

    代码中已近注释说明了,其实api中说的更清楚。

    If an Exception is hit during close, eg due to disk full or some other reason, then both the on-disk index and the internal state of the IndexWriter instance will be consistent. However, the close will not be complete even though part of it (flushing buffered documents) may have succeeded, so the write lock will still be held.

    If you can correct the underlying cause (eg free up some disk space) then you can call close() again. Failing that, if you want to force the write lock to be released (dangerous, because you may then lose buffered docs in the IndexWriter instance) then you can do something like this:

     try {
    writer.close();
    } finally {
    if (IndexWriter.isLocked(directory)) {
    IndexWriter.unlock(directory);
    }
    }
  • 相关阅读:
    解决软件升级过程中GAC发生的问题.
    Appupdater 组件的一个问题.
    == 和 != 有点靠不主,建议在进行比较的时候尽可能的使用Equals
    从资源文件中提取Icon对象到Image对象的小技巧。
    arcgis flex开发备忘
    IIS服务器的系统事件日志常见问题汇总(更新中)
    WebSphere6.1配置应用程序
    Android Preference使用
    Oracle无法删除当前已连接的用户
    SSH环境搭建
  • 原文地址:https://www.cnblogs.com/huangfox/p/1852371.html
Copyright © 2011-2022 走看看