zoukankan      html  css  js  c++  java
  • 关于Lucene 3.0升级到Lucene 4.x 备忘

      最近,需要对项目进行lucene版本升级。而原来项目时基于lucene 3.0的,很古老的一个版本的了。在老版本中中,我们主要用了几个lucene的东西:

      1、查询lucene多目录索引。

      2、构建RAMDirectory,把索引放到内存中,以提高检索效率。

      3、构建Lucene自定义分词。

      4、修改Lucene默认的打分算法。

        下面,将代码改造前和改造后做一对比:

        1. 搜索多索引目录

       3.0 构建多索引目录: 

     1     // 初始化全国索引
     2     private boolean InitGlobal(String strRootPath) {
     3         try {
     4 
     5             IndexSearcher[] searchers = new IndexSearcher[2];
     6             
     7             MultiSearcher globalSearcher = null;
     8             if (Configution.IsMMap.equalsIgnoreCase("true")) {
     9 
    10                 searchers[0] = new IndexSearcher(new RAMDirectory(FSDirectory
    11                         .open(new File(strRootPath + "/" + GLABOL_INDEX))));
    12                 searchers[1] = new IndexSearcher(new RAMDirectory(FSDirectory
    13                         .open(new File(strRootPath + "/" + BUS_INDEX))));
    14 //                searchers[2] = new IndexSearcher(new RAMDirectory(FSDirectory
    15 //                        .open(new File(strRootPath + "/" + LU_INDEX))));
    16                 globalSearcher =  new MultiSearcher(searchers);
    17             } else {
    18                 searchers[0] = new IndexSearcher(FSDirectory.open(new File(
    19                         strRootPath + "/" + GLABOL_INDEX)));
    20                 searchers[1] = new IndexSearcher(FSDirectory.open(new File(
    21                         strRootPath + "/" + BUS_INDEX)));
    22 //                searchers[2] = new IndexSearcher(FSDirectory.open(new File(
    23 //                        strRootPath + "/" + LU_INDEX)));
    24                 
    25                 globalSearcher =  new MultiSearcher(searchers);
    26             }
    27             System.out.println("finish Global");
    28 
    29             m_mapIndexName2Searcher.put("0", globalSearcher);
    30             m_mapAdmin2IndexName.put("0", "0");
    31 
    32             return true;
    33 
    34         } catch (Exception e) {
    35             e.printStackTrace();
    36             SearchLog.SearchLog.error("全国索引初始化异常");
    37             return false;
    38         }
    39     }

         Ok,使用MultiSearcher,这是lucene低版本搜索多索引的解决方案。但是在高版本,MutiSearcher这个类本身都删除了,折腾我很长时间。可见以版本帝著称的Lucene代码设计不是太好。整个lucene代码,接口使用很少,大多是类和抽象类。

           4.x 构建多索引目录: 

    	// 初始化全国索引
    	private boolean InitGlobal(String strRootPath) {
    		try {
    			
    			IndexSearcher globalSearcher = null;
    			if (Configution.IsMMap.equalsIgnoreCase("true")) {
    				
    				IndexReader irGlobal = DirectoryReader.open(new RAMDirectory(FSDirectory
    						.open(new File(strRootPath + "/" + GLABOL_INDEX)),new IOContext()));
    				
    				IndexReader irBus = DirectoryReader.open(new RAMDirectory(FSDirectory
    						.open(new File(strRootPath + "/" + BUS_INDEX)),new IOContext()));
    				
    				MultiReader mr = new MultiReader(irGlobal,irBus);
    				
    
    				globalSearcher =  new IndexSearcher(mr);//new MultiSearcher(searchers);
    			} else {
    
    				IndexReader irGlobal = DirectoryReader.open(FSDirectory
    						.open(new File(strRootPath + "/" + GLABOL_INDEX)));
    				
    				IndexReader irBus = DirectoryReader.open(FSDirectory
    						.open(new File(strRootPath + "/" + BUS_INDEX)));
    				
    				MultiReader mr = new MultiReader(irGlobal,irBus);
    				globalSearcher =   new IndexSearcher(mr);//new MultiSearcher(searchers);
    			}
    			System.out.println("finish Global");
    
    			m_mapIndexName2Searcher.put("0", globalSearcher);
    			m_mapAdmin2IndexName.put("0", "0");
    
    			return true;
    
    		} catch (Exception e) {
    			e.printStackTrace();
    			SearchLog.SearchLog.error("全国索引初始化异常");
    			return false;
    		}
    	}
    

      ok 改造后,直接用IndexSearcher替代MultiSearcher,通过传入MultiReader来检索多个索引目录。

      2、构建RAMDirectory,将索引放入内存中。

        3.0 构建内存索引目录:

                    searchers[0] = new IndexSearcher(new RAMDirectory(FSDirectory
                            .open(new File(strRootPath + "/" + GLABOL_INDEX))));
                    searchers[1] = new IndexSearcher(new RAMDirectory(FSDirectory
                            .open(new File(strRootPath + "/" + BUS_INDEX))));

        直接将Diretory作为RAMDirectory的构造函数,注意这个动作有坑,如果数据量大,你要等很久的!

        4.x 构建内存索引目录:

                    IndexReader irGlobal = DirectoryReader.open(new RAMDirectory(FSDirectory
                            .open(new File(strRootPath + "/" + GLABOL_INDEX)),new IOContext()));
                    
                    IndexReader irBus = DirectoryReader.open(new RAMDirectory(FSDirectory
                            .open(new File(strRootPath + "/" + BUS_INDEX)),new IOContext()));
                    
                    MultiReader mr = new MultiReader(irGlobal,irBus);

        在4.x中,安装3.0构造方法是不行的,还需要传入一个IOContext对象,汗~~~~~~~~~~~~~~~~

      3、自定义分词:

        3.0 自定义分词:

    public class SingleAnalyzer extends Analyzer {
    
        /**
         * @param args
         */
        
    
            public TokenStream tokenStream(String fieldName, Reader reader){
                TokenStream result = null;
                if(fieldName.equals("name"))
                {
                    result = new SingleTokenizer(reader);
                }
                if(fieldName.equals("totalcity"))
                {
                    result = new IKTokenizer(reader, false);
                }
            
    //        result = new StandardFilter(result);
    //        result = new LowerCaseFilter(result);
        //    result = new StopFilter(result, stopSet);
            return result;
            }
    
        
        public static void main(String[] args) {
            // TODO Auto-generated method stub
    
        }
    
    }

      重写tokenStream方法即可,很简单。

        4.x自定义分词:

    public class SingleAnalyzer extends Analyzer {
    
        /**
         * @param args
         */
        
    
    //        public TokenStream tokenStream(String fieldName, Reader reader){
    //            TokenStream result = null;
    //            if(fieldName.equals("name"))
    //            {
    //                result = new SingleTokenizer(reader);
    //            }
    //            if(fieldName.equals("totalcity"))
    //            {
    //                result = new IKTokenizer(reader, false);
    //            }
    //        
    ////        result = new StandardFilter(result);
    ////        result = new LowerCaseFilter(result);
    //    //    result = new StopFilter(result, stopSet);
    //        return result;
    //        }
    
        @Override
        protected TokenStreamComponents createComponents(String fieldName,
                Reader reader) {
            // TODO Auto-generated method stub
    //         final Tokenizer source = new ChineseTokenizer(reader);
    //          return new TokenStreamComponents(source, new ChineseFilter(source));
            Tokenizer source = null;
            if(fieldName.equals("name")){
                source = new SingleTokenizer(reader);
            }else if(fieldName.equals("totalcity")){
                source = new IKTokenizer(reader, false);
            }
            return new TokenStreamComponents(source, source);
        }
    
    }

      OK,在4.x中你需要重写createComponents方法。

      4、打分算法:

        3.x和4.x打分算法变化不大,但是命名空间发生了变化,汗~~~~~~~~~~~~

               3.x 命名空间:引入:import org.apache.lucene.search.DefaultSimilarity,命名空间在:org.apache.lucene.search

               4.x命名空间:引入:import org.apache.lucene.search.similarities.*,命名空间在:org.apache.lucene.search.similarities。

      5、查询表达式:主要体现在TermRangeQuery上,3.x版本的一个参数是string类型,但是在4.x版本变成了包了string一层的BytesRef,还有其他很多细节变化

        3.x TermRangerQuery: 

     1         String left = Long
     2                     .toString((long) (rcBound.m_dLeft * COORDINATE_SCALE_FACTOR));
     3             String right = Long
     4                     .toString((long) (rcBound.m_dRight * COORDINATE_SCALE_FACTOR));
     5             String top = Long
     6                     .toString((long) (rcBound.m_dTop * COORDINATE_SCALE_FACTOR));
     7             String bottom = Long
     8                     .toString((long) (rcBound.m_dBottom * COORDINATE_SCALE_FACTOR));
     9             
    10             
    11 
    12             TermRangeQuery query1 = new TermRangeQuery("lon", left, right,
    13                     true, true);
    14             TermRangeQuery query2 = new TermRangeQuery("lat", bottom, top,
    15                     true, true);
    16             searchQuery.add(query1, BooleanClause.Occur.MUST);
    17             searchQuery.add(query2, BooleanClause.Occur.MUST);

           4.x TermRangerQuery:  

    String left = Long
                        .toString((long) (rcBound.m_dLeft * COORDINATE_SCALE_FACTOR));
                String right = Long
                        .toString((long) (rcBound.m_dRight * COORDINATE_SCALE_FACTOR));
                String top = Long
                        .toString((long) (rcBound.m_dTop * COORDINATE_SCALE_FACTOR));
                String bottom = Long
                        .toString((long) (rcBound.m_dBottom * COORDINATE_SCALE_FACTOR));
                
                
                BytesRef brLeft = new BytesRef(left);
                BytesRef brRight = new BytesRef(right);
                BytesRef brBottom = new BytesRef(bottom);
                BytesRef brTop = new BytesRef(top);
    
                TermRangeQuery query1 = new TermRangeQuery("lon",
                        brLeft, brRight, true, true);
                TermRangeQuery query2 = new TermRangeQuery("lat",
                        brBottom, brTop, true, true);
                searchQuery.add(query1, BooleanClause.Occur.MUST);
                searchQuery.add(query2, BooleanClause.Occur.MUST);

      6、关闭IndexSearcher

        3.x 关闭IndexSearcher直接调用close方法即可:

     1 public void UnInit() {
     2         if (!m_bIsInit)
     3             return;
     4 
     5         Iterator iter = m_mapIndexName2Searcher.keySet().iterator();
     6 
     7         while (iter.hasNext()) {
     8 
     9             String key = (String) iter.next();
    10 
    11             MultiSearcher val = (MultiSearcher) m_mapIndexName2Searcher
    12                     .get(key);
    13 
    14             try {
    15     
    16                 val.close();//关闭IndexSearcher
    17             } catch (IOException e) {
    18                 e.printStackTrace();
    19                 SearchLog.SearchLog.error("分级索引关闭异常");
    20             }
    21         }
    22 
    23         m_mapIndexName2Searcher.clear();
    24         m_mapAdmin2IndexName.clear();
    25         m_mapIndexName2Searcher = null;
    26         m_mapAdmin2IndexName = null;
    27         m_bIsInit = false;
    28     }    

      4.x 关闭IndexSearcher 没有直接close的方法,需要getIndexReader 然后调用IndexReader的close方法:

     1 public void UnInit() {
     2         if (!m_bIsInit)
     3             return;
     4 
     5         Iterator iter = m_mapIndexName2Searcher.keySet().iterator();
     6 
     7         while (iter.hasNext()) {
     8 
     9             String key = (String) iter.next();
    10 
    11             IndexSearcher val = (IndexSearcher) m_mapIndexName2Searcher
    12                     .get(key);
    13 
    14             try {
    15                 val.getIndexReader().close();//关闭IndexSearcher
    16             } catch (IOException e) {
    17                 e.printStackTrace();
    18                 SearchLog.SearchLog.error("分级索引关闭异常");
    19             }
    20         }
    21 
    22         m_mapIndexName2Searcher.clear();
    23         m_mapAdmin2IndexName.clear();
    24         m_mapIndexName2Searcher = null;
    25         m_mapAdmin2IndexName = null;
    26         m_bIsInit = false;
    27     }

      总之,lucene版本变化很大,如果升级很多方法发送变化,您需要细致观察,多试试,才能升级。升级完成后,最好进行一次功能测试,有些功能可能发生变化甚至错误。升级Lucene不是一件好差事~~~~~~~~~

    文章转载请注明出处:http://www.cnblogs.com/likehua/p/4387700.html

        

      

  • 相关阅读:
    C语言 sprintf 函数 C语言零基础入门教程
    C语言 printf 函数 C语言零基础入门教程
    C语言 文件读写 fgets 函数 C语言零基础入门教程
    C语言 文件读写 fputs 函数 C语言零基础入门教程
    C语言 fprintf 函数 C语言零基础入门教程
    C语言 文件读写 fgetc 函数 C语言零基础入门教程
    C语言 文件读写 fputc 函数 C语言零基础入门教程
    C语言 strlen 函数 C语言零基础入门教程
    Brad Abrams关于Naming Conventions的演讲中涉及到的生词集解
    适配器模式
  • 原文地址:https://www.cnblogs.com/likehua/p/4387700.html
Copyright © 2011-2022 走看看