zoukankan      html  css  js  c++  java
  • 关于Lucene 3.0升级到Lucene 4.x 备忘

      最近,需要对项目进行lucene版本升级。而原来项目时基于lucene 3.0的,很古老的一个版本的了。在老版本中中,我们主要用了几个lucene的东西:

      1、查询lucene多目录索引。

      2、构建RAMDirectory,把索引放到内存中,以提高检索效率。

      3、构建Lucene自定义分词。

      4、修改Lucene默认的打分算法。

        下面,将代码改造前和改造后做一对比:

        1. 搜索多索引目录

       3.0 构建多索引目录: 

     1     // 初始化全国索引
     2     private boolean InitGlobal(String strRootPath) {
     3         try {
     4 
     5             IndexSearcher[] searchers = new IndexSearcher[2];
     6             
     7             MultiSearcher globalSearcher = null;
     8             if (Configution.IsMMap.equalsIgnoreCase("true")) {
     9 
    10                 searchers[0] = new IndexSearcher(new RAMDirectory(FSDirectory
    11                         .open(new File(strRootPath + "/" + GLABOL_INDEX))));
    12                 searchers[1] = new IndexSearcher(new RAMDirectory(FSDirectory
    13                         .open(new File(strRootPath + "/" + BUS_INDEX))));
    14 //                searchers[2] = new IndexSearcher(new RAMDirectory(FSDirectory
    15 //                        .open(new File(strRootPath + "/" + LU_INDEX))));
    16                 globalSearcher =  new MultiSearcher(searchers);
    17             } else {
    18                 searchers[0] = new IndexSearcher(FSDirectory.open(new File(
    19                         strRootPath + "/" + GLABOL_INDEX)));
    20                 searchers[1] = new IndexSearcher(FSDirectory.open(new File(
    21                         strRootPath + "/" + BUS_INDEX)));
    22 //                searchers[2] = new IndexSearcher(FSDirectory.open(new File(
    23 //                        strRootPath + "/" + LU_INDEX)));
    24                 
    25                 globalSearcher =  new MultiSearcher(searchers);
    26             }
    27             System.out.println("finish Global");
    28 
    29             m_mapIndexName2Searcher.put("0", globalSearcher);
    30             m_mapAdmin2IndexName.put("0", "0");
    31 
    32             return true;
    33 
    34         } catch (Exception e) {
    35             e.printStackTrace();
    36             SearchLog.SearchLog.error("全国索引初始化异常");
    37             return false;
    38         }
    39     }

         Ok,使用MultiSearcher,这是lucene低版本搜索多索引的解决方案。但是在高版本,MutiSearcher这个类本身都删除了,折腾我很长时间。可见以版本帝著称的Lucene代码设计不是太好。整个lucene代码,接口使用很少,大多是类和抽象类。

           4.x 构建多索引目录: 

    	// 初始化全国索引
    	private boolean InitGlobal(String strRootPath) {
    		try {
    			
    			IndexSearcher globalSearcher = null;
    			if (Configution.IsMMap.equalsIgnoreCase("true")) {
    				
    				IndexReader irGlobal = DirectoryReader.open(new RAMDirectory(FSDirectory
    						.open(new File(strRootPath + "/" + GLABOL_INDEX)),new IOContext()));
    				
    				IndexReader irBus = DirectoryReader.open(new RAMDirectory(FSDirectory
    						.open(new File(strRootPath + "/" + BUS_INDEX)),new IOContext()));
    				
    				MultiReader mr = new MultiReader(irGlobal,irBus);
    				
    
    				globalSearcher =  new IndexSearcher(mr);//new MultiSearcher(searchers);
    			} else {
    
    				IndexReader irGlobal = DirectoryReader.open(FSDirectory
    						.open(new File(strRootPath + "/" + GLABOL_INDEX)));
    				
    				IndexReader irBus = DirectoryReader.open(FSDirectory
    						.open(new File(strRootPath + "/" + BUS_INDEX)));
    				
    				MultiReader mr = new MultiReader(irGlobal,irBus);
    				globalSearcher =   new IndexSearcher(mr);//new MultiSearcher(searchers);
    			}
    			System.out.println("finish Global");
    
    			m_mapIndexName2Searcher.put("0", globalSearcher);
    			m_mapAdmin2IndexName.put("0", "0");
    
    			return true;
    
    		} catch (Exception e) {
    			e.printStackTrace();
    			SearchLog.SearchLog.error("全国索引初始化异常");
    			return false;
    		}
    	}
    

      ok 改造后,直接用IndexSearcher替代MultiSearcher,通过传入MultiReader来检索多个索引目录。

      2、构建RAMDirectory,将索引放入内存中。

        3.0 构建内存索引目录:

                    searchers[0] = new IndexSearcher(new RAMDirectory(FSDirectory
                            .open(new File(strRootPath + "/" + GLABOL_INDEX))));
                    searchers[1] = new IndexSearcher(new RAMDirectory(FSDirectory
                            .open(new File(strRootPath + "/" + BUS_INDEX))));

        直接将Diretory作为RAMDirectory的构造函数,注意这个动作有坑,如果数据量大,你要等很久的!

        4.x 构建内存索引目录:

                    IndexReader irGlobal = DirectoryReader.open(new RAMDirectory(FSDirectory
                            .open(new File(strRootPath + "/" + GLABOL_INDEX)),new IOContext()));
                    
                    IndexReader irBus = DirectoryReader.open(new RAMDirectory(FSDirectory
                            .open(new File(strRootPath + "/" + BUS_INDEX)),new IOContext()));
                    
                    MultiReader mr = new MultiReader(irGlobal,irBus);

        在4.x中,安装3.0构造方法是不行的,还需要传入一个IOContext对象,汗~~~~~~~~~~~~~~~~

      3、自定义分词:

        3.0 自定义分词:

    public class SingleAnalyzer extends Analyzer {
    
        /**
         * @param args
         */
        
    
            public TokenStream tokenStream(String fieldName, Reader reader){
                TokenStream result = null;
                if(fieldName.equals("name"))
                {
                    result = new SingleTokenizer(reader);
                }
                if(fieldName.equals("totalcity"))
                {
                    result = new IKTokenizer(reader, false);
                }
            
    //        result = new StandardFilter(result);
    //        result = new LowerCaseFilter(result);
        //    result = new StopFilter(result, stopSet);
            return result;
            }
    
        
        public static void main(String[] args) {
            // TODO Auto-generated method stub
    
        }
    
    }

      重写tokenStream方法即可,很简单。

        4.x自定义分词:

    public class SingleAnalyzer extends Analyzer {
    
        /**
         * @param args
         */
        
    
    //        public TokenStream tokenStream(String fieldName, Reader reader){
    //            TokenStream result = null;
    //            if(fieldName.equals("name"))
    //            {
    //                result = new SingleTokenizer(reader);
    //            }
    //            if(fieldName.equals("totalcity"))
    //            {
    //                result = new IKTokenizer(reader, false);
    //            }
    //        
    ////        result = new StandardFilter(result);
    ////        result = new LowerCaseFilter(result);
    //    //    result = new StopFilter(result, stopSet);
    //        return result;
    //        }
    
        @Override
        protected TokenStreamComponents createComponents(String fieldName,
                Reader reader) {
            // TODO Auto-generated method stub
    //         final Tokenizer source = new ChineseTokenizer(reader);
    //          return new TokenStreamComponents(source, new ChineseFilter(source));
            Tokenizer source = null;
            if(fieldName.equals("name")){
                source = new SingleTokenizer(reader);
            }else if(fieldName.equals("totalcity")){
                source = new IKTokenizer(reader, false);
            }
            return new TokenStreamComponents(source, source);
        }
    
    }

      OK,在4.x中你需要重写createComponents方法。

      4、打分算法:

        3.x和4.x打分算法变化不大,但是命名空间发生了变化,汗~~~~~~~~~~~~

               3.x 命名空间:引入:import org.apache.lucene.search.DefaultSimilarity,命名空间在:org.apache.lucene.search

               4.x命名空间:引入:import org.apache.lucene.search.similarities.*,命名空间在:org.apache.lucene.search.similarities。

      5、查询表达式:主要体现在TermRangeQuery上,3.x版本的一个参数是string类型,但是在4.x版本变成了包了string一层的BytesRef,还有其他很多细节变化

        3.x TermRangerQuery: 

     1         String left = Long
     2                     .toString((long) (rcBound.m_dLeft * COORDINATE_SCALE_FACTOR));
     3             String right = Long
     4                     .toString((long) (rcBound.m_dRight * COORDINATE_SCALE_FACTOR));
     5             String top = Long
     6                     .toString((long) (rcBound.m_dTop * COORDINATE_SCALE_FACTOR));
     7             String bottom = Long
     8                     .toString((long) (rcBound.m_dBottom * COORDINATE_SCALE_FACTOR));
     9             
    10             
    11 
    12             TermRangeQuery query1 = new TermRangeQuery("lon", left, right,
    13                     true, true);
    14             TermRangeQuery query2 = new TermRangeQuery("lat", bottom, top,
    15                     true, true);
    16             searchQuery.add(query1, BooleanClause.Occur.MUST);
    17             searchQuery.add(query2, BooleanClause.Occur.MUST);

           4.x TermRangerQuery:  

    String left = Long
                        .toString((long) (rcBound.m_dLeft * COORDINATE_SCALE_FACTOR));
                String right = Long
                        .toString((long) (rcBound.m_dRight * COORDINATE_SCALE_FACTOR));
                String top = Long
                        .toString((long) (rcBound.m_dTop * COORDINATE_SCALE_FACTOR));
                String bottom = Long
                        .toString((long) (rcBound.m_dBottom * COORDINATE_SCALE_FACTOR));
                
                
                BytesRef brLeft = new BytesRef(left);
                BytesRef brRight = new BytesRef(right);
                BytesRef brBottom = new BytesRef(bottom);
                BytesRef brTop = new BytesRef(top);
    
                TermRangeQuery query1 = new TermRangeQuery("lon",
                        brLeft, brRight, true, true);
                TermRangeQuery query2 = new TermRangeQuery("lat",
                        brBottom, brTop, true, true);
                searchQuery.add(query1, BooleanClause.Occur.MUST);
                searchQuery.add(query2, BooleanClause.Occur.MUST);

      6、关闭IndexSearcher

        3.x 关闭IndexSearcher直接调用close方法即可:

     1 public void UnInit() {
     2         if (!m_bIsInit)
     3             return;
     4 
     5         Iterator iter = m_mapIndexName2Searcher.keySet().iterator();
     6 
     7         while (iter.hasNext()) {
     8 
     9             String key = (String) iter.next();
    10 
    11             MultiSearcher val = (MultiSearcher) m_mapIndexName2Searcher
    12                     .get(key);
    13 
    14             try {
    15     
    16                 val.close();//关闭IndexSearcher
    17             } catch (IOException e) {
    18                 e.printStackTrace();
    19                 SearchLog.SearchLog.error("分级索引关闭异常");
    20             }
    21         }
    22 
    23         m_mapIndexName2Searcher.clear();
    24         m_mapAdmin2IndexName.clear();
    25         m_mapIndexName2Searcher = null;
    26         m_mapAdmin2IndexName = null;
    27         m_bIsInit = false;
    28     }    

      4.x 关闭IndexSearcher 没有直接close的方法,需要getIndexReader 然后调用IndexReader的close方法:

     1 public void UnInit() {
     2         if (!m_bIsInit)
     3             return;
     4 
     5         Iterator iter = m_mapIndexName2Searcher.keySet().iterator();
     6 
     7         while (iter.hasNext()) {
     8 
     9             String key = (String) iter.next();
    10 
    11             IndexSearcher val = (IndexSearcher) m_mapIndexName2Searcher
    12                     .get(key);
    13 
    14             try {
    15                 val.getIndexReader().close();//关闭IndexSearcher
    16             } catch (IOException e) {
    17                 e.printStackTrace();
    18                 SearchLog.SearchLog.error("分级索引关闭异常");
    19             }
    20         }
    21 
    22         m_mapIndexName2Searcher.clear();
    23         m_mapAdmin2IndexName.clear();
    24         m_mapIndexName2Searcher = null;
    25         m_mapAdmin2IndexName = null;
    26         m_bIsInit = false;
    27     }

      总之,lucene版本变化很大,如果升级很多方法发送变化,您需要细致观察,多试试,才能升级。升级完成后,最好进行一次功能测试,有些功能可能发生变化甚至错误。升级Lucene不是一件好差事~~~~~~~~~

    文章转载请注明出处:http://www.cnblogs.com/likehua/p/4387700.html

        

      

  • 相关阅读:
    问题:oracle if;结果:Oracle IF语句的使用
    问题:PLS-00204: 函数或伪列 'EXISTS' 只能在 SQL 语句中使用;结果:PL/SQL中不能用exists函数?
    问题:oracle decode;结果:oracle中的decode的使用
    问题:只能在执行 Render() 的过程中调用 RegisterForEventValidation;结果:只能在执行 Render() 的过程中调用 RegisterForEventValidation
    问题:oracle long 与 clob;结果:long类型比clob到底差在什么地方?
    问题:oracle 字符串转换成日期;结果:[oracle] to_date() 与 to_char() 日期和字符串转换
    问题:oracle CLOB类型;结果:oracle中Blob和Clob类型的区别
    问题:C#根据生日计算属相;结果:C#实现根据年份计算生肖属相的方法
    po dto vo bo
    eclipse中自动加载源码的方法
  • 原文地址:https://www.cnblogs.com/likehua/p/4387700.html
Copyright © 2011-2022 走看看