zoukankan      html  css  js  c++  java
  • 用solr DIH 实现mysql 数据定时,增量同步到solr

    基础环境:

    (二)设置增量导入为定时执行的任务:
    很多人利用Windows计划任务,或者Linux的Cron来定期访问增量导入的连接来完成定时增量导入的功能,这其实也是可以的,而且应该没什么问题。
    但是更方便,更加与Solr本身集成度高的是利用其自身的定时增量导入功能。
    1、下载apache-solr-dataimportscheduler-1.0.jar放到Tomcat的webapps的solr目录的WEB-INF的lib目录下:

    下载地址:http://yunpan.cn/cdIpMthFdFcgn (提取码:5a1c)

    由于我采用的jetty+zk配置

    我将apache-solr-dataimportscheduler-1.0.jar 放在solr-4.10.4/example/solr-webapp/webapp/WEB-INF/lib目录下

    2、部分配置文件: db-data-config.xml

    文件目录位置:/solr-4.10.4/example/solr/collection1/conf

             <entity name="bns_sentence" pk="id"
                    query ="select id, uid, createname, createheadimg, wid, word, content, articlenum, state, feel, forwardnum, supportnum, updatetime, createtime from bns_sentence"
                    deltaImportQuery ="select id, uid, createname, createheadimg, wid, word, content, articlenum, state, feel, forwardnum, supportnum, updatetime, createtime from bns_sentence where id='${dataimporter.delta.ID}'"
                    deltaQuery ="select id, uid, createname, createheadimg, wid, word, content, articlenum, state, feel, forwardnum, supportnum, updatetime, createtime from bns_sentence where  updatetime  '${dataimporter.last_index_time}'">
                    <field column="id" name="id"/>
                    <field column="uid" name="uid"/>
            <field column="createname" name="createname"/>
            <field column="createheadimg" name="createheadimg"/>
            <field column="wid" name="wid"/>
            <field column="word" name="word"/>
            <field column="content" name="content"/>
            <field column="articlenum" name="articlenum"/>
            <field column="state" name="state"/>
            <field column="feel" name="feel"/>
            <field column="forwardnum" name="forwardnum"/>
            <field column="supportnum" name="supportnum"/>
            <field column="updatetime" name="updatetime"/>
            <field column="createtime" name="createtime"/>

    3、配置文件头尾

    <?xml version="1.0" encoding="UTF-8" ?>
    <dataConfig>
        <dataSource driver="com.mysql.jdbc.Driver"    
                        url="jdbc:mysql://ip:3306/database"     
                                        user="username"     
                                         password="password"   />
                                                                       <span style="color:#FF0000;"> batchSize="-1"</span>/><!-- 注意:mysql中一定要batchSize="-1" 否则会报异常-->
        <document>
            

           <entity name="tablename" pk="id"  
                    
                          </entity>
        
            </document>
           <!--deltaQuery="select id, content, avgfeel, state, sentencenum, articlenum,updatetime, createtime  from bns_word where  to_char(updatetime,'yyyy-mm-dd hh24:mi:ss')> '${dataimporter.last_index_time}'"-->

     </dataConfig>

    4、修改配置文件dataimport.properties

    我是放在/solr-4.10.4/example/solr/conf 目录下

    配置文件如下

    #################################################
    #                                               #
    #       dataimport scheduler properties         #
    #                                               #
    #################################################
    
    #  to sync or not to sync
    #  1 - active; anything else - inactive
    syncEnabled=1
    
    #  which cores to schedule
    #  in a multi-core environment you can decide which cores you want syncronized
    #  leave empty or comment it out if using single-core deployment
    syncCores=game,resource
    
    #  solr server name or IP address
    #  [defaults to localhost if empty]
    server=ip
    
    #  solr server port
    #  [defaults to 80 if empty]
    port=8983
    
    #  application name/context
    #  [defaults to current ServletContextListener's context (app) name]
    webapp=solr
    
    #  URL params [mandatory]
    #  remainder of URL
    params=/dataimport?command=delta-import&clean=true&commit=true  
    
    #  schedule interval
    #  number of minutes between two runs
    #  [defaults to 30 if empty]
    interval=1
    
    #  重做索引的时间间隔,单位分钟,默认7200,即1天; 
    #  为空,为0,或者注释掉:表示永不重做索引
    reBuildIndexInterval=7200
    
    #  重做索引的参数
    reBuildIndexParams=/dataimport?command=full-import&clean=true&commit=true 
    
    #  重做索引时间间隔的计时开始时间,第一次真正执行的时间=reBuildIndexBeginTime+reBuildIndexInterval*60*1000;
    #  两种格式:2012-04-11 03:10:00 或者  03:10:00,后一种会自动补全日期部分为服务启动时的日期
    reBuildIndexBeginTime=03:10:00

    5、第一次启动会出现:

     sorry, no dataimport-handler defined!
    

     解决办法

    找到配置文件example/solr/collection1/conf 下的solrconfig.xml添加

     <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
                          <lst name="defaults">
                           <str name="config">db-data-config.xml</str>
                            </lst>
                 </requestHandler>
    

    6、启动后报错信息:

    - 2015-08-19 23:31:13.591; org.apache.solr.handler.dataimport.scheduler.BaseTimerTask; [game] <index update process> Response message                     Not Found
    INFO  - 2015-08-19 23:31:13.592; org.apache.solr.handler.dataimport.scheduler.BaseTimerTask; [game] <index update process> Response code                        404
    INFO  - 2015-08-19 23:31:13.592; org.apache.solr.core.SolrResourceLoader; JNDI not configured for solr (NoInitialContextEx)
    INFO  - 2015-08-19 23:31:13.593; org.apache.solr.core.SolrResourceLoader; solr home defaulted to 'solr/' (could not find system property or JNDI)
    INFO  - 2015-08-19 23:31:13.593; org.apache.solr.core.SolrResourceLoader; new SolrResourceLoader for deduced Solr Home: 'solr/'
    INFO  - 2015-08-19 23:31:13.609; org.apache.solr.handler.dataimport.scheduler.SolrDataImportProperties; Instance dir = solr/

    错误原因:

     改成启动方式:

     java -Dsolr.solr.home=/home/hadoop/cloudsolr/solr-4.10.4/example -DzkHost=192.168.0.157:2181,192.168.0.158:2181,192.168.0.159:2181 -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar
     7、错误信息如下:

    1045 [main] ERROR org.apache.solr.handler.dataimport.scheduler.SolrDataImportProperties  – Error locating DataImportScheduler dataimport.properties file
    java.io.FileNotFoundException: /home/hadoop/cloudsolr/solr-4.10.4/example/conf/dataimport.properties (No such file or directory)

    将配置文件dataimport.properties移动对应的目录

    8、错误信息:

    ter  – Could not start Solr. Check solr/home property and the logs
    1146 [main] ERROR org.apache.solr.core.SolrCore  – null:org.apache.solr.common.SolrException: solr.xml does not exist in /home/hadoop/cloudsolr/solr-4.10.4/example/solr.xml cannot start Solr
        at org.apache.solr.core.ConfigSolr.fromFile(ConfigSolr.java:62)

    将对应的solr.xml 复制到对应的目录即可

    9、错误信息:

    in] ERROR org.apache.solr.servlet.SolrDispatchFilter  – Could not start Solr. Check solr/home property and the logs
    3230 [main] ERROR org.apache.solr.core.SolrCore  – null:org.apache.solr.common.SolrException: Found multiple cores with the name [collection1], with instancedirs [/home/hadoop/cloudsolr/solr-4.10.4/example/example-schemaless/solr/collection1/] and [/home/hadoop/cloudsolr/solr-4.10.4/example/solr/collection1/]
    解决办法:example-schemaless/solr/collection1 将例子的core重新命名为其他的名字,并且在core.properties 也修改即可
    10、在执行的时候另一个错误:
    dding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-http-8.1.10.v20130312.jar' to classloader
    481115 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.SolrDataImportProperties  – Instance dir = /home/hadoop/cloudsolr/solr-4.10.4/example/
    481116 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [resource] <index update process> Disconnected from server        ip
    481117 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [resource] <index update process> Process ended at ................ 20.08.2015 01:37:00 595
    541047 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [game] <index update process> Process started at .............. 20.08.2015 01:38:00 525
    541049 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [game] <index update process> Full URL                http://ip:8983/solr/game/dataimport?command=delta-import&clean=true&commit=true
    541057 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [game] <index update process> Response message            Not Found
    541058 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [game] <index update process> Response code            404
    541058 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – JNDI not configured for solr (NoInitialContextEx)
    541059 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – using system property solr.solr.home: /home/hadoop/cloudsolr/solr-4.10.4/example
    541059 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – new SolrResourceLoader for deduced Solr Home: '/home/hadoop/cloudsolr/solr-4.10.4/example/'
    541061 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-deploy-8.1.10.v20130312.jar' to classloader
    541061 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-xml-8.1.10.v20130312.jar' to classloader
    541062 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-servlet-8.1.10.v20130312.jar' to classloader
    541062 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-io-8.1.10.v20130312.jar' to classloader
    541063 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-util-8.1.10.v20130312.jar' to classloader
    541063 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-security-8.1.10.v20130312.jar' to classloader
    541064 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-server-8.1.10.v20130312.jar' to classloader
    541065 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-continuation-8.1.10.v20130312.jar' to classloader
    541065 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/ext/' to classloader
    541066 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-webapp-8.1.10.v20130312.jar' to classloader
    541067 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/servlet-api-3.0.jar' to classloader
    541067 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-jmx-8.1.10.v20130312.jar' to classloader
    541068 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-http-8.1.10.v20130312.jar' to classloader
    541085 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.SolrDataImportProperties  – Instance dir = /home/hadoop/cloudsolr/solr-4.10.4/example/
    541085 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [game] <index update process> Disconnected from server        ip
    541086 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [game] <index update process> Process ended at ................ 20.08.2015 01:38:00 564
    541086 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [resource] <index update process> Process started at .............. 20.08.2015 01:38:00 564
    541087 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [resource] <index update process> Full URL                http://ip:8983/solr/resource/dataimport?command=delta-import&clean=true&commit=true
    541091 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [resource] <index update process> Response message            Not Found
    541091 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [resource] <index update process> Response code            404
    541091 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – JNDI not configured for solr (NoInitialContextEx)
    541091 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – using system property solr.solr.home: /home/hadoop/cloudsolr/solr-4.10.4/example
    541091 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – new SolrResourceLoader for deduced Solr Home: '/home/hadoop/cloudsolr/solr-4.10.4/example/'
    541092 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-deploy-8.1.10.v20130312.jar' to classloader
    541092 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-xml-8.1.10.v20130312.jar' to classloader
    541092 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-servlet-8.1.10.v20130312.jar' to classloader
    541092 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-io-8.1.10.v20130312.jar' to classloader
    541093 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-util-8.1.10.v20130312.jar' to classloader
    541093 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-security-8.1.10.v20130312.jar' to classloader
    541093 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-server-8.1.10.v20130312.jar' to classloader
    541093 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-continuation-8.1.10.v20130312.jar' to classloader
    541094 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/ext/' to classloader
    541094 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-webapp-8.1.10.v20130312.jar' to classloader
    541094 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/servlet-api-3.0.jar' to classloader
    541094 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-jmx-8.1.10.v20130312.jar' to classloader
    541094 [Timer-0] INFO  org.apache.solr.core.SolrResourceLoader  – Adding 'file:/home/hadoop/cloudsolr/solr-4.10.4/example/lib/jetty-http-8.1.10.v20130312.jar' to classloader
    541106 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.SolrDataImportProperties  – Instance dir = /home/hadoop/cloudsolr/solr-4.10.4/example/
    541106 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [resource] <index update process> Disconnected from server       ip
    541111 [Timer-0] INFO  org.apache.solr.handler.dataimport.scheduler.BaseTimerTask  – [resource] <index update process> Process ended at ................ 20.08.2015 01:38:00 589

    问题原因:

    solr版本不支持

    解决办法:

     jar包换1.1版本。

     

     错误原因:

    deltaQuery="select id, content, avgfeel, state, sentencenum, articlenum,updatetime, createtime  from bns_word  where  updatetime  &gt;=  '${dataimporter.last_index_time}'">
    

     在xml 中定义大于号小于号:

    原符号   <    <=    >    >=     &      '       "
    替换符号 &lt; &lt;= &gt; &gt;= &amp; &apos; &quot;

    11、导入数据后出现控制台有出现导入数据成功,但是solr查询不到数据

    错误原因:

    db-data-config.xml
    配置文件中
             <entity name="bns_sentence" pk="id"
                    query ="select id, uid, createname, createheadimg, wid, word, content, articlenum, state, feel, forwardnum, supportnum, updatetime, createtime from bns_sentence"
                    deltaImportQuery ="select id, uid, createname, createheadimg, wid, word, content, articlenum, state, feel, forwardnum, supportnum, updatetime, createtime from bns_sentence where id='${dataimporter.delta.id}'"
     
    dataimporter.delta.id 需要改为小写的id
    

    12 、配置完启动出错:

    48 [coreLoadExecutor-5-thread-1] ERROR org.apache.solr.core.CoreContainer  ? Error creating core [collection1]: RequestHandler init failure
    org.apache.solr.common.SolrException: RequestHandler init failure
    	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:881)
    	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:654)
    	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
    	at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255)
    	at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    	at java.lang.Thread.run(Thread.java:745)
    Caused by: org.apache.solr.common.SolrException: RequestHandler init failure
    	at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:172)
    	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:800)
    	... 8 more
    Caused by: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'
    	at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:490)
    	at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:421)
    	at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:551)
    	at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:624)
    	at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:158)
    	... 9 more
    Caused by: java.lang.ClassNotFoundException: org.apache.solr.handler.dataimport.DataImportHandler
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    	at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    	at java.lang.Class.forName0(Native Method)
    	at java.lang.Class.forName(Class.java:274)
    	at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:474)
    	... 13 more
    

     错误原因:

    解决办法:

    软件包下载地址:http://yunpan.cn/cHTNPkchYSCrX (提取码:e5ee)

    将solr-4.10.4/dist下的
    solr-dataimporthandler-4.10.4.jar
    solr-dataimporthandler-extras-4.10.4.jar
    考到solr web的lib目录下,然后重启即可

    [root@devnote ~]# cp solr-4.5.1/dist/solr-dataimporthandler-*.jar /opt/tomcat/webapps/solr/WEB-INF/lib/

    13 、 solr 清空所有数据:

    http://ip:port/solr/corename/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&stream.contentType=text/xml;charset=utf-8&commit=true

    参考地址:http://josh-persistence.iteye.com/blog/2017155

    14、如果是solr和tomcat 集成,参考http://www.aboutyun.com/thread-10496-1-1.html, 这步是必须的

    、修改solr的WEB-INF目录下面的web.xml文件:
    为<web-app>元素添加一个子元素
    
        <listener>   
            <listener-class>   
                    org.apache.solr.handler.dataimport.scheduler.ApplicationListener   
            </listener-class>   
          </listener>   
    

     

     15、如果出现:Unsupported Media Type 错误提示,数据增量导入失败

    错误原因: 我部署的是在tomcat 下 的solr /WEB-INF/lib 下将apache-solr-dataimportscheduler-1.0.jar 包删除

    解决办法: 将/WEB-INF/lib 下将apache-solr-dataimportscheduler-1.0.jar 删除, 替换上solr-dataimportscheduler-1.1.jar
    软件包下载地址:http://yunpan.cn/cHTNPkchYSCrX (提取码:e5ee)

  • 相关阅读:
    $python日期和时间的处理
    $python生成器
    $思维导图——numpy基本知识
    $python用装饰器实现一个计时器
    $ MySQL-python数据库模块用法
    CentOS 7.2下编译安装PHP7.0.10+MySQL5.7.14+Nginx1.10.1
    Nginx、Apache工作原理及Nginx为何比Apache高效
    Apache的三种工作模式及相关配置
    ThinkPHP框架
    session与cookie的区别是什么?如果客户端禁用了cookie功能,将会对session有什么影响?
  • 原文地址:https://www.cnblogs.com/zhanggl/p/4744199.html
Copyright © 2011-2022 走看看