zoukankan      html  css  js  c++  java
  • 将Mysq数据导入solr索引库

    本文的基础环境都是在centos 64bit,jdk1.7.79

    将mysql 的jar 包添加到/home/hadoop/cloudsolr/solr-4.10.4/contrib/dataimporthandler/lib 下

    修改对应的solrconfig.xml 文件我的core 是collection1,配置文件example/solr/collection1/conf/solrconfig.xml

    在配置文件中添加了:
      <lib dir="/home/hadoop/cloudsolr/solr-4.10.4/dist/" regex="solr-dataimporthandler-d.*.jar" />
       <lib dir="/home/hadoop/cloudsolr/solr-4.10.4/contrib/dataimporthandler/lib/" regex=".*.jar" />
    

    还是在solrconfig配置文件中

       <!-- the dataimport requestHandler --> 
           <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport. 
    DataImportHandler"> 
                   <lst name="defaults"> 
                  <str name="config">db-data-config.xml</str> 
                 </lst> 
           </requestHandler> 
    

    vim db-data-config.xml

    <?xml version="1.0" encoding="UTF-8" ?>
    <dataConfig>  
        <dataSource driver="com.mysql.jdbc.Driver"    
                        url="jdbc:mysql://ip:3306/database"     
                                        user="laiba"     
                                         password="laiba123"   
                                                                       <span style="color:#FF0000;"> batchSize="-1"</span>/><!-- 注意:mysql中一定要batchSize="-1" 否则会报异常-->
        <document>  
                <entity name="bns_article" pk="id"  
                                query="select id,title,author,cover,digest, content from bns_article"  
                                                deltaImportQuery="select id,title, author, cover,digest, content from bns_article where id='${dataimporter.delta.ID}'"  
                                                                deltaQuery="select id,title, author, cover, digest,content from bns_article where  to_char(updatetime,'yyyy-mm-dd hh24:mi:ss')> '${dataimporter.last_index_time}'">
                                                                            <field column="id" name="id"/>  
                                                                                        <field column="title" name="title"/>                  
                                                                                         <field column="author" name="author"/>                           
                                                                                         <field column="cover" name="cover"/>  
                                                                                         <field column="digest" name="digest"/>
                                                                                         <field column="content" name="content"/>
                                                                                          </entity>                    
                                          </document>                                                                        
                                           </dataConfig>    
    

    配置entity的时候要注意的是field 第一个字段是mysql的数据库字段,name 是solr配置文件里面的字段也是在页面显示

    第三:配置schema.xml文件 添加一下字段(也就是要生成索引的数据库字段) (根据上一篇IK分词的设置,也可以把字段设置成需要分词的)

    添加2个字段:

       <field name="cover" type="string" indexed="true" stored="true" multiValued="false"/>
       <field name="digest" type="string" indexed="true" stored="true" multiValued="false"/>
    

     重启服务后出现错误提示:

    HTTP ERROR 500
    
    Problem accessing /solr/. Reason:
    
        {msg=SolrCore 'collection1' is not available due to init failure: RequestHandler init failure,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: RequestHandler init failure
        at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:745)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
        at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
        at org.eclipse.jetty.server.Server.handle(Server.java:368)

    问题原因:

       <!-- the dataimport requestHandler --> 
           <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport. DataImportHandler"> 
                   <lst name="defaults"> 
                  <str name="config">db-data-config.xml</str> 
                 </lst> 
           </requestHandler> 
    将<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport. DataImportHandler"> 换行了。

    解决办法:
    将<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport. DataImportHandler"> 调整在一行即可

    打开集群

    导入数据

    查询

    参考地址:http://wiki.apache.org/solr/DIHQuickStart

    配置多张表导入solr

    配置文件 vim db-data-config.xml
      

        <document>
                <entity name="bns_article" pk="id"  
                       query="select id,title,author,cover,digest, content from bns_article"  
                       deltaImportQuery="select id,title, author, cover,digest, content from bns_article where id='${dataimporter.delta.ID}'"  
                       deltaQuery="select id,title, author, cover, digest,content from bns_article where  to_char(updatetime,'yyyy-mm-dd hh24:mi:ss')> '${dataimporter.last_index_time}'">
                        <field column="id" name="id"/>
                        <field column="title" name="title"/>
                        <field column="author" name="author"/>
                        <field column="cover" name="cover"/>
                        <field column="digest" name="digest"/>
                        <field column="content" name="content"/>
                   </entity>
    
      <entity name="bns_word" pk="id"  
                         query="select id, content, avgfreel, state, sentencenum, articlenum,updatetime, createtime  from bns_word"  
                         deltaImportQuery="select id, content, avgfreel, state, sentencenum, articlenum,updatetime, createtime  from bns_word where id='${dataimporter.delta.ID}'"                     
                         deltaQuery="select id, content, avgfreel, state, sentencenum, articlenum,updatetime, createtime  from bns_word where  to_char(updatetime,'yyyy-mm-dd hh24:mi:ss')> '${dataimporter.last_index_time}'">
                          <field column="id" name="id"/>
                         <field column="content" name="content"/>
                         <field column="avgfreel" name="avgfreel"/>
                          <field column="state" name="state"/>
                          <field column="sentencenum" name="sentencenum"/>
                          <field column="articlenum" name="articlenum"/>
                          <field column="updatetime" name="updatetime"/>
                          <field column="createtime" name="createtime"/>
                          </entity>
         </document>
    

      配置schema.xml文件

    添加字段:

    <field name="avgfeel" type="string" indexed="true" stored="true" multiValued="false"/>
       <field name="state" type="string" indexed="true" stored="true" multiValued="false"/>
       <field name="sentencenum" type="string" indexed="true" stored="true" multiValued="false"/>
       <field name="articlenum" type="string" indexed="true" stored="true" multiValued="false"/>
       <field name="updatetime" type="string" indexed="true" stored="true" multiValued="false"/>
       <field name="createtime" type="string" indexed="true" stored="true" multiValued="false"/>
    

    新添加mysql字段:

             <entity name="bns_sentence" pk="id"
                    query ="select id, uid, createname, createheadimg, wid, word, content, articlenum, state, feel, forwardnum, supportnum, updatetime, createtime from bns_sentence"
                    deltaImportQuery ="select id, uid, createname, createheadimg, wid, word, content, articlenum, state, feel, forwardnum, supportnum, updatetime, createtime from bns_sentence where id='${dataimporter.delta.ID}'"
                    deltaQuery ="select id, uid, createname, createheadimg, wid, word, content, articlenum, state, feel, forwardnum, supportnum, updatetime, createtime from bns_sentence">
                    <field column="id" name="id"/>
                    <field column="uid" name="uid"/>
            <field column="createname" name="createname"/>
            <field column="createheadimg" name="createheadimg"/>
            <field column="wid" name="wid"/>
            <field column="word" name="word"/>
            <field column="content" name="content"/>
            <field column="articlenum" name="articlenum"/>
            <field column="state" name="state"/>
            <field column="feel" name="feel"/>
            <field column="forwardnum" name="forwardnum"/>
            <field column="supportnum" name="supportnum"/>
            <field column="updatetime" name="updatetime"/>
            <field column="createtime" name="createtime"/>
            </entity>
    
            <entity name ="bns_user" pk="id"
                    query= "select id, username, password, money, nickname, headimg, sex, articlenum, sentencenum, wordnum, createtime from bns_user"
                    deltaImportQuery= "select id, username, password, money, nickname, headimg, sex, articlenum, sentencenum, wordnum, createtime from bns_user where id='${dataimporter.delta.ID}'"
                    deltaQuery ="select id, username, password, money, nickname, headimg, sex, articlenum, sentencenum, wordnum, createtime from bns_user">
                    <field column="id" name="id"/>
                    <field column="username" name="username"/>
                    <field column="password" name="password"/>
                    <field column="money" name="money"/>
                    <field column="nickname" name="nickname"/>
                    <field column="headimg" name="headimg"/>
                    <field column="sex" name="sex"/>
                    <field column="articlenum" name="articlenum"/>
                    <field column="sentencenum" name="sentencenum"/>
                    <field column="wordnum" name="wordnum"/>
                    <field column="createtime" name="createtime"/>
            </entity>
    

      配置schema.xml文件

    添加字段:

     <field name="uid" type="string" indexed="true" stored="true" multiValued="false"/>
       <field name="word" type="string" indexed="true" stored="true" multiValued="false"/>
       <field name="feel" type="string" indexed="true" stored="true" multiValued="false"/>
       <field name="forwardnum" type="string" indexed="true" stored="true" multiValued="false"/>
       <field name="supportnum" type="string" indexed="true" stored="true" multiValued="false"/>
       
      <field name="username" type="string" indexed="true" stored="true" multiValued="false"/>
      <field name="password" type="string" indexed="true" stored="true" multiValued="false"/>
      <field name="money" type="string" indexed="true" stored="true" multiValued="false"/>
      <field name="nickname" type="string" indexed="true" stored="true" multiValued="false"/>
      <field name="heading" type="string" indexed="true" stored="true" multiValued="false"/>
      <field name="sex" type="string" indexed="true" stored="true" multiValued="false"/>
      <field name="wordnum" type="string" indexed="true" stored="true" multiValued="false"/>
      <field name="nickname" type="string" indexed="true" stored="true" multiValued="false"/>
    

    出现问题:就导入几条数据的时候,indexing 很慢

  • 相关阅读:
    Sql Server Report 导出到EXCEL 指定行高
    SQLSqlserver中如何将一列数据,不重复的拼接成一个字符串
    SQL Server Management Studio无法记住密码
    nuget.org无法解析的办法
    js获取select标签选中的值及文本
    Word 如何实现表格快速一分为二
    sql server rdl report 如何用动态sql
    浏览器窗口刷新
    SWFUpload 在ie9上出现的bug
    历数PC发展史上的祖先们
  • 原文地址:https://www.cnblogs.com/zhanggl/p/4726901.html
Copyright © 2011-2022 走看看