zoukankan      html  css  js  c++  java
  • 搜索引擎 apachesolr

    SOLR

     

    1.Solr server setup

    Java environment setup

    Download linux JDK 6 from this website :

    http://java.sun.com/javase/downloads/index.jsp

    After installing JDK, edit /ect/profile , add these code to the end of the file

    JAVA_HOME=/usr/java/jdk1.6.0_16

    PATH=$JAVA_HOME/bin:$PATH

    CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

    export JAVA_HOME

    export PATH

    export CLASSPATH

     

    /usr/java/jdk1.6.0_16 is the folder of the jdk. You should change it ,if you don’t install jdk in this folder.

     Solr setup

    1.Download solr (apache-solr-1.3.0.zip ) from this website:

    http://ftp.kddilabs.jp/infosystems/apache/lucene/solr/

     

    2.Install solr with following steps

    #unzip -q apache-solr-1.3.0.zip
    #cd apache-solr-1.3.0/example/
    # java -jar start.jar
               we can see that the Solr is running by loading http://localhost:8983/solr/admin/ in web browser. This is the main starting point for Administering Solr.

    This is tutorial of solr http://lucene.apache.org/solr/tutorial.html.

    2.Search Apach solr with php.

    This is a tutorial of php solr client example:

    http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/

    We use PHP Solr Client to access to solr server . Download PHP Solr Client from this website: http://code.google.com/p/solr-php-client/downloads/list

     

    Change default Solr index data schema.

    Solr index data schema is in the folder of “apache-solr-1.3.0\example\solr\conf\ schema.xml”

    This is the snippet of solr schema.

    <schema name="example" version="1.1">
     ...
     <fields>
     <field name="id" type="string" indexed="true" stored="true" required="true" /> 
       <field name="sku" type="textTight" indexed="true" stored="true" omitNorms="true"/>
       <field name="name" type="text" indexed="true" stored="true"/>
     ...
     </fields>
     <uniqueKey>id</uniqueKey>
     ...
     <defaultSearchField>text</defaultSearchField>
     ...
    </schema>

    Edit the field element , change it as below:

    <field name="id" type="string" indexed="true" stored="true" required="true" />

     <field name="product_name" type="text" indexed="true" stored="true"/>

    <defaultSearchField>product_name</defaultSearchField>

    To make this change active ,we have to restart Solr server as command like this:

    #java -jar start.jar

     

    Create index by PHP

    using php solr client , we can access to Solr easily.This is an example fo how to create an index by php.

    <?php

    require_once 'Apache/Solr/Service.php';

    //10.60.0.111 is solr service ip.

    $solr=new Apache_Solr_Service('10.60.0.111','8983','/solr');

    if (!$solr->ping())

    {

                  echo("service not responding");

    }

    else

    {

                  echo("solr Service is available<br />");

    }

    $parts=array(

     '1'=>array(

     'id'=>'a123',

     'product_name'=>'garoontest'

     ),

     '2'=>array(

     'id'=>'a456',

     'product_name'=>'share360,test'

     )

     );

    $documents = array();

     foreach ( $parts as $item => $fields ) {

        $part = new Apache_Solr_Document();

        foreach ( $fields as $key => $value ) {

          if ( is_array( $value ) ) {

            foreach ( $value as $datum ) {

              $part->setMultiValue( $key, $datum );

            }

          }

          else {

            $part->$key = $value;

          }

        }

        $documents[] = $part;

     }

       try {

        $solr->addDocuments( $documents );

        $solr->commit();

        $solr->optimize();

     }

     catch ( Exception $e ) {

        echo $e->getMessage();

     }

    ?>

    l   Search index by PHP .

    This is an example of searching index by php

    <?php

    require_once 'Apache/Solr/Service.php';

    $solr=new Apache_Solr_Service('10.60.0.111','8983','/solr');

    if (!$solr->ping())

    {

                  echo("service not responding");

    }

    else

    {

                  echo("sucess");

    }

    $offset = 0;

    $limit = 10;

    $query="garoon";

    $response=$solr->search($query,$offset,$limit);

    if ($response->getHttpStatus()==200)

    {

     if ( $response->response->numFound > 0 ) {

            echo "$query <br />";

     

            foreach ( $response->response->docs as $doc )

            {

              echo "id: ".$doc->id."product_name ".$doc->product_name. "--";

              echo '<br />';

            }

            echo '<br />';

          }

    }

    else {

          echo $response->getHttpStatusMessage();

         }

         

    ?>

    l    delete index by PHP

    <?php

    require_once 'Apache/Solr/Service.php';

     

    //10.60.0.111 is solr service ip.

    $solr=new Apache_Solr_Service('10.60.0.111','8983','/solr');

    if (!$solr->ping())

    {

                  echo("service not responding <br />");

    }

    else

    {

                  echo("solr Service is available<br />");

    }

    $response=$solr->deleteById("a123");

    echo($response->getHttpStatusMessage());

    ?>

     update index by PHP

     If we want to update a document to index , there are two methods to resolve it :

         Method 1: delete the document by id, and then add an new one to index.

         Method 2: use the add method to directly add the document to index , because id is an indentify field, Solr server will use new document to cover the old one.

     如何使Solr支持中文,日文和英文的全文搜索呢。apache提供提供了一个 cjk库函数供我们使用,具体使用参考:http://chaifeng.com/blog/2008/01/_apache_solr.html

     默认情况下 Apache Solr 是不支持中文检索的,如果文档中包含中文,必须用完整的一句中文才能检索出内容。
    下面以 Apache Solr 的演示程序为例,注意:粗体部分是需要修改的地方。
    找到如下三行:
         <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
           <analyzer type="index">
             <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    修改为:
         <fieldType name="text" class="solr.TextField">
           <analyzer type="index" class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
             <tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>
    找到如下两行:
           <analyzer type="query">
             <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    修改为:
           <analyzer type="query" class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
             <tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>
    修改完毕,重新运行 Apache Solr 就可以对中文进行检索了,原先已经导入的文档需要重新导入。
    记住原先的配置中有个 positionIncrementGap="100" 一定要删除了,否则会有异常。

     注意:如果是php编程,一定要让程序代码的编码格式为utf-8编码形式,不然创建索引会失败。

  • 相关阅读:
    day 13 闭包函数,装饰器,迭代器
    day12 可变长参数、函数对象、函数的嵌套、名称空间和作用域
    day11 文件的高级应用、文件修改的两种方式、函数的定义、函数的三种定义方式、函数的调用、函数的返回值、函数的参数
    数字类型内置方法
    基本语法之for循环
    基本语法之while循环
    python基本语法(3)
    python基本语法(2)
    python基本语法(1)
    编程及计算机组成
  • 原文地址:https://www.cnblogs.com/likwo/p/1591322.html
Copyright © 2011-2022 走看看