zoukankan      html  css  js  c++  java
  • 搜索引擎Zend_lucene

    Zend Lucene

     

    1.General

    Zend_Search_Lucene is a general purpose text search engine written entirely in PHP 5. it stores its index on the filesystem and does not require a database server.

    2. How to install Zend Lucene

    DownLoad WebSite :     http://www.zend.com/community/downloads

    Zend Framework version :   Zend Framework 1.9 minimal

    Download Zend Framework 1.9 minimal from DownLoad WebSite.

    Remove everything from Zend Folder but remain following files and directories:

    Exception.php

    Loader/

    Loader.php

    Search/

     

    3.How to create an index.

    an example of creating an index as below:

     <?php

    //File Name: createindex.php

    require_once 'Zend/Search/Lucene.php';

    $productsData= array(

    0=>array("PID"=>1,"url"=>"http://www.cybozu.jp","productName"=>"garoon","Description"=>"garoon Description","lag"=>"en"),

    1=>array("PID"=>2,"url"=>"http://www.cybozu.jp","productName"=>"share360","Description"=>"share360 Description" ,"lag"=>"en"),

    2=>array("PID"=>3,"url"=>"http://www.cybozu.jp a","productName"=>"日本語の製品名前","Description"=>"日本語の製品","lag"=>"jp"),

    3=>array("PID"=>4,"url"=>"http://www.cybozu.jp a","productName"=>"中文产品名","Description"=>"中文产品描述","lag"=>"zh")

    );

    $index=new Zend_Search_Lucene('index',true);

    $doc = new Zend_Search_Lucene_Document();

    foreach ($productsData as $productData)

    {

         $doc->addField(Zend_Search_Lucene_Field::keyword('PID', $productData['PID'], 'UTF-8'));

         $doc->addField(Zend_Search_Lucene_Field::Text('url', $productData['url'], 'UTF-8'));

          $doc->addField(Zend_Search_Lucene_Field::Text('productName', $productData['productName'], 'UTF-8'));

          $doc->addField(Zend_Search_Lucene_Field::Text('Description', $productData['Description'], 'UTF-8'));

         $doc->addField(Zend_Search_Lucene_Field::unIndexed('lan', $productData['lan'], 'UTF-8'));  

     $index->addDocument($doc);

         $index->commit();

        $index->optimize(); 

    }

    echo 'index has been created!';

    In KB project, index data is come from database, using method above , We can index all the text from database.

     

    4.Searching index

    After creating an index , We can search index as below:

    <?php

     //File Name: search.php

     require_once('Zend/Search/Lucene.php');

     $index = new Zend_Search_Lucene('index');

    $keywords='garoon';

     echo "Index contains {$index->count()} documents.\n";

     $query = Zend_Search_Lucene_Search_QueryParser::parse( $keywords, 'utf-8' );

     $hits = $index->find($query);

     foreach ($hits as $hit)

              {

                 echo 'PID: '.$hit->PID.'<br>';

                 echo 'Score: '.$hit->score.'<br>';

                 echo 'url: '.$hit->url.'<br>';

                 echo 'productName: '.$hit->productName.'<br>';

                 echo 'lan: '.$hit->lan.'<br>';

            }

    If we want to search the text for multiple language, We can get value of lan , and then display different results by lan.

     

    5.delete and update index.

    If we want to update index , first we must find the document in index by keyword, then delete it ,after deleting the old document ,We can add a new document. This is an example to update an index. We delete PID :1 product,and update the description.

    <?php

     require_once('Zend/Search/Lucene.php');

        $index = new Zend_Search_Lucene('index');

     //new product data to update

     $productNewData =array("PID"=>1,"url"=>"http://www.cybozu.jp","productName"=>"garoon","Description"=>"update garoon Description","lan"=>"en");

     $keywords="PID:1";

     $hits = $index->find($keywords);

     //Delete PID:1

       foreach ($hits as $hit)

             {

                 echo 'PID: '.$hit->PID .'has been deleted <br>';

                 $index->delete($hit->id);

            }

            $index->commit();

     //add new product data to index   

     $doc = new Zend_Search_Lucene_Document();

     $doc->addField(Zend_Search_Lucene_Field::keyword('PID', $productNewData['PID'], 'UTF-8'));

     $doc->addField(Zend_Search_Lucene_Field::Text('url', $productNewData['url'], 'UTF-8'));

     $doc->addField(Zend_Search_Lucene_Field::Text('productName', $productNewData['productName'], 'UTF-8'));

     $doc->addField(Zend_Search_Lucene_Field::Text('Description', $productNewData['Description'], 'UTF-8'));

     $doc->addField(Zend_Search_Lucene_Field::unIndexed('lan', $productNewData['lan'], 'UTF-8'));

     $index->addDocument($doc);

     $index->commit();

     $index->optimize(); 

     

    6.How to search japanese or chinese text by lucene.

    As default , lucene can only search English text.But in this project , we must search the text by English, Japanese and Chinese. So we have to change default analyzer of Lucene.

    This is an extend of default analyzer of Lucene as below:

    <?php

    // File Name:chinese.php

    require_once 'Zend/Search/Lucene/Analysis/Analyzer.php';

    require_once 'Zend/Search/Lucene/Analysis/Analyzer/Common.php';

     

    class CN_Lucene_Analyzer extends Zend_Search_Lucene_Analysis_Analyzer_Common

    {

        private $_position;

        private $_cnStopWords = array( );

        

        public function setCnStopWords( $cnStopWords )

        {

            $this->_cnStopWords = $cnStopWords;

        }

     

        /**

        * Reset token stream

        */

        public function reset()

        {

            $this->_position = 0;

            $search = array(",", "/", "\\", ".", ";", ":", "\"", "!", "~", "`", "^", "(", ")", "?", "-", "'", "<", ">", "$", "&", "%", "#", "@", "+", "=", "{", "}", "[", "]", "", "", "", "", "", "", "", "", "“", "”", "‘", "’", "", "", "", "—", " ", "", "", "", "…", "", "", "", "" );

        

            $this->_input = str_replace( $search, '', $this->_input );

            $this->_input = str_replace( $this->_cnStopWords, ' ', $this->_input );

        }

     

        /**

        * Tokenization stream API

        * Get next token

        * Returns null at the end of stream

        *

        * @return Zend_Search_Lucene_Analysis_Token|null

        */

        public function nextToken()

        {

            if ($this->_input === null)

            {

                return null;

            }

            $len = strlen($this->_input);

            //print "Old string".$this->_input."<br />";

            while ($this->_position < $len)

            {

                // Delete space at the begining

                while ($this->_position < $len &&$this->_input[$this->_position]==' ' )

                {

                    $this->_position++;

                }

                $termStartPosition = $this->_position;

                $temp_char = $this->_input[$this->_position];

                $isCnWord = false;

                if(ord($temp_char)>127)

                {

                    $i = 0;      

                    while( $this->_position < $len && ord( $this->_input[$this->_position] )>127 )

                    {

                        $this->_position = $this->_position + 3;

                        $i ++;

                        if($i==2)

                        {

                            $isCnWord = true;

                            break;

                        }

                    }

     

                    if($i==1) continue;

                }

                else

                {

                    while ($this->_position < $len && ctype_alnum( $this->_input[$this->_position] ))

                    {

                        $this->_position++;

                    }

                    //echo $this->_position.":".$this->_input[$this->_position-1]."\n";

                }

                if ($this->_position == $termStartPosition)

                {

                    $this->_position++;

                    continue;

                }

        

                $tmp_str = substr($this->_input, $termStartPosition, $this->_position - $termStartPosition);

                

                $token = new Zend_Search_Lucene_Analysis_Token( $tmp_str, $termStartPosition,$this->_position );

                

                $token = $this->normalize($token);

     

                if($isCnWord)

                {

                    $this->_position = $this->_position - 3;

                }

     

                if ($token !== null)

                {

                    return $token;

                }

            }

            

            return null;

        }

     

    With the help of chinese.php we can search Japanese and Chinese in kb. And also we must add codes as below before creating an index and searching.

     

    require_once 'chinese.php';

    Zend_Search_Lucene_Analysis_Analyzer::setDefault(new CN_Lucene_Analyzer());

     

    7.Is Zend Lucene need downtime?

      By using Zend Lucene , we don’t need any downtime. When add a new article we can add it to index at the same time, If we edit an article, we need to delete old document and update index with new one .

     

     

     

  • 相关阅读:
    一个简单的knockout.js 和easyui的绑定
    knockoutjs + easyui.treegrid 可编辑的自定义绑定插件
    Knockout自定义绑定my97datepicker
    去除小数后多余的0
    Windows Azure Web Site (15) 取消Azure Web Site默认的IIS ARR
    Azure ARM (1) UI初探
    Azure Redis Cache (3) 创建和使用P级别的Redis Cache
    Windows Azure HandBook (7) 基于Azure Web App的企业官网改造
    Windows Azure Storage (23) 计算Azure VHD实际使用容量
    Windows Azure Virtual Network (11) 创建VNet-to-VNet的连接
  • 原文地址:https://www.cnblogs.com/likwo/p/1591319.html
Copyright © 2011-2022 走看看