zoukankan      html  css  js  c++  java
  • sphinx 快速使用

    1. 建立配置文件 例可以参照之前的模板新建一个配置文件 sphinx/etc目录
      #MySQL数据源配置,详情请查看:http://www.coreseek.cn/products-install/mysql/
      #请先将var/test/documents.sql导入数据库,并配置好以下的MySQL用户密码数据库
      
      #源定义
      source mysql
      {
          type                    = mysql
      
          sql_host                = localhost
          sql_user                = root
          sql_pass                = 
          sql_db                    = test
          sql_port                = 3306
          sql_query_pre            = SET NAMES utf8
      
          sql_query                = SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content FROM documents
                                                                    #sql_query第一列id需为整数
                                                                    #title、content作为字符串/文本字段,被全文索引
          sql_attr_uint            = group_id             #从SQL读取到的值必须为整数
          sql_attr_timestamp        = date_added          #从SQL读取到的值必须为整数,作为时间属性
      
          sql_query_info_pre      = SET NAMES utf8                                        #命令行查询时,设置正确的字符集
          sql_query_info            = SELECT * FROM documents WHERE id=$id                #命令行查询时,从数据库读取原始数据信息
      }
      
      #index定义
      index mysql
      {
          source            = mysql                   #对应的source名称
          path            = C:wampappssphinxvar   #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/...
          docinfo            = extern
          mlock            = 0
          morphology        = none
          min_word_len        = 1
          html_strip                = 0
      
          #中文分词配置,详情请查看:http://www.coreseek.cn/products-install/coreseek_mmseg/
          #charset_dictpath = /usr/local/mmseg3/etc/ #BSD、Linux环境下设置,/符号结尾
          charset_dictpath = C:wampappssphinxetc                        #Windows环境下设置,/符号结尾,最好给出绝对路径,例如:C:/usr/local/coreseek/etc/...
          charset_type        = zh_cn.utf-8
      }
      
      #全局index定义
      indexer
      {
          mem_limit            = 128M
      }
      
      #searchd服务定义
      searchd
      {
          listen                  =   9312
          read_timeout        = 5
          max_children        = 30
          max_matches            = 1000
          seamless_rotate        = 0
          preopen_indexes        = 0
          unlink_old            = 1
          pid_file = C:wampappssphinxvarlogsearchd_mysql.pid    #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/...
          log = C:wampappssphinxvarlogsearchd_mysql.log         #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/...
          query_log = C:wampappssphinxvarlogquery_mysql.log     #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/...
          binlog_path =                                               #关闭binlog日志
      }
    2. 把 searchd 服务安装成一个Windows服务:

      c:wampappssphinxin>searchd --install --config c:wampappssphinxetcsphinx_mysql.conf

       

        Coreseek Fulltext 4.0 [ Sphinx 1.11-dev (r2540)]
        Copyright (c) 2007-2011,
        Beijing Choice Software Technologies Inc (http://www.coreseek.com)

        Installing service...
        Service 'searchd' installed succesfully.

        这样 searchd 服务应该出现在“控制面板->系统管理->服务”的列表中了,但还没有被启动,因为在启动它之前,我们还需要做些配置并用indexer 建立索引 . 这些可以参考 快速入门教程.

      3.建立索引

      c:wampappssphinxin>indexer --all --config C:wampappssphinxetcsphinx.conf

      

      indexing index 'mysql'...
      collected 3 docs, 0.0 MB
      sorted 0.0 Mhits, 100.0% done
      total 3 docs, 7545 bytes
      total 0.026 sec, 287198 bytes/sec, 114.19 docs/sec
      total 2 reads, 0.000 sec, 4.2 kb/call avg, 0.0 msec/call avg
      total 9 writes, 0.000 sec, 2.2 kb/call avg, 0.0 msec/call avg

      indexer 直接打入命令,可以查看,帮助选项

    --config <file>         read configuration from specified file
                            (default is csft.conf)
    --all                   reindex all configured indexes
    --quiet                 be quiet, only print errors
    --verbose               verbose indexing issues report
    --noprogress            do not display progress
                            (automatically on if output is not to a tty)
    --buildstops <output.txt> <N>
                            build top N stopwords and write them to given file
    --buildfreqs            store words frequencies to output.txt
                            (used with --buildstops only)
    --merge <dst-index> <src-index>
                            merge 'src-index' into 'dst-index'
                            'dst-index' will receive merge result
                            'src-index' will not be modified
    --merge-dst-range <attr> <min> <max>
                            filter 'dst-index' on merge, keep only those documents
                            where 'attr' is between 'min' and 'max' (inclusive)
    --merge-klists
    --merge-killlists       merge src and dst kill-lists (default is to
                            apply src kill-list to dst index)
    --dump-rows <FILE>      dump indexed rows into FILE
    
    Examples:

      4.进行索引

      c:wampappssphinxin>search.exe -c c:wampappssphinxetcsphinx.conf twitter

      指定配置文件,搜索twitter

      5.集成到Php

     1 <?php
     2 /*
     3  * test sphinx
     4  */
     5 include_once('sphinxapi.php');
     6 
     7 $sp = new SphinxClient();
     8 
     9 $sp ->SetServer('127.0.0.1',9312);  //connect server
    10 $sp ->SetConnectTimeout(5);  //connection timeout
    11 $sp ->SetLimits(0,10); //($min,$max)
    12 
    13 $keywords = $_POST['kw']?trim($_POST['kw']):'';     //search keywords
    14 
    15 $res = $sp ->Query($keywords,'*');   // *:all index name can use specific index
    16 print_r($res);

    返回结果如下,其中Matches数组是搜索到匹配的结果,其中key是搜索的的结果主键。

    提取出key,使用in ,在连接数据库既可以取出匹配的结果。

    Array
    (
        [error] => 
        [warning] => 
        [status] => 0
        [fields] => Array
            (
                [0] => title
                [1] => content
            )
    
        [attrs] => Array
            (
                [group_id] => 1
                [date_added] => 2
            )
    
        [matches] => Array
            (
                [2] => Array
                    (
                        [weight] => 2
                        [attrs] => Array
                            (
                                [group_id] => 3
                                [date_added] => 1270135548
                            )
    
                    )
    
                [1] => Array
                    (
                        [weight] => 1
                        [attrs] => Array
                            (
                                [group_id] => 2
                                [date_added] => 1270131607
                            )
    
                    )
    
            )
    
        [total] => 2
        [total_found] => 2
        [time] => 0.001
        [words] => Array
            (
                [twitter] => Array
                    (
                        [docs] => 2
                        [hits] => 5
                    )
    
            )
    
    )
  • 相关阅读:
    踩踩踩
    c语言可变参
    C++开发者都应该使用的10个C++11特性
    c++11 条件变量 生产者-消费者 并发线程
    c++11 线程
    C++ 虚函数表解析 继承
    坐标系
    C++ 容器:顺序性容器、关联式容器和容器适配器
    全面深入介绍C++字符串:string类
    做一个懒COCOS2D-X程序猿(一)停止手打所有cpp文件到android.mk
  • 原文地址:https://www.cnblogs.com/gophper/p/4398518.html
Copyright © 2011-2022 走看看