zoukankan      html  css  js  c++  java
  • sphinx实时索引和高亮显示

    sphinx实时索引和高亮显示

    上次介绍了coreseek与sphinx的区别,并详细记录了安装coreseek文档说明,以及给php加上sphinx模块,详细内容请参考我写的coreseek详解这篇文档,这次主要介绍sphinx是如何做到实时索引.首先配置进入到coreseek配置文件目录,对原始配置文件进行配制,这里介略说下coreseek配制文件,主要分为主数据源,增量数据源,主索引,增量索引,索引器配制、以及还有守护进程配制。如果应用在大型系统上还会涉及到分布式索引,和增量分布式索引,由于分布式索引过于复杂,这里就不说.下面贴出我在项目中用到的sphinx配制文件 

     
    ##主数据源
    source main
    {
      type					= mysql
      sql_host				= localhost
      sql_user				= root
      sql_pass				=
      sql_db					= test
      sql_port				= 3306	# optional, default is 3306
      sql_sock				= /tmp/mysql.sock
       sql_query_pre			= SET NAMES utf8
    #	 sql_query_pre			= SET SESSION query_cache_type=OFF
      sql_query_pre = replace into sph_counter select 1,max(id) from post		
      sql_query=select id,title,content from post where id <=(select max_doc_id from sph_counter where count_id = 1)
      sql_ranged_throttle	= 0
      sql_query_info		= SELECT * FROM post WHERE id=$id
    }
     
     
    #增量数据源
    source delta : main
    {
      sql_query_pre=set names utf8
      sql_query=select id,title,content from post where id >(select max_doc_id from sph_counter where count_id = 1)
    }
     
    #主索引
    index main
    {
      source			= main
      path			= /usr/local/coreseek/var/data/main
      docinfo			= extern
      mlock			= 0
      morphology		= none	
      min_word_len		= 1
      charset_type		= zh_cn.utf-8
      charset_dictpath	= /usr/local/mmseg/etc/
      html_strip				= 0
    }
    #增量索引
    index delta : main
    {
      source=delta
      path			= /usr/local/coreseek/var/data/delta
    #	morphology		= stem_en
    }
     
    ##索引器
    indexer
    {
      mem_limit			= 128M
    }
     
    ###守护进程设置
    searchd
    {
     
      log					= /usr/local/coreseek/var/log/searchd.log
      query_log			= /usr/local/coreseek/var/log/query.log
     
      read_timeout		= 5
     
      client_timeout		= 300
     
      max_children		= 30
     
      pid_file			= /usr/local/coreseek/var/log/searchd.pid
     
      max_matches			= 1000
     
      seamless_rotate		= 1
     
      preopen_indexes		= 0
     
      unlink_old			= 1
     
      mva_updates_pool	= 1M
     
      max_packet_size		= 8M
     
      max_filters			= 256
     
      max_filter_values	= 4096
    }

    上面请注意我的sql语句的写法,这里是一个核心,也是决定sphinx能否配置成功的一个关键,下面贴出sph_counter和post表结构,这里做下说明sph_count表是与sphinx实时索引相关的表

     
    CREATE TABLE `post` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `title` varchar(254) NOT NULL,
      `content` text,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=42 DEFAULT CHARSET=utf8;
     
    CREATE TABLE `sph_counter` (
      `count_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `max_doc_id` int(11) DEFAULT NULL,
      PRIMARY KEY (`count_id`)
    ) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=utf8

    下面通过一段程序介绍shpinx如何实现高显示和实时索引

     
    <html>
      <head>
        <title>spinx</title>
        <meta charset="utf-8" />
      </head>
      <body>
        <form action="find.php" method="post">
          <input type="text"  name="search"/>
          <input type="submit" value="提交">
        </form>
      </body>
    </html>
     
    header("content-type:text/html;charset=utf-8");
      $keyword = $_POST['search'];
      $sphinx = new SphinxClient();
      $sphinx->SetServer("localhost",9312);
      $sphinx->SetMatchMode(SPH_MATCH_ANY);
      $result=$sphinx->query("$keyword","*");
      $key = array_keys($result['matches']);
      $ids = implode(',',$key);
      $conn = mysql_connect('localhost','root','')or die('mysql connect failed');
      mysql_select_db('test');
      mysql_set_charset('utf8',$conn);
      $sql = "select * from post where id in($ids)";
      $res = mysql_query($sql);
      $opt = array("before_match"=>"<font style='font-weight:bold;color:#f00'>","after_match"=>"</font>");
      while($row=mysql_fetch_assoc($res)){
        echo '<pre>';
                    //这里为sphinx高亮显示
        $rows = $sphinx->buildExcerpts($row,"main",$keyword,$opt);
        print_r($rows);
      }
            $sphinx->close();

    运行之后结果展示

    做到这里以经完成了一大半,但还没有做到实时索引,假设数据库表里面的数据增加就没有办法搜索到新增的数据,这里写了一个shell脚本 main.sh

     
    #!/bin/bash
     /usr/local/coreseek/bin/inderer main --rotate >>/usr/local/coreseek/var/log/main.log

    脚本delta.sh

     
    #!/bin/bash
    /usr/local/coreseek/bin/inderer delta --rotate >>/usr/local/coreseek/var/log/delta.log

    然后将这两个脚本放在linux定时任务器每一分钟执行一次,代码如下

     
    */5 * * * * /usr/local/coreseek/init/delta.sh
    00 03 * * * /usr/local/coreseek/init/main.sh

    完毕,另外在说一点,sphinx操作的表必须要有主键。

    坚持!
  • 相关阅读:
    linux系统性能监控常用命令
    如何在windows的DOS窗口中正常显示中文(UTF-8字符)
    在Windows的CMD中如何设置支持UTF8编码?
    设置cmd的codepage的方法
    Oracle字符集转换
    移动端跨平台开发的深度解析
    类型擦除是抽象泛型的实例化的过程
    FP又称为Monadic Programming
    深入剖析Swift性能优化
    真实世界中的 Swift 性能优化
  • 原文地址:https://www.cnblogs.com/doubilaile/p/4641926.html
Copyright © 2011-2022 走看看