zoukankan      html  css  js  c++  java
  • 中文分词 sphni与scws

    1、安装sphnix
    cd /usr/local/src
    wget http://sphinxsearch.com/files/sphinx-2.2.11-release.tar.gz
    tar -zxvf sphinx-2.2.11-release.tar.gz
    cd sphinx-2.2.11-release
    yum install mysql56u-libs
    ./configure --prefix=/usr/local/sphinx --with-mysql
    make
    make install
    2、安装sphinx客户端libsphinxclient
    cd /usr/local/src/sphinx-2.2.11-release/api/libsphinxclient
    ./configure --prefix=/usr/local/libsphinxclient
    make
    make install
    3、安装php扩展
    cd /usr/local/src
    #wget https://github.com/php/pecl-search_engine-sphinx/archive/php7.zip (适用于php7版本)
    wget wget http://pecl.php.net/get/sphinx-1.3.3.tgz (适用于php7以下版本)
    tar -zxvf sphinx-1.3.3.tgz
    cd sphinx-1.3.3
    phpize
    ./configure --with-sphinx=/usr/local/libsphinxclient --with-php-config=/usr/bin/php-config
    make
    make install
    vim /etc/php.d/50-sphinx.ini
    extension = sphinx.so
    service php-fpm restart
    #php -m|grep sphinx
    sphinx

    使用手册
    http://docs.php.net/manual/zh/book.sphinx.php

    4、索引启动服务
    cp /usr/local/sphinx/etc/sphinx.conf.dist /usr/local/sphinx/etc/sphinx.conf
    /usr/local/sphinx/bin/indexer --all
    /usr/local/sphinx/bin/searchd

    二、php 分词 scws
    官网 http://www.ftphp.com/scws/
    1、 安装
    wget http://www.xunsearch.com/scws/down/scws-1.2.1.tar.bz2
    tar -jxvf scws-1.2.1.tar.bz2
    cd scws-1.2.1
    ./configure --prefix=/usr/local/scws
    make && make install
    2、 词库
    wget http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
    scws-dict-chs-utf8.tar.bz2 解压放入 /opt/server/scws/etc

    词库 dict.utf-8.xdb
    规则 rules.utf-8.ini

    3、 php 扩展

    源码在phpext下
    cd /usr/local/src/scws-1.2.1/phpext/
    phpize
    ./configure --with-scws=/usr/local/scws --with-php-config=/usr/bin/php-config
    make
    make install
    vim /etc/php.d/50-scws.ini
    extension = scws.so
    service php-fpm restart
    php -m|grep scws
    scws
    4、 分词测试
    http://www.ftphp.com/scws/docs.php

    详见测试文件 test_all.php
    测试文件
    vim /data/html/fenci1.php
    <?php
    $so = scws_new();
    $so->set_charset('utf8');
    // 这里没有调用 set_dict 和 set_rule 系统会自动试调用 ini 中指定路径下的词典和规则文件
    $so->send_text("我是一个中国人,我会C++语言,我也有很多T恤衣服");
    while ($tmp = $so->get_result())
    {
    echo "<PRE>";
    print_r($tmp);
    }
    $so->close();
    ?>

    访问结果:
    Array
    (
    [0] => Array
    (
    [word] => 我
    [off] => 0
    [len] => 3
    [idf] => 0
    [attr] => r
    )

    [1] => Array
    (
    [word] => 是
    [off] => 3
    [len] => 3
    [idf] => 0
    [attr] => v
    )

    [2] => Array
    (
    [word] => 一个
    [off] => 6
    [len] => 6
    [idf] => 4.289999961853
    [attr] => m
    )

    [3] => Array
    (
    [word] => 中国人
    [off] => 12
    [len] => 9
    [idf] => 4.9000000953674
    [attr] => n
    )

    )
    …………………………

    三、 索引

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

  • 相关阅读:
    Web——[HCTF 2018]WarmUp
    栈的设置+栈的越界问题+栈的极限大小
    栈的概念
    检测点3.1
    字节型数据和字型数据的小结
    汇编语言(王爽)学习记录_第一章
    sqli-labs less-1 --> less-4
    五角星
    STD二手图书交流平台团队博客-登陆问题的解决
    STD二手图书交流平台团队博客-界面构建
  • 原文地址:https://www.cnblogs.com/chenjiahe/p/6116688.html
Copyright © 2011-2022 走看看