zoukankan      html  css  js  c++  java
  • IK分词器插件

    (1)源码 
    https://github.com/medcl/elasticsearch-analysis-ik

    这里写图片描述 
    (2)releases 
    https://github.com/medcl/elasticsearch-analysis-ik/releases

    这里写图片描述

    (3)复制zip地址 
    https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip

    4.2 安装插件

    (1)elasticsearch-plugin

    [es@node1 elasticsearch-6.1.1]$ bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip
    -> Downloading https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip
    [=================================================] 100%   
    -> Installed analysis-ik
    [es@node1 elasticsearch-6.1.1]$ ll plugins/
    total 0
    drwxr-xr-x 2 es es 199 Jan  7 08:52 analysis-ik
    [es@node1 elasticsearch-6.1.1]$ 

    (2)查看目录

    [es@node1 elasticsearch-6.1.1]$ ll plugins/analysis-ik/
    total 1420
    -rw-r--r-- 1 es es 263965 Jan  7 08:52 commons-codec-1.9.jar
    -rw-r--r-- 1 es es  61829 Jan  7 08:52 commons-logging-1.2.jar
    -rw-r--r-- 1 es es  51658 Jan  7 08:52 elasticsearch-analysis-ik-6.1.1.jar
    -rw-r--r-- 1 es es 736658 Jan  7 08:52 httpclient-4.5.2.jar
    -rw-r--r-- 1 es es 326724 Jan  7 08:52 httpcore-4.4.4.jar
    -rw-r--r-- 1 es es   2666 Jan  7 08:52 plugin-descriptor.properties
    [es@node1 elasticsearch-6.1.1]$ 

    4.3 重启elasticsearch

    [es@node1 elasticsearch-6.1.1]$ bin/elasticsearch
    [2018-01-07T09:01:17,283][INFO ][o.e.n.Node ] [] initializing ...
    [2018-01-07T09:01:17,421][INFO ][o.e.e.NodeEnvironment ] [cNWkQjt] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [14.3gb], net total_space [21.9gb], types [rootfs]
    [2018-01-07T09:01:17,422][INFO ][o.e.e.NodeEnvironment ] [cNWkQjt] heap size [1007.3mb], compressed ordinary object pointers [true]
    [2018-01-07T09:01:17,484][INFO ][o.e.n.Node ] node name [cNWkQjt] derived from node ID [cNWkQjt9SzKFNtyx8IIu-A]; set [node.name] to override
    [2018-01-07T09:01:17,484][INFO ][o.e.n.Node ] version[6.1.1], pid[3445], build[bd92e7f/2017-12-17T20:23:25.338Z], OS[Linux/3.10.0-514.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_112/25.112-b15]
    [2018-01-07T09:01:17,485][INFO ][o.e.n.Node ] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/opt/elasticsearch-6.1.1, -Des.path.conf=/opt/elasticsearch-6.1.1/config]
    [2018-01-07T09:01:19,000][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [aggs-matrix-stats]
    [2018-01-07T09:01:19,000][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [analysis-common]
    [2018-01-07T09:01:19,000][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [ingest-common]
    [2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [lang-expression]
    [2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [lang-mustache]
    [2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [lang-painless]
    [2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [mapper-extras]
    [2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [parent-join]
    [2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [percolator]
    [2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [reindex]
    [2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [repository-url]
    [2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [transport-netty4]
    [2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [tribe]
    [2018-01-07T09:01:19,003][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded plugin [analysis-ik]
    [2018-01-07T09:01:21,678][INFO ][o.e.d.DiscoveryModule ] [cNWkQjt] using discovery type [zen]
    [2018-01-07T09:01:22,567][INFO ][o.e.n.Node ] initialized
    [2018-01-07T09:01:22,568][INFO ][o.e.n.Node ] [cNWkQjt] starting ...
    [2018-01-07T09:01:22,803][INFO ][o.e.t.TransportService ] [cNWkQjt] publish_address {192.168.80.131:9300}, bound_addresses {192.168.80.131:9300}
    [2018-01-07T09:01:22,837][INFO ][o.e.b.BootstrapChecks ] [cNWkQjt] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
    [2018-01-07T09:01:25,940][INFO ][o.e.c.s.MasterService ] [cNWkQjt] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{Xvho5gpPTuavakz227C_uA}{192.168.80.131}{192.168.80.131:9300}
    [2018-01-07T09:01:25,949][INFO ][o.e.c.s.ClusterApplierService] [cNWkQjt] new_master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{Xvho5gpPTuavakz227C_uA}{192.168.80.131}{192.168.80.131:9300}, reason: apply cluster state (from master [master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{Xvho5gpPTuavakz227C_uA}{192.168.80.131}{192.168.80.131:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
    [2018-01-07T09:01:25,993][INFO ][o.e.h.n.Netty4HttpServerTransport] [cNWkQjt] publish_address {192.168.80.131:9200}, bound_addresses {192.168.80.131:9200}
    [2018-01-07T09:01:25,993][INFO ][o.e.n.Node ] [cNWkQjt] started
    [2018-01-07T09:01:26,077][INFO ][o.w.a.d.Monitor ] try load config from /opt/elasticsearch-6.1.1/config/analysis-ik/IKAnalyzer.cfg.xml
    [2018-01-07T09:01:26,799][INFO ][o.e.g.GatewayService ] [cNWkQjt] recovered [2] indices into cluster_state
    [2018-01-07T09:01:27,526][INFO ][o.e.c.r.a.AllocationService] [cNWkQjt] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[test][2], [test][0]] ...]).

    4.3 测试IK中文分词器的基本功能

    (1)ik_smart 
    其中pretty本意”漂亮的”,表示以美观的形式打印出JSON格式响应。

    GET _analyze?pretty
    {
      "analyzer": "ik_smart",
      "text":"安徽省长江流域"
    }

    分词结果

    {
      "tokens": [ { "token": "安徽省", "start_offset": 0, "end_offset": 3, "type": "CN_WORD", "position": 0 }, { "token": "长江流域", "start_offset": 3, "end_offset": 7, "type": "CN_WORD", "position": 1 } ] }

    这里写图片描述 
    (2)ik_max_word

    GET _analyze?pretty
    {
      "analyzer": "ik_max_word",
      "text":"安徽省长江流域"
    }

    分词结果

    {
      "tokens": [ { "token": "安徽省", "start_offset": 0, "end_offset": 3, "type": "CN_WORD", "position": 0 }, { "token": "安徽", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 1 }, { "token": "省长", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 2 }, { "token": "长江流域", "start_offset": 3, "end_offset": 7, "type": "CN_WORD", "position": 3 }, { "token": "长江", "start_offset": 3, "end_offset": 5, "type": "CN_WORD", "position": 4 }, { "token": "江流", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 5 }, { "token": "流域", "start_offset": 5, "end_offset": 7, "type": "CN_WORD", "position": 6 } ] }

    这里写图片描述

    (3)新词

    GET _analyze?pretty
    {
      "analyzer": "ik_smart",
      "text": "王者荣耀"
    }

    分词结果

    {
      "tokens": [ { "token": "王者", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 0 }, { "token": "荣耀", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 1 } ] }

    这里写图片描述

    4.4 扩展字典

    (1)查看已有词典

    [es@node1 analysis-ik]$ pwd
    /opt/elasticsearch-6.1.1/config/analysis-ik
    [es@node1 analysis-ik]$ ll
    total 8260
    -rw-rw---- 1 es bigdata 5225922 Jan 7 08:52 extra_main.dic -rw-rw---- 1 es bigdata 63188 Jan 7 08:52 extra_single_word.dic -rw-rw---- 1 es bigdata 63188 Jan 7 08:52 extra_single_word_full.dic -rw-rw---- 1 es bigdata 10855 Jan 7 08:52 extra_single_word_low_freq.dic -rw-rw---- 1 es bigdata 156 Jan 7 08:52 extra_stopword.dic -rw-rw---- 1 es bigdata 625 Jan 7 08:52 IKAnalyzer.cfg.xml -rw-rw---- 1 es bigdata 3058510 Jan 7 08:52 main.dic -rw-rw---- 1 es bigdata 123 Jan 7 08:52 preposition.dic -rw-rw---- 1 es bigdata 1824 Jan 7 08:52 quantifier.dic -rw-rw---- 1 es bigdata 164 Jan 7 08:52 stopword.dic -rw-rw---- 1 es bigdata 192 Jan 7 08:52 suffix.dic -rw-rw---- 1 es bigdata 752 Jan 7 08:52 surname.dic [es@node1 analysis-ik]$

    (2)自定义词典

    [es@node1 analysis-ik]$ mkdir custom
    [es@node1 analysis-ik]$ vi custom/new_word.dic
    [es@node1 analysis-ik]$ cat custom/new_word.dic 
    老铁
    王者荣耀
    洪荒之力
    共有产权房
    一带一路
    [es@node1 analysis-ik]$ 

    (3)更新配置

    [es@node1 analysis-ik]$ vi IKAnalyzer.cfg.xml 
    [es@node1 analysis-ik]$ cat IKAnalyzer.cfg.xml 
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
    <properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 -->
        <entry key="ext_dict">custom/new_word.dic</entry>
         <!--用户可以在这里配置自己的扩展停止词字典-->
        <entry key="ext_stopwords"></entry>
        <!--用户可以在这里配置远程扩展字典 -->
        <!-- <entry key="remote_ext_dict">words_location</entry> -->
        <!--用户可以在这里配置远程扩展停止词字典-->
        <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
    </properties>
    [es@node1 analysis-ik]$

    (4)重启elasticsearch

    [es@node1 elasticsearch-6.1.1]$ bin/elasticsearch
    [2018-01-07T10:00:23,032][INFO ][o.e.n.Node ] [] initializing ...
    [2018-01-07T10:00:23,170][INFO ][o.e.e.NodeEnvironment ] [cNWkQjt] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [14.3gb], net total_space [21.9gb], types [rootfs]
    [2018-01-07T10:00:23,171][INFO ][o.e.e.NodeEnvironment ] [cNWkQjt] heap size [1007.3mb], compressed ordinary object pointers [true]
    [2018-01-07T10:00:23,209][INFO ][o.e.n.Node ] node name [cNWkQjt] derived from node ID [cNWkQjt9SzKFNtyx8IIu-A]; set [node.name] to override
    [2018-01-07T10:00:23,210][INFO ][o.e.n.Node ] version[6.1.1], pid[3574], build[bd92e7f/2017-12-17T20:23:25.338Z], OS[Linux/3.10.0-514.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_112/25.112-b15]
    [2018-01-07T10:00:23,210][INFO ][o.e.n.Node ] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/opt/elasticsearch-6.1.1, -Des.path.conf=/opt/elasticsearch-6.1.1/config]
    [2018-01-07T10:00:24,717][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [aggs-matrix-stats]
    [2018-01-07T10:00:24,717][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [analysis-common]
    [2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [ingest-common]
    [2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [lang-expression]
    [2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [lang-mustache]
    [2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [lang-painless]
    [2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [mapper-extras]
    [2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [parent-join]
    [2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [percolator]
    [2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [reindex]
    [2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [repository-url]
    [2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [transport-netty4]
    [2018-01-07T10:00:24,720][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [tribe]
    [2018-01-07T10:00:24,720][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded plugin [analysis-ik]
    [2018-01-07T10:00:27,866][INFO ][o.e.d.DiscoveryModule ] [cNWkQjt] using discovery type [zen]
    [2018-01-07T10:00:28,794][INFO ][o.e.n.Node ] initialized
    [2018-01-07T10:00:28,795][INFO ][o.e.n.Node ] [cNWkQjt] starting ...
    [2018-01-07T10:00:29,047][INFO ][o.e.t.TransportService ] [cNWkQjt] publish_address {192.168.80.131:9300}, bound_addresses {192.168.80.131:9300}
    [2018-01-07T10:00:29,093][INFO ][o.e.b.BootstrapChecks ] [cNWkQjt] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
    [2018-01-07T10:00:32,210][INFO ][o.e.c.s.MasterService ] [cNWkQjt] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{N6t0NiDmQp2vlrbx-FtcUQ}{192.168.80.131}{192.168.80.131:9300}
    [2018-01-07T10:00:32,217][INFO ][o.e.c.s.ClusterApplierService] [cNWkQjt] new_master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{N6t0NiDmQp2vlrbx-FtcUQ}{192.168.80.131}{192.168.80.131:9300}, reason: apply cluster state (from master [master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{N6t0NiDmQp2vlrbx-FtcUQ}{192.168.80.131}{192.168.80.131:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
    [2018-01-07T10:00:32,285][INFO ][o.e.h.n.Netty4HttpServerTransport] [cNWkQjt] publish_address {192.168.80.131:9200}, bound_addresses {192.168.80.131:9200}
    [2018-01-07T10:00:32,286][INFO ][o.e.n.Node ] [cNWkQjt] started
    [2018-01-07T10:00:32,326][INFO ][o.w.a.d.Monitor ] try load config from /opt/elasticsearch-6.1.1/config/analysis-ik/IKAnalyzer.cfg.xml
    [2018-01-07T10:00:32,905][INFO ][o.w.a.d.Monitor ] [Dict Loading] custom/new_word.dic
    [2018-01-07T10:00:33,279][INFO ][o.e.g.GatewayService ] [cNWkQjt] recovered [2] indices into cluster_state
    [2018-01-07T10:00:34,092][INFO ][o.e.c.r.a.AllocationService] [cNWkQjt] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[test][3]] ...]).

    从输出信息中可以看到

    [Dict Loading] custom/new_word.dic

    说明自定义词典已经加载了。

    (5)重启Kibana 
    重启Kibana后,从新执行下面命令

    GET _analyze?pretty
    {
      "analyzer": "ik_smart",
      "text":"王者荣耀"
    }

    分词结果

    {
      "tokens": [ { "token": "王者荣耀", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 0 } ] }

    这里写图片描述

  • 相关阅读:
    sp2010 升级sp2013 用户无法打开网站
    powerviot install in sharepoint 2013
    can not connect cube in performancce dashboard
    westrac server security configure user info
    添加报表服务在多服务器场
    sharepoint 2013 office web app 2013 文档在线浏览 IE11 浏览器不兼容解决方法
    delete job definition
    目前付款申请单内网打开慢的问题
    item style edit in sharepoint 2013
    Could not load file or assembly '$SharePoint.Project.AssemblyFullName$'
  • 原文地址:https://www.cnblogs.com/larry-luo/p/11133336.html
Copyright © 2011-2022 走看看