zoukankan      html  css  js  c++  java
  • elasticsearch5.6.8中文分词器

    安装分词器,务必确保版本一致!

    下载地址:https://github.com/medcl/elasticsearch-analysis-ik

    为了保证一致,我特地将elasticsearch进行降级。

    ik_smart

    GET _analyze?pretty
    {
      "analyzer": "ik_smart",
      "text": "中华人民共和国国歌"
    }
    
    {
      "tokens": [
        {
          "token": "中华人民共和国",
          "start_offset": 0,
          "end_offset": 7,
          "type": "CN_WORD",
          "position": 0
        },
        {
          "token": "国歌",
          "start_offset": 7,
          "end_offset": 9,
          "type": "CN_WORD",
          "position": 1
        }
      ]
    }
    

    ik_max_word

    GET _analyze?pretty
    {
      "analyzer": "ik_max_word",
      "text": "中华人民共和国国歌"
    }
    
    {
      "tokens": [
        {
          "token": "中华人民共和国",
          "start_offset": 0,
          "end_offset": 7,
          "type": "CN_WORD",
          "position": 0
        },
        {
          "token": "中华人民",
          "start_offset": 0,
          "end_offset": 4,
          "type": "CN_WORD",
          "position": 1
        },
        {
          "token": "中华",
          "start_offset": 0,
          "end_offset": 2,
          "type": "CN_WORD",
          "position": 2
        },
        {
          "token": "华人",
          "start_offset": 1,
          "end_offset": 3,
          "type": "CN_WORD",
          "position": 3
        },
        {
          "token": "人民共和国",
          "start_offset": 2,
          "end_offset": 7,
          "type": "CN_WORD",
          "position": 4
        },
        {
          "token": "人民",
          "start_offset": 2,
          "end_offset": 4,
          "type": "CN_WORD",
          "position": 5
        },
        {
          "token": "共和国",
          "start_offset": 4,
          "end_offset": 7,
          "type": "CN_WORD",
          "position": 6
        },
        {
          "token": "共和",
          "start_offset": 4,
          "end_offset": 6,
          "type": "CN_WORD",
          "position": 7
        },
        {
          "token": "国",
          "start_offset": 6,
          "end_offset": 7,
          "type": "CN_CHAR",
          "position": 8
        },
        {
          "token": "国歌",
          "start_offset": 7,
          "end_offset": 9,
          "type": "CN_WORD",
          "position": 9
        }
      ]
    }
    
  • 相关阅读:
    C++迭代器
    JdbcTemplateUtil 工具类分享
    PE和CDlinux二合一启动盘制作
    程序员自述——2019新年篇
    HTML/CSS常用单词
    JAVA学习常用单词
    Spring集成Mybatis3
    Spring集成struts2
    解决VS2010打开Web页面时经常由于内存较低而导致VS2010自动关闭的问题
    年终总结
  • 原文地址:https://www.cnblogs.com/jiqing9006/p/9277055.html
Copyright © 2011-2022 走看看