zoukankan      html  css  js  c++  java
  • elasticsearch5.6.8中文分词器

    安装分词器,务必确保版本一致!

    下载地址:https://github.com/medcl/elasticsearch-analysis-ik

    为了保证一致,我特地将elasticsearch进行降级。

    ik_smart

    GET _analyze?pretty
    {
      "analyzer": "ik_smart",
      "text": "中华人民共和国国歌"
    }
    
    {
      "tokens": [
        {
          "token": "中华人民共和国",
          "start_offset": 0,
          "end_offset": 7,
          "type": "CN_WORD",
          "position": 0
        },
        {
          "token": "国歌",
          "start_offset": 7,
          "end_offset": 9,
          "type": "CN_WORD",
          "position": 1
        }
      ]
    }
    

    ik_max_word

    GET _analyze?pretty
    {
      "analyzer": "ik_max_word",
      "text": "中华人民共和国国歌"
    }
    
    {
      "tokens": [
        {
          "token": "中华人民共和国",
          "start_offset": 0,
          "end_offset": 7,
          "type": "CN_WORD",
          "position": 0
        },
        {
          "token": "中华人民",
          "start_offset": 0,
          "end_offset": 4,
          "type": "CN_WORD",
          "position": 1
        },
        {
          "token": "中华",
          "start_offset": 0,
          "end_offset": 2,
          "type": "CN_WORD",
          "position": 2
        },
        {
          "token": "华人",
          "start_offset": 1,
          "end_offset": 3,
          "type": "CN_WORD",
          "position": 3
        },
        {
          "token": "人民共和国",
          "start_offset": 2,
          "end_offset": 7,
          "type": "CN_WORD",
          "position": 4
        },
        {
          "token": "人民",
          "start_offset": 2,
          "end_offset": 4,
          "type": "CN_WORD",
          "position": 5
        },
        {
          "token": "共和国",
          "start_offset": 4,
          "end_offset": 7,
          "type": "CN_WORD",
          "position": 6
        },
        {
          "token": "共和",
          "start_offset": 4,
          "end_offset": 6,
          "type": "CN_WORD",
          "position": 7
        },
        {
          "token": "国",
          "start_offset": 6,
          "end_offset": 7,
          "type": "CN_CHAR",
          "position": 8
        },
        {
          "token": "国歌",
          "start_offset": 7,
          "end_offset": 9,
          "type": "CN_WORD",
          "position": 9
        }
      ]
    }
    
  • 相关阅读:
    hdu 4947
    hdu 4946
    hdu 4944
    hdu 4942
    hdu 4941
    PAT 【L2-011 玩转二叉树】
    PAT【L2-006 树的遍历】
    XYNUOJ 【2070: 重建二叉树】
    XYNUOJ 【1367: 二叉链表存储的二叉树】
    XYNUOJ 2390【二叉树遍历2】
  • 原文地址:https://www.cnblogs.com/jiqing9006/p/9277055.html
Copyright © 2011-2022 走看看