zoukankan      html  css  js  c++  java
  • es

    Es 内置分词器

    • Standard Analyer 默认分词器,按词切分,小写处理
    • Simple Analyer 按照非字母切分(符号被过滤),小写处理
    • Stop Analyer 小写处理,停用过滤词(the, is , a)
    • Whitespace Analyer 按照空格切分,不转小写
    • Keyword Analyer 不分词,直接将输入当作输出
    • Pattern Analyer 正则表达式,默认 W+(非字符分隔)
    • Language 提供30种分词器
    • Customer Analyzer 自定义分词器

    Standard Analyer 默认分词器

    按词切分,小写处理

    GET /_analyze
    {
      "analyzer": "standard",
      "text": "Trying Out Kibana! "
    }
    
    结果
    {
      "tokens" : [
        {
          "token" : "trying",
          "start_offset" : 0,
          "end_offset" : 6,
          "type" : "<ALPHANUM>",
          "position" : 0
        },
        {
          "token" : "out",
          "start_offset" : 7,
          "end_offset" : 10,
          "type" : "<ALPHANUM>",
          "position" : 1
        },
        {
          "token" : "kibana",
          "start_offset" : 11,
          "end_offset" : 17,
          "type" : "<ALPHANUM>",
          "position" : 2
        }
      ]
    }
    
    

    Simple Analyer

    按照非字母切分(符号被过滤),小写处理

    GET /_analyze
    {
      "analyzer": "simple",
      "text": "Try78ing 12 Out 1212 Kib45ana! "
    }
    
    结果
    {
      "tokens" : [
        {
          "token" : "try",
          "start_offset" : 0,
          "end_offset" : 3,
          "type" : "word",
          "position" : 0
        },
        {
          "token" : "ing",
          "start_offset" : 5,
          "end_offset" : 8,
          "type" : "word",
          "position" : 1
        },
        {
          "token" : "out",
          "start_offset" : 12,
          "end_offset" : 15,
          "type" : "word",
          "position" : 2
        },
        {
          "token" : "kib",
          "start_offset" : 21,
          "end_offset" : 24,
          "type" : "word",
          "position" : 3
        },
        {
          "token" : "ana",
          "start_offset" : 26,
          "end_offset" : 29,
          "type" : "word",
          "position" : 4
        }
      ]
    }
    
    

    Simple Analyer

    按照非字母切分(符号被过滤),小写处理

    GET /_analyze
    {
      "analyzer": "stop",
      "text": "Try78ing 12 Out 1212 Kib45ana! "
    }
    
    
    结果
    
    {
      "tokens" : [
        {
          "token" : "try",
          "start_offset" : 0,
          "end_offset" : 3,
          "type" : "word",
          "position" : 0
        },
        {
          "token" : "ing",
          "start_offset" : 5,
          "end_offset" : 8,
          "type" : "word",
          "position" : 1
        },
        {
          "token" : "out",
          "start_offset" : 12,
          "end_offset" : 15,
          "type" : "word",
          "position" : 2
        },
        {
          "token" : "kib",
          "start_offset" : 21,
          "end_offset" : 24,
          "type" : "word",
          "position" : 3
        },
        {
          "token" : "ana",
          "start_offset" : 26,
          "end_offset" : 29,
          "type" : "word",
          "position" : 4
        }
      ]
    }
    
    

    Whitespace Analyer

    按照空格切分,不转小写

    GET /_analyze
    {
      "analyzer": "whitespace",
      "text": "Try78ing 12 Out 1212 Kib45ana! "
    }
    
    结果
    {
      "tokens" : [
        {
          "token" : "Try78ing",
          "start_offset" : 0,
          "end_offset" : 8,
          "type" : "word",
          "position" : 0
        },
        {
          "token" : "12",
          "start_offset" : 9,
          "end_offset" : 11,
          "type" : "word",
          "position" : 1
        },
        {
          "token" : "Out",
          "start_offset" : 12,
          "end_offset" : 15,
          "type" : "word",
          "position" : 2
        },
        {
          "token" : "1212",
          "start_offset" : 16,
          "end_offset" : 20,
          "type" : "word",
          "position" : 3
        },
        {
          "token" : "Kib45ana!",
          "start_offset" : 21,
          "end_offset" : 30,
          "type" : "word",
          "position" : 4
        }
      ]
    }
    
    
    

    Keyword Analyer

    不分词,直接将输入当作输出

    GET /_analyze
    {
      "analyzer": "whitespace",
      "text": "Try78ing 12 Out 1212 Kib45ana! "
    }
    结果
    {
      "tokens" : [
        {
          "token" : "Try78ing 12 Out 1212 Kib45ana! ",
          "start_offset" : 0,
          "end_offset" : 31,
          "type" : "word",
          "position" : 0
        }
      ]
    }
    
    

    Pattern Analyer

    正则表达式,默认 W+(非字符分隔)

    GET /_analyze
    {
      "analyzer": "whitespace",
      "text": "Try78ing 12 Out 1212 Kib45ana! "
    }
    结果
    {
      "tokens" : [
        {
          "token" : "try78ing",
          "start_offset" : 0,
          "end_offset" : 8,
          "type" : "word",
          "position" : 0
        },
        {
          "token" : "12",
          "start_offset" : 9,
          "end_offset" : 11,
          "type" : "word",
          "position" : 1
        },
        {
          "token" : "out",
          "start_offset" : 12,
          "end_offset" : 15,
          "type" : "word",
          "position" : 2
        },
        {
          "token" : "1212",
          "start_offset" : 16,
          "end_offset" : 20,
          "type" : "word",
          "position" : 3
        },
        {
          "token" : "kib45ana",
          "start_offset" : 21,
          "end_offset" : 29,
          "type" : "word",
          "position" : 4
        }
      ]
    }
    
    
    

    Language 提供30种分词器

    Customer Analyzer

    自定义分词器

  • 相关阅读:
    iTOP-4412开发板-串口基础知识和测试方法
    迅为i.MX6ULL终结者开发板-能想到的功能它都有
    Android4.4.2 源码编译-iMX6Q/D核心板-非设备树源码
    如何让Dev支持c++11特性
    2019年第十届蓝桥杯【C++省赛B组】
    upper_bound()和low_bound函数的基本使用和理解(转载,已获博主授权)
    C++的bitset(位操作使用),转载
    2018年第九届蓝桥杯【C++省赛B组】(未完)
    2013蓝桥杯预赛C/C++本科B组
    信用卡号验证
  • 原文地址:https://www.cnblogs.com/smallyi/p/13430614.html
Copyright © 2011-2022 走看看