Es 内置分词器
- Standard Analyer 默认分词器,按词切分,小写处理
- Simple Analyer 按照非字母切分(符号被过滤),小写处理
- Stop Analyer 小写处理,停用过滤词(the, is , a)
- Whitespace Analyer 按照空格切分,不转小写
- Keyword Analyer 不分词,直接将输入当作输出
- Pattern Analyer 正则表达式,默认 W+(非字符分隔)
- Language 提供30种分词器
- Customer Analyzer 自定义分词器
Standard Analyer 默认分词器
按词切分,小写处理
GET /_analyze
{
"analyzer": "standard",
"text": "Trying Out Kibana! "
}
结果
{
"tokens" : [
{
"token" : "trying",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "out",
"start_offset" : 7,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "kibana",
"start_offset" : 11,
"end_offset" : 17,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
Simple Analyer
按照非字母切分(符号被过滤),小写处理
GET /_analyze
{
"analyzer": "simple",
"text": "Try78ing 12 Out 1212 Kib45ana! "
}
结果
{
"tokens" : [
{
"token" : "try",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "ing",
"start_offset" : 5,
"end_offset" : 8,
"type" : "word",
"position" : 1
},
{
"token" : "out",
"start_offset" : 12,
"end_offset" : 15,
"type" : "word",
"position" : 2
},
{
"token" : "kib",
"start_offset" : 21,
"end_offset" : 24,
"type" : "word",
"position" : 3
},
{
"token" : "ana",
"start_offset" : 26,
"end_offset" : 29,
"type" : "word",
"position" : 4
}
]
}
Simple Analyer
按照非字母切分(符号被过滤),小写处理
GET /_analyze
{
"analyzer": "stop",
"text": "Try78ing 12 Out 1212 Kib45ana! "
}
结果
{
"tokens" : [
{
"token" : "try",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "ing",
"start_offset" : 5,
"end_offset" : 8,
"type" : "word",
"position" : 1
},
{
"token" : "out",
"start_offset" : 12,
"end_offset" : 15,
"type" : "word",
"position" : 2
},
{
"token" : "kib",
"start_offset" : 21,
"end_offset" : 24,
"type" : "word",
"position" : 3
},
{
"token" : "ana",
"start_offset" : 26,
"end_offset" : 29,
"type" : "word",
"position" : 4
}
]
}
Whitespace Analyer
按照空格切分,不转小写
GET /_analyze
{
"analyzer": "whitespace",
"text": "Try78ing 12 Out 1212 Kib45ana! "
}
结果
{
"tokens" : [
{
"token" : "Try78ing",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 0
},
{
"token" : "12",
"start_offset" : 9,
"end_offset" : 11,
"type" : "word",
"position" : 1
},
{
"token" : "Out",
"start_offset" : 12,
"end_offset" : 15,
"type" : "word",
"position" : 2
},
{
"token" : "1212",
"start_offset" : 16,
"end_offset" : 20,
"type" : "word",
"position" : 3
},
{
"token" : "Kib45ana!",
"start_offset" : 21,
"end_offset" : 30,
"type" : "word",
"position" : 4
}
]
}
Keyword Analyer
不分词,直接将输入当作输出
GET /_analyze
{
"analyzer": "whitespace",
"text": "Try78ing 12 Out 1212 Kib45ana! "
}
结果
{
"tokens" : [
{
"token" : "Try78ing 12 Out 1212 Kib45ana! ",
"start_offset" : 0,
"end_offset" : 31,
"type" : "word",
"position" : 0
}
]
}
Pattern Analyer
正则表达式,默认 W+(非字符分隔)
GET /_analyze
{
"analyzer": "whitespace",
"text": "Try78ing 12 Out 1212 Kib45ana! "
}
结果
{
"tokens" : [
{
"token" : "try78ing",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 0
},
{
"token" : "12",
"start_offset" : 9,
"end_offset" : 11,
"type" : "word",
"position" : 1
},
{
"token" : "out",
"start_offset" : 12,
"end_offset" : 15,
"type" : "word",
"position" : 2
},
{
"token" : "1212",
"start_offset" : 16,
"end_offset" : 20,
"type" : "word",
"position" : 3
},
{
"token" : "kib45ana",
"start_offset" : 21,
"end_offset" : 29,
"type" : "word",
"position" : 4
}
]
}
Language 提供30种分词器
Customer Analyzer
自定义分词器