zoukankan      html  css  js  c++  java
  • elasticsearch 深入 —— normalizer

    keyword字段的normalizer属性类似于分析器,只是它保证分析链生成单个token。

    索引关键字之前,以及在通过诸如match查询之类的查询解析器或者通过诸如term查询之类的术语级查询搜索keyword字段时的搜索,应用规范化器——normalizer。

    PUT index
    {
      "settings": {
        "analysis": {
          "normalizer": {
            "my_normalizer": {
              "type": "custom",
              "char_filter": [],
              "filter": ["lowercase", "asciifolding"]
            }
          }
        }
      },
      "mappings": {
        "_doc": {
          "properties": {
            "foo": {
              "type": "keyword",
              "normalizer": "my_normalizer"
            }
          }
        }
      }
    }
    
    PUT index/_doc/1
    {
      "foo": "BÀR"
    }
    
    PUT index/_doc/2
    {
      "foo": "bar"
    }
    
    PUT index/_doc/3
    {
      "foo": "baz"
    }
    
    POST index/_refresh
    
    GET index/_search
    {
      "query": {
        "term": {
          "foo": "BAR"
        }
      }
    }
    
    GET index/_search
    {
      "query": {
        "match": {
          "foo": "BAR"
        }
      }
    }

    上述查询与文档1和2匹配,因为在索引和查询时都将BÀR转换为bar 。

    {
      "took": $body.took,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped" : 0,
        "failed": 0
      },
      "hits": {
        "total": 2,
        "max_score": 0.2876821,
        "hits": [
          {
            "_index": "index",
            "_type": "_doc",
            "_id": "2",
            "_score": 0.2876821,
            "_source": {
              "foo": "bar"
            }
          },
          {
            "_index": "index",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.2876821,
            "_source": {
              "foo": "BÀR"
            }
          }
        ]
      }
    }

    此外,关键字在索引之前被转换的事实也意味着聚合返回归一化值:

    GET index/_search
    {
      "size": 0,
      "aggs": {
        "foo_terms": {
          "terms": {
            "field": "foo"
          }
        }
      }
    }

    返回:

    {
      "took": 43,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped" : 0,
        "failed": 0
      },
      "hits": {
        "total": 3,
        "max_score": 0.0,
        "hits": []
      },
      "aggregations": {
        "foo_terms": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "bar",
              "doc_count": 2
            },
            {
              "key": "baz",
              "doc_count": 1
            }
          ]
        }
      }
    }
  • 相关阅读:
    转:阅读代码
    转:三个教程
    转:C/C++程序员简历模板
    转:对于一个字节(8bit)的变量,求其二进制表示中“1”的个数
    内存偏移
    转:用C++实现的一种插件体系结构-----概述
    转:用异或实现两个数的交换的问题
    转:二级指针
    转:《链接、装载与库》里的一个错误:关于调用栈
    转:你应当如何学习C++(以及编程)(rev#1)
  • 原文地址:https://www.cnblogs.com/gmhappy/p/11864041.html
Copyright © 2011-2022 走看看