zoukankan html css js c++ java

es 之自定义 mapping（五）

当我们往 es 中插入数据时，若索引不存在则会自动创建，mapping 使用默认的；但是有时默认的映射关系不能满足我们的要求，我们可以自定义
mapping 映射关系。

mapping 即索引结构，可以看做是数据库中的表结构，包含字段名、字段类型、倒排序索引相关设置。

映射关系

每个索引都有一个映射类型，决定了文档将如何被索引，索引类型有：

元字段 meta-fields：用于自定义如何处理文档关联的元数据，如：_index、_type、_source 等字段
字段或属性 field or properties：映射类型包含与文档相关的字段或者属性的列表

字段的数据类型

字符串类型: text 或者 keyword
数值类型: integer、long、short、byte、double、float等
布尔类型: boolean
日期类型: date
二进制类型: binary
范围类型: integer_range、double_range、date_range、float_range
数组类型: array
对象类型: object
嵌套类型: nested object
地理位置数据类型: geo_point、geo_shape
专用类型: ip、join、token count、percolator 等

keyword 类型不会分词，text 会分词，因此 keyword 比 text 更节省空间，效率也更高。

自定义 mapping

PUT mapping_test
{
  "mappings": {
    "test1": {
      "properties": {
        "name": {"type": "text"},
        "age": {"type": "long"}
      }
    }
  }
}

参数

mapping_test：索引名
mappings：关键字
test1：_type 名称
properties：关键字
name、age：字段名

以上会创建一个新的索引 mapping_test，其中 mapping 信息是我们自定义的，若返回以下信息，表示创建成功：

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "mapping_test"
}

查看 mapping：

GET mapping_test/_mapping

查询结果：

{
  "mapping_test" : {
    "mappings" : {
      "test1" : {
        "properties" : {
          "age" : {
            "type" : "long"
          },
          "name" : {
            "type" : "text"
          }
        }
      }
    }
  }
}

mapping 中的参数

analyzer

字段分词器，默认为 standard，可以指定第三方的分词器：

PUT mapping_test
{
  "mappings": {
    "test1": {
      "properties": {
        "name": {
            "type": "text",
            "analyzer": "ik_smart"      # 使用 ik 中文分词器
        },
      }
    }
  }
}

boost

查询时提高字段的相关性算分，得分越高在查询结果集中排名越靠前，boost 可以指定其分数（权重），默认 1.0：

PUT mapping_test
{
  "mappings": {
    "test1": {
      "properties": {
        "name": {
            "type": "text",
            "boost": 2
        },
      }
    }
  }
}

copy_to

该属性将多个字段的值拷贝到指定字段，然后可以将其作为单个字段查询，以下将 first_name、last_name 的值拷贝到 full_name 字段中：

# 创建索引
PUT my_index
{
  "mappings": {
    "doc": {
      "properties": {
        "first_name": {
          "type": "text",
          "copy_to": "full_name"
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name"
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}

# 查询数据
PUT my_index/doc/1
{
  "first_name": "John",
  "last_name": "Smith"
}

查询：

GET my_index/doc/_search
{
  "query": {
    "match": {
      "full_name": {
        "query": "John"
      }
    }
  }
}

查询结果：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith"
        }
      }
    ]
  }
}

dynamic

创建索引时，索引中字段是固定的，该属性可以决定是否允许新增字段，有三种状态：

true：允许新增，es 会自动添加映射关系
false：允许新增，不会自动添加映射关系，但是不能作为主查询查询（查询不到具体的新增字段）
strict：严格模式，不可以新增字段，新增就报错，需要重新设计索引

1、dynamic 为 true 时

PUT s1
{
  "mappings": {
    "doc": {
      "dynamic": true,
      "properties": {
        "name": {"type": "text"}
      }
    }
  }
}

# 插入数据，新增了一个 age 字段
PUT s1/doc/1
{
  "name": "rose",
  "age": 19
}

# 可以使用 age 作为主查询条件查询
GET s1/doc/_search
{
  "query": {
    "match": {
      "age": 19
    }
  }
}

创建索引、插入数据，查询都没有问题

2、dynamic为 false 时

PUT s2
{
  "mappings": {
    "doc": {
      "dynamic": false,
      "properties": {
        "name": {"type": "text"}
      }
    }
  }
}

# 插入数据，新增了一个 age 字段
PUT s2/doc/1
{
  "name": "rose",
   "age": 19
}

# 使用 age 字段作为主条件查询
GET s2/doc/_search
{
  "query": {
    "match": {
      "age": 19
    }
  }
}

查询结果：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

创建索引、插入数据，新增字段作为主条件查询查询为空，查询不到数据。

3、dynamic 为 strict 时：

PUT s3
{
  "mappings": {
    "doc": {
      "dynamic": "strict",
      "properties": {
        "name": {"type": "text"}
      }
    }
  }
}

PUT s3/doc/1
{
  "name": "rose",
   "age": 19
}

严格模式下，禁止插入，插入就出错：

{
  "error": {
    "root_cause": [
      {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [age] within [doc] is not allowed"
      }
    ],
    "type": "strict_dynamic_mapping_exception",
    "reason": "mapping set to strict, dynamic introduction of [age] within [doc] is not allowed"
  },
  "status": 400
}

index

index 属性默认为 true，若设置为 false，那么 es 不会为该属性创建索引，即不能当前主条件查询，查询会报错：

PUT s5
{
  "mappings": {
    "doc": {
      "properties": {
        "t1": {
          "type": "text",
          "index": true
        },
        "t2": {
          "type": "text",
          "index": false
        }
      }
    }
  }
}

PUT s5/doc/1
{
  "t1": "论母猪的产前保养",
  "t2": "论母猪的产后护理"
}

GET s5/doc/_search
{
  "query": {
    "match": {
      "t1": "母猪"
    }
  }
}

# t2 字段 index 设置为 false，作为主条件查询
GET s5/doc/_search
{
  "query": {
    "match": {
      "t2": "母猪"
    }
  }
}

t2 字段 index 设置为 false，作为主条件查询时会报错：

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: {
  "match" : {
    "t2" : {
      "query" : "母猪",
      "operator" : "OR",
      "prefix_length" : 0,
      "max_expansions" : 50,
      "fuzzy_transpositions" : true,
      "lenient" : false,
      "zero_terms_query" : "NONE",
      "auto_generate_synonyms_phrase_query" : true,
      "boost" : 1.0
    }
  }
}",
        "index_uuid": "jTRViM6SSRSERtEcSTSOFQ",
        "index": "s5"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "s5",
        "node": "d8Q4szIXR8KlHOram-TICA",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: {
  "match" : {
    "t2" : {
      "query" : "母猪",
      "operator" : "OR",
      "prefix_length" : 0,
      "max_expansions" : 50,
      "fuzzy_transpositions" : true,
      "lenient" : false,
      "zero_terms_query" : "NONE",
      "auto_generate_synonyms_phrase_query" : true,
      "boost" : 1.0
    }
  }
}",
          "index_uuid": "jTRViM6SSRSERtEcSTSOFQ",
          "index": "s5",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Cannot search on field [t2] since it is not indexed."
          }
        }
      }
    ]
  },
  "status": 400
}

ignore_above

超过 ignore_above 设置的字符串将不会被索引或存储，对于字符串数组，ignore_above 将分别应用于每个数组元素，并且字符串元素 ignore_above 将不会被索引或存储。

PUT s6
{
  "mappings": {
    "doc": {
      "properties": {
        "t1": {
          "type": "keyword",
          "ignore_above": 10
        }
      }
    }
  }
}

PUT s6/doc/1
{
  "t1": "123456"
}

# 超过 ignore_above 10
PUT s6/doc/2
{
  "t1": "1234567891011121314151617181920"
}

# 查询时为空
GET s6/doc/_search
{
  "query": {
    "match": {
      "t1": "1234567891011121314151617181920"
    }
  }
}

查询结果：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

注意：字段启用 ignore_above 时，字段类型不能为 text，超过 ignore_above ，不会被索引，即查询不到具体数据。

index_options

控制倒排序索引记录的内容，可选项：

docs：只记录文档 id
freqs:记录文档 id、单词频率
positions:记录文档 id、词频、单词位置
offsets:记录文档 id、词频、单词位置、偏移量

其中 text 类型字段默认的 index_options 为 positions，其余类型默认为 docs，同时记录的内容越多，占用的空间也越大。

fields

允许为字段设置子字段，可以有多个，如检索人的中文姓名和拼音姓名，把 name_pinyin 这个字段挂在 name_cn 字段下：

PUT s7
{
  "mappings": {
    "doc": {
      "properties": {
        "name_cn": {
          "type": "text",
          "fields": {
            "name_pinyin": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

PUT s7/doc/1
{
  "name_cn": "张三",
  "name_pinyin": "zhangsan"
}

GET s7/doc/_search
{
  "query": {
    "match": {
      "name_pinyin": "zhangsan"
    }
  }
}

null_value

当字段遇到 null 值时的处理策略（字段为 null 时不会被搜索的，text 类型的字段不能使用该属性），设置该值后可以用你设置的值替换null 值，这点可类比 mysql 中的 "default" 设置默认值。

PUT s8
{
  "mappings": {
    "doc": {
      "properties": {
        "name_cn": {
          "type": "keyword",
          "null_value": "张三"
        }
      }
    }
  }
}

search_analyzer

指定搜索时分词器，这一要注意，在 es 之分词中说到过，分词的两个时机是索引时分词和搜索时分词，一般情况下使用索引时分词即可，所以如果你同时设置了两个，那么这两个分词器最好保持一致，不然可能出现搜索匹配不到数据的问题。

PUT s10
{
  "mappings": {
    "doc": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "standard",
          "search_analyzer": "standard"
        }
      }
    }
  }
}

查看全文

相关阅读:
Java 多线程（一）基础知识与概念
 hashMap和treeMap
转：Java IO流学习总结
 hibernate缓存
 java aio nio bio
java1.8新特性
 LeetCode Contiguous Array
LeetCode Sort Characters By Frequency
LeetCode Subarray Sum Equals K
LeetCode Group Anagrams

原文地址：https://www.cnblogs.com/midworld/p/13782875.html

es 之 自定义 mapping（五）

映射关系

字段的数据类型

自定义 mapping

mapping 中的参数

analyzer

boost

copy_to

dynamic

index

ignore_above

index_options

fields

null_value

search_analyzer

es 之自定义 mapping（五）