elasticsearch移除映射类型(mapping type)

zoukankan html css js c++ java

elasticsearch移除映射类型(mapping type)
移除映射类型(mapping type)

es在8.X之前的逻辑存储模型是模拟关系型数据库的三层结构:index(库)-type(表)-document(record).这种结构在7.x时不再推荐使用,8.x的时候会把type移除变成最终的两级结构:index(库)-document(record).由于层级的调整,在7.x中影响范围包括:index creation, put mapping, get mapping, put template, get template and get field mappings APIs.

什么是映射类型(mapping type)

es中每个文档存储在一个索引(index)中,同时分配到一个映射类型(mapping type)中.映射类型通常表示文档类型和索引实体.如一个twitter索引可能有user类型和tweet类型.每个映射类型有自己的字段,所以user类型可能有full_name,user_name,email字段,tweet类型可能有content,tweeted_at,user_name字段.文档通过_type元数据字段来标识类型名称.查询时也通过类型字段来限制查询范围:
```
GET twitter/user,tweet/_search
{
  "query": {
    "match": {
      "user_name": "kimchy"
    }
  }
}
```
映射类型的缺点

关系数据库中,每个表是独立的,表间的字段即使同名也互相没有干扰.但是es中的映射类型不同.
1. es中同一个索引下的同名字段会被Lucene存储在一起,这也要求它们的映射类型必须一致.如果不一致则会导致数据删除失败.
2. 同一个索引中存储不同的实体,往往相同字段很少,由于es的数据存储是以索引为维度的,这就造成实际存储的数据比较分散,不利于文档压缩.
这也看出来映射类型的存在很鸡肋,因此es决定7.x时移除映射类型的概念.这也是基于查询和存储的效率最佳选择.

推荐的映射类型方式
- 按文档类型划分索引
实际存储时,不再通过映射类型区分文档类型,而是通过索引来区分.如上面的twitter索引示例,我们把user存到user索引中,把tweet存到tweet索引中,查询,存储完全独立.

优点:
1. 数据更密集,有利于lucene压缩
2. 同一索引中文档数据表示独立的实体,使得全文检索条目排名分析更精确.
- 自定义类型字段(custome type field)
由于集群中主分片数量是有限的,你可能不想浪费整个分片存储一些很少数量的文档数据.这种情况下,你可以通过自定义类型字段实现类似之前映射类型的功能.

用上面的user/tweet举例:

7.x之前的版本存储流程如下:
```
PUT twitter
{
  "mappings": {
    "user": {
      "properties": {
        "name": { "type": "text" },
        "user_name": { "type": "keyword" },
        "email": { "type": "keyword" }
      }
    },
    "tweet": {
      "properties": {
        "content": { "type": "text" },
        "user_name": { "type": "keyword" },
        "tweeted_at": { "type": "date" }
      }
    }
  }
}

PUT twitter/user/kimchy
{
  "name": "Shay Banon",
  "user_name": "kimchy",
  "email": "shay@kimchy.com"
}

PUT twitter/tweet/1
{
  "user_name": "kimchy",
  "tweeted_at": "2017-10-24T09:00:00Z",
  "content": "Types are going away"
}

GET twitter/tweet/_search
{
  "query": {
    "match": {
      "user_name": "kimchy"
    }
  }
}
```
7.x之后的版本通过添加自定义类型(type)字段存储方式:
```
PUT twitter?include_type_name=true
{
  "mappings": {
    "_doc": {
      "properties": {
        "type": { "type": "keyword" }, 
        "name": { "type": "text" },
        "user_name": { "type": "keyword" },
        "email": { "type": "keyword" },
        "content": { "type": "text" },
        "tweeted_at": { "type": "date" }
      }
    }
  }
}

7.x之后明确指定类型(type)字段,而7.x之前是隐式指定的_type字段

PUT twitter/_doc/user-kimchy
{
  "type": "user", 
  "name": "Shay Banon",
  "user_name": "kimchy",
  "email": "shay@kimchy.com"
}

PUT twitter/_doc/tweet-1
{
  "type": "tweet", 
  "user_name": "kimchy",
  "tweeted_at": "2017-10-24T09:00:00Z",
  "content": "Types are going away"
}

GET twitter/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "user_name": "kimchy"
        }
      },
      "filter": {
        "match": {
          "type": "tweet" 
        }
      }
    }
  }
}
```
- 父子结构(Parent/Child)不再提供映射类型
7.x前,parent-child关系,定义映射关系时可以指定parent和children类型.7.x后不使用这个语法.parent-child特性运行不做改变,但是实际关系表示改为通过join字段实现.

7.0无类型接口(typeless apis)

7.x作为过渡版本,它在弱化类型的实际影响.正常api使用中如果你明确指定了类型(type),会得到一个不推荐警告:
```
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
```
- 索引APIs
索引创建,索引模板和映射apis支持新的url参数include_type_name.它指明在请求和响应映射定义中是否包含类型名称.这个参数默认值在6.8中是true,以匹配7.0前的映射类型匹配问题.7.0后的默认值为false,8.0中将移除.
```
PUT /my-index-000001?include_type_name=false
{
  "mappings": {
    "properties": { 
      "foo": {
        "type": "keyword"
      }
    }
  }
}
```
新增索引映射时不进行类型映射.
- 文档APIs
7.x中,索引APIs必须通过{index}/_doc路径调用,自动生成_id或指明{index}/_doc/{id}中的id.
```
PUT /my-index-000001/_doc/1
{
  "foo": "baz"
}
```
实际存储数据
```
{
  "_index": "my-index-000001",
  "_id": "1",
  "_type": "_doc",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}
```
获取和删除APIs也类似:{index}/_doc/{id}
```
GET /my-index-000001/_doc/1
```
- 查询APIs
当调用像_search, _msearch, 或_explain查询API时,url中不应该包含类型.此外,_type字段不用于查询,聚合和脚本中.
- 响应中的类型
文档和查询APIs的响应将继续返回_type关键字,避免破坏现在的响应解析.然而,这个关键字不再推荐也不再被引用.在8.0中将被完全删除.
- 索引模板
推荐在重新添加索引无类型模板时将include_type_name设置为false.后台实现中,无类型模板在创建指令中模仿类型_doc进行创建.

实际官方文档中还展示旧版到新版的数据迁移方式,以及进行映射类型版本支持的计划.点击参考资料即可获取.

参考资料

removal-of-types
喜欢关注一下,不喜欢点评一下
查看全文

相关阅读:
HDU-2037 今年暑假不AC
HDU-3348 coins
CodeForces-985C Liebig's Barrels
CSU-2034 Column Addition
-----------------******Java API提供了几个常用包：
DOM4j解析、修改、删除、增加、保存XML的方法
 Dom解析XML（添加，删除，修改，保存）
------------------------------------日期之间的转换
 -----------------------------------String类的常用方法
 ------------ 异常笔记

原文地址：https://www.cnblogs.com/chengmuyu/p/14318577.html

elasticsearch移除映射类型(mapping type)

移除映射类型(mapping type)

什么是映射类型(mapping type)

映射类型的缺点

推荐的映射类型方式

7.0无类型接口(typeless apis)

参考资料