Elastic Stack 笔记（五）Elasticsearch5.6 Mappings 映射

zoukankan html css js c++ java

Elastic Stack 笔记（五）Elasticsearch5.6 Mappings 映射
博客地址：http://www.moonxy.com

一、前言

关系型数据库对我们来说都很熟悉，Elasticsearch 也可以看成是一种数据库，所以我们经常将关系型数据库中的概念和 Elasticsearch 中的概念进行对比，如下：

Relational DB（关系型数据库） -> Databases（数据库） -> Tables（表） -> Rows（行） -> Columns（列）
Elasticsearch -> Indices（索引） -> Types（类型） -> Documents（文档） -> Fields（域/字段）

如上所示，Elasticsearch 中的 index（索引）就相当于数据库，type（类型）相当于表，mapping（映射）相当于表结构，document（文档）相当于行等等。

但是 Elasticsearch 也有自己的特点：

Elasticsearch 没有典型意义的事务；

Elasticsearch 是一种面向文档的数据库；

Elasticsearch 没有提供授权和认证特性。

二、映射

为了能够把日期字段处理成日期，把数字字段处理成数字，把字符串字段处理成全文本（Full-text）或精确（Exact-value）的字符串值，Elasticsearch 需要知道每个字段里面都包含什么数据类型。这些类型和字段的信息存储在映射中。创建索引的时候，可以预先定义字段的类型以及相关属性，相当于定义数据库字段的属性。以下参考文档地址均来自官方最新版本 6.2。

Elasticsearch 官网文档地址：Elasticsearch Reference

2.1 字段数据类型

字段数据类型文档地址：Field datatypes

核心类型 Core datatypes

字符串类型

string

text and keyword

text：全文检索需要分词的类型。

keyword：精确值。合适分组排序。不进行分词，只能通过精确值搜索到，支持模糊、精确查询，支持聚合等。

Elasticsearch 1.x 和 2.x 中是 string 类型，5.x 之后，分解为 text 和 keyword。

数字类型

Numeric datatypes

long, integer, short, byte, double, float, half_float, scaled_float

日期类型

Date datatype

date

JSON 中没有日期类型，所以在 ELasticsearch 中，日期类型可以是以下几种：

日期格式的字符串：e.g. "2015-01-01" or "2015/01/01 12:10:30".

long类型的毫秒数( milliseconds-since-the-epoch)

integer的秒数(seconds-since-the-epoch)

日期格式可以自定义，如果没有自定义，默认格式如下：

"strict_date_optional_time||epoch_millis"

布尔类型

Boolean datatype

boolean

true 和 false

二进制类型

Binary datatype

binary

范围类型

Range datatypes

integer_range, float_range, long_range, double_range, date_range

复杂数据类型 Complex datatypes

数组类型

Array datatype

Array support does not require a dedicated type

数组支持不需要专用类型

对象类型

Object datatype

object for single JSON objects

单个JSON对象的对象

嵌套类型

Nested datatype

nested for arrays of JSON objects

嵌套用于JSON对象数组

地理数据类型 Geo datatypes

地理坐标点类型

Geo-point datatype

geo_point for lat/lon points

用于经纬度坐标点

地理形状类型

Geo-Shape datatype

geo_shape for complex shapes like polygons

用于复杂的形状，比如多边形

专业数据类型 specialised datatypes

IP 地址数据类型

IP datatype

ip for IPv4 and IPv6 addresses

完成数据类型

Completion datatype

completion to provide auto-complete suggestions

completion 提供自动补全建议。

令牌计数数据类型

Token count datatype

token_count to count the number of tokens in a string

murmur3 插件类型

mapper-murmur3

murmur3 to compute hashes of values at index-time and store them in the index

通过插件，可以通过 murmur3 来计算 index 的 hash 值。

过滤器类型

Percolator type

Accepts queries from the query-dsl

连接数据类型

join datatype

Defines parent/child relation for documents within the same index

多字段类型

It is often useful to index the same field in different ways for different purposes. For instance, a string field could be mapped as a text field for full-text search, and as a keyword field for sorting or aggregations. Alternatively, you could index a text field with the standard analyzer, the english analyzer, and the french analyzer.

This is the purpose of multi-fields. Most datatypes support multi-fields via the fields parameter.

2.2 元字段

元字段是映射中描述文档本身的字段，从大的分类上来看，主要有文档属性的元字段、源文档的元字段、索引的元字段、路由的元字段和自定义元字段。

元字段文档地址：Meta-Fields

元字段用于定制文档的相关元数据。元字段的示例包括文档的_index，_type，_id 和 _source 字段。

文档属性的元字段 identity_meta_fields

_index

The index to which the document belongs.

索引标识。

_uid

A composite field consisting of the _type and the _id.

由_type和_id组成的复合字段。

_type

The document’s mapping type.

文档的类型。

_id

The document’s ID.

文档的id。

源文档的元字段 Document source meta-fields

_source

The original JSON representing the body of the document.

文档的原始 JSON 字符串。

_size

The size of the _source field in bytes, provided by the mapper-size plugin.

_source 字段的大小。

索引的元字段 Indexing meta-fieldsedit

_all

A catch-all field that indexes the values of all other fields. Disabled by default.

包含索引全部字段的超级字段。

_all 字段是把其他字段拼接在一起的超级字段，所有的字段内容用空格分开，_all 字段会被解析和索引，但是不存储。

_field_names

All fields in the document which contain non-null values.

文档中包含非空值的所有字段。

路由元字段 Routing meta-fieldedit

_routing

A custom routing value which routes a document to a particular shard.

将文档路由到特定分片的自定义路由值。

其他元字段 Other meta-fieldedit

_meta

Application specific metadata.

应用程序特定的元字段，通常用于自定义元字段。

2.3 映射参数

Elasticsearch 提供了足够多的映射参数对字段的映射进行参数设置，一些常用功能的实现，比如字段的分词器，字段的权重、日期格式、检索模型的选择等都是通过映射参数来配置完成的。

映射参数文档地址：Mapping parameters
以映射参数 analyzer 为例，在创建索引时指定分词器，如下：
PUT forum { "mappings": { "article": { "properties": { "id": { "type": "text" }, "title": { "type": "text" }, "postdate": { "type": "date" }, "content": { "type": "text", "analyzer": "ik_max_word" } } } } }
analyzer 指定文本字段的分词器，对索引和分词都有效，默认使用标准分词器，可以指定第三方分词器，比如 IK 分词器，如 ik_smart 将使用智能分词，属于粗粒度分词，ik_max_word 是最细粒度分词。

以映射参数 index 为例，index 属性指定字段是否参与索引，不索引也就不可搜索，取值可以为 true 或者 false。

Elasticsearch 1.x 和 2.x 之前版本 "index":"not_analyzed"，表示不分词，在版本 5.x 中提示已经废弃了 "not_analyzed"，只能是 true 或 false。5.x 中 string 分为两种类型 keyword，text。如果不想分词，用 keyword 即可，"the keyword field for not_analyzed exact string values"。

使用如下 API 查询分词结果
GET _analyze { "analyzer": "ik_max_word", "text": "中国人" }
返回结果如下：
{ "tokens": [ { "token": "中国人", "start_offset": 0, "end_offset": 3, "type": "CN_WORD", "position": 0 }, { "token": "中国", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 1 }, { "token": "国人", "start_offset": 1, "end_offset": 3, "type": "CN_WORD", "position": 2 }, { "token": "人", "start_offset": 2, "end_offset": 3, "type": "CN_WORD", "position": 3 } ] }
查看全文

相关阅读:
CentOS7安装mysql-8
zabbix监控规划及实施
 集群技术
 自动化脚本-配置LVS（DR模式）
Pacemaker+ISCSI实现Apache高可用-配置
 创建集群corosync
我的第一个python程序——猜数字
 质量报告
 新需求测试与回归测试
 冒烟测试

原文地址：https://www.cnblogs.com/cnjavahome/p/9153139.html

最新文章
Day8
实现mongodb通讯
 Day8
应该注意的点
 安装Robomongo
路由修改，集中在index.js
大战之前的感想
 HDU 6203
hdu 6194
HDU 6198

热门文章
HDU 6040
HDU 6044
HDU 6035
HDU 1495
HDU 4763
CF 862C
MySQL5.5升级至5.7
Docker常用命令及脚本
 redis主从切换
 MySQL5.7的参数优化