zoukankan html css js c++ java

elasticsearch之快速上手

一、elasticsearch的简单操作

前言

现在，让我们启动一个节点和kibana。
接下来的一切操作都在kibana中Dev Tools下的Console里完成。

创建一篇文档

现在，我们试图将小黑的小姨妈的个人信息录入elasticsearch。我们只要输入：

PUT t1/doc/1
{
 "name": "小黑的小姨妈",
 "age": 18
}

PUT表示创建命令。虽然命令可以小写，但是我们推荐大写。在以REST ful风格返回的结果中：

{
  "_index" : "t1",
  "_type" : "type1",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

结果中的result则是操作类型，现在是created，表示第一次创建。如果我们再次点击执行该命令，那么result则会是updated。我们细心则会发现_version开始是1，现在你每点击一次就会增加一次。表示第几次更改。

查询所有索引

现在，我们再来学习一条命令：

GET _cat/indices?v

返回的结果如下图：

上图中，展示当前集群中索引情况，包括，索引的健康状况、UUID、主副分片个数、大小等信息。你发现我们创建的t1索引了吗？

查询指定的索引信息

我们来单独看看t1索引：

GET t1

返回的结果如下：

{
  "t1" : {
    "aliases" : { },
    "mappings" : {
      "doc" : {
        "properties" : {
          "age" : {
            "type" : "long"
          },
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1553163739688",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "_7jNW5XATheeK84zKkPwlw",
        "version" : {
          "created" : "6050499"
        },
        "provided_name" : "t1"
      }
    }
  }
}

返回了t1索引的创建信息。

查询文档信息

那我们来查看我们刚才创建的那篇文档：

GET t1/doc/1

返回的结果如下：

{
  "_index" : "t1",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 2,
  "found" : true,
  "_source" : {
    "name" : "小黑的小姨妈",
    "age" : 18
  }
}

返回了我们刚才创建的文档信息。
我们再来为小黑添加两个姨妈：

PUT t1/doc/2
{
 "name": "小黑的二姨妈",
 "age": 16
}
PUT t1/doc/3
{
 "name": "小黑的三姨妈",
 "age": 19
}

刚才，我们学会了查询小黑的一个姨妈，那么该如何查询所有姨妈呢？

GET t1/doc/_search

返回结果如下：

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "小黑的二姨妈",
          "age" : 16
        }
      },
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "小黑的小姨妈",
          "age" : 18
        }
      },
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "小黑的三姨妈",
          "age" : 19
        }
      }
    ]
  }
}

现在小黑跟他的姨妈们闹了别扭，就想删除这个姨妈，该怎么办呢？

删除指定索引

我们其实直接删除这个t1索引就可以了：

DELETE /t1

DELETE 是删除命令，返回结果如下：

{
  "acknowledged" : true
}

返回结果提示删除确认成功。

如果此时再查询索引情况，则会发现t1已经不存在了，所有的文档也就不存在了。

二、elasticsearch的CURD

前言#

我们之前已经学过了elasticsearch的简单操作了。
接下来，洒家要给大家讲述一个真实的故事..........
故事一定是要伴随着赵忠祥老师的声音作为开始，雨季就要来临了，又到了动物们发情的季节了......

CURD之C#

《知否知否，应是绿肥红瘦之改编》，编剧：张开

让我们将镜头切换到北宋时期某位官人的府邸，府里男主人是：

Copy

PUT zhifou/doc/1
{
  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["黑", "长", "直"]
}

他明处貌似还有俩老婆：

Copy

PUT zhifou/doc/2
{
  "name":"大娘子",
  "age":18,
  "from":"sheng",
  "desc":"肤白貌美，娇憨可爱",
  "tags":["白", "富","美"]
}

PUT zhifou/doc/3
{
  "name":"龙套偏房",
  "age":22,
  "from":"gu",
  "desc":"mmp，没怎么看，不知道怎么形容",
  "tags":["造数据", "真","难"]
}

家里红旗不倒，家外彩旗飘摇：

Copy

PUT zhifou/doc/4
{
  "name":"石头",
  "age":29,
  "from":"gu",
  "desc":"粗中有细，狐假虎威",
  "tags":["粗", "大","猛"]
}

PUT zhifou/doc/5
{
  "name":"魏行首",
  "age":25,
  "from":"广云台",
  "desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
  "tags":["闭月","羞花"]
}

注意：当执行PUT命令时，如果数据不存在，则新增该条数据，如果数据存在则修改该条数据。

咱们通过GET命令查询一下：

Copy

GET zhifou/doc/1

结果如下：

Copy

{
  "_index" : "zhifou",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "顾老二",
    "age" : 30,
    "from" : "gu",
    "desc" : "皮肤黑、武器长、性格直",
    "tags" : [
      "黑",
      "长",
      "直"
    ]
  }
}

查询也没啥问题，但是你可能说了，人家老二是黄种人，怎么是黑的呢？好吧咱改改desc和tags：

Copy

PUT zhifou/doc/1
{
  "desc":"皮肤很黄，武器很长，性格很直",
  "tags":["很黄","很长", "很直"]
}

上例，我们仅修改了desc和tags两处，而name、age和from三个属性没有变化，我们可以忽略不写吗？查查看：

Copy

GET zhifou/doc/1

结果如下：

Copy

{
  "_index" : "zhifou",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 3,
  "found" : true,
  "_source" : {
    "desc" : "皮肤很黄，武器很长，性格很直",
    "tags" : [
      "很黄",
      "很长",
      "很直"
    ]
  }
}

哎呀，出事故了！修改是修改了，但结果不太理想啊，因为name、age和from属性都没啦！
注意：PUT命令，在做修改操作时，如果未指定其他的属性，则按照指定的属性进行修改操作。也就是如上例所示的那样，我们修改时只修改了desc和tags两个属性，其他的属性并没有一起添加进去。

很明显，这是病！dai治！怎么治？上车，咱们继续往下走！

CURD之U#

让我们首先恢复一下事故现场：

Copy

PUT zhifou/doc/1
{
  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["黑", "长", "直"]
}

我们要将黑修改成黄：

Copy

POST zhifou/doc/1/_update
{
  "doc": {
    "desc": "皮肤很黄，武器很长，性格很直",
    "tags": ["很黄","很长", "很直"]
  }
}

上例中，我们使用POST命令，在id后面跟_update，要修改的内容放到doc文档（属性）中即可。

我们再来查询一次：

Copy

GET zhifou/doc/1

结果如下：

Copy

{
  "_index" : "zhifou",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 5,
  "found" : true,
  "_source" : {
    "name" : "顾老二",
    "age" : 30,
    "from" : "gu",
    "desc" : "皮肤很黄，武器很长，性格很直",
    "tags" : [
      "很黄",
      "很长",
      "很直"
    ]
  }
}

结果如上例所示，现在其他的属性没有变化，只有desc和tags属性被修改。

注意：POST命令，这里可用来执行修改操作（还有其他的功能），POST命令配合_update完成修改操作，指定修改的内容放到doc中。

写了这么多，我也发现我上面有讲的不对的地方——石头不是跟顾老二不清不楚，石头是跟小桃不清不楚！好吧，刚才那个数据是一个错误示范！我们这就把它干掉！

CURD之D#

Copy

DELETE zhifou/doc/4

很简单，通过DELETE命令，就可以删除掉那个错误示范了！

删除效果如下：

Copy

{
  "_index" : "zhifou",
  "_type" : "doc",
  "_id" : "4",
  "_version" : 4,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 4,
  "_primary_term" : 1
}

我们再来查询一遍：

Copy

GET zhifou/doc/4

结果如下：

Copy

{
  "_index" : "zhifou",
  "_type" : "doc",
  "_id" : "4",
  "found" : false
}

上例中，found：false表示查询数据不存在。

CURD之R#

我们上面已经不知不觉的使用熟悉这种简单查询方式，通过 GET命令查询指定文档：

Copy

GET zhifou/doc/1

结果如下：

Copy

{
  "_index" : "zhifou",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 5,
  "found" : true,
  "_source" : {
    "name" : "顾老二",
    "age" : 30,
    "from" : "gu",
    "desc" : "皮肤很黄，武器很长，性格很直",
    "tags" : [
      "很黄",
      "很长",
      "很直"
    ]
  }
}

查询没那么简单，预知后事如何，请听下回分解：复杂查询

that's all

三、elasticsearch之查询的两种方式

前言#

简单的没挑战，来点复杂的，比如查看来自顾家的都有哪些人怎么查呢？elasticsearch提供两种查询方式：

查询字符串(query string)，简单查询，就像是像传递URL参数一样去传递查询语句，被称为简单搜索或查询字符串(query string)搜索。
另外一种是通过DSL语句来进行查询，被称为DSL查询(Query DSL),DSL是Elasticsearch提供的一种丰富且灵活的查询语言，该语言以json请求体的形式出现，通过restful请求与Elasticsearch进行交互。

准备数据#

Copy

PUT zhifou/doc/1
{
  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["黑", "长", "直"]
}

PUT zhifou/doc/2
{
  "name":"大娘子",
  "age":18,
  "from":"sheng",
  "desc":"肤白貌美，娇憨可爱",
  "tags":["白", "富","美"]
}

PUT zhifou/doc/3
{
  "name":"龙套偏房",
  "age":22,
  "from":"gu",
  "desc":"mmp，没怎么看，不知道怎么形容",
  "tags":["造数据", "真","难"]
}


PUT zhifou/doc/4
{
  "name":"石头",
  "age":29,
  "from":"gu",
  "desc":"粗中有细，狐假虎威",
  "tags":["粗", "大","猛"]
}

PUT zhifou/doc/5
{
  "name":"魏行首",
  "age":25,
  "from":"广云台",
  "desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
  "tags":["闭月","羞花"]
}

查询字符串#

Copy

GET zhifou/doc/_search?q=from:gu

还是使用GET命令，通过_serarch查询，查询条件是什么呢？条件是from属性是gu家的人都有哪些。最后，别忘了_search和from属性中间的英文分隔符?。

结果如下：

Copy

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细，狐假虎威",
          "tags" : [
            "粗",
            "大",
            "猛"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp，没怎么看，不知道怎么形容",
          "tags" : [
            "造数据",
            "真",
            "难"
          ]
        }
      }
    ]
  }
}

我们来重点说下hits，hits是返回的结果集——所有from属性为gu的结果集。重点中的重点是_score得分，得分是什么呢？根据算法算出跟查询条件的匹配度，匹配度高得分就高。后面再说这个算法是怎么回事。

结构化查询#

我们现在使用DSL方式，来完成刚才的查询，查看来自顾家的都有哪些人。

Copy

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  }
}

上例，查询条件是一步步构建出来的，将查询条件添加到match中即可，而match则是查询所有from字段的值中含有gu的结果就会返回。
当然结果没啥变化：

Copy

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细，狐假虎威",
          "tags" : [
            "粗",
            "大",
            "猛"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp，没怎么看，不知道怎么形容",
          "tags" : [
            "造数据",
            "真",
            "难"
          ]
        }
      }
    ]
  }
}

see also：[Elasticsearch查询规则（一）match和term](https://www.jianshu.com/p/eb30eee13923) that's all

四、elasticsearch - term和match

前言

现在，是时候学习两种最常用的查询方法了，match和term了。
车速太快，系好安全带，睁大眼，不要在前进的道路上迷失了！

match查询

准备数据

PUT zhifou/doc/1
{
  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["黑", "长", "直"]
}

PUT zhifou/doc/2
{
  "name":"大娘子",
  "age":18,
  "from":"sheng",
  "desc":"肤白貌美，娇憨可爱",
  "tags":["白", "富","美"]
}

PUT zhifou/doc/3
{
  "name":"龙套偏房",
  "age":22,
  "from":"gu",
  "desc":"mmp，没怎么看，不知道怎么形容",
  "tags":["造数据", "真","难"]
}


PUT zhifou/doc/4
{
  "name":"石头",
  "age":29,
  "from":"gu",
  "desc":"粗中有细，狐假虎威",
  "tags":["粗", "大","猛"]
}

PUT zhifou/doc/5
{
  "name":"魏行首",
  "age":25,
  "from":"广云台",
  "desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
  "tags":["闭月","羞花"]
}

match系列之match（按条件查询）

我们查看来自顾家的都有哪些人。

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  }
}

上例，查询条件是一步步构建出来的，将查询条件添加到match中即可，而match则是查询所有from字段的值中含有gu的结果就会返回。
结果如下：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细，狐假虎威",
          "tags" : [
            "粗",
            "大",
            "猛"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp，没怎么看，不知道怎么形容",
          "tags" : [
            "造数据",
            "真",
            "难"
          ]
        }
      }
    ]
  }
}

match系列之match_all（查询全部）

除了按条件查询之外，我们还可以查询zhifou索引下的doc类型中的所有文档，那就是查询全部：

GET zhifou/doc/_search
{
  "query": {
    "match_all": {}
  }
}

match_all的值为空，表示没有查询条件，那就是查询全部。就像select * from table_name一样。

查询结果如下：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "name" : "魏行首",
          "age" : 25,
          "from" : "广云台",
          "desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
          "tags" : [
            "闭月",
            "羞花"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "大娘子",
          "age" : 18,
          "from" : "sheng",
          "desc" : "肤白貌美，娇憨可爱",
          "tags" : [
            "白",
            "富",
            "美"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细，狐假虎威",
          "tags" : [
            "粗",
            "大",
            "猛"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp，没怎么看，不知道怎么形容",
          "tags" : [
            "造数据",
            "真",
            "难"
          ]
        }
      }
    ]
  }
}

返回的是zhifou索引下doc类型的所有文档！

match系列之match_phrase（短语查询）

我们现在已经对match有了基本的了解，match查询的是散列映射，包含了我们希望搜索的字段和字符串。也就说，只要文档中只要有我们希望的那个关键字，但也因此带来了一些问题。
首先来创建一些示例：

PUT t1/doc/1
{
  "title": "中国是世界上人口最多的国家"
}
PUT t1/doc/2
{
  "title": "美国是世界上军事实力最强大的国家"
}
PUT t1/doc/3
{
  "title": "北京是中国的首都"
}

现在，当我们以中国作为搜索条件，我们希望只返回和中国相关的文档。我们首先来使用match查询：

GET t1/doc/_search
{
  "query": {
    "match": {
      "title": "中国"
    }
  }
}

结果如下：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.68324494,
    "hits" : [
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.68324494,
        "_source" : {
          "title" : "中国是世界上人口最多的国家"
        }
      },
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.5753642,
        "_source" : {
          "title" : "北京是中国的首都"
        }
      },
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 0.39556286,
        "_source" : {
          "title" : "美国是世界上军事实力最强大的国家"
        }
      }
    ]
  }
}

虽然如期的返回了中国的文档。但是却把和美国的文档也返回了，这并不是我们想要的。是怎么回事呢？因为这是elasticsearch在内部对文档做分词的时候，对于中文来说，就是一个字一个字分的，所以，我们搜中国，中和国都符合条件，返回，而美国的国也符合。
而我们认为中国是个短语，是一个有具体含义的词。所以elasticsearch在处理中文分词方面比较弱势。后面会讲针对中文的插件。
但目前我们还有办法解决，那就是使用短语查询：

GET t1/doc/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "中国"
      }
    }
  }
}

这里match_phrase是在文档中搜索指定的词组，而中国则正是一个词组，所以愉快的返回了。
那么，现在我们要想搜索中国和世界相关的文档，但又忘记其余部分了，怎么做呢？用match也不行，那就继续用match_phrase试试：

GET t1/doc/_search
{
  "query": {
    "match_phrase": {
      "title": "中国世界"
    }
  }
}

返回结果也是空的，因为没有中国世界这个短语。
我们搜索中国和世界这两个指定词组时，但又不清楚两个词组之间有多少别的词间隔。那么在搜的时候就要留有一些余地。这时就要用到了slop了。相当于正则中的中国.*?世界。这个间隔默认为0，导致我们刚才没有搜到,现在我们指定一个间隔。

GET t1/doc/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "中国世界",
        "slop": 2
      }
    }
  }
}

现在，两个词组之间有了2个词的间隔，这个时候，就可以查询到结果了：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.7445889,
    "hits" : [
      {
        "_index" : "t1",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.7445889,
        "_source" : {
          "title" : "中国是世界上人口最多的国家"
        }
      }
    ]
  }
}

slop间隔你可以根据需要适当改动。

match系列之match_phrase_prefix（最左前缀查询）

现在凌晨2点半，单身狗小黑为了缓解寂寞，就准备搜索几个beautiful girl来陪伴自己。但是由于英语没过2级，但单词beautiful拼到bea就不知道往下怎么拼了。这个时候，我们的智能搜索要帮他啊，elasticsearch就看自己的词库有啥事bea开头的词，结果还真发现了两个：

PUT t3/doc/1
{
  "title": "maggie",
  "desc": "beautiful girl you are beautiful so"
}
PUT t3/doc/2
{
  "title": "sun and beach",
  "desc": "I like basking on the beach"
}

但这里用match和match_phrase都不太合适，因为小黑输入的不是完整的词。那怎么办呢？我们用match_phrase_prefix来搞：

GET t3/doc/_search
{
  "query": {
    "match_phrase_prefix": {
      "desc": "bea"
    }
  }
}

结果如下：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.39556286,
    "hits" : [
      {
        "_index" : "t3",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.39556286,
        "_source" : {
          "title" : "maggie",
          "desc" : "beautiful girl,you are beautiful so"
        }
      },
      {
        "_index" : "t3",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "title" : "sun and beach",
          "desc" : "I like basking on the beach"
        }
      }
    ]
  }
}

前缀查询是短语查询类似，但前缀查询可以更进一步的搜索词组，只不过它是和词组中最后一个词条进行前缀匹配（如搜这样的you are bea）。应用也非常的广泛，比如搜索框的提示信息，当使用这种行为进行搜索时，最好通过max_expansions来设置最大的前缀扩展数量，因为产生的结果会是一个很大的集合，不加限制的话，影响查询性能。

GET t3/doc/_search
{
  "query": {
    "match_phrase_prefix": {
      "desc": {
        "query": "bea",
        "max_expansions": 1
      }
      
    }
  }
}

但是，如果此时你去尝试加上max_expansions测试后，你会发现并没有如你想想的一样，仅返回一条数据，而是返回了多条数据。
max_expansions执行的是搜索的编辑（Levenshtein）距离。那什么是编辑距离呢？编辑距离是一种计算两个字符串间的差异程度的字符串度量（string metric）。我们可以认为编辑距离就是从一个字符串修改到另一个字符串时，其中编辑单个字符（比如修改、插入、删除）所需要的最少次数。俄罗斯科学家Vladimir Levenshtein于1965年提出了这一概念。
我们再引用elasticsearch官网的一段话：该max_expansions设置定义了在停止搜索之前模糊查询将匹配的最大术语数，也可以对模糊查询的性能产生显着影响。但是，减少查询字词会产生负面影响，因为查询提前终止可能无法找到某些有效结果。重要的是要理解max_expansions查询限制在分片级别工作，这意味着即使设置为1，多个术语可能匹配，所有术语都来自不同的分片。此行为可能使其看起来好像max_expansions没有生效，因此请注意，计算返回的唯一术语不是确定是否有效的有效方法max_expansions。。
我想你也没看懂这句话是啥意思，但我们只需知道该参数工作于分片层，也就是Lucene部分，超出我们的研究范围了。
我们快刀斩乱麻的记住，使用前缀查询会非常的影响性能，要对结果集进行限制，就加上这个参数。

match系列之multi_match（多字段查询）

现在，我们有一个50个字段的索引，我们要在多个字段中查询同一个关键字，该怎么做呢？

PUT t3/doc/1
{
  "title": "maggie is beautiful girl",
  "desc": "beautiful girl you are beautiful so"
}
PUT t3/doc/2
{
  "title": "beautiful beach",
  "desc": "I like basking on the beach,and you? beautiful girl"
}

我们先用原来的方法查询：

GET t3/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "beautiful"
          }
        },
        {
          "match": {
            "desc": "beautiful"
          }
        }
      ]
    }
  }
}

使用must来限制两个字段（值）中必须同时含有关键字。这样虽然能达到目的，但是当有很多的字段呢，我们可以用multi_match来做：

GET t3/doc/_search
{
  "query": {
    "multi_match": {
      "query": "beautiful",
      "fields": ["title", "desc"]
    }
  }
}

我们将多个字段放到fields列表中即可。以达到匹配多个字段的目的。
除此之外，multi_match甚至可以当做match_phrase和match_phrase_prefix使用，只需要指定type类型即可：

GET t3/doc/_search
{
  "query": {
    "multi_match": {
      "query": "gi",
      "fields": ["title"],
      "type": "phrase_prefix"
    }
  }
}
GET t3/doc/_search
{
  "query": {
    "multi_match": {
      "query": "girl",
      "fields": ["title"],
      "type": "phrase"
    }
  }
}

小结：

match：返回所有匹配的分词。
match_all：查询全部。
match_phrase：短语查询，在match的基础上进一步查询词组，可以指定slop分词间隔。
match_phrase_prefix：前缀查询，根据短语中最后一个词组做前缀匹配，可以应用于搜索提示，但注意和max_expanions搭配。其实默认是50.......
multi_match：多字段查询，使用相当的灵活，可以完成match_phrase和match_phrase_prefix的工作。

term查询

默认情况下，elasticsearch在对文档分析期间（将文档分词后保存到倒排索引中），会对文档进行分词，比如默认的标准分析器会对文档进行：

删除大多数的标点符号。
将文档分解为单个词条，我们称为token。
将token转为小写。

完事再保存到倒排索引上，当然，原文件还是要保存一分的，而倒排索引使用来查询的。
例如Beautiful girl!，在经过分析后是这样的了：

POST _analyze
{
  "analyzer": "standard",
  "text": "Beautiful girl!"
}
# 结果
["beautiful", "girl"]

关于分析：https://www.cnblogs.com/Neeo/articles/10593037.html

而当在使用match查询时，elasticsearch同样会对查询关键字进行分析：

PUT w10
{
  "mappings": {
    "doc":{
      "properties":{
        "t1":{
          "type": "text"
        }
      }
    }
  }
}

PUT w10/doc/1
{
  "t1": "Beautiful girl!"
}
PUT w10/doc/2
{
  "t1": "sexy girl!"
}
GET w10/doc/_search
{
  "query": {
    "match": {
      "t1": "Beautiful girl!"
    }
  }
}

也就是对查询关键字Beautiful girl!进行分析，得到["beautiful", "girl"]，然后分别将这两个单独的token去索引w10中进行查询，结果就是将两篇文档都返回。

这在有些情况下是非常好用的，但是，如果我们想查询确切的词怎么办？也就是精确查询，将Beautiful girl!当成一个token而不是分词后的两个token。

这就要用到了term查询了，term查询的是没有经过分析的查询关键字。

但是，这同样需要限制，如果你要查询的字段类型（如上例中的字段t1类型是text）是text（因为elasticsearch会对文档进行分析，上面说过），那么你得到的可能是不尽如人意的结果或者压根没有结果：

GET w10/doc/_search
{
  "query": {
    "term": {
      "t1": "Beautiful girl!"
    }
  }
}

如上面的查询，将不会有结果返回，因为索引w10中的两篇文档在经过elasticsearch分析后没有一个分词是Beautiful girl!，那此次查询结果为空也就好理解了。

所以，我们这里得到一个论证结果：不要使用term对类型是text的字段进行查询，要查询text类型的字段，请改用match查询。

学会了吗？那再来一个示例，你说一下结果是什么：

GET w10/doc/_search
{
  "query": {
    "term": {
      "t1": "Beautiful"
    }
  }
}

答案是，没有结果返回！因为elasticsearch在对文档进行分析时，会经过小写！人家倒排索引上存的是小写的beautiful，而我们查询的是大写的Beautiful。

所以，要想有结果你这样：

GET w10/doc/_search
{
  "query": {
    "term": {
      "t1": "beautiful"
    }
  }
}

那，term查询可以查询哪些类型的字段呢，例如elasticsearch会将keyword类型的字段当成一个token保存到倒排索引上，你可以将term和keyword结合使用。

最后，要想使用term查询多个精确的值怎么办？我只能说：亲，这里推荐卸载es呢！低调又不失尴尬的玩笑！

这里推荐使用terms查询：

GET w10/doc/_search
{
  "query": {
    "terms": {
      "t1": ["beautiful", "sexy"]
    }
  }
}

欢迎斧正，that's all see also： [官网：如何在Elasticsearch中使用模糊搜索](https://www.elastic.co/blog/found-fuzzy-search) | [elasticsearch模糊匹配max_expansions＆min_similarity](https://stackoverflow.com/questions/7148615/elasticsearch-fuzzy-matching-max-expansions-min-similarity) | [elasticsearch 全文搜索 match_phrase_prefix 查询中的 max_expansions 该怎么用?](https://segmentfault.com/q/1010000017179306/a-1020000017196690)| [term query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html#query-dsl-term-query)

五、elasticsearch之排序查询

前言#

我们之前学过几种查询方式了，但是结果顺序都是elasticsearch决定的。我们来给查询结果搞上我们定制的顺序。

准备数据#

Copy

PUT zhifou/doc/1
{
  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["黑", "长", "直"]
}

PUT zhifou/doc/2
{
  "name":"大娘子",
  "age":18,
  "from":"sheng",
  "desc":"肤白貌美，娇憨可爱",
  "tags":["白", "富","美"]
}

PUT zhifou/doc/3
{
  "name":"龙套偏房",
  "age":22,
  "from":"gu",
  "desc":"mmp，没怎么看，不知道怎么形容",
  "tags":["造数据", "真","难"]
}


PUT zhifou/doc/4
{
  "name":"石头",
  "age":29,
  "from":"gu",
  "desc":"粗中有细，狐假虎威",
  "tags":["粗", "大","猛"]
}

PUT zhifou/doc/5
{
  "name":"魏行首",
  "age":25,
  "from":"广云台",
  "desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
  "tags":["闭月","羞花"]
}

排序查询：sort#

降序：desc#

想到排序，出现在脑海中的无非就是升（正）序和降（倒）序。比如我们查询顾府都有哪些人，并根据age字段按照降序，并且，我只想看nmae和age字段：

Copy

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ]
}

上例，在条件查询的基础上，我们又通过sort来做排序，根据age字段排序，是降序呢还是升序，由order字段控制，desc是降序。

结果如下：

Copy

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        },
        "sort" : [
          30
        ]
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细，狐假虎威",
          "tags" : [
            "粗",
            "大",
            "猛"
          ]
        },
        "sort" : [
          29
        ]
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp，没怎么看，不知道怎么形容",
          "tags" : [
            "造数据",
            "真",
            "难"
          ]
        },
        "sort" : [
          22
        ]
      }
    ]
  }
}

上例中，结果是以降序排列方式返回的。

升序：asc#

那么想要升序怎么搞呢？

Copy

GET zhifou/doc/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ]
}

上例，想要以升序的方式排列，只需要将order值换为asc就可以了。

结果如下：

Copy

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "大娘子",
          "age" : 18,
          "from" : "sheng",
          "desc" : "肤白貌美，娇憨可爱",
          "tags" : [
            "白",
            "富",
            "美"
          ]
        },
        "sort" : [
          18
        ]
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp，没怎么看，不知道怎么形容",
          "tags" : [
            "造数据",
            "真",
            "难"
          ]
        },
        "sort" : [
          22
        ]
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "name" : "魏行首",
          "age" : 25,
          "from" : "广云台",
          "desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
          "tags" : [
            "闭月",
            "羞花"
          ]
        },
        "sort" : [
          25
        ]
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细，狐假虎威",
          "tags" : [
            "粗",
            "大",
            "猛"
          ]
        },
        "sort" : [
          29
        ]
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        },
        "sort" : [
          30
        ]
      }
    ]
  }
}

上例，可以看到结果是以age从小到大的顺序返回结果。

不是什么数据类型都能排序#

那么，你可能会问，除了age，能不能以别的属性作为排序条件啊？来试试：

Copy

GET zhifou/chengyuan/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "name": {
        "order": "asc"
      }
    }
  ]
}

上例，我们以name属性来排序，来看结果：

Copy

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "zhifou",
        "node": "wrtr435jSgi7_naKq2Y_zQ",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    }
  },
  "status": 400
}

结果跟我们想象的不一样，报错了！

注意：在排序的过程中，只能使用可排序的属性进行排序。那么可以排序的属性有哪些呢？

数字
日期

其他的都不行！

欢迎斧正，that's all

六、elasticsearch之分页查询

前言#

随着数据量的不断增大，查询结果也展示的越来越长，很多时候，我们仅是查询几条数据，不用全部显示出来。那又该怎么做呢？这里就要用到分页了。

准备数据#

Copy

PUT zhifou/doc/1
{
  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["黑", "长", "直"]
}

PUT zhifou/doc/2
{
  "name":"大娘子",
  "age":18,
  "from":"sheng",
  "desc":"肤白貌美，娇憨可爱",
  "tags":["白", "富","美"]
}

PUT zhifou/doc/3
{
  "name":"龙套偏房",
  "age":22,
  "from":"gu",
  "desc":"mmp，没怎么看，不知道怎么形容",
  "tags":["造数据", "真","难"]
}


PUT zhifou/doc/4
{
  "name":"石头",
  "age":29,
  "from":"gu",
  "desc":"粗中有细，狐假虎威",
  "tags":["粗", "大","猛"]
}

PUT zhifou/doc/5
{
  "name":"魏行首",
  "age":25,
  "from":"广云台",
  "desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
  "tags":["闭月","羞花"]
}

分页查询：from/size#

我们来看看elasticsearch是怎么将结果分页的：

Copy

GET zhifou/doc/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ], 
  "from": 2,
  "size": 1
}

上例，首先以age降序排序，查询所有。并且在查询的时候，添加两个属性from和size来控制查询结果集的数据条数。

from：从哪开始查
size：返回几条结果

如上例的结果：

Copy

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "name" : "魏行首",
          "age" : 25,
          "from" : "广云台",
          "desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
          "tags" : [
            "闭月",
            "羞花"
          ]
        },
        "sort" : [
          25
        ]
      }
    ]
  }
}

上例中，在返回的结果集中，从第2条开始，返回1条数据。

那如果想要从第2条开始，返回2条结果怎么做呢？

Copy

GET zhifou/doc/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ], 
  "from": 2,
  "size": 2
}

上例中，我们指定from为2，意为从第2条开始返回，返回多少呢？size意为2条。

还可以这样：

Copy

GET zhifou/doc/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ], 
  "from": 4,
  "size": 2
}

上例中，从第4条开始返回2条数据。

结果如下：

Copy

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "大娘子",
          "age" : 18,
          "from" : "sheng",
          "desc" : "肤白貌美，娇憨可爱",
          "tags" : [
            "白",
            "富",
            "美"
          ]
        },
        "sort" : [
          18
        ]
      }
    ]
  }
}

上例中仅有一条数据，那是为啥呢？因为我们现在只有5条数据，从第4条开始查询，就只有1条符合条件，所以，就返回了1条数据。

学到这里，我们也可以看到，我们的查询条件越来越多，开始仅是简单查询，慢慢增加条件查询，增加排序，对返回结果进行限制。所以，我们可以说：对于elasticsearch来说，所有的条件都是可插拔的，彼此之间用,分割。比如说，我们在查询中，仅对返回结果进行限制：

Copy

GET zhifou/doc/_search
{
  "query": {
    "match_all": {}
  },
  "from": 4,
  "size": 2
}

上例中，在所有的返回结果中，结果从4开始返回2条数据。

Copy

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp，没怎么看，不知道怎么形容",
          "tags" : [
            "造数据",
            "真",
            "难"
          ]
        }
      }
    ]
  }
}

但我们只有1条符合条件的数据。

欢迎斧正，that's all

七、elasticsearch之布尔查询

前言

布尔查询是最常用的组合查询，根据子查询的规则，只有当文档满足所有子查询条件时，elasticsearch引擎才将结果返回。布尔查询支持的子查询条件共4中：

must（and）
should（or）
must_not（not）
filter

下面我们来看看每个子查询条件都是怎么玩的。

准备数据

PUT zhifou/doc/1
{
  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["黑", "长", "直"]
}

PUT zhifou/doc/2
{
  "name":"大娘子",
  "age":18,
  "from":"sheng",
  "desc":"肤白貌美，娇憨可爱",
  "tags":["白", "富","美"]
}

PUT zhifou/doc/3
{
  "name":"龙套偏房",
  "age":22,
  "from":"gu",
  "desc":"mmp，没怎么看，不知道怎么形容",
  "tags":["造数据", "真","难"]
}



PUT zhifou/doc/4
{
  "name":"石头",
  "age":29,
  "from":"gu",
  "desc":"粗中有细，狐假虎威",
  "tags":["粗", "大","猛"]
}

PUT zhifou/doc/5
{
  "name":"魏行首",
  "age":25,
  "from":"广云台",
  "desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
  "tags":["闭月","羞花"]
}

must

现在，我们用布尔查询所有from属性为gu的数据：

GET zhifou/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        }
      ]
    }
  }
}

上例中，我们通过在bool属性（字段）内使用must来作为查询条件，那么条件是什么呢？条件同样被match包围，就是from为gu的所有数据。
这里需要注意的是must字段对应的是个列表，也就是说可以有多个并列的查询条件，一个文档满足各个子条件后才最终返回。

结果如下：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细，狐假虎威",
          "tags" : [
            "粗",
            "大",
            "猛"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp，没怎么看，不知道怎么形容",
          "tags" : [
            "造数据",
            "真",
            "难"
          ]
        }
      }
    ]
  }
}

上例中，可以看到，所有from属性为gu的数据查询出来了。

那么，我们想要查询from为gu，并且age为30的数据怎么搞呢？

GET zhifou/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "age": 30
          }
        }
      ]
    }
  }
}

上例中，在must列表中，在增加一个age为30的条件。

结果如下：

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.287682,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 1.287682,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        }
      }
    ]
  }
}

上例，符合条件的数据被成功查询出来了。

注意：现在你可能慢慢发现一个现象，所有属性值为列表的，都可以实现多个条件并列存在

should

那么，如果要查询只要是from为gu或者tags为闭月的数据怎么搞？

GET zhifou/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "tags": "闭月"
          }
        }
      ]
    }
  }
}

上例中，或关系的不能用must的了，而是要用should，只要符合其中一个条件就返回。

结果如下：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细，狐假虎威",
          "tags" : [
            "粗",
            "大",
            "猛"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "5",
        "_score" : 0.5753642,
        "_source" : {
          "name" : "魏行首",
          "age" : 25,
          "from" : "广云台",
          "desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
          "tags" : [
            "闭月",
            "羞花"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp，没怎么看，不知道怎么形容",
          "tags" : [
            "造数据",
            "真",
            "难"
          ]
        }
      }
    ]
  }
}

返回了所有符合条件的结果。

must_not

那么，如果我想要查询from既不是gu并且tags也不是可爱，还有age不是18的数据怎么办？

GET zhifou/doc/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "tags": "可爱"
          }
        },
        {
          "match": {
            "age": 18
          }
        }
      ]
    }
  }
}

上例中，must和should都不能使用，而是使用must_not，又在内增加了一个age为18的条件。

结果如下：

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "name" : "魏行首",
          "age" : 25,
          "from" : "广云台",
          "desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
          "tags" : [
            "闭月",
            "羞花"
          ]
        }
      }
    ]
  }
}

上例中，只有魏行首这一条数据，因为只有魏行首既不是顾家的人，标签没有可爱那一项，年龄也不等于18！
这里有点需要补充，条件中age对应的18你写成整形还是字符串都没啥……

filter

那么，如果要查询from为gu，age大于25的数据怎么查？

GET zhifou/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "gt": 25
          }
        }
      }
    }
  }
}

这里就用到了filter条件过滤查询，过滤条件的范围用range表示，gt表示大于，大于多少呢？是25。

结果如下：

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细，狐假虎威",
          "tags" : [
            "粗",
            "大",
            "猛"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        }
      }
    ]
  }
}

上例中，age大于25的条件都已经筛选出来了。

那么要查询from是gu，age大于等于30的数据呢？

GET zhifou/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "gte": 30
          }
        }
      }
    }
  }
}

上例中，大于等于用gte表示。

结果如下：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        }
      }
    ]
  }
}

那么，要查询age小于25的呢？

GET zhifou/doc/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "lt": 25
          }
        }
      }
    }
  }
}

上例中，小于用lt表示，结果如下：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 0.0,
        "_source" : {
          "name" : "大娘子",
          "age" : 18,
          "from" : "sheng",
          "desc" : "肤白貌美，娇憨可爱",
          "tags" : [
            "白",
            "富",
            "美"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp，没怎么看，不知道怎么形容",
          "tags" : [
            "造数据",
            "真",
            "难"
          ]
        }
      }
    ]
  }
}

在查询一个age小于等于18的怎么办呢？

GET zhifou/doc/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "lte": 18
          }
        }
      }
    }
  }
}

上例中，小于等于用lte表示。结果如下：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 0.0,
        "_source" : {
          "name" : "大娘子",
          "age" : 18,
          "from" : "sheng",
          "desc" : "肤白貌美，娇憨可爱",
          "tags" : [
            "白",
            "富",
            "美"
          ]
        }
      }
    ]
  }
}

要查询from是gu，age在25~30之间的怎么查？

GET zhifou/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "gte": 25,
            "lte": 30
          }
        }
      }
    }
  }
}

上例中，使用lte和gte来限定范围。结果如下：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细，狐假虎威",
          "tags" : [
            "粗",
            "大",
            "猛"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30,
          "from" : "gu",
          "desc" : "皮肤黑、武器长、性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        }
      }
    ]
  }
}

那么，要查询from是sheng，age小于等于25的怎么查呢？其实结果，我们可能已经想到了，只有一条，因为只有盛家小六符合结果。

GET zhifou/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "sheng"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "lte": 25
          }
        }
      }
    }
  }
}

结果果然不出洒家所料！

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "大娘子",
          "age" : 18,
          "from" : "sheng",
          "desc" : "肤白貌美，娇憨可爱",
          "tags" : [
            "白",
            "富",
            "美"
          ]
        }
      }
    ]
  }
}

但是，洒家手一抖，将must换为should看看会发生什么？

GET zhifou/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "from": "sheng"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "lte": 25
          }
        }
      }
    }
  }
}

结果如下：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "大娘子",
          "age" : 18,
          "from" : "sheng",
          "desc" : "肤白貌美，娇憨可爱",
          "tags" : [
            "白",
            "富",
            "美"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "5",
        "_score" : 0.0,
        "_source" : {
          "name" : "魏行首",
          "age" : 25,
          "from" : "广云台",
          "desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
          "tags" : [
            "闭月",
            "羞花"
          ]
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22,
          "from" : "gu",
          "desc" : "mmp，没怎么看，不知道怎么形容",
          "tags" : [
            "造数据",
            "真",
            "难"
          ]
        }
      }
    ]
  }
}

结果有点出乎意料，因为龙套偏房和魏行首不属于盛家，但也被查询出来了。那你要问了，怎么肥四？小老弟！这是因为在查询过程中，优先经过filter过滤，因为should是或关系，龙套偏房和魏行首的年龄符合了filter过滤条件，也就被放行了！所以，如果在filter过滤条件中使用should的话，结果可能不会尽如人意！建议使用must代替。

注意：filter工作于bool查询内。比如我们将刚才的查询条件改一下，把filter从bool中挪出来。

GET zhifou/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "sheng"
          }
        }
      ]
    },
    "filter": {
      "range"： {
        "age": {
          "lte": 25
        }
      }
    }
  }
}

如上例所示，我们将filter与bool平级，看查询结果：

{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
        "line": 12,
        "col": 5
      }
    ],
    "type": "parsing_exception",
    "reason": "[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
    "line": 12,
    "col": 5
  },
  "status": 400
}

结果报错了！所以，filter工作位置很重要。

小结：

must：与关系，相当于关系型数据库中的and。
should：或关系，相当于关系型数据库中的or。
must_not：非关系，相当于关系型数据库中的not。
filter：过滤条件。
range：条件筛选范围。
gt：大于，相当于关系型数据库中的>。
gte：大于等于，相当于关系型数据库中的>=。
lt：小于，相当于关系型数据库中的<。
lte：小于等于，相当于关系型数据库中的<=。

八、elasticsearch之查询结果过滤

前言#

在未来，一篇文档可能有很多（是的，很多！不要被我们的示例这仨俩字段所迷惑）的字段，每次查询都默认给我们返回全部，在数据量很大的时候，是的，比如我只想查姑娘的手机号，你一并给我个喜好啊、三围什么的算什么？是要告诉洒家，hi，小老弟，要撩妹么？
所以，我们对结果做一些过滤，清清白白的告诉elasticsearch，小老弟，我只是查！水！表！

准备数据#

Copy

PUT zhifou/doc/1
{
  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["黑", "长", "直"]
}

结果过滤：_source#

现在，在所有的结果中，我只需要查看name和age两个属性，其他的不要怎么办？

Copy

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "name": "顾老二"
    }
  },
  "_source": ["name", "age"]
}

如上例所示，在查询中，通过_source来控制仅返回name和age属性。

Copy

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.8630463,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.8630463,
        "_source" : {
          "name" : "顾老二",
          "age" : 30
        }
      }
    ]
  }
}

在数据量很大的时候，我们需要什么字段，就返回什么字段就好了，提高查询效率。整个三围啥的可low了，有本事上图！

欢迎斧正，that's all

九、elasticsearch之高亮查询

前言

如果返回的结果集中很多符合条件的结果，那怎么能一眼就能看到我们想要的那个结果呢？比如下面网站所示的那样，我们搜索elasticsearch，在结果集中，将所有elasticsearch高亮显示？

如上图我们搜索思否一样。

我们该怎么做呢？

准备数据

PUT zhifou/doc/4
{
  "name":"石头",
  "age":29,
  "from":"gu",
  "desc":"粗中有细，狐假虎威",
  "tags":["粗", "大","猛"]
}

默认高亮显示

我们来查询：

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "name": "石头"
    }
  },
  "highlight": {
    "fields": {
      "name": {}
    }
  }
}

上例中，我们使用highlight属性来实现结果高亮显示，需要的字段名称添加到fields内即可，elasticsearch会自动帮我们实现高亮。

结果如下：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.5098256,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 1.5098256,
        "_source" : {
          "name" : "石头",
          "age" : 29,
          "from" : "gu",
          "desc" : "粗中有细，狐假虎威",
          "tags" : [
            "粗",
            "大",
            "猛"
          ]
        },
        "highlight" : {
          "name" : [
            "<em>石</em><em>头</em>"
          ]
        }
      }
    ]
  }
}

上例中，elasticsearch会自动将检索结果用标签包裹起来，用于在页面中渲染。

自定义高亮显示

但是，你可能会问，我不想用em标签，我这么牛逼，应该用个b标签啊！好的，elasticsearch同样考虑到你很牛逼，所以，我们可以自定义标签。

GET zhifou/chengyuan/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "highlight": {
    "pre_tags": "<b class='key' style='color:red'>",
    "post_tags": "</b>",
    "fields": {
      "from": {}
    }
  }
}

上例中，在highlight中，pre_tags用来实现我们的自定义标签的前半部分，在这里，我们也可以为自定义的标签添加属性和样式。post_tags实现标签的后半部分，组成一个完整的标签。至于标签中的内容，则还是交给fields来完成。

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.5753642,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "chengyuan",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "name" : "老二",
          "age" : 30,
          "sex" : "male",
          "birth" : "1070-10-11",
          "from" : "gu",
          "desc" : "皮肤黑，武器长，性格直",
          "tags" : [
            "黑",
            "长",
            "直"
          ]
        },
        "highlight" : {
          "name" : [
            "<b class='key' style='color:red'>老</b><b class='key' style='color:red'>二</b>"
          ]
        }
      }
    ]
  }
}

需要注意的是：自定义标签中属性或样式中的逗号一律用英文状态的单引号表示，应该与外部elasticsearch语法的双引号区分开。

至此，基本的查询，我们已经能胜任绝大数的应用场景。接下来我们来看一下更多关于结果处理的函数。

欢迎斧正，that's all

十、elasticsearch之聚合函数

前言#

聚合函数大家都不陌生，elasticsearch中也没玩出新花样，所以，这一章相对简单，只需要记得：

以及各自的用法即可。先来看求平均。

准备数据#

Copy

PUT zhifou/doc/1
{
  "name":"顾老二",
  "age":30,
  "from": "gu",
  "desc": "皮肤黑、武器长、性格直",
  "tags": ["黑", "长", "直"]
}

PUT zhifou/doc/2
{
  "name":"大娘子",
  "age":18,
  "from":"sheng",
  "desc":"肤白貌美，娇憨可爱",
  "tags":["白", "富","美"]
}

PUT zhifou/doc/3
{
  "name":"龙套偏房",
  "age":22,
  "from":"gu",
  "desc":"mmp，没怎么看，不知道怎么形容",
  "tags":["造数据", "真","难"]
}


PUT zhifou/doc/4
{
  "name":"石头",
  "age":29,
  "from":"gu",
  "desc":"粗中有细，狐假虎威",
  "tags":["粗", "大","猛"]
}

PUT zhifou/doc/5
{
  "name":"魏行首",
  "age":25,
  "from":"广云台",
  "desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp，最后竟然没有嫁给顾老二！",
  "tags":["闭月","羞花"]
}

avg#

现在的需求是查询from是gu的人的平均年龄。

Copy

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_avg": {
      "avg": {
        "field": "age"
      }
    }
  },
  "_source": ["name", "age"]
}

上例中，首先匹配查询from是gu的数据。在此基础上做查询平均值的操作，这里就用到了聚合函数，其语法被封装在aggs中，而my_avg则是为查询结果起个别名，封装了计算出的平均值。那么，要以什么属性作为条件呢？是age年龄，查年龄的什么呢？是avg，查平均年龄。

返回结果如下：

Copy

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 0.6931472,
        "_source" : {
          "name" : "石头",
          "age" : 29
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "顾老二",
          "age" : 30
        }
      },
      {
        "_index" : "zhifou",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "龙套偏房",
          "age" : 22
        }
      }
    ]
  },
  "aggregations" : {
    "my_avg" : {
      "value" : 27.0
    }
  }
}

上例中，在查询结果的最后是平均值信息，可以看到是27岁。

虽然我们已经使用_source对字段做了过滤，但是还不够。我不想看都有哪些数据，只想看平均值怎么办？别忘了size!

Copy

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_avg": {
      "avg": {
        "field": "age"
      }
    }
  },
  "size": 0, 
  "_source": ["name", "age"]
}

上例中，只需要在原来的查询基础上，增加一个size就可以了，输出几条结果，我们写上0，就是输出0条查询结果。

查询结果如下：

Copy

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_avg" : {
      "value" : 27.0
    }
  }
}

查询结果中，我们看hits下的total值是3，说明有三条符合结果的数据。最后面返回平均值是27。

max#

那怎么查最大值呢？

Copy

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_max": {
      "max": {
        "field": "age"
      }
    }
  },
  "size": 0
}

上例中，只需要在查询条件中将avg替换成max即可。

返回结果如下：

Copy

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_max" : {
      "value" : 30.0
    }
  }
}

在返回的结果中，可以看到年龄最大的是30岁。

min#

那怎么查最小值呢？

Copy

GET zhifou/doc/_search
{
  "query": {
    "match": {

      "from": "gu"
    }
  },
  "aggs": {
    "my_min": {
      "min": {
        "field": "age"
      }
    }
  },
  "size": 0
}

最小值则用min表示。

返回结果如下：

Copy

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_min" : {
      "value" : 22.0
    }
  }
}

返回结果中，年龄最小的是22岁。

sum#

那么，要是想知道它们的年龄总和是多少怎么办呢？

Copy

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_sum": {
      "sum": {
        "field": "age"
      }
    }
  },
  "size": 0
}

上例中，求和用sum表示。

Copy

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_sum" : {
      "value" : 81.0
    }
  }
}

从返回的结果可以发现，年龄总和是81岁。

分组查询#

现在我想要查询所有人的年龄段，并且按照15~20，20~25,25~30分组，并且算出每组的平均年龄。

分析需求，首先我们应该先把分组做出来。

Copy

GET zhifou/doc/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "age_group": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 15,
            "to": 20
          },
          {
            "from": 20,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      }
    }
  }
}

上例中，在aggs的自定义别名age_group中，使用range来做分组，field是以age为分组，分组使用ranges来做，from和to是范围，我们根据需求做出三组。

Copy

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_group" : {
      "buckets" : [
        {
          "key" : "15.0-20.0",
          "from" : 15.0,
          "to" : 20.0,
          "doc_count" : 1
        },
        {
          "key" : "20.0-25.0",
          "from" : 20.0,
          "to" : 25.0,
          "doc_count" : 1
        },
        {
          "key" : "25.0-30.0",
          "from" : 25.0,
          "to" : 30.0,
          "doc_count" : 2
        }
      ]
    }
  }
}

返回的结果中可以看到，已经拿到了三个分组。doc_count为该组内有几条数据，此次共分为三组，查询出4条内容。还有一条数据的age属性值是30，不在分组的范围内！

那么接下来，我们就要对每个小组内的数据做平均年龄处理。

Copy

GET zhifou/doc/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "age_group": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 15,
            "to": 20
          },
          {
            "from": 20,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      },
      "aggs": {
        "my_avg": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

上例中，在分组下面，我们使用aggs对age做平均数处理，这样就可以了。

Copy

{
 "took" : 1,
 "timed_out" : false,
 "_shards" : {
   "total" : 5,
   "successful" : 5,
   "skipped" : 0,
   "failed" : 0
 },
 "hits" : {
   "total" : 5,
   "max_score" : 0.0,
   "hits" : [ ]
 },
 "aggregations" : {
   "age_group" : {
     "buckets" : [
       {
         "key" : "15.0-20.0",
         "from" : 15.0,
         "to" : 20.0,
         "doc_count" : 1,
         "my_avg" : {
           "value" : 18.0
         }
       },
       {
         "key" : "20.0-25.0",
         "from" : 20.0,
         "to" : 25.0,
         "doc_count" : 1,
         "my_avg" : {
           "value" : 22.0
         }
       },
       {
         "key" : "25.0-30.0",
         "from" : 25.0,
         "to" : 30.0,
         "doc_count" : 2,
         "my_avg" : {
           "value" : 27.0
         }
       }
     ]
   }
 }
}

在结果中，我们可以清晰的看到每组的平均年龄（my_avg的value中）。

注意：聚合函数的使用，一定是先查出结果，然后对结果使用聚合函数做处理

小结：

avg：求平均
max：最大值
min：最小值
sum：求和

欢迎斧正，that's all

查看全文

相关阅读:
HttpClient上传文件(转)
数据库查询结果导出到excel
docker报错“net/http: TLS handshake timeout”
java线程的几个状态和锁的作用范围
 简单管理WPF及Winform所有弹出窗体
 FastJson学习
 基于消息中间件实现流量削峰
 pandas
DBSCAN
A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

原文地址：https://www.cnblogs.com/bubu99/p/13580674.html