ES的资源:
https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html
https://www.elastic.co/webinars/getting-started-kibana?baymax=rtp&elektra=docs&storm=top-video&iesrc=ctr
https://www.elastic.co/webinars/getting-started-logstash?baymax=rtp&elektra=docs&storm=top-video&iesrc=ctr
es默认端口9200,可以看到es的基本信息
http://localhost:9200/
Elasticsearch: The Definitive Guide(第二个是master分支版本的权威指南)
https://www.elastic.co/guide/en/elasticsearch/guide/index.html
https://www.elastic.co/guide/en/elasticsearch/guide/master/index.html
shard代表一个索引(在主节点)存储到N个文件中,因为单个索引文件,太大了,查询将会有问题,所以分成多个文件来保存,其实有一种分割的味道,没有问题。
replica代表副本,其实主要是用于高可用;避免单点故障。
获取索引信息(_cat并不是cat猫,而是category)
GET /_cat/indices?v
创建一个索引
PUT /customer?pretty
GET /_cat/indices?v
创建一个文档;PUT指定ID,POST则是不指定ID创建一个文档,ID为随机数;这里面有个pretty?这个pretty代表pretty-print,是指返回有好的JSON串;
PUT /customer/_doc/1?pretty { "name": "John Doe" } GET /customer/_doc/1?pretty POST /customer/_doc?pretty { "name": "Jane Doe" }
修改文档(本质是先删除后添加)
POST /customer/_doc/1/_update?pretty { "doc": { "name": "Jane Doe" } } POST /customer/_doc/1/_update?pretty { "doc": { "name": "Jane Doe", "age": 20 } } POST /customer/_doc/1/_update?pretty { "script" : "ctx._source.age += 5" }
删除文档
DELETE /customer/_doc/2?pretty
批量处理(批量添加,以及批量修改)
1 POST /customer/_doc/_bulk?pretty 2 {"index":{"_id":"1"}} 3 {"name": "John Doe" } 4 {"index":{"_id":"2"}} 5 {"name": "Jane Doe" } 6 7 POST /customer/_doc/_bulk?pretty 8 {"update":{"_id":"1"}} 9 {"doc": { "name": "John Doe becomes Jane Doe" } } 10 {"delete":{"_id":"2"}}
批量导入数据
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"
查询,注意这里用到了_search,还有在修改的时候,这个位是“_update"。q=*代表查询所有的文档,sort代表按照account_number做升序(asc)排列,pretty上面介绍了。返回结果中hits代表命中的documents,totals属性代表了返回条数;但是注意默认返回10条;可以由size属性来制定;
GET /bank/_search?q=*&sort=account_number:asc&pretty
等价查询
1 GET /bank/_search 2 { 3 "query": { "match_all": {} }, 4 "sort": [ 5 { "account_number": "asc" } 6 ] 7 }
如果想要从中间某段,通过指定from属性,代表从index=n开始;如果n=5.98,系统将会向下取整,取n=5;注意在此之前都是返回值max_score都是0,但是从这个查询开始因为引入了查询条件,max_score开始有值了。
1 GET /bank/_search 2 { 3 "query": { "match_all": {} }, 4 "from": 10, #代表从id=10开始 5 "size": 10 6 }
返回指定列(Select col1,col2...)
1 GET /bank/_search 2 { 3 "query": { "match_all": {} }, 4 "_source": ["account_number", "balance"] 5 }
指定检索列(Where)
1 GET /bank/_search 2 { 3 "query": { "match": { "account_number": 20 } } 4 }
注意下面两组查询的差别,match和match phase之间的差别;前者是只要有任何一个匹配都是会作为检索结果的;并根据打分结果进行排序罗列;后者则要求短语全匹配,即位置之间关系必须严格按照mill在lane前一个位置;但是在操作中发现比如mill lane即使全匹配分值也不过是13.2,这个匹配是单词能够全部匹配,比如果198 Mill2 Lane,尽管只差一个Mill2,但是这样一来,分值是8.3,这个和其他数据,只匹配一个Lane的分值(Mill完全匹配不了)是一样的。
1 GET /bank/_search 2 { 3 "query": { "match": { "address": "198 Mill Lane" } } 4 } 5 6 GET /bank/_search 7 { 8 "query": { "match_phrase": { "address": "198 Mill Lane" } } 9 }
bool查询,相当于where的“and”
1 GET /bank/_search 2 { 3 "query": { 4 "bool": { 5 "must": [ 6 { "match": { "address": "mill" } }, 7 { "match": { "address": "lane" } } 8 ] 9 } 10 } 11 }
bool+should相当于where条件的“or”
1 GET /bank/_search 2 { 3 "query": { 4 "bool": { 5 "should": [ 6 { "match": { "address": "mill" } }, 7 { "match": { "address": "lane" } } 8 ] 9 } 10 } 11 }
还有where条件取反,不包含呢
1 GET /bank/_search 2 { 3 "query": { 4 "bool": { 5 "must_not": [ 6 { "match": { "address": "mill" } }, 7 { "match": { "address": "lane" } } 8 ] 9 } 10 } 11 }
还可以组合查询
1 GET /bank/_search 2 { 3 "query": { 4 "bool": { 5 "must": [ 6 { "match": { "age": "40" } } 7 ], 8 "must_not": [ 9 { "match": { "state": "ID" } } 10 ] 11 } 12 } 13 }
过滤器
这个过滤器是在bool查询器里面的;但是filter并不会触发文档计分;这个查询score显示为1是因为bool查询导致的文档评分;
1 get /bank/_search 2 { 3 "query":{ 4 "bool":{ 5 "must":{"match_all":{}}, 6 "filter":{ 7 "range":{ 8 "balance":{ 9 "gte":2000, 10 "lte":3000 11 } 12 } 13 } 14 } 15 } 16 }
分组
分组相当于groupby,下面的例子就是对于字段“state”值进行分组,去count值;group_by_state默认就是按照字段聚合计算count()值;
这里size设置为0是因为只要聚集函数的结果,而不要查询结果;如果设置了size>0将会将检索结果显示在response中;
1 GET /bank/_search 2 { 3 "size": 0, 4 "aggs": { 5 "group_by_state": { 6 "terms": { 7 "field": "state.keyword" 8 } 9 } 10 } 11 }
再来一个复杂一些的,groupby做count合计之外,还做了balance字段取均值;注意均值是放在group_by_state里面的;同时在在groupby之后,按照均值进行排序。
1 GET /bank/_search 2 { 3 "size": 0, 4 "aggs": { 5 "group_by_state": { 6 "terms": { 7 "field": "state.keyword", 8 "order": { 9 "average_balance": "desc" 10 } 11 }, 12 "aggs": { 13 "average_balance": { 14 "avg": { 15 "field": "balance" 16 } 17 } 18 } 19 } 20 } 21 }
再上一个更加复杂的,指定范围进行排序,同时指定了二级聚合字段(gender)
1 GET /bank/_search 2 { 3 "size": 0, 4 "aggs": { 5 "group_by_age": { 6 "range": { 7 "field": "age", 8 "ranges": [ 9 { 10 "from": 20, 11 "to": 30 12 }, 13 { 14 "from": 30, 15 "to": 40 16 }, 17 { 18 "from": 40, 19 "to": 50 20 } 21 ] 22 }, 23 "aggs": { 24 "group_by_gender": { 25 "terms": { 26 "field": "gender.keyword" 27 }, 28 "aggs": { 29 "average_balance": { 30 "avg": { 31 "field": "balance" 32 } 33 } 34 } 35 } 36 } 37 } 38 } 39 }
返回的片段
1 "aggregations": { 2 "group_by_age": { 3 "buckets": [ 4 { 5 "key": "20.0-30.0", #以及聚合字段 6 "from": 20, 7 "to": 30, 8 "doc_count": 451, 9 "group_by_gender": { 10 "doc_count_error_upper_bound": 0, 11 "sum_other_doc_count": 0, 12 "buckets": [ #二级聚合字段 13 { 14 "key": "M", 15 "doc_count": 232, 16 "average_balance": { 17 "value": 27374.05172413793 18 } 19 }, 20 { 21 "key": "F", 22 "doc_count": 219, 23 "average_balance": { 24 "value": 25341.260273972603 25 } 26 } 27 ] 28 } 29 }, 30 ... ...