1.回顾
(1)es是什么?
es是基于Apache Lucene的开源分布式(全文)搜索引擎,提供简单的RESTful API来隐藏Lucene的复杂性。
es除了全文搜索引擎之外,还可以这样描述它:
分布式的实时文件存储,每个字段都被索引并可被搜索
分布式的实时分析搜索引擎
可以扩展到成百上千台服务器,处理PB级结构化或非结构化数据。
(2)
数据组织 - 物理:节点和分片 - 逻辑:索引、类型、文档
(3)
简单操作
GET
PUT
DELETE
2.ES的增删改查(curd)
(1)删除之前的数据&数据准备(创建数据)
DELETE s18 PUT s18/doc/1 { "name":"yangyazhou", "age": 81, "sex": "男", "tags": "闷骚", "b": "19900715" } PUT s18/doc/2 { "name":"yangtao", "age": 18, "sex": "男", "tags":"浪", "b": "19970521" } PUT s18/doc/3 { "name":"cancan", "age": 16, "sex":"女", "tags":"学习认真", "b":"19980101" } PUT s18/doc/4 { "name":"guchenxu", "age": 22, "sex": "男", "tags":"幽默", "b":"19930302" } PUT s18/doc/5 { "name":"yangwenyu", "age": 23, "sex": "男", "tags":"正人君子", "b":"19941201" }
运行上边的5个操作
(2)查找数据
GET s18/doc/1 GET s18/doc/_search # 查询字符串 query string GET s18/doc/_search?q=age:22
(3)更新,只能更新一个,其他不能更新,(不建议使用)
PUT s18/doc/5 { "tags":"帅气" } GET s18/doc/5
上图是查看到的数据,下面恢复一下数据
(4)只更新指定字段,其他不做修改
#恢复数据
PUT s18/doc/5
{
"name":"yangwenyu",
"age": 23,
"sex": "男",
"tags":"正人君子",
"b":"19941201"
}
# 修改指定字段使用POST POST s18/doc/5/_update { "doc": { "tags":"帅气" } }
#查看 GET s18/doc/5
是否可以按照条件删除?
DELETE s18/doc/5 DELETE s18
不建议下图的删除方式
POST s18/doc/_delete_by_query?q=age:18
# 查询字符串 query string GET s18/doc/_search?q=age:22
只需要记忆最简单的就可以了
PUT增加 GET查找 POST修改 DELETE删除
3.es查询的两种方式
# 查询的两种方式 # 方式一:查询字符串 query string GET s18/doc/_search?q=age:22 # 方式二:DSL GET s18/doc/_search { "query": { "match": { "age": "18" } } }
GET s18/doc/_search
{
"query": {
"match": {
"age": 18
}
}
}
#内部已经做好了转化
4.复杂查询之es的match和match_all
match
GET s18/doc/_search
{
"query": {
"match": {
"tags": "浪"
}
}
}
# 报错,不能加在列表里边 GET s18/doc/_search { "query": { "match": { "tags": ["浪", "闷骚"] } } } #通过空格分隔 GET s18/doc/_search { "query": { "match": { "tags": "浪 闷骚" } } }
#通过逗号分隔
GET s18/doc/_search
{
"query": {
"match": {
"tags": "浪,闷骚"
}
}
}
#只要符合上边的一个条件就能返回,只是写法不同内部会做一些转换
match_all的用法
#下面的两种方式是等价的
GET s18/doc/_search GET s18/doc/_search { "query": { "match_all": {} } }
5.es的sort排序(通常以数字排序)年龄,薪水,分数等等
desc表示从大到小,降序
asc表示从小到大,升序
注意:不是所有的字段都能排序,选择有意义的排序
# 排序 sort
GET s18/doc/_search { "query": { "match_all": {} }, "sort": [ { "age": { "order": "desc" } } ] } GET s18/doc/_search { "query": { "match_all": {} }, "sort": [ { "age": { "order": "asc" } } ] }
6.es的分页(结构化查询条件是可插拔的优点)
GET s18/doc/_search GET s18/doc/_search { "query": { "match_all": {} }, "from": 0, "size": 2 }
#上边查找的是第1条和第2条数据 GET s18/doc/_search { "query": { "match_all": {} }, "from": 2, "size": 2 }
#上边查找的是第3条和第4条数据 GET s18/doc/_search GET s18/doc/_search { "query": { "match_all": {} }, "from": 4, "size": 10 }
#上边查找的是第5条到底15条数据,没有就取到最大值,如果只有1条就只返回1条
分页就是自定制,从哪显示到哪里的意思.
7.es的bool查询should(or) must(and) must_not(not)
#查询yangwenyu或者18岁
GET s18/doc/_search { "query": { "bool": { "should": [ { "match": { "name": "yangwenyu" } }, { "match": { "age": "18" } } ] } } }
#这个查询出的结果排序,也就是打分机制存在于内部算法中
#查询性别是男的并且年龄81 GET s18/doc/_search { "query": { "bool": { "must": [ { "match": { "age": 81 } }, { "match": { "sex": "男" } } ] } } }
# 查询性别既不是男的,又不是18岁: must_not GET s18/doc/_search { "query": { "bool": { "must_not": [ { "match": { "sex": "男" } }, { "match": { "age": 18 } } ] } } }
# 查询年龄大于20岁的男的文档: gt 大于 GET s18/doc/_search { "query": { "bool": { "must": [ { "match": { "sex": "男" } } ], "filter": { "range": { "age": { "gt": 20 } } } } } }
# gte 大于等于,查询年龄大于等于23的男的 GET s18/doc/_search { "query": { "bool": { "must": [ { "match": { "sex": "男" } } ], "filter": { "range": { "age": { "gte": 23 } } } } } }
# 小于lt 查询年龄小于20的女的 GET s18/doc/_search { "query": { "bool": { "must": [ { "match": { "sex": "女" } } ], "filter": { "range": { "age": { "lt": 20 } } } } } }
# 小于等于lte, 查询年龄小于等于23的男的 GET s18/doc/_search { "query": { "bool": { "should": [ { "match": { "sex": "男" } } ], "filter": { "range": { "age": { "lte": 23 } } } } } }
# filter中尽量用must,避免脏数据 GET s18/doc/_search { "query": { "bool": { "must": [ { "match": { "sex": "男" } } ], "filter": { "range": { "age": { "lte": 23 } } } } } }
# 查询年龄小于等于23的非男性 GET s18/doc/_search { "query": { "bool": { "must_not": [ { "match": { "sex": "男" } } ], "filter": { "range": { "age": { "lte": 23 } } } } } }
9.es的高亮查询
关键字高亮显示,查询是哪个检索的.
# 高亮查询 # 查询name是cancan的文档 GET s18/doc/_search { "query": { "match": { "name": "cancan" } }, "highlight": { "fields": { "name": {} } } } GET s18/doc/_search { "query": { "match": { "name": "cancan" } }, "highlight": { "pre_tags": "<b style='color:red;font-size:20px;' class='wangdi'>", "post_tags": "</b>", "fields": { "name": {} } } }
#现在只是json结果,只有放在前端才能显示结果
PUT s18/doc/7 { "name":"wangdi", "desc": "骚的打漂" } GET s18/doc/_search { "query": { "match": { "desc": "打漂" } }, "highlight": { "pre_tags": "<b style='color:red;font-size:20px;' class='wangdi'>", "post_tags": "</b>", "fields": { "desc": {} } } }
#上边代表只是高亮显示"打漂"
#高亮显示就是重要的点
10.es的结果字段过滤
# 结果过滤 GET s18/doc/_search { "query": { "match": { "name": "yangtao" } }, "_source": "name" }
GET s18/doc/_search { "query": { "match": { "name": "yangtao" } }, "_source": ["name", "age", "sex"] }
我们只需要过滤出,我们需要的字段,减少服务器压力
11.es的聚合
# 聚合查询 # sum,查询所有男生的年龄总和 GET s18/doc/_search { "query": { "match": { "sex": "男" } }, "aggs": { "my_sum": { "sum": { "field": "age" } } } } # 查询年龄最大的男生 max GET s18/doc/_search { "query": { "match": { "sex": "男" } }, "aggs": { "my_max": { "max": { "field": "age" } } } } # 查询年龄最小的 min GET s18/doc/_search { "aggs": { "my_min": { "min": { "field": "age" } } } } # 求平均 avg GET s18/doc/_search { "aggs": { "my_avg": { "avg": { "field": "age" } } } } # 分组,根据年龄,10-20,,20-30, 30-100,每个年龄段有多少人?
GET s18/doc/_search { "query": { "match": { "sex": "男" } }, "aggs": { "my_group":{ "range": { "field": "age", "ranges": [ { "from": 10, "to": 20 }, { "from": 20, "to": 30 }, { "from": 30, "to": 100 } ] } } } } # 分组,根据年龄,10-20,,20-30, 30-100, 对每组年龄求和 GET s18/doc/_search { "query": { "match": { "sex": "男" } }, "aggs": { "group":{ "range": { "field": "age", "ranges": [ { "from": 10, "to": 20 }, { "from": 20, "to": 30 }, { "from": 30, "to": 100 } ] }, "aggs": { "my_sum": { "sum": { "field": "age" } } } } } }
先分组,再聚合
12.es的mappings之dynamic
homework:
(1)用py脚本制作一键启动es和kibana
(2)倒排索引,把表画出来