zoukankan html css js c++ java

ElasticSearch基础学习笔记

今晚翻以前的学习笔记，有两篇ES写得挺详细的，分享出来

名词

document：索引和搜索的主要数据载体，对应写入到ES中的一个doc。
field: document中的各个字段。
term: 词项，搜索时的一个单位，代表文本中的某个词。
token: 词条，词项(term)在字段(field)中的一次出现,包括词项的文本、开始和结束的位移、类型等信息。
Lucene内部使用的是倒排索引的数据结构，将词项（term）映射到文档(document)。

单实例安装

需要java1.8以上

https://www.elastic.co

tar -zxvf elasticsearch-5.2.2.tar.gz 

cd elasticsearch-5.2.2

sh ./bin/elasticsearch #关键词有started 表示启动成功

# 打开 127.0.0.1:9200 可以看到json化的信息


{
  "name" : "EDedFce",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "IM4Uhhw3Qkm9fKSo5TkXkg",
  "version" : {
    "number" : "6.2.0",
    "build_hash" : "37cdac1",
    "build_date" : "2018-02-01T17:31:12.527918Z",
    "build_snapshot" : false,
    "lucene_version" : "7.2.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

bin---这里面是ES启动的脚本
conf---elasticsearch.yml为ES的配置文件
data---这里是ES得当前节点的分片的数据，可以直接拷贝到其他的节点进行使用
logs---日志文件
plugins---这里存放一些常用的插件，如果有一切额外的插件，可以放在这里使用。

安装elasticsearch-head

先安装node npm

安装

获取nodejs 资源

# 4.x
curl --silent --location https://rpm.nodesource.com/setup_4.x | bash -
# 5.x
curl --silent --location https://rpm.nodesource.com/setup_5.x | bash -

安装

yum install -y nodejs

测试是否安装成功

node -v 
 # v4.4.0

npm -v
# 2.14.20

git clone git://github.com/mobz/elasticsearch-head.git
cd elasticsearch-head
npm install
npm run start

4.在elasticsearch添加配置

vim config/elasticsearch.yml

# 设置允许跨域
http.cors.enabled: true
http.cors.allow-origin: "*"

# network.host: 0.0.0.0 允许远程访问
# bootstrap.system_call_filter: false centos6.x操作系统不支持SecCom,远程访问需要关闭这个选项

#保存

#重启elasticsearch,-d参数表示守护程序运行
./bin/elasticsearch -d 

#重启elasticsearch-head 
cd elasticsearch-head 
npm run start

打开 http://localhost:9100/

分布式安装

设置master

Elastic 本质上是一个分布式数据库，允许多台服务器协同工作，每台服务器可以运行多个 Elastic 实例。单个Elastic实例称为一个节点（node）。一组节点构成一个集群（cluster）。节点通过集群名字加入集群。

vim config/elasticsearch.yml

cluster.name: wali //集群名称
node.name: master //节点名称
node.master: true //声明为master

network.host: 127.0.0.1 //绑定的ip地址和其他节点与该节点交互的ip地址

kill 进程
./bin/elasticsearch -d 重启进程

设置slave

# 解压并拷贝两份源码到es_slave下

mkdir es_slave
cp elasticsearch-5.5.5.tar.gz es_slave
tar -zxvf elasticsearch-5.5.5.tar.gz
cp -r  elasticsearch-5.5.5 es_slave1
cp -r  elasticsearch-5.5.5 es_slave2

cd es_slave1
vim ./config/elasticsearch.yml

cluster.name: wali #集群名称
node.name slave1

network.host: 127.0.0.1
http.port: 8200 #配置成另一个端口
discovery.zen.ping.unicast.hosts : ["127.0.0.1"] #用这个ip来找master

./bin/elasticsearch -d # 改动该slave

# 同样操作修改es_slave2,最后打开127.0.0.1:9100 看看是不是正常

基础概念

索引：含有相同属性的文档集合(用一个索引代表消费者数据，用另一个代表产品数据，相当于database)，用一个名字作为识别
类型：索引可以定义一个或多个类型，文档必须属于一个类型，文档必须属于一个类型(相同字段的文档作为一个类型,相当于table)
文档：文档是可以被索引的基本数据单位(相当于row)

e.g. 图书索引 > 科普类型 > 书本A

分片：每个索引都有多个分片，每个分片都是一个Lucene索引（假如一个索引数据量很大，会对IO有压力，分摊压力，分片可以水平拓展或垂直拓展，ES默认5个，只能在创建索引的时候指定，不能后期修改）
备份：拷贝一份分片就完成了分片的备份（主分片有问题的时候，用备分片代替,ES默认一个备份，可以动态修改）

Restful API

API格式：http://:/<索引>/<类型>/<文档ID>
HTTP动词：GET/PUT/POST/DELETE

创建索引

结构化索引
非结构化索引(mappings为空)

127.0.0.1:9200/pepple

创建 people索引 man类型

PUT 172.28.28.15:9200/people

{
	"settings":{
		"number_of_shards":3, //分片数
		"number_of_replicas":1 //备份数
	},
	"mappings":{ //映射
		"man":{
			"properties":{ //映射属性
				"name":{
					"type":"text"
				},
				"country":{
					"type":"keyword"
				},
				"age":{
					"type":"integer"
				},
				"date":{
					"type":"date",
					"format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
				}
			}
		}
	}
}

插入

指定文档id插入

PUT 172.28.28.15:9200/people/man/1

{
	"name":"瓦力",
	"country":"China",
	"age":30,
	"date":"1999-01-03"
}



{
    "_index": "people",
    "_type": "man",
    "_id": "1",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

自动文档ID插入

POST 172.28.28.15:9200/people/man  #需要改成POST方法

{
	"name":"超重瓦力",
	"country":"China",
	"age":40,
	"date":"1999-01-03"
}

{
    "_index": "people",
    "_type": "man",
    "_id": "5KKFeWEB7ApVdIXNUWwb", //自动生成了这个随机ID
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

更新

直接修改文档

POST 172.28.28.15:9200/people/man/1/_update #指定ID和update

//需要更新的数据包在doc中
{
	"doc":{
		"name":"谁是瓦力呢" 
	}
}

{
    "_index": "people",
    "_type": "man",
    "_id": "1",
    "_version": 2,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "_seq_no": 1,
    "_primary_term": 1
}

通过脚本方式修改

POST 172.28.28.15:9200/people/man/1/_update
{
	"script":{
		"lang":"painless", //支持多种语言，这种是内置语言
		"inline":"ctx._source.age += 10"  //ctx是上下文 source是文档
	}
}


# 支持把参数写到外面
{
	"script":{
		"lang":"painless",
		"inline":"ctx._source.age = params.age",
		"params":{
			"age":100
		}
	}
}

删除

删除文档

DELETE 172.28.28.15:9200/people/man/1

#结果
{
    "_index": "people",
    "_type": "man",
    "_id": "1",
    "_version": 5,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "_seq_no": 4,
    "_primary_term": 1
}

DELETE 172.28.28.15:9200/people

#结果
{
    "acknowledged": true
}

查询

DEMO

#先建立索引
PUT 172.28.28.15:9200/book
{
	"settings":{
		"number_of_shards":3,
		"number_of_replicas":1
	},
	"mappings":{
		"novel":{
			"properties":{
				"word_count":{
					"type":"integer"
				},
				"author":{
					"type":"keyword"
				},
				"title":{
					"type":"text"
				},
				"publish_date":{
					"format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis",
					"type":"date"
				}
			}
		}
	}
}


#插入数据
PUT 172.28.28.15:9200/book/novel/1
{
	"author":"王五",
	"title":"菜谱",
	"word_count":5000,
	"publish_date":"2002-10-01"
}

{
	"author":"瓦力",
	"title":"ElasticSearch入门",
	"word_count":3000,
	"publish_date":"2017-10-01"
}

{
	"author":"哈哈瓦力",
	"title":"ElasticSearch精通",
	"word_count":3000,
	"publish_date":"2016-10-01"
}

{
	"author":"赵六",
	"title":"PHP入门",
	"word_count":4000,
	"publish_date":"2017-10-01"
}

{
	"author":"JayI",
	"title":"CDN",
	"word_count":8000,
	"publish_date":"2015-10-01"
}

{
	"author":"Lily",
	"title":"多多",
	"word_count":18000,
	"publish_date":"2013-10-01"
}

{
	"author":"张三丰",
	"title":"太极拳",
	"word_count":9000,
	"publish_date":"2012-12-01"
}
{
	"author":"路路",
	"title":"剑谱",
	"word_count":7000,
	"publish_date":"2011-11-01"
}
{
	"author":"Fufy",
	"title":"魔术",
	"word_count":4000,
	"publish_date":"2010-11-01"
}
{
	"author":"tmall",
	"title":"开店指南",
	"word_count":9000,
	"publish_date":"2010-11-01"
}

查询id为1的文档

GET http://172.28.28.15:9200/book/novel/1

# 结果
{
    "_index": "book",
    "_type": "novel",
    "_id": "1",
    "_version": 1,
    "found": true,
    "_source": {
        "author": "王五",
        "title": "菜谱",
        "word_count": 5000,
        "publish_date": "2002-10-01"
    }
}

条件查询

POST http://172.28.28.15:9200/book/_search //用search关键词

#查询所有数据
{
	"query":{
		"match_all":{}
	}
}


#结果
{
    "took": 27, #花费时间
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 10, #总共数据
        "max_score": 1,
        "hits": [  #默认10条
            {
                "_index": "book",
                "_type": "novel",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "author": "瓦力",
                    "title": "ElasticSearch入门",
                    "word_count": 3000,
                    "publish_date": "2017-10-01"
                }
            }
        ]
    }
}

{
	"query":{
		"match_all":{}
	},
	"from":1, #指定从哪里返回
	"size":1 #指定返回条数
}

查询关键词

{
	"query":{
		"match":{
			"title":"PHP"
		}
	}
}

指定排序

按日期倒排

{
	"query":{
		"match":{
			"title":"谱"
		}
	},
	"sort":[
		{"publish_date":{
				"order":"desc"
			}
		}
	]
}

聚合查询aggs

按照word_count进行聚合（可以写多组）

{
	"aggs":{
		"group_by_word_count":{ #别名？
			"terms":{
				"field":"word_count"
			}
		}
	}
}

#结果
 "aggregations": {
        "group_by_word_countss": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": 3000,
                    "doc_count": 2
                },
                {
                    "key": 4000,
                    "doc_count": 2
                },
                {
                    "key": 9000,
                    "doc_count": 2
                },
                {
                    "key": 5000,
                    "doc_count": 1
                },
                {
                    "key": 7000,
                    "doc_count": 1
                },
                {
                    "key": 8000,
                    "doc_count": 1
                },
                {
                    "key": 18000,
                    "doc_count": 1
                }
            ]
        }
    }

统计

对指定字段进行统计

{
	"aggs":{
		"grades_word_count":{
			"stats":{
				"field":"word_count"
			}
		}
	}
}

#结果
"aggregations": {
        "grades_word_count": {
            "count": 10,
            "min": 3000,
            "max": 18000,
            "avg": 7000,
            "sum": 70000
        }
}

//指定计算最小值
{
	"aggs":{
		"grades_word_count":{
			"min":{
				"field":"word_count"
			}
		}
	}
}

#结果
"aggregations": {
    "grades_word_count": {
        "value": 3000
    }
}

高级查询

子条件查询(特定字段查询所指特定值)

Query context

在查询的过程中，除了判断文档是否满足查询条件外，ES还会计算一个_score来标识匹配的程序，旨在判断目标文档和查询条件的匹配有多好

全文本查询针对文本类型的数据
字段级别查询针对结构化数据，如数字和日期
模糊匹配(关键词match)

{
	"query":{
		"match":{
			"author":"瓦力"
		}
	}
}

#结果是匹配到所有作者为瓦力的书籍

搜索 ElasticSearch入门，会切开成ElasticSearch和入门两个关键词进行匹配,最后结果是，只要符合ElasticSearch和入门的结果都会被搜索出来

(title like '%ElasticSearch%'' or title like '%入门%'')

{
	"query":{
		"match":{
			"title":"ElasticSearch入门"
		}
	}
}

短语匹配 match_phrase,只有完全符合ElasticSearch入门全部关键词的书籍才会被匹配出来

(title='ElasticSearch入门')

{
	"query":{
		"match_phrase":{
			"title":"ElasticSearch入门"
		}
	}
}

多个字段的匹配查询,作者和标题其中有一个包含瓦力的记录都会出来

(author='wali' or title='wali')

{
	"query":{
		"multi_match":{
			"query":"瓦力",
			"fields":["author","title"]
		}
	}
}

语法查询(通过语法组合来进行查询)

(title like '%ElasticSearch%' AND title LIKE '%精通%')
{
	"query":{
		"query_string":{
			"query":"ElasticSearch AND 精通"
		}
	}
}

(title like '%ElasticSearch%' AND title LIKE '%精通%') OR (title LIKE PHP)
{
	"query":{
		"query_string":{
			"query":"(ElasticSearch AND 精通) OR PHP"
		}
	}
}

# 多字段查询
(`title` like '%ElasticSearch%'' OR title like '%瓦力%') or (author like '%ElasticSearch'% OR title like '%瓦力'%)
{
	"query":{
		"query_string":{
			"query":"ElasticSearch OR 瓦力",
			"fields":["title","author"]
		}
	}
}

# 结构化数据查询(term具体项)

(word_count = 10000)
{
	"query":{
		"term":{
			"word_count":10000
		}
	}
}

# 结构化数据的范围查询(日期 数字等)
(word_count >= 1000 and word_count <= 5000)
{
	"query":{
		"range":{
			"word_count":{
				"gte":1000,
				"lte":5000
			}
		}
	}
}

Filter context

在查询过程中，只判断该文档是否满足条件，只有yes or no(相对query会有_score评分)

(word_count = 10000)

{
	"query":{
		"bool":{
			"filter":{
				"term":{
					"word_count":10000
				}
			}
		}
	}
}

复合条件查询以一定的逻辑组合子条件查询

固定分数查询
布尔查询
..more
固定分数查询

# 查出来的分数都是1
{
	"query":{
		"constant_score":{
			"filter":{
				"match":{
					"title":"ElasticSearch"
				}
			}
		}
	}
}

# 把查询分数指定为2

{
	"query":{
		"constant_score":{
			"filter":{
				"match":{
					"title":"ElasticSearch"
				}
			},
			"boost":2
		}
	}
}

布尔查询

# 或条件should (where author like '%瓦力%' or title like '%ElasticSearch%')

{
	"query":{
		"bool":{
			"should":[
				{
					"match":{
						"author":"瓦力"
					}
				},
				{
					"match":{
						"title":"ElasticSearch"
					}
				}
			]
		}
	}
}

#must 与条件 (where author like '%瓦力%' and title like '%ElasticSearch%')
{
	"query":{
		"bool":{
			"must":[
				{
					"match":{
						"author":"瓦力"
					}
				},
				{
					"match":{
						"title":"ElasticSearch"
					}
				}
			]
		}
	}
}


# 带过滤条件，满足should的条件后，用filter筛选字数是3000的书
{
	"query":{
		"bool":{
			"should":[
				{
					"match":{
						"author":"瓦力"
					}
				},
				{
					"match":{
						"title":"ElasticSearch"
					}
				}
			],
			"filter":[
				{
					"term":{
						"word_count":3000
					}
				}
			]
		}
	}
}


# 指定不满足条件,满足作者不是瓦力的书

{
	"query":{
		"bool":{
			"must_not":{
				"term":{
					"author":"瓦力"
				}
			}
		}
	}
}

查看全文

相关阅读:
ASP.NET 取得 Uri 各项属性值
 js获取当前时间显示在页面上
 脚步提示及跳转
 整体刷新和局部刷新frameset窗口
 asp.net 字符串过滤
 .net 获取当前网页的的url
优酷去广告最新的关于如何屏蔽优酷广告的方法
 bat命令集合
 修复IE
网易见外工作台（AI），语音转文字，快速制作字幕，中英翻译，在线修改字幕

原文地址：https://www.cnblogs.com/jaychan/p/14984059.html