一.前言
1.为何部署sentinel哨兵
前文redis主从架构中,当主服务故障时,需要手动将从服务切换为主服务,sentinel服务就是将这个过程自动化。
主要功能有:
1)不时监控主从服务正常运行
2)可以通过脚本、API接口发出报警
3)自动故障迁移
当然sentinel不止以上这些,更多的功能参考https://redis.io/topics/sentinel
二.redis部署sentinel
1.环境说明
系统基于CentOS-6.7-x86_64-minimal.iso
源码安装包redis-3.2.4版本,sentinel功能2.8版本开始支持
192.168.56.101 host101 192.168.56.102 host102 192.168.56.103 host103 master 192.168.56.101:6379 slave1 192.168.56.102:6379 slave2 192.168.56.103:6379 sentinel1 192.168.56.101:26379 sentinel2 192.168.56.102:26379 sentinel3 192.168.56.103:26379
2.redis配置
#master端配置 #复制一份默认的配置文件再修改以下个地方 [root@host101 ~]# cp /usr/local/src/redis-3.2.4/redis.conf /etc/redis/6379.conf [root@host101 ~]# vim /etc/redis/6379.conf bind 0.0.0.0 #监控地址 daemonize yes #dameon形式运行 pidfile /var/run/redis_6379.pid #pid文件 logfile "/var/log/redis_6379.log" #启动日志 dbfilename dump.rdb #数据文件 dir /var/lib/redis #数据目录 requirepass mima #配置连接密码 #slave1端配置 [root@host102 ~]# vim /etc/redis/6379.conf bind 0.0.0.0 #监控地址 daemonize yes #dameon形式运行 pidfile /var/run/redis_6379.pid #pid文件 logfile "/var/log/redis_6379.log" #启动日志 dbfilename dump.rdb #数据文件 dir /var/lib/redis #数据目录 requirepass mima #配置连接密码 slaveof 192.168.56.101 6379 #指定master masterauth mima #slave认证master密码 slave-read-only yes #设置slave为只读模式 #slave2端配置 [root@host103 ~]# vim /etc/redis/6379.conf bind 0.0.0.0 #监控地址 daemonize yes #dameon形式运行 pidfile /var/run/redis_6379.pid #pid文件 logfile "/var/log/redis_6379.log" #启动日志 dbfilename dump.rdb #数据文件 dir /var/lib/redis #数据目录 requirepass mima #配置连接密码 slaveof 192.168.56.101 6379 #指定master masterauth mima #slave认证master密码 slave-read-only yes #设置slave为只读模式
3.启动redis并查看主从状态
#启动redis服务,三台都操作 [root@host101 redis]# /usr/local/redis/bin/redis-server /etc/redis/6379.conf #查看主从状态 [root@host101 redis]# /usr/local/redis/bin/redis-cli -a mima 127.0.0.1:6379> info replication # Replication role:master connected_slaves:2 slave0:ip=192.168.56.102,port=6379,state=online,offset=15,lag=0 slave1:ip=192.168.56.103,port=6379,state=online,offset=15,lag=1 master_repl_offset:15 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:2 repl_backlog_histlen:14 #测试读写 [root@host101 redis]# /usr/local/redis/bin/redis-cli -a mima 127.0.0.1:6379> set name marry OK 127.0.0.1:6379> get name "marry" 127.0.0.1:6379> exit [root@host101 redis]# /usr/local/redis/bin/redis-cli -h 192.168.56.103 -a mima 192.168.56.103:6379> get name "marry" 192.168.56.103:6379> set name marry3 (error) READONLY You can't write against a read only slave. 192.168.56.103:6379> exit [root@host101 redis]# /usr/local/redis/bin/redis-cli -h 192.168.56.102 -a mima 192.168.56.102:6379> info replication # Replication role:slave master_host:192.168.56.101 master_port:6379 master_link_status:up master_last_io_seconds_ago:2 master_sync_in_progress:0 slave_repl_offset:537 slave_priority:100 slave_read_only:1 connected_slaves:0 master_repl_offset:0 repl_backlog_active:0 repl_backlog_size:1048576 repl_backlog_first_byte_offset:0 repl_backlog_histlen:0
4.配置sentinel并启动
#三台都修改为以下同样配置 [root@host101 ~]# grep -Ev "^$|#" /usr/local/src/redis-3.2.4/sentinel.conf > /etc/redis/sentinel.conf [root@host101 ~]# vim /etc/redis/sentinel.conf port 26379 #sentinel监听端口 daemonize yes #以daemon形式运行在后台 logfile /var/log/sentinel.log #日志文件 dir /tmp sentinel monitor mymaster 192.168.56.101 6379 2 #监控mymaster组,master地址,端口,quorum次数 sentinel down-after-milliseconds mymaster 5000 #5000毫秒即5秒连续不能连通master,认为master挂掉 sentinel parallel-syncs mymaster 1 sentinel failover-timeout mymaster 60000 #故障切换超时时间 sentinel auth-pass mymaster mima #密码认证 protected-mode no #默认情况下sentinel只监听环回地址,这样就导致sentinel之间不能通信,可以改为监听网卡或者关闭protected-mode #三台都启动 [root@host101 ~]# /usr/local/redis/bin/redis-sentinel /etc/redis/sentinel.conf #查看sentinel启动日志 #正常情况下可以看到+sentinel-address-switch字样 [root@host101 ~]# more /var/log/sentinel.log 24470:X 08 Dec 10:53:12.205 * Increased maximum number of open files to 10032 (it was originally set to 1024). _._ _.-``__ ''-._ _.-`` `. `_. ''-._ Redis 3.2.4 (00000000/0) 64 bit .-`` .-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in sentinel mode |`-._`-...-` __...-.``-._|'` _.-'| Port: 26379 | `-._ `._ / _.-' | PID: 24470 `-._ `-._ `-./ _.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | http://redis.io `-._ `-._`-.__.-'_.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | `-._ `-._`-.__.-'_.-' _.-' `-._ `-.__.-' _.-' `-._ _.-' `-.__.-' 24470:X 08 Dec 10:53:12.206 # Sentinel ID is 106e22fad7ad280b2c38542c164f7060b6587d68 24470:X 08 Dec 10:53:12.206 # +monitor master mymaster 192.168.56.101 6379 quorum 2 24470:X 08 Dec 10:53:14.321 * +sentinel-address-switch master mymaster 192.168.56.101 6379 ip 192.168.56.102 port 26379 for fae94df5596af315af0f5f97fe7ade3fad0b8a98 24470:X 08 Dec 10:53:14.336 * +sentinel-address-switch master mymaster 192.168.56.101 6379 ip 192.168.56.103 port 26379 for 8ea722390cabf3ad304b20f8cc42157603d21d84 #需要注意的地方,一旦启动sentinel服务后,服务会自动修改sentinel配置文件/etc/redis/sentinel.conf [root@host101 ~]# cat /etc/redis/sentinel.conf port 26379 daemonize yes dir "/tmp" logfile "/var/log/sentinel.log" sentinel myid 106e22fad7ad280b2c38542c164f7060b6587d68 sentinel monitor mymaster 192.168.56.101 6379 2 sentinel down-after-milliseconds mymaster 5000 sentinel failover-timeout mymaster 60000 sentinel auth-pass mymaster mima protected-mode no # Generated by CONFIG REWRITE sentinel config-epoch mymaster 0 sentinel leader-epoch mymaster 0 sentinel known-slave mymaster 192.168.56.102 6379 sentinel known-slave mymaster 192.168.56.103 6379 sentinel known-sentinel mymaster 192.168.56.103 26379 8ea722390cabf3ad304b20f8cc42157603d21d84 sentinel known-sentinel mymaster 192.168.56.102 26379 fae94df5596af315af0f5f97fe7ade3fad0b8a98 sentinel current-epoch 0
4.详细介绍以上几个参数的用途
sentinel monitor mymaster 192.168.56.101 6379 2
4.1这一行代表sentinel监控的master的名字叫做mymaster,地址为192.168.56.101:6379,行尾最后的一个2代表什么意思呢?我们知道,网络是不可靠的,有时候一个sentinel会因为网络堵塞而误以为一个master redis已经死掉了,当sentinel集群式,解决这个问题的方法就变得很简单,只需要多个sentinel互相沟通来确认某个master是否真的死了,这个2代表,当集群中有2个sentinel认为master死了时,才能真正认为该master已经不可用了。(sentinel集群中各个sentinel也有互相通信,通过gossip协议)。
sentinel down-after-milliseconds mymaster 5000
4.2sentinel会向master发送心跳PING来确认master是否存活,如果master在“一定时间范围”内不回应PONG 或者是回复了一个错误消息,那么这个sentinel会主观地(单方面地)认为这个master已经不可用了(subjectively down, 也简称为SDOWN)。而这个down-after-milliseconds就是用来指定这个“一定时间范围”的,单位是毫秒。
不过需要注意的是,这个时候sentinel并不会马上进行failover主备切换,这个sentinel还需要参考sentinel集群中其他sentinel的意见,如果超过某个数量的sentinel也主观地认为该master死了,那么这个master就会被客观地(注意哦,这次不是主观,是客观,与刚才的subjectively down相对,这次是objectively down,简称为ODOWN)认为已经死了。需要一起做出决定的sentinel数量在上一条配置中进行配置。
sentinel failover-timeout mymaster 60000
4.3failover过期时间,当failover开始后,在此时间内仍然没有触发任何failover操作,当前sentinel将会认为此次failoer失败。
5.测试故障转移
#任意sentinel节点,查看状态 [root@host101 ~]# /usr/local/redis/bin/redis-cli -p 26379 127.0.0.1:26379> info sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=192.168.56.101:6379,slaves=2,sentinels=3 #slave监听sentinel日志 [root@host102 redis]# tail -f /var/log/sentinel.log #master杀掉redis-server进程 [root@host101 ~]# killall redis-server #slave监听sentinel日志,大概5秒后刷出日志 [root@host102 redis]# tail -f /var/log/sentinel.log 6711:X 08 Dec 11:16:15.164 # +sdown master mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:15.223 # +odown master mymaster 192.168.56.101 6379 #quorum 2/2 6711:X 08 Dec 11:16:15.224 # +new-epoch 1 6711:X 08 Dec 11:16:15.224 # +try-failover master mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:15.226 # +vote-for-leader fae94df5596af315af0f5f97fe7ade3fad0b8a98 1 6711:X 08 Dec 11:16:15.232 # 8ea722390cabf3ad304b20f8cc42157603d21d84 voted for fae94df5596af315af0f5f97fe7ade3fad0b8a98 1 6711:X 08 Dec 11:16:15.232 # 106e22fad7ad280b2c38542c164f7060b6587d68 voted for fae94df5596af315af0f5f97fe7ade3fad0b8a98 1 6711:X 08 Dec 11:16:15.293 # +elected-leader master mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:15.293 # +failover-state-select-slave master mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:15.346 # +selected-slave slave 192.168.56.102:6379 192.168.56.102 6379 @ mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:15.346 * +failover-state-send-slaveof-noone slave 192.168.56.102:6379 192.168.56.102 6379 @ mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:15.447 * +failover-state-wait-promotion slave 192.168.56.102:6379 192.168.56.102 6379 @ mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:16.273 # +promoted-slave slave 192.168.56.102:6379 192.168.56.102 6379 @ mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:16.273 # +failover-state-reconf-slaves master mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:16.333 * +slave-reconf-sent slave 192.168.56.103:6379 192.168.56.103 6379 @ mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:17.283 * +slave-reconf-inprog slave 192.168.56.103:6379 192.168.56.103 6379 @ mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:17.283 * +slave-reconf-done slave 192.168.56.103:6379 192.168.56.103 6379 @ mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:17.358 # -odown master mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:17.358 # +failover-end master mymaster 192.168.56.101 6379 6711:X 08 Dec 11:16:17.358 # +switch-master mymaster 192.168.56.101 6379 192.168.56.102 6379 6711:X 08 Dec 11:16:17.358 * +slave slave 192.168.56.103:6379 192.168.56.103 6379 @ mymaster 192.168.56.102 6379 6711:X 08 Dec 11:16:17.358 * +slave slave 192.168.56.101:6379 192.168.56.101 6379 @ mymaster 192.168.56.102 6379 6711:X 08 Dec 11:16:22.402 # +sdown slave 192.168.56.101:6379 192.168.56.101 6379 @ mymaster 192.168.56.102 6379 #再次查看sentinel状态,可以看到master节点已变为原192.168.56.102:6379 [root@host101 ~]# /usr/local/redis/bin/redis-cli -p 26379 127.0.0.1:26379> info sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=192.168.56.102:6379,slaves=2,sentinels=3
6.恢复原master192.168.56.101:3679
#添加redis-server认证密码,因为一开始原master并没有配置这个选项,启动服务 [root@host101 ~]# echo "masterauth mima" >> /etc/redis/6379.conf [root@host101 ~]# /usr/local/redis/bin/redis-server /etc/redis/6379.conf #检查sentinel日志,检查replication状态,测试读写
总结:sentinel只是实现的redis的高可用,并没有实现前端服务的高可用。
1,keepalived:通过keepalived的虚拟IP,提供主从的统一访问,在主出现问题时,通过keepalived运行脚本将从提升为主,待主恢复后先同步后自动变为主,该方案的好处是主从切换后,应用程序不需要知道(因为访问的虚拟IP不变),坏处是引入keepalived增加部署复杂性, 而且keepalived的应用场景有限,比如它的核心协议VRRP只能工作在局域网内,不能工作在局域网外(网间、广域网),而且在网络不受自己控制时基本不能用,除非设定好的VIP是供局域网使用;
2,zookeeper:通过zookeeper来监控主从实例,维护最新有效的IP,应用通过zookeeper取得IP,对Redis进行访问;
3,sentinel:通过Sentinel监控主从实例,自动进行故障恢复,该方案有个缺陷:因为主从实例地址(IP PORT)是不同的,当故障发生进行主从切换后,应用程序无法知道新地址,故在Jedis2.2.2中新增了对Sentinel的支持,应用通过redis.clients.jedis.JedisSentinelPool.getResource()取得的Jedis实例会及时更新到新的主实例地址。