Redis目前高可用的架构非常多,比如keepalived+redis,redis cluster,twemproxy,codis,这些架构各有优劣,今天暂且不说这些架构,今天主要说说redis sentinel高可用架构。
它的主要功能有以下几点
- 不时地监控redis是否按照预期良好地运行;
- 如果发现某个redis节点运行出现状况,能够通知另外一个进程(例如它的客户端);
- 能够进行自动切换。当一个master节点不可用时,能够选举出master的多个slave(如果有超过一个slave的话)中的一个来作为新的master,其它的slave节点会将它所追随的master的地址改为被提升为master的slave的新地址。
关于更加详细的配置以及介绍推荐看完以下文章,我在这里就不多说了,直接进行搭建:
http://segmentfault.com/a/1190000002680804
http://segmentfault.com/a/1190000002685515
redis sentinel的架构如下图:
当然Redis-Sentinel推荐使用3个或者3个以上节点,至于为什么这么做看完我上面给的文章链接。
环境介绍:
Redis Sentinel5台服务器:
10.36.30.203 10.36.30.204 10.37.124.202 10.37.124.203 10.37.124.204
这里不要觉得浪费,这样做是为了更加安全高效的监控redis,且redis Sentinel可以进行复用,也就是可以监控多个Redis实例,所以服务器不存在浪费。
Redis 服务器2台,1主1从:
10.69.25.173 master 10.69.30.170 slave
5台Sentinel的配置文件内容如下:
port 26379 dir "/data/redis/sentinel/26379" daemonize yes logfile "/data/redis/sentinel/26379/sentinel.log" # 6379 sentinel monitor master-6379 10.69.25.173 6379 3 sentinel down-after-milliseconds master-6379 15000 sentinel parallel-syncs master-6379 1 sentinel failover-timeout master-6379 180000 sentinel client-reconfig-script master-6379 /sh/redis/notify.py
其中sentinel client-reconfig-script master-6379 /sh/redis/notify.py是在主从切换以后发送告警邮件。其他参数的意义参考我给的文章链接。相关目录自己创建好。
notify.py脚本内容如下,5台服务器上面都需要存在,因为你不知道哪个节点会被选举为leader(网上还没有人提到切换发送告警邮件问题):
#!/usr/bin/python #coding:utf8 import sys import time import smtplib import logging from email.mime.text import MIMEText from email.message import Message from email.header import Header alarm_mail =['xxxxxx@163.com'] def main(): failover_time=time.strftime("%Y-%m-%d %H:%M:%S") logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s', datefmt='%Y-%m-%d %H:%M:%S', filename='/sh/redis/failover.log', filemode='a') console = logging.StreamHandler() console.setLevel(logging.INFO) formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s') console.setFormatter(formatter) logging.getLogger('').addHandler(console) mail_host='xxxxx' mail_port=25 mail_user='xxxxxxx' mail_pass='xxxxxxxx' mail_send_from = 'xxxxxxx' def send_mail(to_list,sub,content): me=mail_send_from msg = MIMEText(content, _subtype='html', _charset='utf-8') msg['Subject'] = Header(sub,'utf-8') msg['From'] = Header(me,'utf-8') msg['To'] = ";".join(to_list) try: smtp = smtplib.SMTP() smtp.connect(mail_host,mail_port) smtp.login(mail_user,mail_pass) smtp.sendmail(me,to_list, msg.as_string()) smtp.close() return True except Exception as error: logging.error("邮件发送失败: %s" % (error)) return False try: master_name = sys.argv[1] role = sys.argv[2] from_ip = sys.argv[4] from_port = sys.argv[5] to_ip = sys.argv[6] to_port = sys.argv[7] except Exception as error: logging.error('从 Sentinel 获取参数错误: %s ' % (error)) sys.exit(1) sub='redis %s faiover' % (master_name) nodify_message = "%s %s is failover end. sentinel find redis master %s:%s is down. failover to slave %s:%s" % (failover_time,master_name,from_ip,from_port,to_ip,to_port) if role == 'leader': logging.info(nodify_message) send_mail(alarm_mail,sub,nodify_message) if __name__ == "__main__": main()
10.69.25.173 master
10.69.30.170 slave
自己安装完成redis,并且搭建好复制关系。
现在分别在5台Sentinel服务器上面启动Sentinel,有2种方式启动。哪两种自己看前面文章。
redis-sentinel sentinel.conf
启动以后随便找一台服务器查看日志,输出如下提示:
[18219] 12 Dec 09:56:47.161 # Sentinel runid is f3086fc39145cb3d832785899699050d2c7f3b08 [18219] 12 Dec 09:56:47.161 # +monitor master master-6379 10.69.25.173 6379 quorum 1 [18219] 12 Dec 09:56:47.183 * +slave slave 10.69.30.170:6379 10.69.30.170 6379 @ master-6379 10.69.25.173 6379
这里的+slave就表示找到了一个从库。
再看看其他sentinel服务器的日志:
[1480] 12 Dec 09:58:37.250 # Sentinel runid is 812f9f8b860dcc73d4b587e3bdf85df13808a3cd [1480] 12 Dec 09:58:37.250 # +monitor master master-6379 10.69.25.173 6379 quorum 1 [1480] 12 Dec 09:58:38.252 * +slave slave 10.69.30.170:6379 10.69.30.170 6379 @ master-6379 10.69.25.173 6379 [1480] 12 Dec 09:58:38.304 * +sentinel sentinel 10.36.30.204:26379 10.36.30.204 26379 @ master-6379 10.69.25.173 6379 [1480] 12 Dec 09:58:38.388 * +sentinel sentinel 10.37.124.202:26379 10.37.124.202 26379 @ master-6379 10.69.25.173 6379 [1480] 12 Dec 09:58:38.461 * +sentinel sentinel 10.37.124.203:26379 10.37.124.203 26379 @ master-6379 10.69.25.173 6379 [1480] 12 Dec 09:58:39.423 * +sentinel sentinel 10.37.124.204:26379 10.37.124.204 26379 @ master-6379 10.69.25.173 6379
+sentinel表示发现了其他的sentinel服务器。现在整个集群就已经工作了。
首先进入sentinel查看现在的主节点是哪台服务器(随便哪台sentinel都可以):
redis-cli -p 26379
127.0.0.1:26379> info Sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 master0:name=master-6379,status=ok,address=10.69.25.173:6379,slaves=1,sentinels=5 127.0.0.1:26379>
可以看到现在的主库是10.69.25.173:6379。现在我们把这台服务器的redis进程kill掉,查看是否会进行切换:
pkill -9 redis
再次查看,发现主库已经是原来的从库了。
而且还会收到告警邮件,内容如下:
127.0.0.1:26379> info Sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 master0:name=master-6379,status=ok,address=10.69.30.170:6379,slaves=1,sentinels=5 127.0.0.1:26379>
同样的,如果把刚才kill掉的reids重新启动,又会把启动的redis设置为10.69.30.170的从库。
[1480] 12 Dec 10:01:48.921 # +new-epoch 1 [1480] 12 Dec 10:01:48.933 # +vote-for-leader 92517289efcb4ae695eff3e064fde7f4e0e43a1f 1 [1480] 12 Dec 10:01:48.955 # +sdown master master-6379 10.69.25.173 6379 [1480] 12 Dec 10:01:48.955 # +odown master master-6379 10.69.25.173 6379 #quorum 1/1 [1480] 12 Dec 10:01:48.955 # Next failover delay: I will not start a failover before Sat Dec 12 10:07:49 2015 [1480] 12 Dec 10:01:50.067 # +config-update-from sentinel 10.37.124.203:26379 10.37.124.203 26379 @ master-6379 10.69.25.173 6379 [1480] 12 Dec 10:01:50.067 # +switch-master master-6379 10.69.25.173 6379 10.69.30.170 6379 [1480] 12 Dec 10:01:50.067 * +slave slave 10.69.25.173:6379 10.69.25.173 6379 @ master-6379 10.69.30.170 6379 [1480] 12 Dec 10:02:05.109 # +sdown slave 10.69.25.173:6379 10.69.25.173 6379 @ master-6379 10.69.30.170 6379 [1480] 12 Dec 10:03:19.241 # -sdown slave 10.69.25.173:6379 10.69.25.173 6379 @ master-6379 10.69.30.170 6379 [1480] 12 Dec 10:03:29.219 * +convert-to-slave slave 10.69.25.173:6379 10.69.25.173 6379 @ master-6379 10.69.30.170 6379
那么客户端如何知道主从进行切换了呢,如果是java那么有jedis客户端比较方便,如果是php,python语言呢,我们可以自己进行判断。当然还有另外一种方法就是采用dns,修改dns解析。
我这里用python简单写了一个daemon,不会php,哎。
#!/usr/bin/python import redis import os sentinel_server=['10.36.30.203:26379','10.36.30.204:26379','10.37.124.202:26379','10.37.124.203:26379','10.37.124.204:26379'] def queue(host,port): str=''.join(map(lambda xx:(hex(ord(xx))[2:]),os.urandom(16))) pool = redis.ConnectionPool(host=host, port=port, db=0) r = redis.Redis(connection_pool=pool) r.lpush('low_task_queue',str) def get_sentinel(): global master_host global master_port for info in sentinel_server: host=info.split(':')[0] port=info.split(':')[1] try: r = redis.Redis(host=host, port=port) info=r.info('sentinel')['master0']['address'].split(':') master_host=info[0] master_port=info[1] except Exception as error: print 'concat to sentinel error: %s' % (error) pass else: break if __name__ == "__main__": get_sentinel() while True: try: queue(master_host,master_port) except Exception as error: print 'conct redis error %s' % (error) get_sentinel() continue
如果引入dns,那么架构图可以是下面这样:
以上就是简单的测试了,更多的测试交给大家了。
总结:
Redis Sentinel实现高可用还是比较靠谱的,后面线上也打算使用。需要注意的是Redis Sentinel节点推荐3个以上。相比keepalived+redis实现高可用更靠谱,且keepalived+redis还不能管理多个实例,这点是比较麻烦的。
参考资料:
http://segmentfault.com/a/1190000002680804
http://segmentfault.com/a/1190000002685515