Testing the failover
注意: 在这个测试,你应该打开一个标签 让一致性测试应用程序运行
为了了解故障转移,简单的事情是我们这么做()是单个进程崩溃,在我们的例子中是单个主进程
我们可以通过以下命令是被master 并使其崩溃:
192.168.137.4:7004> CLUSTER nodes
9809b72ec290d73d99a3e1b0d12c4c7bf8583c45 192.168.137.3:7003@17003 master - 0 1583521243434 18 connected 0-5460
3c6510bd29af80703ae7c0be5a5884caaa60cd4e 192.168.137.2:7001@17001 slave a7287834bc7db37249614d23e06ed8f9a6c7b3d3 0 1583521243000 14 connected
bf0edaba80c4f31e9b56101572d2a5ccc8aa145c 192.168.137.4:7005@17005 master - 0 1583521243936 19 connected 5461-10922
1b83e27acd5235726aea44702526a8ca0ede9a48 192.168.137.2:7000@17000 slave 9809b72ec290d73d99a3e1b0d12c4c7bf8583c45 0 1583521242520 18 connected
a7287834bc7db37249614d23e06ed8f9a6c7b3d3 192.168.137.4:7004@17004 myself,master - 0 1583521242000 14 connected 10923-16383
191d7306b81ffa85b5837898562eb6bf1479122c 192.168.137.3:7002@17002 slave bf0edaba80c4f31e9b56101572d2a5ccc8aa145c 0 1583521243231 19 connected
192.168.137.4:7004>
192.168.137.4:7004> CLUSTER info
cluster_state:ok
node3:/root/cluster/7005#redis-cli -h 192.168.137.4 -p 7004 -c cluster nodes | grep master
9809b72ec290d73d99a3e1b0d12c4c7bf8583c45 192.168.137.3:7003@17003 master - 0 1583521306528 18 connected 0-5460
bf0edaba80c4f31e9b56101572d2a5ccc8aa145c 192.168.137.4:7005@17005 master - 0 1583521304504 19 connected 5461-10922
a7287834bc7db37249614d23e06ed8f9a6c7b3d3 192.168.137.4:7004@17004 myself,master - 0 1583521302000 14 connected 10923-16383
因此7003 7004 7005 是masters,让我们crash 节点7003 使用DEBUG SEGFAULT command:
node3:/root/cluster/7005#redis-cli -h 192.168.137.3 -p 7003 "info" | grep role
role:master
redis-cli -h 192.168.137.3 -p 7003 debug segfault
node3:/root/cluster/7005#redis-cli -h 192.168.137.3 -p 7003 debug segfault
Error: Server closed the connection
node3:/root/cluster/7005#redis-cli -h 192.168.137.4 -p 7004 -c cluster nodes | grep master
9809b72ec290d73d99a3e1b0d12c4c7bf8583c45 192.168.137.3:7003@17003 master,fail - 1583521519531 1583521517516 18 disconnected 0-5460
bf0edaba80c4f31e9b56101572d2a5ccc8aa145c 192.168.137.4:7005@17005 master - 0 1583521525104 19 connected 5461-10922
a7287834bc7db37249614d23e06ed8f9a6c7b3d3 192.168.137.4:7004@17004 myself,master - 0 1583521524000 14 connected 10923-16383
正如你看到的 在failover系统时是不能接受578个读和577个写的,
然而数据库中没有创建一致性
这个可能听起来有些出乎意外, 因为在第一个章节 我们
说明 Redis Cluster 可能丢失写在failover期间,因为它使用异步复制
我们没有说的是,这种情况不太可能发生,因为Redis将应答发送给客户端,并将复制的命令发送给从机,大约在同一时间,因此会有一个非常小的窗口丢失数据。然而,很难触发并不意味着它是不可能的,所以这不会改变Redis集群提供的一致性保证。
我们现在可以检查故障转移后的集群设置
(注意 同时 我重启了crashed的实例 重新加入集群作为slave)
node3:/root/cluster/7005#redis-cli -h 192.168.137.4 -p 7004 -c cluster nodes
9809b72ec290d73d99a3e1b0d12c4c7bf8583c45 192.168.137.3:7003@17003 slave 1b83e27acd5235726aea44702526a8ca0ede9a48 0 1583521963360 20 connected
3c6510bd29af80703ae7c0be5a5884caaa60cd4e 192.168.137.2:7001@17001 slave a7287834bc7db37249614d23e06ed8f9a6c7b3d3 0 1583521961858 14 connected
bf0edaba80c4f31e9b56101572d2a5ccc8aa145c 192.168.137.4:7005@17005 master - 0 1583521962351 19 connected 5461-10922
1b83e27acd5235726aea44702526a8ca0ede9a48 192.168.137.2:7000@17000 master - 0 1583521962000 20 connected 0-5460
a7287834bc7db37249614d23e06ed8f9a6c7b3d3 192.168.137.4:7004@17004 myself,master - 0 1583521962000 14 connected 10923-16383
191d7306b81ffa85b5837898562eb6bf1479122c 192.168.137.3:7002@17002 slave bf0edaba80c4f31e9b56101572d2a5ccc8aa145c 0 1583521962353 19 connected