哨兵机制
识别挂掉的主节点
quorum(法定人数)
是判定主节点不能访问所需要的最少哨兵数量
执行失效备援perform a failover
其中一个哨兵需要被选为救援的领导,并被授权执行救援,而这需要过半数的哨兵同意
So for example if you have 5 Sentinel processes, and the quorum for a given master set to the value of 2, this is what happens:
- If two Sentinels agree at the same time about the master being unreachable, one of the two will try to start a failover.
- If there are at least a total of three Sentinels reachable, the failover will be authorized and will actually start.
In practical terms this means during failures Sentinel never starts a failover if the majority of Sentinel processes are unable to talk (aka no failover in the minority partition).
一主一从哨兵
+----+ +----+
| M1 |-------| R1 |
| S1 | | S2 |
+----+ +----+
Configuration: quorum = 1
如果M1挂掉,R1会成为主节点,因为两个哨兵可以就M1的失效达成一致,而且可以授权救援。表面上这样可以工作,但是请看下面这种情况。
如果M1所在的机器直接停掉,同时S1就停止工作。这样的话S2就不能授权救援,整个系统就不可用了。
要注意的是,失效救援是需要过半数的哨兵同意。同时如果上图只在一边执行救援,并且没有授权,会非常危险:
+----+ +------+
| M1 |----//-----| [M1] |
| S1 | | S2 |
+----+ +------+
上面的配置了两个主节点(完美对称)(假设S2可以没有授权进行救援)。
客户端可能无限期地往两边写入数据,当网络恢复时并不能知道哪边的配置是对的。
所以至少请在配置三个哨兵在三个不同的机器上。