zoukankan      html  css  js  c++  java
  • Redis Sentinel 原理简单介绍

    记录下自己关于Redis Sentinel的理解~

    不管什么中间件,只要是单点部署就都会有单点故障的隐患,所以很容易想到的架构是:主从架构

    Redis主从架构

    Redis主从复制原理

    主从复制分完全同步、部分同步两种情况:

    1. 完全同步:当一个从节点连接到Master后,向master发送一个SYNC命令(新版本PSYNC),master执行BGSAVE生成RDB文件,同时开启一个buffer记录master上的写操作,RDB文件生成好后,发送给Slave节点,slave保存到本地磁盘,然后再加载到内存。然后master将buffer里面到写命令发给slave, 好像是通过redis协议???

    2. 部分同步:slave可以发送PSYNC master_run_id offset 请求部分同步,master和slaves都会记录同步的offset,如果slave请求同步的offset对应的数据在master上有,就同步给slave, 如果在master上没有,就会执行一次完全同步。

    从库请求master进行一次完全同步:
    master的日志:

    8233:M 01 Sep 2020 16:55:02.260 * Replica 127.0.0.1:6380 asks for synchronization
    38233:M 01 Sep 2020 16:55:02.260 * Full resync requested by replica 127.0.0.1:6380
    38233:M 01 Sep 2020 16:55:02.260 * Starting BGSAVE for SYNC with target: disk
    38233:M 01 Sep 2020 16:55:02.261 * Background saving started by pid 38299
    38299:C 01 Sep 2020 16:55:02.323 * DB saved on disk
    38299:C 01 Sep 2020 16:55:02.323 * RDB: 4 MB of memory used by copy-on-write
    38233:M 01 Sep 2020 16:55:02.378 * Background saving terminated with success
    38233:M 01 Sep 2020 16:55:02.378 * Synchronization with replica 127.0.0.1:6380 succeeded

    从库:6380的日志:

    $ tail -f 6380.log
    38295:S 01 Sep 2020 16:55:02.259 * Connecting to MASTER 127.0.0.1:6379
    38295:S 01 Sep 2020 16:55:02.259 * MASTER <-> REPLICA sync started
    38295:S 01 Sep 2020 16:55:02.259 * Non blocking connect for SYNC fired the event.
    38295:S 01 Sep 2020 16:55:02.260 * Master replied to PING, replication can continue...
    38295:S 01 Sep 2020 16:55:02.260 * Partial resynchronization not possible (no cached master)
    38295:S 01 Sep 2020 16:55:02.262 * Full resync from master: 46ef90de89e6771b67bc2b43371da2f97a03b4d1:0
    38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: receiving 175 bytes from master
    38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: Flushing old data
    38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: Loading DB in memory
    38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: Finished with success

    哨兵架构

    单点故障解决了,但是主从切换还得人工来搞,能不能做到自动切换呢,当然可以!

    Master: 6379
    Slaves:6380,6381
    Sentinels:26379,26380,26381

    哨兵原理

    1. 哨兵之间的自动发现

    1. 每个sentinel节点每2秒都会向自己监控的master和slaves节点的 Pub/Sub channel: __sentinel__:hello发送message
    2. 每个sentinel节点订阅master和slave的channel:__sentinel__:hello 来自动发现其他的sentinel

    sentinel发布的message:__sentinel__:hello通道的内容:

    127.0.0.1:6381> PSUBSCRIBE *

    Reading messages... (press Ctrl-C to quit)

    1. "psubscribe"

    2. "*"

    3. (integer) 1

    4. "pmessage"

    5. "*"

    6. "__sentinel__:hello"

    7. "127.0.0.1,26380,fc976b271914f43a4a318dfe8c1f41a2e747f8d8,1,mymaster,127.0.0.1,6381,1"

    8. "pmessage"

    9. "*"

    10. "__sentinel__:hello"

    11. "127.0.0.1,26379,b60bd3e15db23a9862d213e7703001c72d48dc73,1,mymaster,127.0.0.1,6381,1"

    哨兵节点之间的发布订阅事件内容,自动发现了其他的Sentinel:

    $ src/redis-cli -p 26379

    127.0.0.1:26379> PSUBSCRIBE *

    Reading messages... (press Ctrl-C to quit)

    1. "psubscribe"

    2. "*"

    3. (integer) 1

    4. "pmessage"

    5. "*"

    6. "+sentinel"

    7. "sentinel fc976b271914f43a4a318dfe8c1f41a2e747f8d8 127.0.0.1 26380 @ mymaster 127.0.0.1 6379"

    2. 如何发现其他的Slaves

    通过Master节点知道有哪些Slaves,通过向Master发送info命令来发现Master下的从。

    3. 进行一次自动故障转移

    3.1. master 宕机

    手动kill掉master节点的进程

    3.2. sentinel发现master宕机

    1 查看sentinel的log日志:

    $ tail -f 26379.log
    (手动关闭了master6379节点)
    38502:X 01 Sep 2020 17:26:38.311 # +sdown master mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:38.395 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2

    2 查看sentinel之间的Pub/Sub Channel:

    1. "+sdown"

    2. "master mymaster 127.0.0.1 6379"

    3. "pmessage"

    4. "*"

    5. "+odown"

    6. "master mymaster 127.0.0.1 6379 #quorum 2/2"

    7. "pmessage"

    3.3. Sentinel Leader选举

    在三个sentinel中选出由哪个sentinel来做这次的主从自动切换,首先会sentinel投票
    1 查看sentinel的log日志:

    $ tail -f 26379.log
    38502:X 01 Sep 2020 17:26:38.395 # +new-epoch 1
    38502:X 01 Sep 2020 17:26:38.395 # +try-failover master mymaster 127.0.0.1 6379
    38502:X 01 Sep 2020 17:26:38.396 # +vote-for-leader b60bd3e15db23a9862d213e7703001c72d48dc73 1 (给哨兵b60bd开启投票)
    38502:X 01 Sep 2020 17:26:38.397 # fc976b271914f43a4a318dfe8c1f41a2e747f8d8 voted for b60bd3e15db23a9862d213e7703001c72d48dc73 1 (fc976b给sentinel Id=b60bd投1票)
    38502:X 01 Sep 2020 17:26:38.454 # +elected-leader master mymaster 127.0.0.1 6379

    3.4. 选择合适的slave作为新的master
    1. 查看sentinle的log日志:

    $ tail -f 26379.log
    38502:X 01 Sep 2020 17:26:38.454 # +failover-state-select-slave master mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:38.545 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (选择6381成为新的master)

    38502:X 01 Sep 2020 17:26:38.545 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (6381成为新的master)

    38502:X 01 Sep 2020 17:26:38.646 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:39.282 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:39.282 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:39.346 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:39.480 # -odown master mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:40.186 # +failover-end master mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:40.186 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381

    38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381

    38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

    1. Slave的选择策略:
    1. 存活的slave
    2. 复制偏移量最大的
    3. Run Id 最小的
    1. 6381升级为master:

    $ src/redis-cli -p 6381

    127.0.0.1:6381> info Replication

    # Replication

    role:master

    connected_slaves:1

    slave0:ip=127.0.0.1,port=6380,state=online,offset=79554,lag=0

    master_replid:d349582dc829f56d1da32e2d2f1434c6f2c44802

    master_replid2:46ef90de89e6771b67bc2b43371da2f97a03b4d1

    master_repl_offset:79554

    second_repl_offset:65161

    repl_backlog_active:1

    repl_backlog_size:1048576

    repl_backlog_first_byte_offset:631

    repl_backlog_histlen:78924

    127.0.0.1:6381>

    3.5 上面涉及的完整的日志:
    1. Sentinel间的Pub/Sub内容:

    $ src/redis-cli -p 26379

    127.0.0.1:26379> PSUBSCRIBE *

    Reading messages... (press Ctrl-C to quit)

    1. "psubscribe"

    2. "*"

    3. (integer) 1

    4. "pmessage"

    5. "*"

    6. "+sdown"

    7. "master mymaster 127.0.0.1 6379"

    8. "pmessage"

    9. "*"

    10. "+odown"

    11. "master mymaster 127.0.0.1 6379 #quorum 2/2"

    12. "pmessage"

    13. "*"

    14. "+new-epoch"

    15. "1"

    16. "pmessage"

    17. "*"

    18. "+try-failover"

    19. "master mymaster 127.0.0.1 6379"

    20. "pmessage"

    21. "*"

    22. "+vote-for-leader"

    23. "b60bd3e15db23a9862d213e7703001c72d48dc73 1"

    24. "pmessage"

    25. "*"

    26. "+elected-leader"

    27. "master mymaster 127.0.0.1 6379"

    28. "pmessage"

    29. "*"

    30. "+failover-state-select-slave"

    31. "master mymaster 127.0.0.1 6379"

    32. "pmessage"

    33. "*"

    34. "+selected-slave"

    35. "slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"

    36. "pmessage"

    37. "*"

    38. "+failover-state-send-slaveof-noone"

    39. "slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"

    40. "pmessage"

    41. "*"

    42. "+failover-state-wait-promotion"

    43. "slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"

    44. "pmessage"

    45. "*"

    46. "-role-change"

    47. "slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 new reported role is master"

    48. "pmessage"

    49. "*"

    50. "+promoted-slave"

    51. "slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"

    52. "pmessage"

    53. "*"

    54. "+failover-state-reconf-slaves"

    55. "master mymaster 127.0.0.1 6379"

    56. "pmessage"

    57. "*"

    58. "+slave-reconf-sent"

    59. "slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379"

    60. "pmessage"

    61. "*"

    62. "-odown"

    63. "master mymaster 127.0.0.1 6379"

    64. "pmessage"

    65. "*"

    66. "+slave-reconf-inprog"

    67. "slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379"

    68. "pmessage"

    69. "*"

    70. "+slave-reconf-done"

    71. "slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379"

    72. "pmessage"

    73. "*"

    74. "+failover-end"

    75. "master mymaster 127.0.0.1 6379"

    76. "pmessage"

    77. "*"

    78. "+switch-master"

    79. "mymaster 127.0.0.1 6379 127.0.0.1 6381"

    80. "pmessage"

    81. "*"

    82. "+slave"

    83. "slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381"

    1. 完整的Sentinel哨兵的log日志:

    $ tail -f 26379.log

    38501:X 01 Sep 2020 17:15:48.851 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

    38501:X 01 Sep 2020 17:15:48.851 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=38501, just started

    38501:X 01 Sep 2020 17:15:48.851 # Configuration loaded

    38502:X 01 Sep 2020 17:15:48.854 * Running mode=sentinel, port=26379.

    38502:X 01 Sep 2020 17:15:48.855 # Sentinel ID is b60bd3e15db23a9862d213e7703001c72d48dc73

    38502:X 01 Sep 2020 17:15:48.855 # +monitor master mymaster 127.0.0.1 6379 quorum 2

    38502:X 01 Sep 2020 17:15:48.855 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (发现了slave)

    38502:X 01 Sep 2020 17:17:59.296 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379 (发现了slave)

    38502:X 01 Sep 2020 17:20:31.119 * +sentinel sentinel fc976b271914f43a4a318dfe8c1f41a2e747f8d8 127.0.0.1 26380 @ mymaster 127.0.0.1 6379 (发现了另外一个sentinel)

    38502:X 01 Sep 2020 17:22:42.715 # +sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:23:42.558 * +reboot slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:23:42.659 # -sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

    .............
    ............

    (手动关闭了master6379节点)
    38502:X 01 Sep 2020 17:26:38.311 # +sdown master mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:38.395 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2

    38502:X 01 Sep 2020 17:26:38.395 # +new-epoch 1

    38502:X 01 Sep 2020 17:26:38.395 # +try-failover master mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:38.396 # +vote-for-leader b60bd3e15db23a9862d213e7703001c72d48dc73 1 (给哨兵b60bd开启投票)

    38502:X 01 Sep 2020 17:26:38.397 # fc976b271914f43a4a318dfe8c1f41a2e747f8d8 voted for b60bd3e15db23a9862d213e7703001c72d48dc73 1 (fc976b给sentinel Id=b60bd投1票)

    38502:X 01 Sep 2020 17:26:38.454 # +elected-leader master mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:38.454 # +failover-state-select-slave master mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:38.545 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (选择6381成为新的master)

    38502:X 01 Sep 2020 17:26:38.545 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (6381成为新的master)

    38502:X 01 Sep 2020 17:26:38.646 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:39.282 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:39.282 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:39.346 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:39.480 # -odown master mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:40.186 # +failover-end master mymaster 127.0.0.1 6379

    38502:X 01 Sep 2020 17:26:40.186 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381

    38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381

    38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

    38502:X 01 Sep 2020 17:27:10.258 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

  • 相关阅读:
    dsu on tree
    bzoj3527 [Zjoi2014]力
    bzoj3527 [Zjoi2014]力
    114.遍历文件夹并批量修改文件名
    25.八皇后问题
    24.C语言最全排序方法小结(不断更新)
    112.备忘录设计模式
    110.文件搜索,系统大小获取,以及病毒行为
    109.vprintf vfprintf vscanf vfscanf
    108.sqllite3(C语言数据库库)详解
  • 原文地址:https://www.cnblogs.com/yangweiqiang/p/13627041.html
Copyright © 2011-2022 走看看