zoukankan      html  css  js  c++  java
  • pxc 集群节点被kill -9 了拿什么拯救?

    集群关机或者异常宕机,重启后想要以IST的方式加入集群,需要考虑集群中是否存在满足的节点,该节点的gcache还存放着停机期间所产生的事物。

    重新初始化节点加入集群应该是最后的救命稻草。

    这里模拟某个节点意外宕机或者被kill -9 了。

    1、查看一下grastate.dat的状态

    cat /data/database/pxc3306/grastate.dat

    # GALERA saved state
    version: 2.1
    uuid:    b1ac3465-05a1-11e8-9776-77ea0d01e209
    seqno:   -1
    safe_to_bootstrap: 0
    

    解释:

    • uuid: 节点所属集群的wsrep_cluster_state_uuid
    • seqno: 整个集群的状态,也就是wsrep_last_committed,也是下次做IST的起始点。如果集群非正常关闭或者正在运行状态,seqno的值为-1
    • safe_to_bootstrap:

    正常情况下,节点启动后会从grastate.dat中的seqno开始做IST。如果是非正常关闭,或者某些原因启动失败,这个值会被改为-1,这时候需要借助wsrep_recovery日志进行恢复。

    2、wsrep_recovery

    启动失败:

    /usr/bin/mysqld_safe --defaults-file=/etc/pxc3306.cnf &

    2018-04-03T08:18:47.310804Z mysqld_safe Logging to '/data/database/pxc3306/pxc3306.log'.
    2018-04-03T08:18:47.315813Z mysqld_safe Logging to '/data/database/pxc3306/pxc3306.log'.
    2018-04-03T08:18:47.371770Z mysqld_safe Starting mysqld daemon with databases from /data/database/pxc3306
    2018-04-03T08:18:47.404646Z mysqld_safe WSREP: Running position recovery with --log_error='/data/database/pxc3306/wsrep_recovery.5Dl9fN' --pid-file='/data/database/pxc3306/node-1-recover.pid'
    

    可以看到有个wsrep_recovery.xxxx的日志文件,可以从里面找到Recovered position

    或者:

    /usr/bin/mysqld_safe --defaults-file=/etc/pxc3306.cnf --wsrep-recover &

    3、找到Recovered position

    grep Recovered /data/database/pxc3306/wsrep_recovery.5Dl9fN

    2018-04-03T08:18:54.534461Z 0 [Note] WSREP: Recovered position: bf26341f-43cb-11e8-a863-62c0eb4d9e79:634
    

    可以看到uuid为:bf26341f-43cb-11e8-a863-62c0eb4d9e79,seqno为:634

    4、修改grastate.dat

    将找到的seqno重新写入:

    vim /data/database/pxc3306/grastate.dat

    # GALERA saved state
    version: 2.1
    uuid:    bf26341f-43cb-11e8-a863-62c0eb4d9e79
    seqno:   634
    safe_to_bootstrap: 0
    

    5、寻找合适的donor的节点

    这一步很重要,意味着刚宕机的节点能否以IST的方式加入集群

    在节点1上执行:

    (root@localhost) [mydb_1]>  show status  like 'wsrep_local_cached_downto';
    +---------------------------+-------+
    | Variable_name             | Value |
    +---------------------------+-------+
    | wsrep_local_cached_downto | 625   |
    +---------------------------+-------+
    1 row in set (0.00 sec)
    

    在节点2上执行:

    (root@localhost) [(none)]> show status  like 'wsrep_local_cached_downto';
    +---------------------------+-------+
    | Variable_name             | Value |
    +---------------------------+-------+
    | wsrep_local_cached_downto | 678   |
    +---------------------------+-------+
    1 row in set (0.01 sec)
    

    可以看出:节点1的gcache中缓存的seqno为625,比宕机节点所需要的634要小,意味着节点1的gcache中缓存在宕机节点宕机期间集群所产生的所有事物,因此节点1可以作为donor节点。相反节点2不行。

    6、重新启动

    指定节点1作为donor:

    /usr/bin/mysqld_safe --defaults-file=/etc/pxc3306.cnf --wsrep_sst_donor=pxc-node-0 &

     2018-04-03T08:26:05.689594Z mysqld_safe Logging to '/data/database/pxc3306/pxc3306.log'.
    2018-04-03T08:26:05.694919Z mysqld_safe Logging to '/data/database/pxc3306/pxc3306.log'.
    2018-04-03T08:26:05.741478Z mysqld_safe Starting mysqld daemon with databases from /data/database/pxc3306
    2018-04-03T08:26:05.757902Z mysqld_safe Skipping wsrep-recover forbf26341f-43cb-11e8-a863-62c0eb4d9e79:634 pair
    2018-04-03T08:26:05.760692Z mysqld_safe Assigning bf26341f-43cb-11e8-a863-62c0eb4d9e79:634 to wsrep_start_position
    

    恢复正常,查看一下端口,都ok

    netstat -ltnpa | grep mysqld

    tcp        0      0 0.0.0.0:4567                0.0.0.0:*                   LISTEN      4473/mysqld         
    tcp        0      0 30.0.0.199:33819            30.0.0.198:4567             ESTABLISHED 4473/mysqld         
    tcp    73021      0 30.0.0.199:4568             30.0.0.198:56028            ESTABLISHED 4473/mysqld         
    tcp        0      0 30.0.0.199:54614            30.0.0.196:4567             ESTABLISHED 4473/mysqld         
    tcp        0      0 :::3306                     :::*                        LISTEN      4473/mysqld  
    

    可以看到启用了4568端口,IST专用。

    6、查看一下状态

    登录该节点,查看状态:

    mysql> show status like 'wsrep_cluster_status';
    +----------------------+---------+
    | Variable_name        | Value   |
    +----------------------+---------+
    | wsrep_cluster_status | Primary |
    +----------------------+---------+
    1 row in set (0.01 sec)
    

    随便select一下会发现如下报错:

    ERROR 1047 (08S01): WSREP has not yet prepared node for application use
    

    因为目前数据还没同步完,

    mysql>  show status like 'wsrep_last_committed'; 
    +----------------------+-------+
    | Variable_name        | Value |
    +----------------------+-------+
    | wsrep_last_committed | 655   |
    +----------------------+-------+
    1 row in set (0.00 sec)
    

    wsrep_last_committed 的值和集群其他节点的一致了,这个节点就恢复ok了。

    可以看看节点1的日志:

    2018-04-20T07:57:24.655491Z 0 [Note] WSREP: Member 0.0 (pxc-node-2) requested state transfer from '*any*'. Selected 1.0 (pxc-node-0)(SYNCED) as donor.
    2018-04-20T07:57:24.655541Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 677)
    2018-04-20T07:57:24.655702Z 13 [Note] WSREP: IST request: bf26341f-43cb-11e8-a863-62c0eb4d9e79:634-677|tcp://30.0.0.226:4568
    2018-04-20T07:57:24.655758Z 13 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    2018-04-20T07:57:24.657885Z 0 [Note] WSREP: Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role 'donor' --address '30.0.0.226:4444/xtrabackup_sst//1' --socket '/tmp/pxc3306.sock' --datadir '/data/database/pxc3306/' --defaults-file '/etc/pxc3306.cnf' --defaults-group-suffix ''  --binlog 'host-30-0-0-225-bin' --gtid 'bf26341f-43cb-11e8-a863-62c0eb4d9e79:654' --bypass)
    2018-04-20T07:57:24.665844Z 13 [Note] WSREP: DONOR thread signaled with 0
    2018-04-20T07:57:24.696526Z 0 [Note] WSREP: async IST sender starting to serve tcp://30.0.0.226:4568 sending 655-677
    2018-04-20T07:57:25.632154Z WSREP_SST: [INFO] Bypassing SST. Can work it through IST
    2018-04-20T07:57:25.675074Z 0 [Note] WSREP: 1.0 (pxc-node-0): State transfer to 0.0 (pxc-node-1) complete.
    
  • 相关阅读:
    caffe2安装教程
    如何将360极速浏览器的网页背景颜色设置为护眼色
    3ds max启动慢怎么办?
    3dContactPointAnnotationTool开发日志(十八)
    3dContactPointAnnotationTool开发日志(十七)
    3dContactPointAnnotationTool开发日志(十六)
    3dContactPointAnnotationTool开发日志(十四)
    3dContactPointAnnotationTool开发日志(十三)
    js上传图片及预览功能
    photoSlider-原生js移动开发轮播图、相册滑动插件
  • 原文地址:https://www.cnblogs.com/wshenjin/p/8709311.html
Copyright © 2011-2022 走看看