zoukankan      html  css  js  c++  java
  • mysql group replication 主节点宕机恢复

    一、mysql group replication 生来就要面对两个问题:

      一、主节点宕机如何恢复。

      二、多数节点离线的情况下、余下节点如何继续承载业务。

      在这里我们只讨论第一个问题、也就是说当主结点宕机之后、我们怎么把它从新加入到高可用集群中去。这个问题又可以细分成

      两种情况:

        1、温和打击:主结点的数据还在、宕机期间集群中的其它结点的binlog日志也都还在

              这种情况下重新启动mysql group replication 就可修复问题。

        2、毁灭打击:主结点的数据都没有了

              这种情况下要从其余结点备份恢复宕机结点、然后再重启mysql group replication 就可修复问题。

      详细的修复步骤请看后面的例子

    二、环境介绍:

      环境简介

    主机名         ip地址        mgr角色

    mtls17        10.186.19.17      primary    

    mtls18        10.186.19.18      seconde

    mtls19        10.186.19.19      seconde

      集群状态:

    mysql> select * from replication_group_members;
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    | CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    | group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17      |        3306 | ONLINE       |
    | group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18      |        3306 | ONLINE       |
    | group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19      |        3306 | ONLINE       |
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    3 rows in set (0.00 sec)
    
    mysql> show global status like 'group_replication_primary_member';
    +----------------------------------+--------------------------------------+
    | Variable_name                    | Value                                |
    +----------------------------------+--------------------------------------+
    | group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
    +----------------------------------+--------------------------------------+
    1 row in set (0.00 sec)

      说明:

      由上面的信息可以看出mtls17上的mysql为集群当前的primary结点、并且集群的各结点的状态正常。

    三、情况下的故障模拟 + 解决:

      1、模拟mtls17结点宕机

    ps -ef | grep mysql
    mysql    24125     1  0 00:04 ?        00:00:14 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
    root     26125 22481  0 00:36 pts/0    00:00:00 grep --color=auto mysql
    [root@mtls17 data]# kill -9 24125
    [root@mtls17 data]# ps -ef | grep mysql
    root     26128 22481  0 00:37 pts/0    00:00:00 grep --color=auto mysql

      

      2、查看余下两个结点的情况

    mysql> melect * from replication_group_members;
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    | CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    | group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18      |        3306 | ONLINE       |
    | group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19      |        3306 | ONLINE       |
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    2 rows in set (0.00 sec)
    
    mysql> show global status like 'group_replication_primary_member';
    +----------------------------------+--------------------------------------+
    | Variable_name                    | Value                                |
    +----------------------------------+--------------------------------------+
    | group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e |
    +----------------------------------+--------------------------------------+
    1 row in set (0.00 sec)

      由上面可以看出在mtls17结点上的mysql被kill掉之后、余下的两个结点组成了新的集群、并且mtls18上的mysql

      成为了primary

      

      3、解决primary宕机恢复的问题

    systemctl start mysql
    [root@mtls17 data]# mysql -uroot -pmtls0352
    mysql: [Warning] Using a password on the command line interface can be insecure.
    Welcome to the MySQL monitor.  Commands end with ; or g.
    Your MySQL connection id is 3
    Server version: 5.7.20-log MySQL Community Server (GPL)
    
    Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
    
    mysql> start group_replication;
    Query OK, 0 rows affected (4.03 sec)
    
    mysql>

      4、检查问题是否正常解决

    select * from replication_group_members;
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    | CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    | group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17      |        3306 | ONLINE       |
    | group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18      |        3306 | ONLINE       |
    | group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19      |        3306 | ONLINE       |
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    3 rows in set (0.00 sec)
    
    mysql> show global status like 'group_replication_primary_member';
    +----------------------------------+--------------------------------------+
    | Variable_name                    | Value                                |
    +----------------------------------+--------------------------------------+
    | group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e |
    +----------------------------------+--------------------------------------+
    1 row in set (0.00 sec)

      总论:之前的主结点在宕机之后、通过重启服务、重启mysql-group-replication成功的解决了问题。

    四、模拟primary结点上的数据已经丢失的情况下、如果恢复结点:

      1、退出服务、删除数据

    [root@mtsl18 ~]# ps -ef | grep mysql
    mysql    10843     1  0 00:04 ?        00:00:19 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
    root     13290  9197  0 00:50 pts/0    00:00:00 grep --color=auto mysql
    [root@mtsl18 ~]# kill -9 10843
    [root@mtsl18 ~]# rm -rf /database/mysql/data/3306
    [root@mtsl18 ~]# ps -ef | grep mysql
    root     13339  9197  0 00:50 pts/0    00:00:00 grep --color=auto mysql

      这个实验是接着情况一做下去的、所以primary在mtls18上、所以我们在mtls18上做退出服务、删除数据的动作

      2、查看集群的状态:

    mysql> select * from replication_group_members;
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    | CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    | group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17      |        3306 | ONLINE       |
    | group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19      |        3306 | ONLINE       |
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    2 rows in set (0.00 sec)
    
    mysql> show global status like 'group_replication_primary_member';
    +----------------------------------+--------------------------------------+
    | Variable_name                    | Value                                |
    +----------------------------------+--------------------------------------+
    | group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
    +----------------------------------+--------------------------------------+
    1 row in set (0.01 sec)

      说明:当mtls18宕机后primary就从mtls18切到了mtls17上去了

      3、通过meb备份mtls19用于还原宕机的mtls18

    mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp 
    --host=localhost --user=root --password=mtls0352 
    --backup-dir=/tmp/ --backup-image=/tmp/2017-12-01T12:30:00.mbi --no-history-logging 
    backup-to-image
    
    MySQL Enterprise Backup version 4.1.0 Linux-2.6.39-400.215.10.el5uek-x86_64 [2017/03/01] 
    Copyright (c) 2003, 2017, Oracle and/or its affiliates. All Rights Reserved.
    
    171201 01:01:36 MAIN    INFO: A thread created with Id '140141436434240' 
    171201 01:01:36 MAIN    INFO: Starting with following command line ...
     mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp --host=localhost 
            --user=root --password=xxxxxxxx --backup-dir=/tmp/ 
            --backup-image=/tmp/2017-12-01T12:30:00.mbi --no-history-logging 
            backup-to-image 
    
    171201 01:01:36 MAIN    INFO: 
    171201 01:01:36 MAIN    INFO: MySQL server version is '5.7.20-log'
    .......
    ........
    171201 01:01:40 MAIN    INFO: Full Image Backup operation completed successfully.
    171201 01:01:40 MAIN    INFO: Backup image created successfully.
    171201 01:01:40 MAIN    INFO: Image Path = /tmp/2017-12-01T12:30:00.mbi
    171201 01:01:40 MAIN    INFO: MySQL binlog position: filename mysql-bin.000002, position 1082
    
    -------------------------------------------------------------
       Parameters Summary         
    -------------------------------------------------------------
       Start LSN                  : 2609664
       End LSN                    : 2610075
    -------------------------------------------------------------
    
    mysqlbackup completed OK!

      4、传输备份到mtls18

    scp /tmp/2017-12-01T12:30:00.mbi mtls18:/tmp/

      5、还原备份

    mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/tmp/2017-12-01T12:30:00.mbi 
    > --backup-dir=/tmp/ --datadir=/database/mysql/data/3306/ 
    > copy-back-and-apply-log
    MySQL Enterprise Backup version 4.1.0 Linux-2.6.39-400.215.10.el5uek-x86_64 [2017/03/01] 
    Copyright (c) 2003, 2017, Oracle and/or its affiliates. All Rights Reserved.
    
    171201 01:09:59 MAIN    INFO: A thread created with Id '140530650736448' 
    171201 01:09:59 MAIN    INFO: Starting with following command line ...
     mysqlbackup --defaults-file=/etc/my.cnf 
            --backup-image=/tmp/2017-12-01T12:30:00.mbi --backup-dir=/tmp/ 
            --datadir=/database/mysql/data/3306/ copy-back-and-apply-log 
    
    171201 01:09:59 MAIN    INFO: 
    IMPORTANT: Please check that mysqlbackup run completes successfully.
    .....
    .....
    171201 01:10:08 PCR1    INFO: The first data file is '/database/mysql/data/3306/ibdata1'
                                  and the new created log files are at '/database/mysql/data/3306/'
    171201 01:10:08 MAIN    INFO: MySQL server version is '5.7.20-log'
    171201 01:10:08 MAIN    INFO: Restoring ...5.7.20-log version
    171201 01:10:08 MAIN    INFO: Apply-log operation completed successfully.
    171201 01:10:08 MAIN    INFO: Full Backup has been restored successfully.
    
    mysqlbackup completed OK!

       6、重启mtls18上的mysql

    [root@mtsl18 tmp]# chown -R mysql:mysql /database/mysql/data/3306
    [root@mtsl18 tmp]# systemctl start mysql
    [root@mtsl18 tmp]# ps -ef | grep mysql
    mysql    14205     1 24 01:11 ?        00:00:01 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
    root     14237  9197  0 01:11 pts/0    00:00:00 grep --color=auto mysql

      7、重启mysql group replication

    mysql -uroot -pmtls0352
    mysql: [Warning] Using a password on the command line interface can be insecure.
    Welcome to the MySQL monitor.  Commands end with ; or g.
    Your MySQL connection id is 4
    Server version: 5.7.20-log MySQL Community Server (GPL)
    
    Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
    
    mysql> reset master;
    Query OK, 0 rows affected (0.10 sec)
    
    mysql> reset slave;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> set sql_log_bin=0;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> source /database/mysql/data/3306/backup_gtid_executed.sql ;
    Query OK, 0 rows affected (0.10 sec)
    
    mysql> set sql_log_bin=1;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> change master to
        ->     master_user='mgr_usr',
        ->     master_password='mgr10352'
        ->     for channel 'group_replication_recovery';
    Query OK, 0 rows affected, 2 warnings (0.21 sec)
    
    mysql> start group_replication;
    Query OK, 0 rows affected (3.46 sec)

      8、检查集群的状态是否正常

    mysql> select * from replication_group_members;
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    | CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    | group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17      |        3306 | ONLINE       |
    | group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19      |        3306 | ONLINE       |
    | group_replication_applier | 85f82fce-d65e-11e7-9e92-1e1b3511358e | mtsl18      |        3306 | ONLINE       |
    +---------------------------+--------------------------------------+-------------+-------------+--------------+
    3 rows in set (0.01 sec)
    
    mysql> show global status like 'group_replication_primary_member';
    +----------------------------------+--------------------------------------+
    | Variable_name                    | Value                                |
    +----------------------------------+--------------------------------------+
    | group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
    +----------------------------------+--------------------------------------+
    1 row in set (0.01 sec)

    五、总结:

      对于两种primary宕故障的修复总结:

        1、数据没有丢、binlog日志也没有丢 那直接重启mysql group replication 就行、它会自动修复问题。

        2、数据丢失的情况、先备份还原-->重启mysql group replication 就行。

      对于mysql group replication 维护操作复杂性的总结:  

        总的来说mysql group replication 对dba还是比较友好的、几个小小的操作就能恢复故障的集群。

    六、我写的关于mysql group replication 的相关文章 

      1、mysql group replication 安装与配置详解:http://www.cnblogs.com/JiangLe/p/6727281.html#3849996

      2、mysql group replication 在mysql-5.7.20版本下的可用性报告:http://www.cnblogs.com/JiangLe/p/7809229.html

      3、mysql group replication 主节宕机点恢复 https://i.cnblogs.com/EditPosts.aspx?postid=7941929

      4、mysql group replication 多数据结点丢失情况下的恢复

      5、我写的全自动化安装mysql-group-replication 开源工具 https://github.com/Neeky/mysqltools

    ----

  • 相关阅读:
    bzoj1607: [Usaco2008 Dec]Patting Heads 轻拍牛头
    bzoj1016: [JSOI2008]最小生成树计数
    bzoj1051: [HAOI2006]受欢迎的牛
    bzoj1003: [ZJOI2006]物流运输
    bzoj1079: [SCOI2008]着色方案
    bzoj1179: [Apio2009]Atm
    bzoj1877: [SDOI2009]晨跑
    bzoj1821: [JSOI2010]Group 部落划分 Group
    bzoj1305: [CQOI2009]dance跳舞
    bzoj1858: [Scoi2010]序列操作
  • 原文地址:https://www.cnblogs.com/JiangLe/p/7941929.html
Copyright © 2011-2022 走看看