zoukankan      html  css  js  c++  java
  • 027:高级复制


    一.半同步和无损复制

    主从复制基本上都是 异步复制, Master并不关Slave节点有没有获取到数据 ,所以复制效率很高,但是数据有可能会丢失。

    • 从 MySQL5.5 开始,MySQL推出了semi-sync replication (半同步复制)

      • 至少有一个Slave节点收到binlog后再返回( IO线程接收到即可 )
      • 减少数据丢失风险
      • 不能完全避免数据丢失
      • 超时后,切换回异步复制
    • 从 MySQL5.7.2 开始,MySQL推出了lossless semi-sync replication (无损复制)

      • 二进制日志(binlog)先写远程( IO线程接收到即可 )
      • 可保证数据完全不丢失

    1.1. loss less / semi-sync replication插件安装

    • 1、手工安装
    mysql> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
    mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
    
    • 2、写入配置文件
    [mysqld]
    plugin_dir=/usr/local/mysql/lib/plugin
    plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
    

    上述操作 仅仅是加载了插件 ,还 未启动 对应的功能,需要配置额外的参数:

    [mysqld]
    # 等同于 rpl_semi_sync_master_enabled = 1
    loose_rpl_semi_sync_master_enabled = 1
    # 等同于 rpl_semi_sync_slave_enabled = 1
    loose_rpl_semi_sync_slave_enabled = 1
    # 超时5秒后,则切换回异步方式
    loose_rpl_semi_sync_master_timeout = 5000
    

    使用 loose_ 前缀表示如果没有加载 semi_sync 的插件,则 忽略该参数
    当Slave在Timeout后,又追上了Master了( IO线程 ),则会 自动切换回半同步复制

    注意:半同步复制 / 无损复制主从 上都要 安装插件和开启功能

    注意:要保证任意时刻发生一台机器宕机都不丢失数据的前提是 master sync_binlog设置为1,slave sync_relay_log设置为1。

    1.2. semi-sync replication

    semi-sync replication 称为 半同步复制 ,在一个事务 提交(commit) 的过程时,在 InnoDB 层的 commit log 步骤后,Master节点需要收到 至少一个 Slave节点回复的 ACK (表示 收到了binlog )后,才能继续下一个事务;
    如果在一定时间内(Timeout)内 没有收到ACK ,则 切换为异步模式 ,具体流程如下:

    对应的配置参数如下:

    [mysqld]
    # 开启主的半同步复制
    rpl_semi_sync_master_enabled=1
    # 开启从的半同步复制
    rpl_semi_sync_slave_enabled=1
    # 超时1秒,切回异步
    rpl_semi_sync_master_timeout=1000
    # 至少收到 1 个 slave发回的ack
    rpl_semi_sync_master_wait_for_slave_count=1
    

    1.3. loss less semi-sync replication

    loss less semi-sync replication 称为 无损复制 ,在一个事务 提交(commit) 的过程时,在 MySQL 层的 write binlog 步骤后,Master节点需要收到 至少一个 Slave节点回复的 ACK (表示 收到了binlog )后,才能继续下一个事务;
    如果在一定时间内(Timeout)内 没有收到ACK ,则 切换为异步模式 ,具体流程如下:

    对应的配置参数如下:

    [mysqld]
    # 开启主的半同步复制
    rpl_semi_sync_master_enabled=1
    # 开启从的半同步复制
    rpl_semi_sync_slave_enabled=1
    # 超时1秒,切回异步
    rpl_semi_sync_master_timeout=1000
    [mysqld57]
    # 控制 半同步复制 还是 无损复制 的参数
    # - AFTER_SYNC 表示的是无损复制;(5.7 默认)
    # - AFTER_COMMIT 表示的是半同步复制;
    rpl_semi_sync_master_wait_point=AFTER_SYNC
    # 至少收到 1 个 slave发回的ack
    rpl_semi_sync_master_wait_for_slave_count=1
    

    1.4. 半同步复制与无损复制的对比

    • ACK的时间点不同

      • 半同步复制InnoDB层Commit Log后等待ACK,主从切换会有数据丢失风险。
      • 无损复制MySQL Server层Write binlog后等待ACK,主从切换会有数据变多风险。
    • 主从数据一致性

      • 半同步复制意味着在Master节点上,这个刚刚提交的事务对数据库的修改,对其他事务是可见的。因此,如果在等待Slave ACK的时候crash了,那么会对其他事务出现幻读数据丢失
      • 无损复制在write binlog完成后,就传输binlog,但还没有去写commit log,意味着当前这个事务对数据库的修改,其他事务也是不可见的。因此,不会出现幻读,数据丢失风险。
      • 因此5.7.2引入了无损复制(after_sync)模式,带来的主要收益是解决 after_commit导致的master crash数据丢失问题,因此在引入after_sync模式后,所有提交的数据已被复制,故障切换时数据一致性将得到提升。

    参考- MySQL半同步复制数据一致性分析

    1.5. 演示无损/半同步复制

    • master server
    mysql root@localhost:employees> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
    Query OK, 0 rows affected
    Time: 0.063s
    
    mysql root@localhost:employees>  INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
    Query OK, 0 rows affected
    Time: 0.005s
    
    mysql root@localhost:employees> set global rpl_semi_sync_master_enabled = 1;
    Query OK, 0 rows affected
    Time: 0.020s
    
    mysql root@localhost:employees> set global rpl_semi_sync_master_timeout = 5000;
    Query OK, 0 rows affected
    Time: 0.001s
    
    mysql root@localhost:employees> show global status like "%rpl%";
    +--------------------------------------------+-------+
    | Variable_name                              | Value |
    +--------------------------------------------+-------+
    | Rpl_semi_sync_master_clients               | 0     |
    | Rpl_semi_sync_master_net_avg_wait_time     | 0     |
    | Rpl_semi_sync_master_net_wait_time         | 0     |
    | Rpl_semi_sync_master_net_waits             | 0     |
    | Rpl_semi_sync_master_no_times              | 0     |
    | Rpl_semi_sync_master_no_tx                 | 0     |
    | Rpl_semi_sync_master_status                | ON    |   -- status ok
    | Rpl_semi_sync_master_timefunc_failures     | 0     |
    | Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
    | Rpl_semi_sync_master_tx_wait_time          | 0     |
    | Rpl_semi_sync_master_tx_waits              | 0     |
    | Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
    | Rpl_semi_sync_master_wait_sessions         | 0     |
    | Rpl_semi_sync_master_yes_tx                | 0     |
    | Rpl_semi_sync_slave_status                 | OFF   |
    +--------------------------------------------+-------+
    15 rows in set
    Time: 0.013s
    mysql root@localhost:employees>
    
    • slave server
    (root@localhost) 09:59:14 [employees]> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
    Query OK, 0 rows affected (0.05 sec)
    
    (root@localhost) 18:07:15 [employees]> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
    Query OK, 0 rows affected (0.00 sec)
    
    (root@localhost) 18:07:29 [employees]> set global rpl_semi_sync_slave_enabled = 1;
    Query OK, 0 rows affected (0.00 sec)
    
    (root@localhost) 18:14:59 [employees]> show global variables like '%semi%';
    +-------------------------------------------+------------+
    | Variable_name                             | Value      |
    +-------------------------------------------+------------+
    | rpl_semi_sync_master_enabled              | OFF        |
    | rpl_semi_sync_master_timeout              | 10000      |
    | rpl_semi_sync_master_trace_level          | 32         |
    | rpl_semi_sync_master_wait_for_slave_count | 1          |
    | rpl_semi_sync_master_wait_no_slave        | ON         |
    | rpl_semi_sync_master_wait_point           | AFTER_SYNC |
    | rpl_semi_sync_slave_enabled               | ON         |
    | rpl_semi_sync_slave_trace_level           | 32         |
    +-------------------------------------------+------------+
    8 rows in set (0.02 sec)
    

    半复制切换异步同步过程的状态

    • master server
    (root@localhost) 10:11:10 [tablespace]>  show global status like "%rpl%";
    +--------------------------------------------+-------+
    | Variable_name                              | Value |
    +--------------------------------------------+-------+
    | Rpl_semi_sync_master_clients               | 1     |  -- 半同步复制的client数量
    | Rpl_semi_sync_master_net_avg_wait_time     | 0     |
    | Rpl_semi_sync_master_net_wait_time         | 0     |  
    | Rpl_semi_sync_master_net_waits             | 1     |  -- master总的等待slave的次数
    | Rpl_semi_sync_master_no_times              | 0     |  -- 切成异步的次数(no = number of)
    | Rpl_semi_sync_master_no_tx                 | 0     |  -- 切成异步后提交的事物数
    | Rpl_semi_sync_master_status                | ON    |  -- 半同步复制的状态
    | Rpl_semi_sync_master_timefunc_failures     | 0     |
    | Rpl_semi_sync_master_tx_avg_wait_time      | 2209  |  -- master等待事物的平均时间
    | Rpl_semi_sync_master_tx_wait_time          | 2209  |  -- master等待事物的总的时间
    | Rpl_semi_sync_master_tx_waits              | 1     |  -- master等待事物的次数
    | Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
    | Rpl_semi_sync_master_wait_sessions         | 0     |
    | Rpl_semi_sync_master_yes_tx                | 1     |
    | Rpl_semi_sync_slave_status                 | OFF   |
    +--------------------------------------------+-------+
    15 rows in set (0.03 sec)
    
    • slave server
    (root@localhost) 13:47:48 [tablespace]> stop slave io_thread;  --停掉IO线程
    Query OK, 0 rows affected (0.05 sec)
    
    • master server
    (root@localhost) 14:48:11 [tablespace]>  show global status like "%rpl%";
    +--------------------------------------------+-------+
    | Variable_name                              | Value |
    +--------------------------------------------+-------+
    | Rpl_semi_sync_master_clients               | 1     |
    | Rpl_semi_sync_master_net_avg_wait_time     | 0     |
    | Rpl_semi_sync_master_net_wait_time         | 0     |
    | Rpl_semi_sync_master_net_waits             | 1     |
    | Rpl_semi_sync_master_no_times              | 0     |
    | Rpl_semi_sync_master_no_tx                 | 0     |
    | Rpl_semi_sync_master_status                | ON    |
    | Rpl_semi_sync_master_timefunc_failures     | 0     |
    | Rpl_semi_sync_master_tx_avg_wait_time      | 2209  |
    | Rpl_semi_sync_master_tx_wait_time          | 2209  |
    | Rpl_semi_sync_master_tx_waits              | 1     |
    | Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
    | Rpl_semi_sync_master_wait_sessions         | 0     |
    | Rpl_semi_sync_master_yes_tx                | 1     |
    | Rpl_semi_sync_slave_status                 | OFF   |
    +--------------------------------------------+-------+
    15 rows in set (0.00 sec)
    
    (root@localhost) 14:48:14 [tablespace]> insert into qqq values(99);  --插入sql
    Query OK, 1 row affected (5.04 sec)    -- 等待5秒后,切成异步
    
    (root@localhost) 14:49:28 [tablespace]>  show global status like "%rpl%";
    +--------------------------------------------+-------+
    | Variable_name                              | Value |
    +--------------------------------------------+-------+
    | Rpl_semi_sync_master_clients               | 0     |
    | Rpl_semi_sync_master_net_avg_wait_time     | 0     |
    | Rpl_semi_sync_master_net_wait_time         | 0     |
    | Rpl_semi_sync_master_net_waits             | 1     | 
    | Rpl_semi_sync_master_no_times              | 1     |  --切成异步的次数
    | Rpl_semi_sync_master_no_tx                 | 1     |  --切成异步后的事物数
    | Rpl_semi_sync_master_status                | OFF   |  --status 为off
    | Rpl_semi_sync_master_timefunc_failures     | 0     |
    | Rpl_semi_sync_master_tx_avg_wait_time      | 2209  |
    | Rpl_semi_sync_master_tx_wait_time          | 2209  |
    | Rpl_semi_sync_master_tx_waits              | 1     |
    | Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
    | Rpl_semi_sync_master_wait_sessions         | 0     |
    | Rpl_semi_sync_master_yes_tx                | 1     |
    | Rpl_semi_sync_slave_status                 | OFF   |
    +--------------------------------------------+-------+
    15 rows in set (0.00 sec)
    
    (root@localhost) 14:49:57 [tablespace]>
    

    1.6. 两种复制方式的性能

    备注:上图是Facebook的测试性能图;其中Y轴是QPS,X轴是并发数

    • 蓝色的 Normal Slave异步复制
      • 性能很好,但是随着并发数的增长,性能有所下降
    • 绿色的 Enhanced mysqlbinlog无损复制
      • 随着并发数的增长,性能几乎是线性增长的,在高并发下,性能会优于异步复制
    • 紫色的 Normal Semi Slave 是 半同步复制
      • 性能较低

    无损复制性能优于半同步复制的原因

    1. 等待ACK回包问题上,其实两种复制的开销是一样的,没有区别,都是网络的等待开销。
    2. 无损复制由于在 write binlog (commit 的第二步)后,需要等待ACK,后续的事务无法提交,这样就 堆积N多个需要落盘的事务(半同步复制由于已经提交了事务,没有堆积事务的效果),通过 组提交 机制一次 fsync的多个事务(半同步复制也有组提交,只是一次 fsync 的事务数没那么多), 相当于提高了I/O性能 ;所以线程(事务)越多,效果越明显,以至于有上图中超过异步复制的效果。(无损复制的组提交比例比原版的高3~4倍)

    产生上述测试效果的前提:测试用例是 IO Bound 的(比如数据量有 100G,而 buffer pool 只有 10G),且并发数足够多。

    下面这两个参数不要去设置,设置了反而性能差

    (root@localhost) 10:10:47 [tablespace]> show variables like "%binlog_group%";
    +-----------------------------------------+-------+
    | Variable_name                           | Value |
    +-----------------------------------------+-------+
    | binlog_group_commit_sync_delay          | 0     |
    | binlog_group_commit_sync_no_delay_count | 0     |   -- 等待一组里面有多少事务我才提交
    +-----------------------------------------+-------+
    2 rows in set (0.01 sec)
    
    (root@localhost) 10:10:50 [tablespace]> show variables like "%binlog_max%";
    +-----------------------------+-------+
    | Variable_name               | Value |
    +-----------------------------+-------+
    | binlog_max_flush_queue_time | 0     |               -- 等待多少时间后才进行组提交
    +-----------------------------+-------+
    1 row in set (0.00 sec)
    
    (root@localhost) 10:11:10 [tablespace]>
    

    1.7. rpl_semi_sync_master_wait_for_slave_count

    该参数控制Master在收到 多少个 Slave的ACK 后,才可以继续commit。配置多个ACK和配置一个ACK的效果是类似的,因为他们是 并行执行 的(理论上来说不会有两倍的等待时间), 取决于最慢的那个 。


    二. 并行复制(Multi-Threaded Slave)

    2.1. MTS介绍

    在官方文档中,并行复制的叫法为 Multi-Threaded Slave (MTS)

    • MySQL的并行复制基于组提交:

    一个组提交中的事务都是可以并行执行的 ,因为既然处于组提交中,这意味着事务之间没有冲突(不会去更新同一行数据),否则不可能在同一个组里面。
    Slave上开启并行复制,需要在配置文件中增加以下参数:

    [mysqld]
    slave-parallel-type=LOGICAL_CLOCK   
    slave_preserve_commit_order=1
    slave-parallel-workers=4
    
    • slave-parallel-type 参数

      • DATABASE 基于库级别的并行复制,如果只有一个库,就还是串行(为了兼容5.6)。
      • LOGICAL_CLOCK 基于逻辑时钟,主上怎么并行执行,从上也是怎么并行回放的。
    • slave-parallel-workers 并行复制的线程数,一般设置为一个组内提交的事务数,线上设置为32足够了

    • slave_preserve_commit_order Slave上commit的顺序保持一致,必须为1,否则可能会有GAP锁产生

    2.2. 动态调整复制线程数

    配置并行复制后,Slave节点可以看到4个 Coordinator 线程

    mysql root@localhost:(none)> show processlist;
    +----+-------------+-----------+--------+---------+--------+--------------------------------------------------------+------------------+
    | Id | User        | Host      | db     | Command | Time   | State                                                  | Info             |
    +----+-------------+-----------+--------+---------+--------+--------------------------------------------------------+------------------+
    | 1  | system user |           | <null> | Connect | 148155 | Waiting for master to send event                       | <null>           |
    | 2  | system user |           | <null> | Connect | 57792  | Slave has read all relay log; waiting for more updates | <null>           |
    | 3  | system user |           | <null> | Connect | 57791  | Waiting for an event from Coordinator                  | <null>           |
    | 4  | system user |           | <null> | Connect | 148155 | Waiting for an event from Coordinator                  | <null>           |
    | 7  | system user |           | <null> | Connect | 148155 | Waiting for an event from Coordinator                  | <null>           |
    | 8  | system user |           | <null> | Connect | 148155 | Waiting for an event from Coordinator                  | <null>           |
    | 12 | root        | localhost | <null> | Query   | 0      | starting                                               | show processlist |
    +----+-------------+-----------+--------+---------+--------+--------------------------------------------------------+------------------+
    7 rows in set
    Time: 0.019s
    mysql root@localhost:(none)>
    
    -- 动态调整方式如下:
    
    mysql root@localhost:(none)> set global slave_parallel_workers=8;
    Query OK, 0 rows affected
    Time: 0.003s
    mysql root@localhost:(none)> stop slave;  一定要重启一下slave才能有效
    Query OK, 0 rows affected
    Time: 0.038s
    mysql root@localhost:(none)> start slave;
    Query OK, 0 rows affected
    Time: 0.080s
    mysql root@localhost:(none)> show processlist;
    +----+-------------+-----------+--------+---------+------+--------------------------------------------------------+------------------+
    | Id | User        | Host      | db     | Command | Time | State                                                  | Info             |
    +----+-------------+-----------+--------+---------+------+--------------------------------------------------------+------------------+
    | 12 | root        | localhost | <null> | Query   | 0    | starting                                               | show processlist |
    | 13 | system user |           | <null> | Connect | 3    | Waiting for master to send event                       | <null>           |
    | 14 | system user |           | <null> | Connect | 3    | Slave has read all relay log; waiting for more updates | <null>           |
    | 15 | system user |           | <null> | Connect | 3    | Waiting for an event from Coordinator                  | <null>           |
    | 16 | system user |           | <null> | Connect | 3    | Waiting for an event from Coordinator                  | <null>           |
    | 17 | system user |           | <null> | Connect | 3    | Waiting for an event from Coordinator                  | <null>           |
    | 18 | system user |           | <null> | Connect | 3    | Waiting for an event from Coordinator                  | <null>           |
    | 19 | system user |           | <null> | Connect | 3    | Waiting for an event from Coordinator                  | <null>           |
    | 20 | system user |           | <null> | Connect | 3    | Waiting for an event from Coordinator                  | <null>           |
    | 21 | system user |           | <null> | Connect | 3    | Waiting for an event from Coordinator                  | <null>           |
    | 22 | system user |           | <null> | Connect | 3    | Waiting for an event from Coordinator                  | <null>           |
    +----+-------------+-----------+--------+---------+------+--------------------------------------------------------+------------------+
    11 rows in set
    Time: 0.019s
    mysql root@localhost:(none)>
    

    特别注意:

    这里的 并行复制 指的是 SQL Thread (回放线程),而非IO Thread (IO线程)
    Waiting for master to send event 这个 Stateshow processlist 中只有一个,即只有一个 IO Thread

    线上环境可以配置成两台Slave做无损复制(保证数据不丢),其他的Slave做异步复制(配置为只读,用于负载均衡),都指向同一台Master。


    三. GTID

    3.1. GTID的介绍

    • 1.Global Transaction Id entifier -- 全局事物ID

    • 2.GTID = Server_UUID + Transaction_ID

      • Server_UUID 是全局唯一的
      • Transaction_ID 是自增的
    • 3.GTID 的作用是替代 Filename + Position

    (root@localhost) 14:54:25 [tablespace]> show variables like "server_uuid";
    +---------------+--------------------------------------+
    | Variable_name | Value                                |
    +---------------+--------------------------------------+
    | server_uuid   | 9dc847d8-bf72-11e7-9ec4-000c2998e4f1 |
    +---------------+--------------------------------------+
    1 row in set (0.01 sec)
    

    在MySQL中看到的 UUID ,实际是保存在 $DATADIR/auto.cnf 中的,且该文件是服务器初始化的时候自动生成的。

    [root@node1 mysqldata]# cat /r2/mysqldata/auto.cnf
    [auto]
    server-uuid=9dc847d8-bf72-11e7-9ec4-000c2998e4f1
    [root@node1 mysqldata]#
    

    通过 冷备 做备份,拷贝 $DATADIR 时,记得要把备份中的 auto.cnf 给删除、

    3.2. GTID的意义

    • 未使用GTID

      • Master宕机 后,一个 Slave选举提升New Master,如果需要重建复制关系,就需要把另外两个SlaveCHANGE MASTER 指向 New Master
      • 那问题来了,原来Slave是指向 MasterFilename_M + Position_M 的位置,现在要指向 New Master 上新的 Filename_N + Position_N 的位置,由于基于二进制日志复制 ,定位两个位置步骤繁琐; 此时两个Slave要继续重建复制关系(CHANGE MASTER)会比较麻烦。
    • 使用GTID

      • 和上面一样的场景,选举机制提升为New Master的GTID 执行到最新事务, 两个Slave需要重新指向 New Master ,由于 使用了GTID ,目前 Slave-A 获取到的事务对应GTID为GTID_ASlave-B 获取到的事务对应GTID为GTID_B;
      • 此时 New Master 上GTID是存在 GTID_A 和 GTID_B,那两个Slave就可以直接使用 GTID_A 和 GTID_B 这两个GTID,通过指向 New Master 接着重建复制;

    3.3. GTID的配置

    [mysqld]
    log_bin = binlog
    log_slave_updates = 1
    gtid_mode = ON 
    enforce-gtid-consistency = true
    
    • 注意:

      • 1.MySQL5.6 必须开启参数 log_slave_updates (5.6版本的限制)

      • 2.MySQL5.6 升级到gtid模式需要停机重启

      • 3.MySQL5.7 版本开始可以不开启 log_slave_updates

      • 4.MySQL5.7.6 版本开始可以在线升级gtid模式

    3.4. 基于GTID的复制

    3.4.1. 配置基于GTID的复制

    配置信息 主库(master) 从库(slave)
    主机 node1.gczheng.com node3.gczheng.com
    IP 192.168.88.88 192.168.88.100
    Port 3306 3306
    MySQL版本 MySQL5.7.18 MySQL5.7.18
    Server_ID 8888 88100

    数据库的备份和还原步骤略过,这里使用 mydumper备份,master上创建账号repl@'192.168.88.100',密码和权限同repl@'192.168.88.99'

    • 1.master节点的配置文件加入
    [mysqld]
    gtid_mode = ON 
    enforce-gtid-consistency = true
    
    • 2.slave 节点加入
    [mysqld]
    gtid_mode = ON 
    enforce-gtid-consistency = true
    
    
    • 3.查看metadata信息
    [root@node3 alldb]# cat metadata
    Started dump at: 2018-02-15 18:53:58                    --dump时间
    SHOW MASTER STATUS:
    	Log: binlog.000014                                  --Filename
    	Pos: 1361                                           --Pos
    	GTID:9dc847d8-bf72-11e7-9ec4-000c2998e4f1:1-52      --GTID时间
    

    注意:metadata中的 GTID:9dc847d8-bf72-11e7-9ec4-000c2998e4f1:1-52 表示为:这个GTID对应的 事物 已在备份中了,Slave还原备份后,进行复制时要跳过这个GTID即对应的事物。

    3.4.2. 跳过GTIDs

    • Slave节点

    如果备份使用 mysqldump 进行备份, 请跳过该步骤 ,因为在备份中已经存在下面这个语句:

    --
    -- GTID state at the beginning of the backup 
    --
    SET @@GLOBAL.GTID_PURGED='9dc847d8-bf72-11e7-9ec4-000c2998e4f1:1-52';
    

    因为我们使用 mydumper 进行备份, myloader还原并不会帮我们执行上述语句,需要手工执行该语句,让 Slave 知道这个GTID需要跳过:

    mysql> SET @@GLOBAL.GTID_PURGED='9dc847d8-bf72-11e7-9ec4-000c2998e4f1:1-52';
    ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.
    -- 这里报出一个错误,意思为如果要设置 @@GLOBAL.GTID_PURGED ,则 @@GLOBAL.GTID_EXECUTED 必须为空。
    

    需要使用 reset master 命令,将 @@GLOBAL.GTID_EXECUTED清空。

    (root@localhost) 09:22:38 [tablespace]> reset master;
    Query OK, 0 rows affected (0.07 sec)
    
    (root@localhost) 09:23:18 [tablespace]> SET @@GLOBAL.GTID_PURGED='9dc847d8-bf72-11e7-9ec4-000c2998e4f1:1-52';
    Query OK, 0 rows affected (0.02 sec)
    

    至此,我们已经跳过了备份中存在的GTIDs所对应的事物

    3.4.3. CHANGE MASTER

    • Slave节点
    (root@localhost) 10:33:35 [tablespace]> change master to master_host="192.168.88.88", master_port=3306, master_user='repl',master_password='123456', master_auto_position=1;
    Query OK, 0 rows affected, 2 warnings (0.04 sec)
    
    (root@localhost) 10:33:46 [tablespace]> start slave;
    Query OK, 0 rows affected (0.03 sec)
    
    (root@localhost) 10:58:50 [tablespace]> show slave status  G;
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.88.88
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: binlog.000014
              Read_Master_Log_Pos: 1893
                   Relay_Log_File: node3-relay-bin.000007
                    Relay_Log_Pos: 445
            Relay_Master_Log_File: binlog.000014
                 Slave_IO_Running: Yes                          --IO线程ok
                Slave_SQL_Running: Yes                          --SQL线程ok
                  Replicate_Do_DB:
              Replicate_Ignore_DB:
               Replicate_Do_Table:
           Replicate_Ignore_Table:
          Replicate_Wild_Do_Table:
      Replicate_Wild_Ignore_Table:
                       Last_Errno: 0
                       Last_Error:
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 1893
                  Relay_Log_Space: 2598
                  Until_Condition: None
                   Until_Log_File:
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File:
               Master_SSL_CA_Path:
                  Master_SSL_Cert:
                Master_SSL_Cipher:
                   Master_SSL_Key:
            Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error:
                   Last_SQL_Errno: 0
                   Last_SQL_Error:
      Replicate_Ignore_Server_Ids:
                 Master_Server_Id: 8888
                      Master_UUID: 9dc847d8-bf72-11e7-9ec4-000c2998e4f1
                 Master_Info_File: /r2/mysqldata/master.info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
               Master_Retry_Count: 86400
                      Master_Bind:
          Last_IO_Error_Timestamp:
         Last_SQL_Error_Timestamp:
                   Master_SSL_Crl:
               Master_SSL_Crlpath:
               Retrieved_Gtid_Set: 9dc847d8-bf72-11e7-9ec4-000c2998e4f1:29-54   --获取到GTID
                Executed_Gtid_Set: 9dc847d8-bf72-11e7-9ec4-000c2998e4f1:29-54   --执行到GTID
                    Auto_Position: 1
             Replicate_Rewrite_DB:
                     Channel_Name:
               Master_TLS_Version:
    1 row in set (0.00 sec)
    
    ERROR:
    No query specified
    

    3.4.4. 复制完成

    • master节点
    (root@localhost) 10:57:09 [tablespace]> show slave hosts;
    +-----------+------+------+-----------+--------------------------------------+
    | Server_id | Host | Port | Master_id | Slave_UUID                           |
    +-----------+------+------+-----------+--------------------------------------+
    |     88100 |      | 3306 |      8888 | 5d4e3453-0fc5-11e8-b387-000c29c9f498 |  --新建slave
    |      8899 |      | 3306 |      8888 | 1951c7ee-c1fb-11e7-ac44-005056305232 |
    +-----------+------+------+-----------+--------------------------------------+
    2 rows in set (0.00 sec)
    
    (root@localhost) 10:58:34 [tablespace]> insert into qqq values(101);  --插入a的值101
    Query OK, 1 row affected (0.02 sec)
    
    (root@localhost) 10:59:07 [tablespace]> select * from  qqq;
    +------+
    | a    |
    +------+
    |    1 |
    |    2 |
    |    3 |
    |    4 |
    |   10 |
    |   99 |
    |  101 |
    +------+
    7 rows in set (0.00 sec)
    
    • slave节点
    (root@localhost) 11:02:32 [tablespace]> select * from  qqq;
    +------+
    | a    |
    +------+
    |    1 |
    |    2 |
    |    3 |
    |    4 |
    |   10 |
    |   99 |
    |  101 |  --slave同步到值101
    +------+
    7 rows in set (0.00 sec)
    
    (root@localhost) 11:03:08 [tablespace]>
    
    

    3.5. GTID 与 Filename-Pos的对应

    在 binlog 中,多了一个 GTID 的 event ,如下所示

    (root@localhost) 11:43:24 [tablespace]> show binlog events;
    +---------------+-----+----------------+-----------+-------------+--------------------------------------------------------------------+
    | Log_name      | Pos | Event_type     | Server_id | End_log_pos | Info                                                               |
    +---------------+-----+----------------+-----------+-------------+--------------------------------------------------------------------+
    | binlog.000001 |   4 | Format_desc    |     88100 |         123 | Server ver: 5.7.18-log, Binlog ver: 4                              |
    | binlog.000001 | 123 | Previous_gtids |     88100 |         154 |                                                                    |
    | binlog.000001 | 154 | Gtid           |      8888 |         219 | SET @@SESSION.GTID_NEXT= '9dc847d8-bf72-11e7-9ec4-000c2998e4f1:55' | -- 一个GTID 的 event
    | binlog.000001 | 219 | Query          |      8888 |         282 | BEGIN                                                              |
    | binlog.000001 | 282 | Table_map      |      8888 |         334 | table_id: 227 (tablespace.qqq)                                     |
    | binlog.000001 | 334 | Write_rows     |      8888 |         374 | table_id: 227 flags: STMT_END_F                                    |
    | binlog.000001 | 374 | Xid            |      8888 |         405 | COMMIT /* xid=1407 */                                              |
    +---------------+-----+----------------+-----------+-------------+--------------------------------------------------------------------+
    7 rows in set (0.00 sec)
    
    (root@localhost) 11:43:39 [tablespace]>
    
    

    通过 扫描 binlog中的GTID值,就可以知道 GTIDFilename-Pos对应的关系 ,但是如果binlog非常大,扫描的量也是会很大的,所以在binlog开头部分有一个 Previous_gtids 的event,如下所示:

    (root@localhost) 11:44:57 [tablespace]> show binlog events in 'binlog.000001' G;
    *************************** 1. row ***************************
       Log_name: binlog.000001
            Pos: 4
     Event_type: Format_desc
      Server_id: 88100
    End_log_pos: 123
           Info: Server ver: 5.7.18-log, Binlog ver: 4
    *************************** 2. row ***************************
       Log_name: binlog.000001
            Pos: 123
     Event_type: Previous_gtids      -- 表示在次之前,GTID运行到的范围是哪里
      Server_id: 88100
    End_log_pos: 154
           Info:
    *************************** 3. row ***************************
       Log_name: binlog.000001
            Pos: 154
     Event_type: Gtid
      Server_id: 8888
    End_log_pos: 219
           Info: SET @@SESSION.GTID_NEXT= '9dc847d8-bf72-11e7-9ec4-000c2998e4f1:55'
    *************************** 4. row ***************************
       Log_name: binlog.000001
            Pos: 219
     Event_type: Query
      Server_id: 8888
    End_log_pos: 282
           Info: BEGIN
    *************************** 5. row ***************************
       Log_name: binlog.000001
            Pos: 282
     Event_type: Table_map
      Server_id: 8888
    End_log_pos: 334
           Info: table_id: 227 (tablespace.qqq)
    *************************** 6. row ***************************
       Log_name: binlog.000001
            Pos: 334
     Event_type: Write_rows
      Server_id: 8888
    End_log_pos: 374
           Info: table_id: 227 flags: STMT_END_F
    *************************** 7. row ***************************
       Log_name: binlog.000001
            Pos: 374
     Event_type: Xid
      Server_id: 8888
    End_log_pos: 405
           Info: COMMIT /* xid=1407 */
    7 rows in set (0.01 sec)
    
    ERROR:
    No query specified
    
    (root@localhost) 11:45:19 [tablespace]>
    
    

    如果我要的GTIDPrevious_gtids 的大,就扫描当前文件,反之则扫描之前的文件,依次类推。因为binlog在rotate(rotate events)的时候,是知道当前最大的GTID的,可以将该值写入到下一个新的binlog的开头,即 Previous_gtid

    3.6. GTID复制出错的处理

    演示人为的在从机上误操作导致的复制失败,如何恢复?

    3.6.1. 演示

    • 1.在从机上插入一条记录(模拟误操作)

      • slave节点

    现在 Slave 上插入一条记录(现实中如果配置了readonly,在app中是无法插入的,app不会给root权限)

    mysql root@localhost:(none)> select * from ttt.t1;
    +---+
    | a |  --a是主键
    +---+
    | 1 |
    | 2 |
    | 3 |
    +---+
    3 rows in set
    Time: 0.012s
    mysql root@localhost:(none)> insert into ttt.t1 values(null);
    Query OK, 1 row affected
    Time: 0.003s
    mysql root@localhost:(none)> select * from ttt.t1;
    +---+
    | a |
    +---+
    | 1 |
    | 2 |
    | 3 |
    | 4 |
    +---+
    4 rows in set
    Time: 0.014s
    mysql root@localhost:(none)>
    
    
    • 2.在主机上插入同样的记录

      • master节点
    mysql root@localhost:ttt> select * from t1;
    +---+
    | a |
    +---+
    | 1 |
    | 2 |
    | 3 |
    +---+
    3 rows in set
    Time: 0.009s
    mysql root@localhost:ttt> insert into t1 values(null);
    Query OK, 1 row affected
    Time: 0.007s
    mysql root@localhost:ttt>
    
    
    • 3.查看 Slave上的状态

      • slave节点
    mysql root@localhost:(none)> show slave status G;
    ***************************[ 1. row ]***************************
    Slave_IO_State                | Waiting for master to send event
    Master_Host                   | 192.168.88.88
    Master_User                   | repl
    Master_Port                   | 3306
    Connect_Retry                 | 60
    Master_Log_File               | binlog.000014
    Read_Master_Log_Pos           | 6195
    Relay_Log_File                | node2-relay-bin.000002
    Relay_Log_Pos                 | 568
    Relay_Master_Log_File         | binlog.000014
    Slave_IO_Running              | Yes
    Slave_SQL_Running             | No
    Replicate_Do_DB               |
    Replicate_Ignore_DB           |
    Replicate_Do_Table            |
    Replicate_Ignore_Table        |
    Replicate_Wild_Do_Table       |
    Replicate_Wild_Ignore_Table   |
    Last_Errno                    | 1062
    Last_Error                    | Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '9dc847d8-bf72-11e7-9ec4-000c2998e4f1:74' at master log binlog.000014, end_log_pos 6164. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
    Skip_Counter                  | 0
    Exec_Master_Log_Pos           | 5944
    Relay_Log_Space               | 1026
    Until_Condition               | None
    Until_Log_File                |
    Until_Log_Pos                 | 0
    Master_SSL_Allowed            | No
    Master_SSL_CA_File            |
    Master_SSL_CA_Path            |
    Master_SSL_Cert               |
    Master_SSL_Cipher             |
    Master_SSL_Key                |
    Seconds_Behind_Master         | <null>
    Master_SSL_Verify_Server_Cert | No
    Last_IO_Errno                 | 0
    Last_IO_Error                 |
    Last_SQL_Errno                | 1062 --错误代码1062,主键重复
    Last_SQL_Error                | Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '9dc8
    47d8-bf72-11e7-9ec4-000c2998e4f1:74' at master log binlog.000014, end_log_pos 6164. See error log and/or performance_schema.replication_applier_status_by_worker table for m
    ore details about this failure or others, if any.
    Replicate_Ignore_Server_Ids   |
    Master_Server_Id              | 8888
    Master_UUID                   | 9dc847d8-bf72-11e7-9ec4-000c2998e4f1
    Master_Info_File              | mysql.slave_master_info
    SQL_Delay                     | 0
    SQL_Remaining_Delay           | <null>
    Slave_SQL_Running_State       |
    Master_Retry_Count            | 86400
    Master_Bind                   |
    Last_IO_Error_Timestamp       |
    Last_SQL_Error_Timestamp      | 180219 16:07:45
    Master_SSL_Crl                |
    Master_SSL_Crlpath            |
    Retrieved_Gtid_Set            | 9dc847d8-bf72-11e7-9ec4-000c2998e4f1:73-74
    Executed_Gtid_Set             | 1951c7ee-c1fb-11e7-ac44-005056305232:1,
    9dc847d8-bf72-11e7-9ec4-000c2998e4f1:1-73  --回放到这个事务时报错
    Auto_Position                 | 0
    Replicate_Rewrite_DB          |
    Channel_Name                  |
    Master_TLS_Version            |
    

    其实从数据一致性看,目前主从数据是一致的,只是复制过来的日志在回放时,发现已有了该部分数据(Error:1062),从而引发了复制异常(SQL回放线程停止)。 我们只需要告诉MySQL, 跳过 这部分一样的GTID,继续复制,即可。

    3.6.2. 处理复制错误

    这里的 跳过 的方法很巧妙,步骤如下

    • 1.将Slave上的 gtid_next 指向 执行失败 的那个 gtid

      • 这里执行失败的 gtid 报错信息中已经给出: '9dc847d8-bf72-11e7-9ec4-000c2998e4f1:73-74'
      • 如果不看报错信息,可以看 Retrieved_Gtid_Set 和 Executed_Gtid_Set 的对比结果
    • 2.执行一个空的事物,即 begin;commit;

      • 这样就把 失败的gtid 对应到了一个 空的事务 上,这个步骤即为 “跳过” 的意思
    • 3.将 gtid_next 设置(还原)为 automatic

    操作:
    - slave节点

    mysql root@localhost:(none)> select @@gtid_next;    -- 当前为默认值,AUTOMATIC
    +-------------+
    | @@gtid_next |
    +-------------+
    | AUTOMATIC   |
    +-------------+
    1 row in set
    Time: 0.010s
    
    -- 步骤1 : 设置 gtid_next 为回放失败的gtid
    mysql root@localhost:(none)> set  gtid_next="9dc847d8-bf72-11e7-9ec4-000c2998e4f1:74";
    Query OK, 0 rows affected
    Time: 0.001s
    
    -- 步骤2 : 执行一个空的事物,让回放失败的gtid对应到这个空的事物
    mysql root@localhost:(none)> begin;
    Query OK, 0 rows affected
    Time: 0.000s
    mysql root@localhost:(none)> commit;
    Query OK, 0 rows affected
    Time: 0.000s
    
    -- -- 步骤3 : 还原gtid_next为automatic
    mysql root@localhost:(none)> set  gtid_next="automatic";
    Query OK, 0 rows affected
    Time: 0.000s
    
    -- -- 步骤4 : 重新启动slave
    mysql root@localhost:(none)> stop slave;
    Query OK, 0 rows affected
    Time: 0.001s
    mysql root@localhost:(none)> start slave;
    Query OK, 0 rows affected
    Time: 0.046s
    mysql root@localhost:(none)> show slave status G;
    ***************************[ 1. row ]***************************
    Slave_IO_State                | Waiting for master to send event
    Master_Host                   | 192.168.88.88
    Master_User                   | repl
    Master_Port                   | 3306
    Connect_Retry                 | 60
    Master_Log_File               | binlog.000014
    Read_Master_Log_Pos           | 6195
    Relay_Log_File                | node2-relay-bin.000005
    Relay_Log_Pos                 | 357
    Relay_Master_Log_File         | binlog.000014
    Slave_IO_Running              | Yes
    Slave_SQL_Running             | Yes
    Replicate_Do_DB               |
    Replicate_Ignore_DB           |
    Replicate_Do_Table            |
    Replicate_Ignore_Table        |
    Replicate_Wild_Do_Table       |
    Replicate_Wild_Ignore_Table   |
    Last_Errno                    | 0
    Last_Error                    |
    Skip_Counter                  | 0
    1 row in set
    Time: 0.010s
    mysql root@localhost:(none)>
    

    skip_gtid_errors.sh 脚本

    #!/bin/bash
    # line V1.0
    # mail:gczheng@139.com
    # data:2018-08-30
    # script_name:skip_gtid_errors.sh  
    
    user=root
    pass='iforgot'
    host='localhost'
    
    for((i=1;i<1000;i++))
    do
    #sql21=`mysql -u${user} -p${pass} -h${host} -e "show slave statusG;" 2>/dev/null|grep -v Last_Error | grep 'executing transaction'| awk '{print $21}'`
    
    Errno=`mysql -u${user} -p${pass} -h${host} -e "show slave statusG;" 2>/dev/null|grep 'Last_SQL_Errno' | awk -F ":" '{print $2}'`
    
    Seconds=`mysql -u${user} -p${pass} -h${host} -e "show slave statusG;" 2>/dev/null|grep 'Seconds_Behind_Master' | awk -F ":" '{print $2}'`
    
    Last_seen_transaction=`mysql -u${user} -p${pass} -h${host} -Ne "select LAST_SEEN_TRANSACTION from performance_schema.replication_applier_status_by_worker" 2>/dev/null`
    
    if [ $Errno -eq 1061 ] || [$Errno -eq 1062 ] || [$Errno -eq 1217 ] || [$Errno -eq 1050 ] 
       then
        	echo "----------------------"
    	echo $(date +'%Y-%m-%d %H:%M:%S')  && echo "Last_seen_transaction:${Last_seen_transaction}" &&  echo -e "Last_Errno:33[31m ${Errno}33[0m"
            mysql -u${user} -p${pass} -h${host} -e "STOP SLAVE;SET @@SESSION.GTID_NEXT='${Last_seen_transaction}';BEGIN; COMMIT;SET @@SESSION.GTID_NEXT = AUTOMATIC;START SLAVE;" 2>/dev/null 
        else
            sleep 3
    	echo "----------------------" 
    	echo  $(date +'%Y-%m-%d %H:%M:%S') && echo  "Seconds_Behind_Master: ${Seconds} " &&  echo -e "Last_Errno:33[32m ${Errno}33[0m"
    fi
    done
    

    3.6.3. 测试复制

    • 1.Master端插入一个测试数据

      • *master节点
    mysql root@localhost:ttt> insert into t1 values(null);
    Query OK, 1 row affected
    Time: 0.007s
    mysql root@localhost:ttt>
    
    • 2.查看Slave2上的数据是否同步

      • *slave节点
    mysql root@localhost:(none)> select * from ttt.t1;
    +---+
    | a |
    +---+
    | 1 |
    | 2 |
    | 3 |
    | 4 |
    | 5 |   -- 立即同步过来的数据
    +---+
    5 rows in set
    Time: 0.011s
    mysql root@localhost:(none)>
    

    至此,GTID复制出错的处理就完成了。

    注意:这里仅仅是 跳过错误 ,和原来的 sql_slave_skip_counter (该功能在GTID下失效)功能类似,无法保证主从数据是一致的(需要人工介入进行确认,比如仅仅主键一样,其他列不一样)

    如果出现了很多的GTID的错误,可能是从机上有大量的操作,建议 重新搭建主从复制
    但还是要在源头上避免此类情况的发生,确保在从机上开启 read_only=1 ,并且避免人工的误操作。 最后强调: GTID基于事务 的复制,一致性要求很高, 强烈建议 在 Slave 上开启 read_only=1

    3.7. 其他注意事项

    1. reset slave all
      使用该命令时 不会清空数据 ,仅仅是清空 show slave statusG 里面的信息,所以在使用该命令之前,请 先记录show slave statusG 的信息。
    2. GTID
      在开启GTID后,不能在一个事物中使用创建临时表的语句,需要使得 autocommit=1; 才可以。
      在开启GTID后,不能使用 create table select ... 的语法来创建表了,因为这其实是多个事物了,GTID没法对应

    四. 级联复制

    4.1. 级联复制的介绍

    slave Bmaster A 上复制, slave Csalve B 上复制,此时 slave B 上就要开启 log_slave_updates 产生binlog ,没有 binlog 则无法传递给
    salve c

    master 配置log_bin参数是当变更时产生binlog, 复制关系中slave获取master的binlog写入relay-log中,slave自身不产生binlog

    4.2. 级联复制的场景

    4.2.1. 跨机房的复制

    在跨机房搭建复制时,如果 master 挂了,深圳slave 提升为 New Master ,此时 北京slave 是不需要去做 CHANGE MASTER 操作的。

    缺点是复制的延迟会更大(跨机房的延迟本来就很难避免)。

    4.2.2. 库的拆分

    生产库 压力很大时或者需要做统计分析时,需要把 逻辑库拆分出去,就可以使用 级联复制 ,让 逻辑库 形成单独的库。

    4.3. 级联复制测试

    配置信息 主库(master) 从库(slave A) 从库(slave B) 从库(slave C)
    主机 node1.gczheng.com node2.gczheng.com node3.gczheng.com proxy.gczheng.com
    IP 192.168.88.88 192.168.88.99 192.168.88.100 192.168.88.77
    Port 3306 3306 3306 3306
    MySQL版本 MySQL5.7.18 MySQL5.7.18 MySQL5.7.18 MySQL5.7.18
    Server_ID 8888 8899 88100 8877
    • Master --> Slave A --> Slave C 是级联复制,Master到Slave A 和 Master到Slave B 之前已部署完,直接做 Slave A到Slave C 的级联复制

    • 1、在Master端新建一个DB名为 mts库 ,用于在 Slave3 上测试参数 replicate_do_db

      • master节点
    (root@localhost) 14:54:28 [tablespace]> create database mts;
    Query OK, 1 row affected (0.03 sec)
    
    (root@localhost) 14:54:33 [tablespace]> use mts;
    Database changed
    
    (root@localhost) 14:55:34 [mts]> create table t(a int);
    Query OK, 0 rows affected (0.07 sec)
    
    (root@localhost) 14:55:38 [mts]> insert into t values(1),(2),(3);
    Query OK, 3 rows affected (0.01 sec)
    Records: 3  Duplicates: 0  Warnings: 0
    
    (root@localhost) 14:56:10 [mts]> select * from mts.t;
    +------+
    | a    |
    +------+
    |    1 |
    |    2 |
    |    3 |
    +------+
    3 rows in set (0.00 sec)
    
    (root@localhost) 14:56:22 [mts]>
    
    
    • 2、在Slave C上增加如下参数
    [mysqld]
    report-host=192.168.88.77 
    # 只复制mts的库 
    replicate_do_db=mts
    
    • 3、确认在slave A 已添加log_slave_updates,并建立通用账号'repl'@'192.168.88.%'
    [mysqld]
    log_slave_updates=1
    
    • 4、change master

      • Slave C 节点
    (root@localhost) 16:19:58 [mts]> change master to master_host="192.168.88.99", master_port=3306, master_user='repl', master_password='123456', master_auto_position=1;
    Query OK, 0 rows affected, 2 warnings (0.02 sec)
    
    (root@localhost) 16:20:31 [mts]> start slave;
    Query OK, 0 rows affected (0.03 sec)
    
    (root@localhost) 16:20:38 [mts]> show slave status G;
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.88.99
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: binlog.000035
              Read_Master_Log_Pos: 2707
                   Relay_Log_File: proxy-relay-bin.000002
                    Relay_Log_Pos: 405
            Relay_Master_Log_File: binlog.000035
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: mts
              Replicate_Ignore_DB:
               Replicate_Do_Table:
           Replicate_Ignore_Table:
          Replicate_Wild_Do_Table:
      Replicate_Wild_Ignore_Table:
                       Last_Errno: 0
                       Last_Error:
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 2707
                  Relay_Log_Space: 612
                  Until_Condition: None
                   Until_Log_File:
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File:
               Master_SSL_CA_Path:
                  Master_SSL_Cert:
                Master_SSL_Cipher:
                   Master_SSL_Key:
            Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error:
                   Last_SQL_Errno: 0
                   Last_SQL_Error:
      Replicate_Ignore_Server_Ids:
                 Master_Server_Id: 8899
                      Master_UUID: 1951c7ee-c1fb-11e7-ac44-005056305232
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
               Master_Retry_Count: 86400
                      Master_Bind:
          Last_IO_Error_Timestamp:
         Last_SQL_Error_Timestamp:
                   Master_SSL_Crl:
               Master_SSL_Crlpath:
               Retrieved_Gtid_Set:
                Executed_Gtid_Set: 1951c7ee-c1fb-11e7-ac44-005056305232:1-17,
    9dc847d8-bf72-11e7-9ec4-000c2998e4f1:1-29:47-58
                    Auto_Position: 1
             Replicate_Rewrite_DB:
                     Channel_Name:
               Master_TLS_Version:
    1 row in set (0.00 sec)
    
    ERROR:
    No query specified
    
    (root@localhost) 16:20:44 [mts]>
    
    

    复制关系:Master-->Slave A-->Slave C ,且使用了基于GTID的复制


    五.多源复制

    5.1. 多源复制的介绍

    多源复制官方文档

    多源复制 是指多主一从的复制,从 MySQL5.7.6 才有的功能,如下图所示:

    在语法层面上,只是在原来的change master 的基础上,增加了 for channel 'channel_name'

    5.2. 多源复制的演示

    主机信息:

    配置信息 主库(master A) 主库(master B) 从库(slave )
    主机 node1.gczheng.com node2.gczheng.com node3.gczheng.com
    IP 192.168.88.88 192.168.88.99 192.168.88.100
    Port 3306 3306 3306
    MySQL版本 MySQL5.7.18 MySQL5.7.18 MySQL5.7.18
    Server_ID 8888 8899 88100

    master A 和 master B 统一创建同步user `repl@'192.168.88.%'' password为123456

      1. master准备数据
      • master A 节点
    mysql root@localhost:(none)> create database dbn1;
    Query OK, 1 row affected
    Time: 0.011s
    】
    mysql root@localhost:(none)> use dbn1
    You are now connected to database "dbn1" as user "root"
    Time: 0.007s
    
    mysql root@localhost:dbn1> create table tn1(a int auto_increment primary key);
    Query OK, 0 rows affected
    Time: 0.062s
    
    mysql root@localhost:dbn1> insert into dbn1.tn1 values(null);
    Query OK, 1 row affected
    Time: 0.068s
    
    mysql root@localhost:dbn1> insert into dbn1.tn1 values(null);
    Query OK, 1 row affected
    Time: 0.014s
    
    mysql root@localhost:dbn1> select * from dbn1.tn1;
    +---+
    | a |
    +---+
    | 1 |
    | 2 |
    +---+
    2 rows in set
    Time: 0.010s
    
    mysql root@localhost:dbn1> show master status G;
    ***************************[ 1. row ]***************************
    File              | binlog.000014
    Position          | 3953
    Binlog_Do_DB      |
    Binlog_Ignore_DB  |
    Executed_Gtid_Set | 9dc847d8-bf72-11e7-9ec4-000c2998e4f1:1-64 --执行到64
    1 row in set
    Time: 0.007s
    mysql root@localhost:dbn1>
    
    
    - **master B 节点**
    
    mysql gcdb@localhost:(none)> create database dbn2;
    Query OK, 1 row affected
    Time: 0.010s
    
    mysql gcdb@localhost:(none)> use dbn2;
    You are now connected to database "dbn2" as user "gcdb"
    Time: 0.015s
    
    mysql gcdb@localhost:dbn2> create table tn2(a int auto_increment primary key);
    Query OK, 0 rows affected
    Time: 0.061s
    
    mysql gcdb@localhost:dbn2> insert into dbn2.tn2 values(null);
    Query OK, 1 row affected
    Time: 0.015s
    
    mysql gcdb@localhost:dbn2> insert into dbn2.tn2 values(null);
    Query OK, 1 row affected
    Time: 0.003s
    
    mysql gcdb@localhost:dbn2> insert into dbn2.tn2 values(null);
    Query OK, 1 row affected
    Time: 0.003s
    
    mysql gcdb@localhost:dbn2> insert into dbn2.tn2 values(null);
    Query OK, 1 row affected
    Time: 0.002s
    
    mysql gcdb@localhost:dbn2> select * from  dbn2.tn2;
    +---+
    | a |
    +---+
    | 1 |
    | 2 |
    | 3 |
    | 4 |
    +---+
    4 rows in set
    Time: 0.012s
    
    mysql gcdb@localhost:dbn2> show master status G;
    ***************************[ 1. row ]***************************
    File              | binlog.000001
    Position          | 1687
    Binlog_Do_DB      |
    Binlog_Ignore_DB  |
    Executed_Gtid_Set | 1951c7ee-c1fb-11e7-ac44-005056305232:1-7    --执行到7
    1 row in set
    Time: 0.012s
    mysql gcdb@localhost:dbn2>
    
    • 2.mysqldump 备份导出导入

      • Slave 节点
    [root@node3 ~]# mysqldump -ugcdb -piforgot -h192.168.88.99 -P 3306 -B dbn2 >/tmp/dbn2.sql
    [root@node3 ~]# mysqldump -ugcdb -piforgot -h192.168.88.88 -P 3306 -B dbn1 >/tmp/dbn1.sql
    
    [root@node3 tmp]# less dbn2.sql
    
    -----------省略其他输出------------
    
    --在恢复备份的时候,会自动执行该语句,就不需要我们手工跳过了,如果是mydumper之类的,需要手工跳过
    
    SET @@GLOBAL.GTID_PURGED='1951c7ee-c1fb-11e7-ac44-005056305232:1-7';
    
    ------------省略其他输出------------
    
    [root@node3 tmp]# mysql -uroot -piforgot < /tmp/dbn2.sql
    mysql: [Warning] Using a password on the command line interface can be insecure.
    ERROR 1840 (HY000) at line 24: @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.
    
    --还是之前的问题,需要在Slave上,先操作一下 reset master ,以清空 @@GLOBAL.GTID_EXECUTED 
    
    [root@node3 tmp]# mysql -uroot -piforgot < /tmp/dbn2.sql
    mysql: [Warning] Using a password on the command line interface can be insecure.
    
    [root@node3 tmp]# mysql -uroot -piforgot < /tmp/dbn1.sql
    mysql: [Warning] Using a password on the command line interface can be insecure.
    ERROR 1840 (HY000) at line 24: @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.
    
    [root@node3 tmp]# mysql -uroot -piforgot < /tmp/dbn1.sql
    mysql: [Warning] Using a password on the command line interface can be insecure.
    [root@node3 tmp]#
    
    
    • 3 change master

    先在Slave上设置 需要复制的库,如果不设置的话,默认会同步系统的库(mysql),这样可能会复制出错(因为master amaster b上的mysql库中可能有相同的记录)。

    • Slave 节点
    [mysqld]
    
    # 只复制dbn1 和 dbn2,这个步骤很重要,且有多个时,必须分多行写
    
    replicate_do_db=dbn1 
    replicate_do_db=dbn2
    
    #mysql5.7多源复制必须添加的参数(不加报错),5.7版本之前不用加
    
    master_info_repository=TABLE
    relay_log_info_repository=TABLE
    
    (root@localhost) 13:57:46 [(none)]> change master to master_host='192.168.88.88', master_port=3306, master_user='repl', master_password='123456', master_auto_position=1 for channel 'ch1';
    Query OK, 0 rows affected, 2 warnings (0.01 sec)
    
    (root@localhost) 13:58:12 [(none)]> change master to master_host='192.168.88.99', master_port=3306, master_user='repl', master_password='123456', master_auto_position=1 for channel 'ch2';
    Query OK, 0 rows affected, 2 warnings (0.01 sec)
    
    (root@localhost) 13:59:19 [(none)]> start slave for channel 'ch1';
    Query OK, 0 rows affected (0.00 sec)
    
    (root@localhost) 13:59:59 [(none)]> start slave for channel 'ch2';
    Query OK, 0 rows affected (0.00 sec)
    
    (root@localhost) 14:39:13 [(none)]> show slave status for channel 'ch1' G;
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.88.88
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: binlog.000014
              Read_Master_Log_Pos: 4349
                   Relay_Log_File: node3-relay-bin-ch1.000002
                    Relay_Log_Pos: 405
            Relay_Master_Log_File: binlog.000014
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: dbn1,dbn2
              Replicate_Ignore_DB:
               Replicate_Do_Table:
           Replicate_Ignore_Table:
          Replicate_Wild_Do_Table:
      Replicate_Wild_Ignore_Table:
                       Last_Errno: 0
                       Last_Error:
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 4349
                  Relay_Log_Space: 616
                  Until_Condition: None
                   Until_Log_File:
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File:
               Master_SSL_CA_Path:
                  Master_SSL_Cert:
                Master_SSL_Cipher:
                   Master_SSL_Key:
            Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error:
                   Last_SQL_Errno: 0
                   Last_SQL_Error:
      Replicate_Ignore_Server_Ids:
                 Master_Server_Id: 8888
                      Master_UUID: 9dc847d8-bf72-11e7-9ec4-000c2998e4f1
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
               Master_Retry_Count: 86400
                      Master_Bind:
          Last_IO_Error_Timestamp:
         Last_SQL_Error_Timestamp:
                   Master_SSL_Crl:
               Master_SSL_Crlpath:
               Retrieved_Gtid_Set:
                Executed_Gtid_Set: 9dc847d8-bf72-11e7-9ec4-000c2998e4f1:1-66
                    Auto_Position: 1
             Replicate_Rewrite_DB:
                     Channel_Name: ch1
               Master_TLS_Version:
    1 row in set (0.00 sec)
    
    ERROR:
    No query specified
    
    (root@localhost) 14:41:49 [(none)]> show slave status for channel 'ch2' G;
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.88.99
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: binlog.000001
              Read_Master_Log_Pos: 2083
                   Relay_Log_File: node3-relay-bin-ch2.000002
                    Relay_Log_Pos: 801
            Relay_Master_Log_File: binlog.000001
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: dbn1,dbn2
              Replicate_Ignore_DB:
               Replicate_Do_Table:
           Replicate_Ignore_Table:
          Replicate_Wild_Do_Table:
      Replicate_Wild_Ignore_Table:
                       Last_Errno: 0
                       Last_Error:
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 2083
                  Relay_Log_Space: 1012
                  Until_Condition: None
                   Until_Log_File:
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File:
               Master_SSL_CA_Path:
                  Master_SSL_Cert:
                Master_SSL_Cipher:
                   Master_SSL_Key:
            Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error:
                   Last_SQL_Errno: 0
                   Last_SQL_Error:
      Replicate_Ignore_Server_Ids:
                 Master_Server_Id: 8899
                      Master_UUID: 1951c7ee-c1fb-11e7-ac44-005056305232
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
               Master_Retry_Count: 86400
                      Master_Bind:
          Last_IO_Error_Timestamp:
         Last_SQL_Error_Timestamp:
                   Master_SSL_Crl:
               Master_SSL_Crlpath:
               Retrieved_Gtid_Set: 1951c7ee-c1fb-11e7-ac44-005056305232:8-9
                Executed_Gtid_Set: 9dc847d8-bf72-11e7-9ec4-000c2998e4f1:1-66
                    Auto_Position: 1
             Replicate_Rewrite_DB:
                     Channel_Name: ch2
               Master_TLS_Version:
    1 row in set (0.00 sec)
    
    ERROR:
    No query specified
    
    (root@localhost) 14:41:54 [(none)]>
    
    
    • 验证

      • master A 节点
    mysql root@localhost:(none)> insert into dbn1.tn1 values(null);
    Query OK, 1 row affected
    Time: 0.013s
    mysql root@localhost:(none)> insert into dbn1.tn1 values(null);
    Query OK, 1 row affected
    Time: 0.012s
    mysql root@localhost:(none)>
    
    
    • master B 节点
    mysql root@localhost:(none)> insert into dbn2.tn2 values(null);
    Query OK, 1 row affected
    Time: 0.003s
    mysql root@localhost:(none)> insert into dbn2.tn2 values(null);
    Query OK, 1 row affected
    Time: 0.003s
    mysql root@localhost:(none)>
    
    
    • slave A 节点
    (root@localhost) 14:41:54 [(none)]> select * from dbn1.tn1;
    +---+
    | a |
    +---+
    | 1 |
    | 2 |
    | 3 |
    | 4 |
    +---+
    4 rows in set (0.00 sec)
    
    (root@localhost) 14:45:14 [(none)]> select * from dbn2.tn2;
    +---+
    | a |
    +---+
    | 1 |
    | 2 |
    | 3 |
    | 4 |
    | 5 |
    | 6 |
    +---+
    6 rows in set (0.00 sec)
    
    (root@localhost) 14:45:17 [(none)]>
    
    

    至此,多源复制的主从搭建就完成了。

    5.3.使用场景

    如果 Master B 上也有一个 dbn1 的库,会有问题么?

    • 如果不做额外的配置,是 会有错误 的;

    • 如果配置了 slave_skip_errors = ddl_exist_errors ,且 没有重复数据 话,复制关系还是正常的。

      这种操作可以起到 数据聚合 的效果。将分库分表后的数据聚合在一起,以供其他应用进行分析(前提是数据不能有重复)。
      但是最合适的场景还是将不同的库进行复制。

    • 中间件的unique key

    中间件可以保证分区键是唯一的(比如order_id),但是对于其他唯一索引来说,需要业务层去保证

  • 相关阅读:
    <%!%>和<%%>的区别
    <jsp:param>传参乱码问题
    RedHat6.4 安装yum源
    J2EE中getParameter与getAttribute以及对应的EL表达式
    Ubuntu xrdp 遠端桌面連線 a
    Ubuntu SSH server 快速安裝和設定 a
    Ubuntu 12.04 快速安裝 Scrapy a
    Linux 壓縮與解壓縮指令備忘錄 a
    vim 設定快捷鍵為 [Ctrl] + [s] 存檔 a
    Linux 常用指令備忘錄 a
  • 原文地址:https://www.cnblogs.com/gczheng/p/8480210.html
Copyright © 2011-2022 走看看