专职DBA-MySQL主从延时复制 作者:周万春 微信:lovemysql3306 本次实验环境延用MySQL主从异步复制的搭建环境 MySQL主从复制生产环境的常见延迟原因 易导致复制延迟的原因: 1.一个主库的从库太多 2.从库硬件比主库查 3.慢SQL语句过多 4.主从复制的设计问题 5.主从复制之间的网络延迟 6.主库读写压力太大 7.业务上大事务太多 MySQL主从复制数据一致性企业级方案 1.采用半同步复制方案 2.当复制发生延迟时让程序改读主库 MySQL多线程复制解决复制延迟实践 (1).查看当前slave服务器的SQL线程状态 Slave [(none)]> show processlist; (2).配置多线程复制 默认为0表示单线程复制 停止主从复制,在线修改线程数 Slave [(none)]> stop slave; Slave [(none)]> set global slave_parallel_workers = 4; Slave [(none)]> show global variables like "%parallel%"; +------------------------------+---------------+ | Variable_name | Value | +------------------------------+---------------+ | innodb_parallel_read_threads | 4 | | slave_parallel_type | LOGICAL_CLOCK | | slave_parallel_workers | 4 | +------------------------------+---------------+ 3 rows in set (0.02 sec) (3).启动主从复制,查看SQL线程数 Slave [(none)]> start slave; Slave [(none)]> show processlist; (4).想永久生效就写入my.cnf [root@db02 ~]# vim /data/mysql/mysql3306/conf/my3306.cnf [mysqld] slave_parallel_workers = 4 让MySQL主从复制的从库只读访问 1.read_only参数允许数据库更新的条件 (1).具有super权限的用户可以更新,不受read_only参数影响。 例如:root@localhost (2).来自从服务器具备主从复制权限的线程可以更新,不受read-only参数的影响。例如:repl@% 在my.cnf文件中配置 [root@db02 ~]# vim /data/mysql/mysql3306/conf/my3306.cnf [mysqld] read_only 然后重启数据库 [root@db02 ~]# mysqladmin.shutdown [root@db02 ~]# mysqld.start MySQL主从复制读写分离Web用户生产设置方案 在配置好MySQL主从复制,并实现了读写分离以后,数据库授权程序访问的用户设置方法: 1.主库和从库使用不同的用户,授予不同的权限。 主库上对web_w用户的授权 grant select,insert,update,delete on `web`.* to 'web_w'@'10.0.0.%' identified by '123'; 从库上对web_r用户的授权 grant select on `web`.* to 'web_r'@'10.0.0.%' identified by '123'; 2.网站程序访问主库和从库时使用一套用户密码。 (1).主库和从库使用相同的用户,但授予不同的权限。 忽略主库的mysql授权库同步 [root@db01 ~]# vim /data/mysql/mysql3306/conf/my3306.cnf binlog-ignore-db = mysql #mysql库不记录binlog日志 replicate-ignore-db = mysql #忽略复制mysql库 在主库上创建完web用户和权限之后,在从库上revoke回收对应的更新权限 主库:grant select,insert,update,delete on `web`.* to 'web'@'10.0.0.%' identified by '123'; 从库:grant select on `web`.* to 'web'@'10.0.0.%' identified by '123'; 在从库上设置read-only参数,让从库只读 [root@db02 ~]# vim /data/mysql/mysql3306/conf/my3306.cnf [mysqld] read_only 然后重启数据库 [root@db02 ~]# mysqladmin.shutdown [root@db02 ~]# mysqld.start MySQL主从延时复制方案及恢复实践 [root@db02 ~]# mysql.dba Slave [(none)]> stop slave; Slave [(none)]> change master to master_delay = 60; Slave [(none)]> start slave; Slave [(none)]> show slave statusG *************************** 1. row *************************** SQL_Delay: 60 #延迟60秒进行复制 SQL_Remaining_Delay: NULL #还剩多少秒执行复制 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates #sql线程状态 [root@db01 ~]# mysql.dba Master [(none)]> create database app02; Slave [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app01 | | mysql | | performance_schema | | sys | +--------------------+ 5 rows in set (0.00 sec) 但是中继日志里面已经有创建的语句了,说明IO线程还是实时在工作的。 [root@db02 ~]# cd /data/mysql/mysql3306/data/ [root@db02 /data/mysql/mysql3306/data]# mysqlbinlog db02-relay-bin.000002 |less SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; create database app02 /*!*/; 过了1分钟后 Slave [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app01 | | app02 | | mysql | | performance_schema | | sys | +--------------------+ 6 rows in set (0.00 sec) MySQL的延时复制实际上影响的只是SQL线程将数据应用到从库。 而IO线程早已把主库更新的数据写到了从库的中继日志里面。 因此,在延时复制期间,即使主库宕机了,从库到了延时复制的时间,也依然会把数据更新到与主库宕机时一致。 使用MySQL主从延时复制进行数据恢复实践 1.模拟环境,将从库延迟调整为3600秒 [root@db02 ~]# mysql.dba Slave [(none)]> stop slave; Slave [(none)]> change master to master_delay = 3600; Slave [(none)]> start slave; Slave [(none)]> show slave statusG 2.模拟在主库写入数据,每隔5秒写入一个库,就当是模拟用户写入数据了 for n in {03..05} do mysql.dba -e "create database app$n;" sleep 5 done [root@db01 ~]# for n in {03..05} > do > mysql -S /data/mysql/3306/mysql.sock -p123 -e "create database app$n;" > sleep 5 > done [root@db01 ~]# 3.模拟人为破坏数据,也可以是不带where的update语句。 Master [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app01 | | app02 | | app03 | | app04 | | app05 | | mysql | | performance_schema | | sys | +--------------------+ 9 rows in set (0.01 sec) 删除app05数据库,后面要做的就是把这个数据库恢复回来,别的数据还得保留。 Master [(none)]> drop database app05; Query OK, 0 rows affected (0.01 sec) 现在,所有的从库都已经是坏数据了,只有延迟从库是好的,但是是一个小时之前的数据。 4.当数据库出现误删数据的情况时,特别是update不加条件破坏数据,要想完整恢复数据,最好选择对外停止访问措施,此时需要牺牲用户体验了。 [root@db01 ~]# iptables -I INPUT -p tcp --dport 3306 ! -s 10.0.0.11 -j DROP 非10.0.0.11禁止访问数据库3306端口。 5.登录主库从库查看binlog发送接收进行确认。 Master [(none)]> show processlist; +----+------+-----------------+------+------------------+------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-----------------+------+------------------+------+---------------------------------------------------------------+------------------+ | 6 | repl | 10.0.0.12:60192 | NULL | Binlog Dump GTID | 272 | Master has sent all binlog to slave; waiting for more updates | NULL | | 10 | root | localhost | NULL | Query | 0 | starting | show processlist | +----+------+-----------------+------+------------------+------+---------------------------------------------------------------+------------------+ 2 rows in set (0.00 sec) 从库 Slave [(none)]> show processlist; +----+-------------+-----------+------+---------+------+----------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-------------+-----------+------+---------+------+----------------------------------------------------------------+------------------+ | 9 | root | localhost | NULL | Query | 0 | starting | show processlist | | 22 | system user | | NULL | Connect | 311 | Waiting for master to send event | NULL | | 23 | system user | | NULL | Connect | 211 | Waiting until MASTER_DELAY seconds after master executed event | NULL | | 24 | system user | | NULL | Connect | 311 | Waiting for an event from Coordinator | NULL | | 25 | system user | | NULL | Connect | 311 | Waiting for an event from Coordinator | NULL | | 26 | system user | | NULL | Connect | 311 | Waiting for an event from Coordinator | NULL | | 27 | system user | | NULL | Connect | 311 | Waiting for an event from Coordinator | NULL | +----+-------------+-----------+------+---------+------+----------------------------------------------------------------+------------------+ 7 rows in set (0.00 sec) Slave [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app01 | | app02 | | mysql | | performance_schema | | sys | +--------------------+ 6 rows in set (0.01 sec) 6.在从库上停止主从复制,并查看数据库是否已同步过来。 Slave [(none)]> stop slave sql_thread; Slave [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app01 | | app02 | | mysql | | performance_schema | | sys | +--------------------+ 6 rows in set (0.00 sec) 因为还未到延迟时间,所以数据不会复制到该延迟从库。 7.根据relay-log.info记录的SQL线程读取relay-log的位置,解析未应用到从库的relay-bin日志。 进入中继日志所在的目录 [root@db02 ~]# cd /data/mysql/mysql3306/data/ [root@db02 /data/mysql/mysql3306/data]# ls -l *relay* -rw-r----- 1 mysql mysql 206 Jul 16 01:55 db02-relay-bin.000001 中继日志 -rw-r----- 1 mysql mysql 1282 Jul 16 02:05 db02-relay-bin.000002 中继日志 -rw-r----- 1 mysql mysql 48 Jul 16 01:55 db02-relay-bin.index 中继日志索引 -rw-r----- 1 mysql mysql 61 Jul 16 02:09 relay-log.info 线程读取中继日志的位置信息 -rw-r----- 1 mysql mysql 84 Jul 16 01:55 worker-relay-log.info.1 -rw-r----- 1 mysql mysql 84 Jul 16 01:55 worker-relay-log.info.2 -rw-r----- 1 mysql mysql 84 Jul 16 01:55 worker-relay-log.info.3 -rw-r----- 1 mysql mysql 84 Jul 16 01:55 worker-relay-log.info.4 [root@db02 /data/mysql/mysql3306/data]# cat relay-log.info 7 ./db02-relay-bin.000002 #SQL线程读取中继日志的文件名信息 320 #SQL线程读取中继日志的位置点信息 mysql-bin.000005 350 3600 4 1 8.解析SQL线程未回放的全部剩余relay-bin中继日志数据。 [root@db02 /data/mysql/mysql3306/data]# mysqlbinlog --start-position=320 db02-relay-bin.000002 > /tmp/relay.sql 9.找到破坏数据库的SQL语句,并从已解析的SQL语句中将其删除掉,这里使用的是"drop database app05" [root@db02 ~]# egrep "drop database app05" /tmp/relay.sql drop database app05 [root@db02 ~]# sed -i '/drop database app05/d' /tmp/relay.sql [root@db02 ~]# egrep "drop database app05" /tmp/relay.sql [root@db02 ~]# egrep "^drop database app05" /tmp/relay.sql 10.将解析后并处理好的relay.sql数据文件恢复到延迟从库。 [root@db02 ~]# mysql.dba Slave [(none)]> set global read_only=off; Slave [(none)]> set global super_read_only=off; Slave [(none)]> source /tmp/relay.sql; Slave [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app01 | | app02 | | app03 | | app04 | | app05 | | mysql | | performance_schema | | sys | +--------------------+ 9 rows in set (0.01 sec) 之前的删除的app05数据库已经恢复找回来了!!! 利用延时从库恢复数据库完毕,此时还需要将此从库切换为主库,作为新主库对外提供用户访问。再对其他遭到破坏的主从数据库进行修复。 Slave [(none)]> start slave sql_thread; ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is '12344d69-e756-11e9-b42c-000c29a86f2f:9'. Slave [(none)]> set gtid_next='12344d69-e756-11e9-b42c-000c29a86f2f:9'; Slave [(none)]> begin;commit; Slave [(none)]> set gtid_next='automatic'; Slave [(none)]> start slave; Slave [(none)]> show slave statusG