zoukankan html css js c++ java

MySQL MHA

MySQL高可用MHA介绍

MySQL高可用MHA介绍

一、MySQL高可用MHA绍

1.1.1 MySQL高可用MHA介绍

MHA介绍

软件简介
MHA（Master High Availability）目前在MySQL高可用方面是一个相对成熟的解决方案，它由日本DeNA公司youshimaton（现就职于Facebook公司）开发，是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中，MHA能做到在10~30秒之内自动完成数据库的故障切换操作，并且在进行故障切换的过程中，MHA能在最大程度上保证数据的一致性，以达到真正意义上的高可用。
　　MHA还提供在线主库切换的功能，能够安全地切换当前运行的主库到一个新的主库中 (通过将从库提升为主库)，大概0.5-2秒内即可完成。
　　该软件由两部分组成：MHA Manager（管理节点）和MHA Node（数据节点）。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群，也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上，MHA Manager会定时探测集群中的master节点，当master出现故障时，它可以自动将最新数据的slave提升为新的master，然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

MHA的优点

MHA优点总结
1）Masterfailover and slave promotion can be done very quickly
自动故障转移快
2）Mastercrash does not result in data inconsistency
主库崩溃不存在数据一致性问题
3）Noneed to modify current MySQL settings (MHA works with regular MySQL)
不需要对当前mysql环境做重大修改
4）Noneed to increase lots of servers
不需要添加额外的服务器(仅一台manager就可管理上百个replication)
5）Noperformance penalty
性能优秀，可工作在半同步复制和异步复制，当监控mysql状态时，仅需要每隔N秒向master发送ping包(默认3秒)，所以对性能无影响。你可以理解为MHA的性能和简单的主从复制框架性能一样。
6）Works with any storage engine
只要replication支持的存储引擎，MHA都支持，不会局限于innodb

MHA高可用部署环境(MHA部署前提，主从复制提前搭建好，并且正常)

主机	ip	角色
db01	10.4.7.51	主库
db02	10.4.7.52	从库
db03	10.4.7.53	从库(manager)

配置相关环境

配置关键程序软连接（每个节点都需要做）
[root@db01 /server/tools]# ln /application/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
[root@db01 /server/tools]# ln /application/mysql/bin/mysql /usr/bin/mysql
[root@db01 /server/tools]# ll /usr/bin/mysql*
-rwxr-xr-x 2 mysql mysql 10423101 Apr 13  2019 /usr/bin/mysql
-rwxr-xr-x 2 mysql mysql 11310574 Apr 13  2019 /usr/bin/mysqlbinlog

配置ssh key 各节点互信连接（(manager 和各node节点)）
[root@db01 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:uQWCxD7h9GGC61OnRopzLPUSzehCQbbVuAhGr9/Em4c root@db01
The key's randomart image is:
+---[RSA 2048]----+
|o+ ++            |
|oo=o=oo          |
|o.oXo=...        |
| o*.X o. o       |
|.B * *  S .      |
|= X * +  o       |
| = = E ..        |
|      .          |
|                 |
+----[SHA256]-----+

db01：（因为公钥和私钥都要拷贝，所以在db01上全部做好）
[root@db01 ~]# cd ~/.ssh/
[root@db01 ~/.ssh]# ll
total 12
-rw------- 1 root root 1679 Jun 30 00:47 id_rsa
-rw-r--r-- 1 root root  391 Jun 30 00:47 id_rsa.pub
-rw-r--r-- 1 root root  342 Jun 30 00:52 known_hosts
[root@db01 ~/.ssh]# cp id_rsa.pub authorized_keys
[root@db01 ~/.ssh]# ll
total 16
-rw-r--r-- 1 root root  391 Jun 30 01:05 authorized_keys
-rw------- 1 root root 1679 Jun 30 00:47 id_rsa
-rw-r--r-- 1 root root  391 Jun 30 00:47 id_rsa.pub
-rw-r--r-- 1 root root  342 Jun 30 00:52 known_hosts

将公钥和私钥拷贝db02
[root@db01 ~/.ssh]# scp -rp /root/.ssh/* 10.4.7.52:/root/.ssh/
id_rsa                                                                                             100% 1679     1.8MB/s   00:00    
id_rsa.pub                                                                                         100%  391   206.6KB/s   00:00    
known_hosts                                                                                        100%  342   369.9KB/s   00:00    
authorized_keys                                                                                    100%  391   336.7KB/s   00:00 

将公钥和私钥拷贝db03
[root@db01 ~/.ssh]# scp -rp /root/.ssh/* 10.4.7.53:/root/.ssh/
id_rsa                                                                                             100% 1679     1.7MB/s   00:00    
id_rsa.pub                                                                                         100%  391   501.2KB/s   00:00    
known_hosts                                                                                        100%  342   285.8KB/s   00:00    
authorized_keys                                                                                    100%  391    14.9KB/s   00:00 
验证：
db01
[root@db01 ~/.ssh]# ssh 10.4.7.52 hostname
db02
[root@db01 ~/.ssh]# ssh 10.4.7.53 hostname
db03

db02
[root@db02 ~]# ssh 10.4.7.51 hostname
db01
[root@db02 ~]# ssh 10.4.7.53 hostname
db03

db03
[root@db03 ~]# ssh 10.4.7.51 hostname
db01
[root@db03 ~]# ssh 10.4.7.52 hostname
db02

下载安装MHA

github下载地址：https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
网盘提取：链接：https://pan.baidu.com/s/1xfr0IQeu6Z9Ct8FiLRhdbw 
提取码：6fbi
[root@db01 /server/tools]# unzip MHA-2019-6.28.zip
[root@db01 /server/tools]# ls 
Atlas-2.2.1.el6.x86_64.rpm  MHA-2019-6.28.zip                        mysql-5.7.26-linux-glibc2.12-x86_64.tar.gz
email_2019-最新.zip         mha4mysql-manager-0.56-0.el6.noarch.rpm  percona-xtrabackup-24-2.4.12-1.el7.x86_64.rpm
master_ip_failover.txt      mha4mysql-node-0.56-0.el6.noarch.rpm

所有节点（db01，db02,db03）安装Node软件和依赖包
[root@db01 /server/tools]# yum install perl-DBD-MySQL -y
[root@db01 /server/tools]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:mha4mysql-node-0.56-0.el6        ################################# [100%]
[root@db01 /server/tools]# rpm -qa mha4mysql-node
mha4mysql-node-0.56-0.el6.noarch

主库创建一个高可用管理用户

db01 [(none)]>grant all privileges on *.* to mha@'10.4.7.%' identified by 'mha';
Query OK, 0 rows affected, 1 warning (0.00 sec)
db01 [(none)]>flush privileges;
Query OK, 0 rows affected (0.01 sec)

db01 [(none)]>select user,host from mysql.user;
+---------------+-----------+
| user          | host      |
+---------------+-----------+
| mha           | 10.4.7.%  |
| rep           | 10.4.7.%  |
| mysql.session | localhost |
| mysql.sys     | localhost |
| root          | localhost |
+---------------+-----------+
5 rows in set (0.01 sec)

在从库db03上安装manager及其环境依赖（生产中manager是单独主机安装的）

[root@db03 /server/tools]# yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
[root@db03 /server/tools]# rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:mha4mysql-manager-0.56-0.el6     ################################# [100%]
[root@db03 /server/tools]# rpm -qa mha4mysql-manager
mha4mysql-manager-0.56-0.el6.noarch

配置文件准备(db03)

创建配置文件目录
[root@db03 /server/tools]# mkdir -p /etc/mha
创建日志目录
[root@db03 /server/tools]# mkdir -p /var/log/mha/app1
创建配置文件
[root@db03 /server/tools]# vim /etc/mha/app1.cnf
[root@db03 /server/tools]# cat /etc/mha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager           #日志文件  
manager_workdir=/var/log/mha/app1               #配置文件
master_binlog_dir=/application/mysql/log_bin    #主库binglog 二进制文件
user=mha                                        #mha用户
password=mha                                    #mha密码
ping_interval=2                                 #每隔两秒检测一下主库心跳（状态
repl_password=123456                            #主从管理用户密码
repl_user=rep                                   #主从管理用户
ssh_user=root                                   #ssh 用户
[server1]                                   
hostname=10.4.7.51                              
port=3306                                 
[server2]            
hostname=10.4.7.52
candidate_master=1                             #主库候选库
port=3306
[server3]
hostname=10.4.7.53
port=3306

扩展mysqladmin 检查mysql状态
[root@db01 ~]# mysqladmin -uroot -p123456 ping 
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
mysqld is alive（mysql还活着）

还可以查看线程：
[root@db01 ~]# mysqladmin -uroot -p123456 processlist
mysqladmin: [Warning] Using a password on the command line interface can be insecure.
+----+------+-----------------+----+------------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host            | db | Command          | Time | State                                                         | Info             |
+----+------+-----------------+----+------------------+------+---------------------------------------------------------------+------------------+
| 2  | rep  | 10.4.7.53:5637  |    | Binlog Dump GTID | 9102 | Master has sent all binlog to slave; waiting for more updates |                  |
| 3  | rep  | 10.4.7.52:14832 |    | Binlog Dump GTID | 9102 | Master has sent all binlog to slave; waiting for more updates |                  |
| 6  | root | localhost       |    | Query            | 0    | starting                                                      | show processlist |
+----+------+-----------------+----+------------------+------+---------------------------------------------------------------+------------------+

状态检查(db03)

[root@db03 /server/tools]# masterha_check_ssh  --conf=/etc/mha/app1.cnf
Mon Jun 29 20:24:57 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jun 29 20:24:57 2020 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Mon Jun 29 20:24:57 2020 - [info] Reading server configuration from /etc/mha/app1.cnf..
Mon Jun 29 20:24:57 2020 - [info] Starting SSH connection tests..
Mon Jun 29 20:24:58 2020 - [debug] 
Mon Jun 29 20:24:57 2020 - [debug]  Connecting via SSH from root@10.4.7.51(10.4.7.51:22) to root@10.4.7.52(10.4.7.52:22)..
Mon Jun 29 20:24:57 2020 - [debug]   ok.
Mon Jun 29 20:24:57 2020 - [debug]  Connecting via SSH from root@10.4.7.51(10.4.7.51:22) to root@10.4.7.53(10.4.7.53:22)..
Mon Jun 29 20:24:57 2020 - [debug]   ok.
Mon Jun 29 20:24:58 2020 - [debug] 
Mon Jun 29 20:24:57 2020 - [debug]  Connecting via SSH from root@10.4.7.52(10.4.7.52:22) to root@10.4.7.51(10.4.7.51:22)..
Mon Jun 29 20:24:57 2020 - [debug]   ok.
Mon Jun 29 20:24:57 2020 - [debug]  Connecting via SSH from root@10.4.7.52(10.4.7.52:22) to root@10.4.7.53(10.4.7.53:22)..
Mon Jun 29 20:24:58 2020 - [debug]   ok.
Mon Jun 29 20:24:59 2020 - [debug] 
Mon Jun 29 20:24:58 2020 - [debug]  Connecting via SSH from root@10.4.7.53(10.4.7.53:22) to root@10.4.7.51(10.4.7.51:22)..
Mon Jun 29 20:24:58 2020 - [debug]   ok.
Mon Jun 29 20:24:58 2020 - [debug]  Connecting via SSH from root@10.4.7.53(10.4.7.53:22) to root@10.4.7.52(10.4.7.52:22)..
Mon Jun 29 20:24:58 2020 - [debug]   ok.
Mon Jun 29 20:24:59 2020 - [info] All SSH connection tests passed successfully.

[root@db03 ~]# masterha_check_repl  --conf=/etc/mha/app1.cnf 
Mon Jun 29 21:07:59 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jun 29 21:07:59 2020 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Mon Jun 29 21:07:59 2020 - [info] Reading server configuration from /etc/mha/app1.cnf..
Mon Jun 29 21:07:59 2020 - [info] MHA::MasterMonitor version 0.56.
Mon Jun 29 21:08:00 2020 - [info] GTID failover mode = 1
Mon Jun 29 21:08:00 2020 - [info] Dead Servers:
Mon Jun 29 21:08:00 2020 - [info] Alive Servers:
Mon Jun 29 21:08:00 2020 - [info]   10.4.7.51(10.4.7.51:3306)
Mon Jun 29 21:08:00 2020 - [info]   10.4.7.52(10.4.7.52:3306)
Mon Jun 29 21:08:00 2020 - [info]   10.4.7.53(10.4.7.53:3306)
Mon Jun 29 21:08:00 2020 - [info] Alive Slaves:
Mon Jun 29 21:08:00 2020 - [info]   10.4.7.52(10.4.7.52:3306)  Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Mon Jun 29 21:08:00 2020 - [info]     GTID ON
Mon Jun 29 21:08:00 2020 - [info]     Replicating from 10.4.7.51(10.4.7.51:3306)
Mon Jun 29 21:08:00 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Jun 29 21:08:00 2020 - [info]   10.4.7.53(10.4.7.53:3306)  Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Mon Jun 29 21:08:00 2020 - [info]     GTID ON
Mon Jun 29 21:08:00 2020 - [info]     Replicating from 10.4.7.51(10.4.7.51:3306)
Mon Jun 29 21:08:00 2020 - [info] Current Alive Master: 10.4.7.51(10.4.7.51:3306)
Mon Jun 29 21:08:00 2020 - [info] Checking slave configurations..
Mon Jun 29 21:08:00 2020 - [info]  read_only=1 is not set on slave 10.4.7.52(10.4.7.52:3306).
Mon Jun 29 21:08:00 2020 - [info]  read_only=1 is not set on slave 10.4.7.53(10.4.7.53:3306).
Mon Jun 29 21:08:00 2020 - [info] Checking replication filtering settings..
Mon Jun 29 21:08:00 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Mon Jun 29 21:08:00 2020 - [info]  Replication filtering check ok.
Mon Jun 29 21:08:00 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Jun 29 21:08:00 2020 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jun 29 21:08:00 2020 - [info] HealthCheck: SSH to 10.4.7.51 is reachable.
Mon Jun 29 21:08:00 2020 - [info] 
10.4.7.51(10.4.7.51:3306) (current master)
 +--10.4.7.52(10.4.7.52:3306)
 +--10.4.7.53(10.4.7.53:3306)

Mon Jun 29 21:08:00 2020 - [info] Checking replication health on 10.4.7.52..
Mon Jun 29 21:08:00 2020 - [info]  ok.
Mon Jun 29 21:08:00 2020 - [info] Checking replication health on 10.4.7.53..
Mon Jun 29 21:08:00 2020 - [info]  ok.
Mon Jun 29 21:08:00 2020 - [warning] master_ip_failover_script is not defined.
Mon Jun 29 21:08:00 2020 - [warning] shutdown_script is not defined.
Mon Jun 29 21:08:00 2020 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

9.开启MHA(db03)

[root@db03 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
[2] 2012

查看MHA状态:
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:2012) is running(0:PING_OK), master:10.4.7.51

10.MHA会提供诸多工具程序，其常见的如下所示：

Manager节点：
　　masterha_check_ssh：MHA 依赖的 ssh 环境监测工具；
　　masterha_check_repl：MYSQL 复制环境检测工具；
　　masterga_manager：MHA 服务主程序；
　　masterha_check_status：MHA 运行状态探测工具；
　　masterha_master_monitor：MYSQL master 节点可用性监测工具；
　　masterha_master_swith:master：节点切换工具；
　　masterha_conf_host：添加或删除配置的节点；
　　masterha_stop：关闭 MHA 服务的工具。
Node节点：（这些工具通常由MHA Manager的脚本触发，无需人为操作）
　　save_binary_logs：保存和复制 master 的二进制日志；
　　apply_diff_relay_logs：识别差异的中继日志事件并应用于其他 slave；
　　purge_relay_logs：清除中继日志（不会阻塞 SQL 线程）；
　　自定义扩展：
　　secondary_check_script：通过多条网络路由检测master的可用性；
　　master_ip_failover_script：更新application使用的masterip；
　　report_script：发送报告;
　　init_conf_load_script：加载初始配置参数；
　　master_ip_online_change_script;更新master节点ip地址。

1.1.2 主从复制架构演变

1.基础主从（不依赖与其他的任何软件）

1主1从 
1主多从 

多级主从
-------------->以上架构大部分中小型企业中还在用，还有一部分用了RDS 
双主 
-------------->中型企业，在高可用（MMM），分布式架构（Mycat ，DBLE）
环状
多主1从
-------------->几乎是没人用

高性能架构-读写分离架构
mysql-proxy  --->  0.8  项目停了
360          --->  mysql-proxy 的二次开发 Atlas  Atlas-sharding 2016年项目就没有更新了
MySQL		 --->  mysql-router 
Percona      --->  ProxySQL 
Mariadb      --->  Maxscale

高可用架构 
1、企业高可用性标准（全年无故障率）
99%                 (1-99%)x365=3.65dx24≈ 87.60小时
99.9%				(1-99.9%)x365=0.365dx24≈ 8.760小时			 ---> 互联网级别
99.99%				(1-99.99%)x365=0.0365dx24≈ 0.8760小时        ---> 准金融级别
99.999%				(1-99.999%)x365=0.00365dx24≈ 0.08760小时     ---> 金融级别  
99.9999%			(1-99.9999%)x365=0.000365dx24≈ 0.008760小时  ---> “0” 宕机

2、高可用架构产品
（1）负载均衡
	 Lvs  F5  nginx  有一定的高可用能力
（2）主备系统(单活)
	 KA ，HA(roseHA,RHCS),PowerHA ,mc_sg,MHA,MMM ，可以保证3个9到4个9
（3）多活系统
	 PXC (不收费), MGC(不收费) , MySQL Cluster（收费）, InnoDB Cluster(8.0，不收费)
	 Oracle RAC(收费)
	 Sysbase cluster
	 DB2 Cluster
	 
3、分布式架构（现在的大趋势）
Mycat 1.65
DBLE(mycat的二次开发) 

4、NewSQL
RDBMS+NoSQL+分布式
sp 
TiDB
巨杉 
polarDB 
OceanBase

MHA 架构模型

Manager:是MHA的管理节点。主库宕机切换到从库
node：MHA的被管控节点
mysql复制（一主两从，三台独立主机）
MHA软件结构

Manager:
masterha_manger             启动MHA 
masterha_check_ssh      	检查MHA的SSH配置状况 
masterha_check_repl         检查MySQL复制状况 
masterha_master_monitor     检测master是否宕机 
masterha_check_status       检测当前MHA运行状态 
masterha_master_switch  	控制故障转移（自动或者手动）
masterha_conf_host      	添加或删除配置的server信息

Node:
这些工具通常由MHA Manager的脚本触发，无需人为操作
save_binary_logs            保存和复制master的二进制日志 
apply_diff_relay_logs       识别差异的中继日志事件并将其差异的事件应用于其他的
purge_relay_logs            清除中继日志（不会阻塞SQL线程）

软件工作过程

manager 启动 
（1）读取--conf=/etc/mha/app1.cnf配置文件
（2）获取到node相关的信息（1主2从）
（3）调用masterha_check_ssh脚本 ，使用 ssh_user=root 进行互信检查
（4）调用masterha_check_repl 检查主从复制情况
（5）manager启动成功。
（6）通过masterha_master_monitor 以 ping_interval=2为间隔持续监控主库的状态
	 网络，主机，数据库状态（mha）
（7）当Manager监控到master宕机
（8）开始选主过程
	 算法一：判断是否有《强制主》参数
	 算法二: 判断两个从库谁更新
	 算法三：按照配置文件书写顺序来选主
（9）判断主库SSH的连通性
	能：S1 和 S2 立即保存（save_binary_logs）缺失部分的binlog到本地
	不能：
		在传统模式下：调用apply_diff_relay_logs计算S1和S2的 relay-log的差异
		需要通过内容进行复杂的对比
		在GTID模式下:调用apply_diff_relay_logs计算S1和S2的 relay-log的差异
		只需要对比GTID号码即可，效率较高
	最后进行数据补偿
（10）解除S1从库身份
（11）S2和S1构建新的主从关系
（12）移除配置文件中故障节点
（13）manager工作完成，自杀退出。（一次性的高可用）

额外的功能：
（1） 提供了Binlog Server
（2） 应用透明（VIP） 
（3） 实时通知管员（send_report）
（4） 自愈系统（待开发。。。。）

故障模拟及处理

第一步：检查MHA是否启动
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:2487) is running(0:PING_OK), master:10.4.7.51

第二步：停止主库mysql
[root@db01 ~]# systemctl stop mysqld
[root@db01 ~]# systemctl status mysqld
● mysqld.service - MySQL Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2020-07-01 00:06:11 CST; 8s ago
     Docs: man:mysqld(8)
           http://dev.mysql.com/doc/refman/en/using-systemd.html
  Process: 1368 ExecStart=/application/mysql/bin/mysqld --defaults-file=/etc/my.cnf (code=exited, status=0/SUCCESS)
 Main PID: 1368 (code=exited, status=0/SUCCESS)

Jun 30 20:38:02 db01 systemd[1]: Started MySQL Server.
Jul 01 00:06:00 db01 systemd[1]: Stopping MySQL Server...
Jul 01 00:06:11 db01 systemd[1]: Stopped MySQL Server.

第三步：db02从库检查是否接替为主库
db02 [(none)]>show slave statusG;
Empty set (0.00 sec)

#已经查看不要了从库信息，证明已经切换

第四步：db03查看从库信息
db03 [(none)]>show slave statusG;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.4.7.52   #成功切换主库为52（db02）
                  Master_User: rep
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 234
               Relay_Log_File: db03-relay-bin.000003
                Relay_Log_Pos: 407
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
查看日志          
[root@db03 ~]#  tail -f /var/log/mha/app1/manager
Master 10.4.7.51(10.4.7.51:3306) is down!

Check MHA Manager logs at db03:/var/log/mha/app1/manager for details.

Started automated(non-interactive) failover.
Selected 10.4.7.52(10.4.7.52:3306) as a new master.
10.4.7.52(10.4.7.52:3306): OK: Applying all logs succeeded.
10.4.7.53(10.4.7.53:3306): OK: Slave started, replicating from 10.4.7.52(10.4.7.52:3306)
10.4.7.52(10.4.7.52:3306): Resetting slave info succeeded.
Master failover to 10.4.7.52(10.4.7.52:3306) completed successfully. #出现（completed successfully.）这个也证明主从切换成功

上述成功之后MHA就会自动杀死进程退出
db03 [(none)]>^DBye
[1]+  Done                    nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1

第五步：故障修复
启动宕机的mysql主库（db01）
[root@db01 ~]# systemctl start mysqld
[root@db01 ~]# systemctl status mysqld
● mysqld.service - MySQL Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2020-07-01 00:30:10 CST; 2s ago
     Docs: man:mysqld(8)
           http://dev.mysql.com/doc/refman/en/using-systemd.html
 Main PID: 2134 (mysqld)
   CGroup: /system.slice/mysqld.service
           └─2134 /application/mysql/bin/mysqld --defaults-file=/etc/my.cnf

Jul 01 00:30:10 db01 systemd[1]: Started MySQL Server.

第六步：恢复好的mysql db01（原主库）设置为从库
查看manager的日志，里面记录着将db01设为从库的信息
[root@db03 ~]# cat /var/log/mha/app1/manager|grep -i 'change master to'
Wed Jul  1 00:06:12 2020 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.4.7.52', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='xxx';

提取如下：
CHANGE MASTER TO 
MASTER_HOST='10.4.7.52', 
MASTER_PORT=3306, 
MASTER_AUTO_POSITION=1, 
MASTER_USER='rep', 
MASTER_PASSWORD='123456';

db01 [(none)]>CHANGE MASTER TO 
    -> MASTER_HOST='10.4.7.52', 
    -> MASTER_PORT=3306, 
    -> MASTER_AUTO_POSITION=1, 
    -> MASTER_USER='rep', 
    -> MASTER_PASSWORD='123456';
Query OK, 0 rows affected, 2 warnings (0.04 sec)

db01 [(none)]>start slave;
Query OK, 0 rows affected (0.00 sec)

检查设置从库是否成功
db01 [(none)]>show slave statusG;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.4.7.52
                  Master_User: rep
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 234
               Relay_Log_File: db01-relay-bin.000003
                Relay_Log_Pos: 407
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes              #设置成功
            Slave_SQL_Running: Yes              #设置成功
            

第七步：对manager配置文件进行修改
添加如下内容：
[root@db03 ~]# vim /etc/mha/app1.cnf（注意MHA每次都会把故障节点剔除配置文件，所以每次修复好故障之后，要将节点添加到配置文件）
[server1]
hostname=10.4.7.52
port=3306

[root@db03 ~]# cat  /etc/mha/app1.cnf 
[server default]
manager_log=/var/log/mha/app1/manager
manager_workdir=/var/log/mha/app1
master_binlog_dir=/application/mysql/log_bin
password=mha
ping_interval=2
repl_password=123456
repl_user=rep
ssh_user=root
user=mha

[server1]
hostname=10.4.7.51
port=3306

[server2]
candidate_master=1
hostname=10.4.7.52
port=3306

[server3]
hostname=10.4.7.53
port=3306

第八步：启动MHA
[root@db03 ~]#  nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
[4] 2839
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:2839) is running(0:PING_OK), master:10.4.7.52

Manager额外参数介绍

说明：
主库宕机谁来接管？
1. 所有从节点日志都是一致的，默认会以配置文件的顺序去选择一个新主。
2. 从节点日志不一致，自动选择最接近于主库的从库
3. 如果对于某节点设定了权重（candidate_master=1），权重节点会优先选择。
但是此节点日志量落后主库100M日志的话，也不会被选择。可以配合check_repl_delay=0，关闭日志量的检查，强制选择候选节点。
(1)  ping_interval=1
#设置监控主库，发送ping包的时间间隔，尝试三次没有回应的时候自动进行failover

(2) candidate_master=1
#设置为候选master，如果设置该参数以后，发生主从切换以后将会将此从库提升为主库，即使这个主库不是集群中事件最新的slave

(3)check_repl_delay=0
#默认情况下如果一个slave落后master 100M的relay logs的话，
MHA将不会选择该slave作为一个新的master，因为对于这个slave的恢复需要花费很长时间，通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时，这个参数对于设置了candidate_master=1的主机非常有用，因为这个候选主在切换的过程中一定是新的master

1.1.3 MHA 的vip功能

MHA 的vip功能(manager db03)

perl 脚本：
官方源码地址：https://github.com/yoshinorim/mha4mysql-manager/tags
用samples 里面的master_ip_failover 脚本修改即可
[root@db03 ~]# cat /usr/local/bin/master_ip_failover 
#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

my (
		$command,        $ssh_user,         $orig_master_host,
		$orig_master_ip, $orig_master_port, $new_master_host,
		$new_master_ip,  $new_master_port,  $new_master_user,
		$new_master_password
   );

my $vip = '10.4.7.55';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
GetOptions(
		'command=s'             => $command,
		'ssh_user=s'            => $ssh_user,
		'orig_master_host=s'    => $orig_master_host,
		'orig_master_ip=s'      => $orig_master_ip,
		'orig_master_port=i'    => $orig_master_port,
		'new_master_host=s'     => $new_master_host,
		'new_master_ip=s'       => $new_master_ip,
		'new_master_port=i'     => $new_master_port,
		'new_master_user=s'     => $new_master_user,
		'new_master_password=s' => $new_master_password,
	  );

exit &main();

sub main {
	print "

IN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===

";
	if ( $command eq "stop" || $command eq "stopssh" ) {
		my $exit_code = 1;
		eval {
			print "Disabling the VIP on old master: $orig_master_host 
";
			&stop_vip();
			$exit_code = 0;
		};
		if ($@) {
			warn "Got Error: $@
";
			exit $exit_code;
		}
		exit $exit_code;
	}
	elsif ( $command eq "start" ) {
		my $exit_code = 10;
		eval {
			print "Enabling the VIP - $vip on the new master - $new_master_host 
";
			&start_vip();
			$exit_code = 0;
		};
		if ($@) {
			warn $@;
			exit $exit_code;
		}
		exit $exit_code;
	}
	elsif ( $command eq "status" ) {
		print "Checking the Status of the script.. OK 
";
		exit 0;
	}
	else {
		&usage();
		exit 1;
	}
}
sub start_vip() {
	`ssh $ssh_user@$new_master_host " $ssh_start_vip "`;
}
sub stop_vip() {
	return 0  unless  ($ssh_user);
	`ssh $ssh_user@$orig_master_host " $ssh_stop_vip "`;
}
sub usage {
	print
		"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port
";
}


修改位置：
my $vip = '10.4.7.55/24';    #只需要根据生产环境修改这几个就可以，其它无需修改
my $key = '1';     #只需要根据生产环境修改这几个就可以，其它无需修改
my $ssh_start_vip = "/sbin/ifconfig eth1:$key $vip";  #只需要根据生产环境修改这几个就可以，其它无需修改
my $ssh_stop_vip = "/sbin/ifconfig eth1:$key down";   #只需要根据生产环境修改这几个就可以，其它无需修改

修改后：
my $vip = '10.4.7.55/24';  #VIP地址
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip"; #要注意网卡接口
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

命令解析：
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
[root@db03 ~]# ifconfig eth0:1 10.4.7.56/24
以上两条命令是相同的

dos2Unix一下（因为是上传的所以做一下，防止报错）
[root@db03 ~]# dos2unix /usr/local/bin/master_ip_failover 
dos2unix: converting file /usr/local/bin/master_ip_failover to Unix format ...


赋予执行权限：
[root@db03 ~]# chmod +x /usr/local/bin/master_ip_failover
[root@db03 ~]# ll /usr/local/bin/master_ip_failover
-rwxr-xr-x 1 root root 4386 Jul  1 01:58 /usr/local/bin/master_ip_failover

添加Perl脚本到manager配置文件

参数:
master_ip_failover_script='path'
[root@db03 ~]# vim /etc/mha/app1.cnf
master_ip_failover_script=/usr/local/bin/master_ip_failover (在[server default]中添加)

在主库上手工添加一个VIP（第一次需要手工绑定VIP）

[root@db02 ~]# ifconfig eth0:1 10.4.7.55/24
[root@db02 ~]# ifconfig 
eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.4.7.55  netmask 255.255.255.0  broadcast 10.4.7.255
        ether 00:0c:29:0d:b2:e5  txqueuelen 1000  (Ethernet)

不确认主库的可以这样查看：
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:2839) is running(0:PING_OK), master:10.4.7.52
master:10.4.7.52（主库）

重新加载启动MHA

停止：
[root@db03 ~]# masterha_stop --conf=/etc/mha/app1.cnf
Stopped app1 successfully.

重新启动：
[root@db03 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[1] 6191

检查：
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:6621) is running(0:PING_OK), master:10.4.7.52

1.1.4 MHA 设置邮件提醒

MHA设置邮件提醒(manager)

网盘提取：链接：https://pan.baidu.com/s/1xfr0IQeu6Z9Ct8FiLRhdbw 
提取码：6fbi
email_2019-最新.zip 里面有三个文件
[root@db03 /server/tools]# unzip email_2019-最新.zip 
[root@db03 /server/tools]# ll email/*
-rw-r--r-- 1 root root    35 Dec 27  2017 email/send
-rw-r--r-- 1 root root 80213 Sep 30  2009 email/sendEmail
-rw-r--r-- 1 root root   203 Apr 19  2019 email/testpl
拷贝到/usr/local/bin/下面
[root@db03 /server/tools]# cp -a email/* /usr/local/bin/
赋予执行权限：
[root@db03 /server/tools]# chmod +x /usr/local/bin/*
[root@db03 /usr/local/bin]# ll
total 92
-rwxr-xr-x 1 root root  2456 Jul  1  2020 master_ip_failover
-rwxr-xr-x 1 root root    35 Dec 27  2017 send
-rwxr-xr-x 1 root root 80213 Jun 30 23:23 sendEmail
-rwxr-xr-x 1 root root   207 Jul  1 00:02 testpl

修改manager配置文件

[root@db03 /usr/local/bin]# vim /etc/mha/app1.cnf 
report_script=/usr/local/bin/send

停止manager：
[root@db03 /usr/local/bin]#  masterha_stop --conf=/etc/mha/app1.cnf
Stopped app1 successfully.
[4]   Exit 1                  nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1  (wd: ~)
(wd now: /usr/local/bin)

重新加载启动HMA
[root@db03 /usr/local/bin]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[7] 9423
[root@db03 /usr/local/bin]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:9423) is running(0:PING_OK), master:10.4.7.52

关闭主库验证邮件报警


[root@db02 ~]# systemctl stop mysqld

1.1.5 binlog-server

binlog-server介绍

binlog在备份中起着至关重要的作用，备份binlog文件时，只能先在本地备份，然后才能传送到远程服务器上。从MySQL5.6版本后，可以利用mysqlbinlog命令把远程机器的日志备份到本地目录，这样就更加方便地实现binlog日志的安全备份。

配置binlog-server

找一台额外的机器，必须要有5.6以上的版本，支持gtid并开启，我们直接用的第二个slave（db03）
binlogserver配置：编辑manager配置文件
[root@db03 ~]# vim /etc/mha/app1.cnf 
[binlog1]
no_master=1          #不设为主库
hostname=10.4.7.53   #备份主机
master_binlog_dir=/data/mysql/binlog #存放备份binglog的目录

创建存放binlog的目录并授权mysql管理
[root@db03 ~]# mkdir -p /data/mysql/binlog
[root@db03 ~]# chown -R mysql.mysql /data/mysql/*
[root@db03 ~]# ll -d  /data/mysql/binlog
drwxr-xr-x 2 mysql mysql 6 Jul  2 12:46 /data/mysql/binglog

CD到/data/mysql/binlog目录下（必须进入到自己创建好的目录）
[root@db03 ~]# cd /data/mysql/binlog/
[root@db03 /data/mysql/binlog]#

检查从库已经使用到那个binglog文件
db03 [(none)]>show slave statusG;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.4.7.51
                  Master_User: rep
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000005   #这是从库正在使用的binglog

#注意：拉取日志的起点,需要按照目前从库的已经获取到的二进制日志点为起点


检查结果：
[root@db03 /data/mysql/binlog]# mysqlbinlog  -R --host=10.4.7.51 --user=mha --password=mha --raw  --stop-never mysql-bin.000005 &
[4] 48164
[root@db03 /data/mysql/binlog]# mysqlbinlog: [Warning] Using a password on the command line interface can be insecure.

[root@db03 /data/mysql/binlog]# 
[root@db03 /data/mysql/binlog]# ll
total 4
-rw-r----- 1 root root 234 Jul  2 12:56 mysql-bin.000005

主库手动刷新binlog
[root@db01 ~]# mysqladmin -uroot -p123456 flush-logs
mysqladmin: [Warning] Using a password on the command line interface can be insecure.

检查备份目录结果：
[root@db03 /data/mysql/binlog]# ll
total 8
-rw-r----- 1 root root 281 Jul  2 12:58 mysql-bin.000005
-rw-r----- 1 root root 234 Jul  2 12:58 mysql-bin.000006

重启MHA（先停在启）
[root@db03 /data/mysql/binlog]# masterha_stop --conf=/etc/mha/app1.cnf
Stopped app1 successfully.


[root@db03 /data/mysql/binlog]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[5] 49158
[root@db03 /data/mysql/binlog]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:49158) is running(0:PING_OK), master:10.4.7.51

常用参数

常用的参数

    -R | –read-from-remote-server
    表示开启binlog备份，在对应的主节点上请求binlog到本地。

    -–raw
    被复制过来的binlog以二进制的格式存放，如果不加该参数则为text格式。

    -r | –result-file
    指定备份目录：
    若指定了–raw参数，-r的值指定binlog的存放目录和文件名前缀；若没有指定–raw参数，-r的值指定文本存放的目录和文件名。

    -t
    这个选项代表从指定的binlog开始拉取，直到当前主节点上binlog的最后一个。

    -–stop-never
    持续连续从主节点拉取binlog，持续备份到当前最后一个，并继续下去。该参数包含-t

    -–stop-never-slave-server-id
    默认值65535，用于在多个mysqlbinlog进程或者从服务器的情况下，避免ID冲突。

查看全文

相关阅读:
世界排名第二的web前端框架bulma与Bootstrap框架的选择
 Bootstrap5中文手册翻译完毕
 RabbitMQ 学习一了解+点对点模式
 仿京东搜索
 ES集成SpringBoot
ElasticSearch
权限管理整合springsecurity代码执行过程
 权限管理
 Gateway网关
 canal数据同步客户端代码实现

原文地址：https://www.cnblogs.com/woaiyunwei/p/13210749.html