zoukankan      html  css  js  c++  java
  • 【MySQL】MHA部署与MasterFailover代码分析

    官网:https://code.google.com/p/mysql-master-ha/

    参考:http://blog.csdn.net/wulantian/article/details/13287975

    参考:http://www.cnblogs.com/wingsless/p/4033093.html

    参考:http://www.cnblogs.com/xuanzhi201111/p/4231412.html#jtss-tsina

    参考:http://ylw6006.blog.51cto.com/470441/890360/

    参考:http://os.51cto.com/art/201307/401702.htm

    参考:http://www.it165.net/database/html/201508/13780.html

    下载:http://pan.baidu.com/s/1pJ0VkSz

    部署:

    [root@dns packages]# tar -zxvf mha4mysql-manager-0.53.tar.gz
    [root@dns mha4mysql-manager-0.53]# perl Makefile.PL
    [root@dns mha4mysql-manager-0.53]# make && make install
    配置有很多参数,可以看看libMHA下面的Config.pm代码
    [root@dns mha4mysql-manager-0.53]# vim /etc/3307.cnf
    [server default]
    manager_log=/data1/masterha/3307_leju/manager.log
    manager_workdir=/data1/masterha/3307_leju
    multi_tier_slave=3
    password=passwd
    ping_interval=1
    repl_password=x8uf7nbv5x64
    repl_user=repl
    shutdown_script="/samples/scripts/power_manager" //设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂
    master_pid_file=/data1/mysql/3307_leju/mysql.pid //shutdown_script脚本中需要使用的pid文件位置,通过pid文件中的pid进程号强制杀掉主库实例
    ssh_user=root user=ha [server1] hostname=10.207.0.125 master_binlog_dir="/data1/mysql/3307_leju" port=3307 [server2] hostname=10.207.0.126 master_binlog_dir="/data1/mysql/3307_leju" port=3307 [server3] hostname=10.207.0.127 master_binlog_dir="/data1/mysql/3307_leju" port=3307 [server4] hostname=10.207.0.128 master_binlog_dir="/data1/mysql/3307_leju" port=3307 启动前检查复制关系 [root@dns bin]# masterha_check_repl --conf=/etc/masterha/3307.cnf 启动manager masterha_manager启动有很多参数,参数说明不列举说明
    [root@dns bin]# masterha_manager
    --conf=/etc/masterha/3307.cnf --remove_dead_master_conf --ignore_last_failover

    功能分析:

    注意MHA需要super,select,replication slave,replication client权限,在测试MHA0.56时由于权限不够perl也没有相关报错,导致failover卡住,问题很难才定位到

    MHA的故障切换是由lib/MHA/MasterFailover.pm代码完成。

    编译后位置移动到/usr/local/share/perl5/MHA/MasterFailover.pm

    代码主体为do_master_failover函数完成,代码片段:

    sub do_master_failover {
      my $error_code = 1;
      my ( $dead_master, $new_master );
    
      eval {
        my @servers_config = init_config();
        $log->info("Starting master failover.");
        $log->info("* Phase 1: Configuration Check Phase..
    ");
    $dead_master = check_settings( @servers_config ); $log->info("** Phase 1: Configuration Check Phase completed. "); $log->info("* Phase 2: Dead Master Shutdown Phase.. ");
    force_shutdown(
    $dead_master); $log->info("* Phase 2: Dead Master Shutdown Phase completed. "); $log->info("* Phase 3: Master Recovery Phase.. "); $log->info("* Phase 3.1: Getting Latest Slaves Phase.. ");
    check_set_latest_slaves();
    $log->info("* Phase 3.2: Saving Dead Master's Binlog Phase.. ");
    save_master_binlog(
    $dead_master); $log->info("* Phase 3.3: Determining New Master Phase.. ");
    my $latest_base_slave = find_latest_base_slave($dead_master); $new_master = select_new_master( $dead_master, $latest_base_slave ); my ( $master_log_file, $master_log_pos ) = recover_master( $dead_master, $new_master, $latest_base_slave ); $new_master->{activated} = 1; $log->info("* Phase 3: Master Recovery Phase completed. "); $log->info("* Phase 4: Slaves Recovery Phase.. "); $error_code = recover_slaves( $dead_master, $new_master, $latest_base_slave, $master_log_file, $master_log_pos); if ( $g_remove_dead_master_conf && $error_code == 0 ) { MHA::Config::delete_block_and_save( $g_config_file, $dead_master->{id}, $log );} cleanup(); }; ... }

    可以通过增加几行代码实现MHA切换通知DNS达到前端请求的切换。

    # 建立连接
    my $dbh = DBI->connect("DBI:mysql:database=dblayer;mysql_socket=/tmp/mysql_3307.sock","dblayer","dblayer");
    # 下线主库域名,IsLive状态修改(Bind_DLZ的MySQL驱动查询增加IsLive字段判断)
    my $rows = $dbh->do("UPDATE dns_records SET IsLive=0 WHERE data='$dead_master->{ip}' and port='$dead_master->{port}'");
    # 获取主库host(Bind的dns_records表中使用IP和PORT对应一个唯一HOST,多实例的支持)
    my $sth = $dbh->prepare("select host from dns_records where data='$dead_master->{ip}' and port='$dead_master->{port}'");
    $sth->execute();
    my $ref = $sth->fetchrow();
    # 将新主库更新host
    my $rows = $dbh->do("UPDATE dns_records SET HOST='$ref' WHERE data='$new_master->{ip}' and port='$dead_master->{port}'");
    # 将切换记录到库
    my $rows = $dbh->do("INSERT INTO masterfailover_log(old_master_ip,new_master_ip,port,type) values ('$dead_master->{ip}','$new_master->{ip}','$dead_master->{port}',1)");
    # 断开连接
    $sth->finish;
    $dbh->disconnect();

    在加入shutdown_script后,出现报错:

    Thu Sep 24 14:21:42 2015 - [info]   /usr/local/mha_manager/samples/scripts/power_manager --command=status --ssh_user=root --host=10.207.0.125 --ip=10.207.0.125 
    Undefined subroutine &main::FIXME_xxx called at /usr/local/mha_manager/samples/scripts/power_manager line 387.
    Thu Sep 24 14:21:42 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln235]  Failed to get power status with return code 1:0.
    Thu Sep 24 14:21:42 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln383] Error happend on checking configurations.  at /usr/local/bin/masterha_check_repl line 48
    Thu Sep 24 14:21:42 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln478] Error happened on monitoring servers.
    Thu Sep 24 14:21:42 2015 - [info] Got exit code 1 (Not master dead).
    
    MySQL Replication Health is NOT OK!

    打开power_manager可以看到,报错的部分是一个未定义函数FIXME_xxx,代码是想通过服务器管理地址进行操作,尝试将未定义函数部分去掉会出现更多问题。

    在power_manager代码下面有一些注释说明,说明中有:

    # killing mysqld with specified pid file. This is useful when you run multiple MySQL instances and want to stop only specified instance
    
    power_manager --command=stopssh --host=master_server --ssh_user=root --pid_file=/var/lib/mysql/mysqld.pid

    所以不在配置中添加shutdown_script,直接在MasterFailover.pm中force_shutdown_internal函数最后添加power_manager的调用。

    system("/usr/local/mha_manager/samples/scripts/power_manager --command=stopssh --host='$dead_master->{ip}' --ssh_user=root --pid_file='$dead_master->{master_pid_file}'");

    因为power_manager有未定义的函数,如果直接调用,在异常主库可以ssh通但是pid已经不存在或者根本ssh不通的情况下,因为stopssh函数返回值异常仍会报错未命名函数,所以改一下power_manager,注释掉一部分代码,防止killall -9 mysql mysqld。

    sub stopssh {
      ...if ($pid_file) {
        $command =
    ""if [ ! -e $pid_file ]; then exit 1; fi; pid=\\`cat $pid_file\\`; rm -f $pid_file; kill -9 \$pid; a=\\`ps ax | grep $pid_file | grep -v grep | wc | awk {'print \$1'}\\`; if [ "a\$a" = "a0" ]; then exit 10; fi; sleep 1; a=\\`ps ax | grep $pid_file | grep -v grep | wc | awk {'print \$1'}\\`; if [ "a\$a" = "a0" ]; then exit 10; else exit 1; fi"";
        ( $high_ret, $low_ret ) = MHA::ManagerUtil::exec_system(
          "ssh $ssh_user_host $MHA::ManagerConst::SSH_OPT_CHECK $command");
        if ( $high_ret == $SSH_STOP_OK && $low_ret == 0 ) {
          print "ssh reachable. mysqld stopped. power off not needed.
    ";
          return $high_ret;
        }
        elsif ( $high_ret == 1 && $low_ret == 0 ) {
          print "ssh reachable. not found $pid_file,mysqld stopped. power off not needed.
    ";
          return 10;
        }
        else {
          print "Killing mysqld instance based on $pid_file failed.
    ";
    return 10;
        }
      }
    
      # 后面killall -9 mysql mysqld 注释
    }

    在配置里面加report_script=/usr/local/bin/send_report后

    修改mha_manager/samples/scripts/send_report的代码可以通知ZABBIX和邮件通知。ZABBIX在mha_manager主机加上一个监控项,把Item设置为Zabbix trapper,Type of information设置为字符

    my @array=split(/ /,$subject);
    my $sender=join('',$array[0],$array[7]);
    
    system("/data1/scripts/zabbix_sender --zabbix-server serverip --port 10051 --host 'mhaserver' --key mysql.MHA --value '$sender'");
    system("/bin/echo '$body' | /usr/bin/mutt -s '$subject' 'jiangxu@test.com'");
  • 相关阅读:
    Linq之旅:Linq入门详解(Linq to Objects)【转】
    Shadow Map 原理和改进 【转】
    OSG 中文解决方案 【转】
    shadow mapping实现动态shadow实现记录 【转】
    RenderMonkey 练习 第六天 【OpenGL Water 水效】
    glsl水包含倒影的实现(rtt) 【转】
    Docker镜像仓库Harbor之搭建及配置
    docker登录没有配置https的harbor镜像仓库
    Git 清除远端已删除的分支
    单节点k8s的一个小例子 webapp+mysql
  • 原文地址:https://www.cnblogs.com/jiangxu67/p/4487248.html
Copyright © 2011-2022 走看看