zoukankan      html  css  js  c++  java
  • Seconds_Behind_Master

    http://blog.chinaunix.net/uid-28212952-id-3494560.html

    今天同事遇到一个故障,xtrabackup备份中flush tables with read lock进程状态是flushing table,持续了大约60000s,首先分析是被长事务堵住了,show processlists,未看到长事务,因为slave上不提供应用服务,所以只有xtrabackup备份一个进程,show slave statusG发现Seconds_Behind_Master=0,但slave io thread执行的master binlog 要比 slave sql thread执行的master binlog多几十个,显然Seconds_Behind_Master不该为0(暂时未找到原因),抽样观察 slave sql thread执行的pos不变,怀疑是sql thread卡住引起flush tables with read lock 处于flushing table,mysqlbinlog 解析relaylog 发现当前slave正在执行delete操作,后检查binlog_format=row且对应表无主键,master上一条无主键表的delete执行记录在binlog中每row对应一个event,当master binlog传到slave上执行时,每个event对要在slave进行一次全表扫描,至此原因找到了,解决方案:

    第一种

    1.stop slave; #会hang 几个或十几个小时进行delete操作的回滚

    2.对未主键表创建主键

    第二种

    1.master  binlog_format=statment

    2.master 备份

    3.重做slave

    Seconds_Behind_Master问题

    show slave statusG 显示slave在复制中的信息,其中Seconds_Behind_Master常被用作衡量slave延迟秒数,虽然Seconds_Behind_Master值被许多认为不能够准确反映slave延迟情况,但由于show slave statusG简单易用,对于无须精确的场景足够了(精确方式如在master上表中定期插入timestamp,在slave上通过系统时间和timestamp进行对比得到精确slave延迟),简单看了下代码time_diff(Seconds_Behind_Master值)

    sql/slave.cc
    bool show_master_info(THD* thd, Master_info* mi)
    long time_diff= ((long)(time(0) - mi->rli.last_master_timestamp)
                           - mi->clock_diff_with_master);

    static int get_master_version_and_clock(MYSQL* mysql, Master_info* mi)
    clock_diff_with_master= 0; /* The "most sensible" value */

     
    sql/rpl_rli.cc
    void Relay_log_info::stmt_done(my_off_t event_master_log_pos,time_t event_creation_time)
    last_master_timestamp= event_creation_time;
     

    event_creation_time(代码中暂时未找到定义),从字面感觉在slave上对应的是Exec_Master_Log_Pos字段值,因此Seconds_Behind_Master值就是master当前系统时间戳(slave当前系统时间戳 - slave与master间的时间戳差) - slave sql thread当前已经执行完的master binlog pos.

    manual中对Seconds_Behind_Master的解释如下,但与代码的计算公式不一致,Seconds_Behind_Master值是slave io thread当前已经获取的master binlog pos - slave sql thread当前已经执行完的master binlog pos

    In essence, this field measures the time difference in seconds between the slave SQL thread and the slave I/O thread.

    If the network connection between master and slave is fast, the slave I/O thread is very close to the master, so this field is a good approximation of how late the slave SQL thread is compared to the master. If the network is slow, this is not a good approximation; the slave SQL thread may quite often be caught up with the slow-reading slave I/O thread, soSeconds_Behind_Master often shows a value of 0, even if the I/O thread is late compared to the master. In other words, this column is useful only for fast networks.

    个人倾向于代码中的解释,设计实验来验证,用取一天前的备份集新做了一个slave,change master且start slave后,Seconds_Behind_Master显示为60000多秒(20小时左右),显然是master当前系统时间戳 - slave sql thread当前已经执行完的master binlog pos的结果
  • 相关阅读:
    python第三天
    python第二天
    python第一天
    Linux之VIM常用功能
    Linux输入输出管理
    Linux文件操作及管理
    Linux虚拟机基本操作
    JAVA堆,栈的区别,用AarrayList、LinkedList自定义栈
    mysql优化limit
    MySql五大引擎的区别以及优劣之分
  • 原文地址:https://www.cnblogs.com/zengkefu/p/5673923.html
Copyright © 2011-2022 走看看