zoukankan      html  css  js  c++  java
  • mysql小白系列_14 线上故障分析与排错

    1.重现故障5---线上执行update报错,并处理。(表结构和UPDATE语句自己构造,请给出详细步骤)
    1)update故障出现ERROR 1206 (HY000): The total number of locks exceeds the lock table size
    root@slave01 15:34:  [test]> select count(*) from AZ2C_D01600_1;
    +----------+
    | count(*) |
    +----------+
    | 10394605 |
    +----------+
    1 row in set (10.86 sec)
    
    root@slave01 16:03:  [test]> desc AZ2C_D01600_1;
    +-------+--------------+------+-----+---------+-------+
    | Field | Type         | Null | Key | Default | Extra |
    +-------+--------------+------+-----+---------+-------+
    | f1    | varchar(255) | YES  |     | NULL    |       |
    | f2    | varchar(255) | YES  |     | NULL    |       |
    | f3    | varchar(255) | YES  |     | NULL    |       |
    | f4    | varchar(255) | YES  |     | NULL    |       |
    | f5    | varchar(255) | YES  |     | NULL    |       |
    | f6    | varchar(255) | YES  |     | NULL    |       |
    | f7    | varchar(255) | YES  |     | NULL    |       |
    +-------+--------------+------+-----+---------+-------+
    7 rows in set (0.00 sec)
    
    root@slave01 16:03:  [test]> show create table AZ2C_D01600_1;
    +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Table         | Create Table                                                                                                                                                                                                                                                                                                     |
    +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | AZ2C_D01600_1 | CREATE TABLE `AZ2C_D01600_1` (
      `f1` varchar(255) DEFAULT NULL,
      `f2` varchar(255) DEFAULT NULL,
      `f3` varchar(255) DEFAULT NULL,
      `f4` varchar(255) DEFAULT NULL,
      `f5` varchar(255) DEFAULT NULL,
      `f6` varchar(255) DEFAULT NULL,
      `f7` varchar(255) DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
    +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set (0.00 sec)
    
    root@slave01 16:03:  [test]> update AZ2C_D01600_1 set f3='asdfasdf ' where f1 like '%sdfsfd%';
    ERROR 1206 (HY000): The total number of locks exceeds the lock table size
    root@slave01 16:04:  [test]> 
    
    2)检查参数innodb_buffer_pool_size
    root@slave01 16:06:  [test]> show variables like 'innodb_buffer_pool_size';
    +-------------------------+---------+
    | Variable_name           | Value   |
    +-------------------------+---------+
    | innodb_buffer_pool_size | 5242880 |
    +-------------------------+---------+
    1 row in set (0.00 sec)
    
    [root@slave01 ~]# cat /etc/my.cnf |grep innodb_buffer_pool_size
    innodb_buffer_pool_size = G
    [root@slave01 ~]# ssh master cat /etc/my.cnf |grep innodb_buffer_pool_size
    innodb_buffer_pool_size = G
    [root@slave01 ~]# ssh slave02 cat /etc/my.cnf |grep innodb_buffer_pool_size
    innodb_buffer_pool_size = G
    

    当前主库是slave01,一主二从的innodb_buffe_pool_size均设置错误

    3)首先修正从库错误参数,并重启生效
    root@slave02 16:11:  [(none)]> select @@innodb_buffer_pool_size;
    +---------------------------+
    | @@innodb_buffer_pool_size |
    +---------------------------+
    |                2147483648 |
    +---------------------------+
    1 row in set (0.00 sec)
    
    root@master 16:12:  [(none)]> select @@innodb_buffer_pool_size;
    +---------------------------+
    | @@innodb_buffer_pool_size |
    +---------------------------+
    |                2147483648 |
    +---------------------------+
    1 row in set (0.00 sec)
    
    4)准备切换到从库
    • 1.停止mha服务/usr/local/bin/masterha_stop --conf=/mysqlData/mha/etc/app.cnf
    • 2.如果两次mhafaliover时间不超过8小时,需要删除文件/mysqlData/mha/etc/app/app.failover.complete
    • 3.在mha配置文件/mysqlData/mha/etc/app.conf确定新的mastercandidate_master=1
    5)停止zabbix或者其他监控告警,停止mha服务,切换到从库
    • 1.关闭原主库的事件调度event_scheduler=OFF
    • 2.切换新主库/usr/local/bin/masterha_master_switch --master_state=alive --conf=/mysqlData/mha/etc/app.cnf
    Wed Mar 14 22:58:51 2018 - [info] * Phase 5: New master cleanup phase..
    Wed Mar 14 22:58:51 2018 - [info] 
    Wed Mar 14 22:58:51 2018 - [info]  slave01: Resetting slave info succeeded.
    Wed Mar 14 22:58:51 2018 - [info] Switching master to slave01(192.168.3.21:3306) completed successfully.
    
    • 3.更改原来主库的参数innodb_buffer_pool_size为正确值,并重启
    • 4.启动新slave复制
    CHANGE MASTER TO 
    MASTER_HOST='192.168.3.21', 
    MASTER_PORT=3306, 
    MASTER_AUTO_POSITION=1, 
    MASTER_USER='repl', 
    MASTER_PASSWORD='Rep12#3@';
    
    #因为手工在从库做了修改,导致主从同步失败
    #启用了GTID,使用mysqldump备份恢复进行重建从库
    
    SET @@GLOBAL.GTID_PURGED='73c029dc-2034-11e8-90a5-005056a365b6:1-594908,
    856d79f8-2038-11e8-b511-005056a330bb:1-3,
    b658767f-2044-11e8-951f-005056a330bb:1-188730';
    
    --
    -- Position to start replication or point-in-time recovery from
    --
    
    -- CHANGE MASTER TO MASTER_LOG_FILE='binlog.000135', MASTER_LOG_POS=1060653593;
    
    --
    -- Current Database: `db1`
    --
    
    CREATE DATABASE /*!32312 IF NOT EXISTS*/ `db1` /*!40100 DEFAULT CHARACTER SET utf8 */;
    #直接在mysqldmp恢复后的从库上执行change master后start slave,无法同步
    #重启myslq后报错Slave failed to initialize relay log info structure from the repository,reset slave就OK了
    
    root@master 17:52:  [(none)]> reset slave;
    Query OK, 0 rows affected (0.01 sec)
    
    root@master 17:53:  [(none)]> CHANGE MASTER TO
        -> MASTER_HOST='172.16.3.153',
        -> MASTER_PORT=3306,
        -> MASTER_USER='repl',
        -> MASTER_PASSWORD='Rep12#3@',
        -> master_auto_position=1;
    Query OK, 0 rows affected, 2 warnings (0.04 sec)
    

    https://www.cnblogs.com/mysql-dba/p/7201513.html
    http://blog.csdn.net/wll_1017/article/details/70332107
    https://www.topjishu.com/7997.html
    http://www.mamicode.com/info-detail-1457556.html

    2.请给出MySQL数据丢失的最佳解决方案?(自由发挥)

    数据写入内存,但未写log未落盘

    1)数据丢失后的解决方案?
    • 业务重做
    • 找客服,客服安抚客户
    2)防止数据丢失
    • 主从复制采用半同步方式
    • 双1
    • 超融合存储
    • 换oracle

    https://www.cnblogs.com/netfocus/p/3655958.html

  • 相关阅读:
    python数据类型--字符串
    Cannot open include file: 'afxcontrolbars.h': No such file or directory
    关于找不到tbb_debug.dll解决办法
    Android fill_parent、wrap_content和match_parent的区别
    多个摄像头同步工作【转】
    如何查找openCV函数源代码[转]
    Opencv 完美配置攻略 2014 (Win8.1 + Opencv 2.4.8 + VS 2013)[转]
    CvCaptureFromCam
    OpenCV与相机的关系[转]
    10亿以内和987654互质正整数的和[转自深圳-冒泡]
  • 原文地址:https://www.cnblogs.com/jenvid/p/8593423.html
Copyright © 2011-2022 走看看