zoukankan      html  css  js  c++  java
  • mysql-mmm故障整理

    Auth: Jin
    Date: 20140414

    1.master-slave同步问题
    1)故障描述和错误代码:
    监控报警slave故障
    登录slave服务器查看
    mysql> show slave statusG
    发现有Errno
    详细的记录在另外电脑,电脑硬盘坏了,没有具体的报错了
    2)排查步骤:
    (1)查看错误日志
    tail -n 100 /var/log/mysql/mysql_7.error
    没有发现问题
    cat /mysql/data7/master.info
    18
    mysql-bin.000106
    847011590
    查看master信息
    # cat /mysql/data7/relay-log.info
    binlogs/mysql-relay-bin.000022
    846786796
    mysql-bin.000106
    846786650
    8
    从master读取到slave的信息
    ll /mysql/data7/binlogs/
    total 826996
    -rw-rw----. 1 mysql mysql 199 Apr 8 17:39 mysql-relay-bin.000021
    -rw-rw----. 1 mysql mysql 846828651 Apr 14 14:31 mysql-relay-bin.000022
    -rw-rw----. 1 mysql mysql 62 Apr 8 17:39 mysql-relay-bin.index
    tail -f /mysql/data7/binlogs/mysql-relay-bin.000022
    都正常
    初步判断IO线程正常
    (2)仔细看slave status
    mysql> show slave statusG
    可以看到IO线程是正常的,SQL线程有问题
    3)解决办法
    重新按最后的position重新同步

    4)总结:
    (1)需要熟悉repliction的原理
    (2)一般情况之需要看slave的status即可,info文件信息都读取进status了,根据status判断是IO的问题,还是SQL的问题


    2,master当机,无法自动恢复问题
    1)故障秒素和错误代码
    监控报警master服务器宕了
    登录mmm_control查看,发现writer切换到backup master上去了。原来的master处于正在恢复总
    2)排查步骤
    查看master服务-正常
    slave status也OK,没有找到自动切换的原因
    后查资料说一个bug,把检测时间修改为了10可避免,我设置的是60
    3)解决办法
    (1)手动把“出故障的”master为online
    (2)将writer角色mv给master

    Auth: Jin
    Date: 20140505
    Title: SQL注入攻击锁表,引起MMM slave无法复制

    故障描述:slave复制不正常
    root@JQ-pdt-host-3 ~]# mmm_control @3307 show
    db2(192.168.201.2) slave/REPLICATION_DELAY. Roles:
    db215(192.168.201.215) slave/ONLINE. Roles: reader(192.168.201.21), reader(192.168.201.22), reader(192.168.201.23)
    db216(192.168.201.216) master/ONLINE. Roles: writer(192.168.201.20)
    db217(192.168.201.217) slave/REPLICATION_DELAY. Roles:
    db3(192.168.201.3) master/ONLINE. Roles:

    [root@JQ-pdt-host-2 ~]# mysql -h 127.0.0.1 -P 3307 -u root -ppassword
    Welcome to the MySQL monitor. Commands end with ; or g.
    Your MySQL connection id is 2155233
    Server version: 5.5.30-log MySQL Community Server (GPL)

    Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.

    Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

    mysql> show slave statusG;
    *************************** 1. row ***************************
    Slave_IO_State: Waiting for master to send event
    Master_Host: 192.168.201.216
    Master_User: dbslave
    Master_Port: 3307
    Connect_Retry: 60
    Master_Log_File: mysql-bin.000109
    Read_Master_Log_Pos: 671221507
    Relay_Log_File: mysql-relay-bin.000031
    Relay_Log_Pos: 668578820
    Relay_Master_Log_File: mysql-bin.000109
    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
    Replicate_Do_DB:
    Replicate_Ignore_DB: mysql,test,performance_schema
    Replicate_Do_Table:
    Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
    Replicate_Wild_Ignore_Table:
    Last_Errno: 0
    Last_Error:
    Skip_Counter: 0
    Exec_Master_Log_Pos: 668578674
    Relay_Log_Space: 671221852
    Until_Condition: None
    Until_Log_File:
    Until_Log_Pos: 0
    Master_SSL_Allowed: No
    Master_SSL_CA_File:
    Master_SSL_CA_Path:
    Master_SSL_Cert:
    Master_SSL_Cipher:
    Master_SSL_Key:
    Seconds_Behind_Master: 1503
    Master_SSL_Verify_Server_Cert: No
    Last_IO_Errno: 0
    Last_IO_Error:
    Last_SQL_Errno: 0
    Last_SQL_Error:
    Replicate_Ignore_Server_Ids:
    Master_Server_Id: 33071


    mysql> show slave statusG
    *************************** 1. row ***************************
    Slave_IO_State: Waiting for master to send event
    Master_Host: 192.168.201.216
    Master_User: dbslave
    Master_Port: 3307
    Connect_Retry: 60
    Master_Log_File: mysql-bin.000109
    Read_Master_Log_Pos: 671759015
    Relay_Log_File: mysql-relay-bin.000031
    Relay_Log_Pos: 668578820
    Relay_Master_Log_File: mysql-bin.000109
    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
    Replicate_Do_DB:
    Replicate_Ignore_DB: mysql,test,performance_schema
    Replicate_Do_Table:
    Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
    Replicate_Wild_Ignore_Table:
    Last_Errno: 0
    Last_Error:
    Skip_Counter: 0
    Exec_Master_Log_Pos: 668578674
    Relay_Log_Space: 671759360
    Until_Condition: None
    Until_Log_File:
    Until_Log_Pos: 0
    Master_SSL_Allowed: No
    Master_SSL_CA_File:
    Master_SSL_CA_Path:
    Master_SSL_Cert:
    Master_SSL_Cipher:
    Master_SSL_Key:
    Seconds_Behind_Master: 1724
    Master_SSL_Verify_Server_Cert: No
    Last_IO_Errno: 0
    Last_IO_Error:
    Last_SQL_Errno: 0
    Last_SQL_Error:
    Replicate_Ignore_Server_Ids:
    Master_Server_Id: 33071
    1 row in set (0.00 sec)


    原因分析1:SQL线程阻塞了
    Seconds_Behind_Master: 在增加
    Exec_Master_Log_Pos: 668578674 没有变化
    根据网上的资料,
    http://blog.csdn.net/zbszhangbosen/article/details/8494921
    Seconds_Behind_Master表示slave上SQL thread与IO thread之间的延迟,我们都知道在MySQL的复制环境中,slave先从master上将
    binlog拉取到本地(通过IO thread),
    然后通过SQLthread将binlog重放,而Seconds_Behind_Master表示本地relaylog中未被执行完的那部分的差值

    原因分析2:查询slave没有将relaylog应用到数据库的原因

    操作
    1.mmm剔除db2
    [root@JQ-pdt-host-3 ~]# mmm_control @3307 set_offline db2
    无法关闭

    2.查看错误日志
    [root@JQ-pdt-host-217 ~]# tail -f /var/log/mysql/mysql_7.error
    140501 19:32:05 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
    in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
    end_log_pos 117120783, Error_code: 1032
    140502 17:02:17 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
    in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
    end_log_pos 233682298, Error_code: 1032
    140502 17:55:03 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
    in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
    end_log_pos 244489088, Error_code: 1032
    140503 3:23:02 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
    in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
    end_log_pos 308850228, Error_code: 1032
    140503 4:15:10 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
    in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
    end_log_pos 313199736, Error_code: 1032
    140503 8:29:47 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
    in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
    end_log_pos 332919503, Error_code: 1032
    140503 18:47:15 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
    in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
    end_log_pos 391309758, Error_code: 1032
    140504 1:38:20 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
    in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
    end_log_pos 431859737, Error_code: 1032
    140505 9:30:08 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
    in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
    end_log_pos 606849321, Error_code: 1032
    140505 14:17:00 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record
    in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,
    end_log_pos 638173055, Error_code: 1032
    master
    140328 9:32:19 [Note] Event Scheduler: Loaded 0 events
    140328 9:32:19 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
    Version: '5.5.30-log' socket: '/mysql/data7/mysql.sock' port: 3307 MySQL Community Server (GPL)
    140328 9:44:30 [Note] Error reading relay log event: slave SQL thread was killed
    140328 9:44:30 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (
    server_errno=2013)
    140328 9:44:30 [Note] Slave I/O thread killed while reading event
    140328 9:44:30 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000153', position 496435278
    140328 9:44:43 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000153' at position
    496435278, relay log 'binlogs/mysql-relay-bin.000254' position: 959809
    140328 9:44:43 [Note] Slave I/O thread: connected to master 'dbslave@192.168.201.3:3307',replication started in log
    'mysql-bin.000153' at position 496435278
    140328 15:04:47 [Note] Slave: received end packet from server, apparent master shutdown:
    140328 15:04:47 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000153' at
    position 532390356
    140328 15:04:47 [ERROR] Slave I/O: error reconnecting to master 'dbslave@192.168.201.3:3307' - retry-time: 60
    retries: 86400, Error_code: 2003
    140328 15:07:47 [Note] Slave: connected to master 'dbslave@192.168.201.3:3307',replication resumed in log 'mysql-
    bin.000153' at position 532390356
    140328 15:44:47 [Note] Error reading relay log event: slave SQL thread was killed
    140328 15:44:47 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (
    server_errno=2013)
    140328 15:44:47 [Note] Slave I/O thread killed while reading event
    140328 15:44:47 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000154', position 4451889
    140328 15:45:02 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000154' at position
    4451889, relay log 'binlogs/mysql-relay-bin.000257' position: 253
    140328 15:45:02 [Note] Slave I/O thread: connected to master 'dbslave@192.168.201.3:3307',replication started in log
    'mysql-bin.000154' at position 4451889
    报错信息都是比较老的

    3.查看线程
    mysql> show processlist |
    | 2 | system user | | NULL | Connect | 2438 | Waiting for table level lock | NULL
    |
    | 30 | dbproxy | 192.168.201.218:36994 | ask | Query | 2431 | Waiting for table level lock |
    SELECT q.question_id AS id, q.question_content AS content,a.answer_content,a.answer_isexpert,a.answe |
    | 33 | dbproxy | 192.168.201.218:37000 | ask | Query | 2435 | Waiting for table level lock |
    SELECT q.question_id AS id, q.question_content AS content,a.answer_content,a.answer_isexpert,a.answe |
    | 37 | dbproxy | 192.168.201.243:55050 | ask | Query | 2420 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 41 | dbproxy | 192.168.201.243:55053 | ask | Query | 2408 | Waiting for table level lock |
    select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
    | 42 | dbproxy | 192.168.201.243:55059 | ask | Query | 2390 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 46 | dbproxy | 192.168.201.243:55062 | ask | Query | 2437 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 50 | dbproxy | 192.168.201.243:55065 | ask | Query | 2407 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 51 | dbproxy | 192.168.201.243:55069 | ask | Query | 2393 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 55 | dbproxy | 192.168.201.218:37714 | ask | Query | 2434 | Waiting for table level lock |
    SELECT q.question_id AS id, q.question_content AS content,a.answer_content,a.answer_isexpert,a.answe |
    | 82 | dbproxy | 192.168.201.243:55224 | ask | Query | 2425 | Waiting for table level lock |
    select q.question_id,q.question_cate_id,q.question_title,q.question_subtitle,q.question_addtime from |
    | 120 | dbproxy | 192.168.201.243:55455 | ask | Query | 2439 | User sleep |
    select * from ask_question where question_id=''+(select(sleep(3)))+'' |
    | 258 | dbproxy | 192.168.201.243:54612 | ask | Query | 2409 | Waiting for table level lock |
    select * from ask_question where question_id=''+(select(sleep(3)))+'' |
    | 259 | dbproxy | 192.168.201.243:54615 | ask | Query | 2436 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 1269 | dbproxy | 192.168.201.243:39716 | ask | Query | 2412 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 1270 | dbproxy | 192.168.201.243:39720 | ask | Query | 2416 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 1610 | dbproxy | 192.168.201.243:59369 | ask | Query | 2435 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 1614 | dbproxy | 192.168.201.243:59372 | ask | Query | 2414 | Waiting for table level lock |
    select * from `ask_question` where question_id = '618401' |
    | 2080 | dbproxy | 192.168.201.243:37535 | ask | Query | 2406 | Waiting for table level lock |
    select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
    | 2081 | dbproxy | 192.168.201.243:37538 | ask | Query | 2407 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 4866 | dbproxy | 192.168.201.243:37556 | ask | Query | 2436 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 4867 | dbproxy | 192.168.201.243:37559 | ask | Query | 2430 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 4868 | dbproxy | 192.168.201.243:37562 | ask | Query | 2426 | Waiting for table level lock |
    SELECT COUNT(*) AS `numrows`
    FROM (`ask_question`)
    WHERE `question_uid` = '1112844184'
    AND `question |
    | 4869 | dbproxy | 192.168.201.243:37565 | ask | Query | 2398 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 4870 | dbproxy | 192.168.201.243:37568 | ask | Query | 2435 | Waiting for table level lock |
    select q.question_id tid,q.question_username author,q.question_uid authorid,q.question_title subject |
    | 4871 | dbproxy | 192.168.201.243:37571 | ask | Query | 2413 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 4872 | dbproxy | 192.168.201.243:37574 | ask | Query | 2415 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 4873 | dbproxy | 192.168.201.243:37577 | ask | Query | 2404 | Waiting for table level lock |
    SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
    | 4874 | dbproxy | 192.168.201.243:43949 | ask | Query | 2421 | Waiting for table level lock |
    SELECT COUNT(1) AS count FROM ask_question WHERE (question_uid=793 OR question_expert_id=43) AND que |
    | 4876 | dbproxy | 192.168.201.243:43971 | ask | Query | 2419 | Waiting for table level lock |
    SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
    | 4877 | dbproxy | 192.168.201.243:43974 | ask | Query | 2400 | Waiting for table level lock |
    select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
    | 4878 | dbproxy | 192.168.201.243:43977 | ask | Query | 2393 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 9496 | dbproxy | 192.168.201.218:34009 | ask | Query | 2430 | Waiting for table level lock |
    SELECT q.question_id AS id, q.question_content AS content,a.answer_content,a.answer_isexpert,a.answe |
    | 15948 | dbproxy | 192.168.201.243:45947 | ask | Query | 2433 | Waiting for table level lock |
    SELECT COUNT(*) AS `numrows`
    FROM (`ask_question`)
    WHERE `question_uid` = '43606'
    AND `question_cate |
    | 15949 | dbproxy | 192.168.201.243:45952 | ask | Query | 2438 | Waiting for table level lock |
    select q.*,c.category_name,c.category_parent_id from ask_question as q left join ask_category as c o |
    | 15950 | dbproxy | 192.168.201.243:45956 | ask | Query | 2425 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 932995 | dbproxy | 192.168.201.243:33088 | ask | Query | 2401 | Waiting for table level lock |
    select q.question_id,q.question_cate_id,q.question_title,q.question_subtitle,q.question_addtime from |
    | 2055023 | dbproxy | 192.168.201.243:41700 | ask | Query | 2390 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 2055027 | dbproxy | 192.168.201.243:41717 | ask | Query | 2392 | Waiting for table level lock |
    SELECT COUNT(1) AS count FROM ask_question WHERE question_uid=745 |
    | 2055028 | dbproxy | 192.168.201.243:41721 | ask | Query | 2392 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 2055029 | dbproxy | 192.168.201.243:41724 | ask | Query | 2392 | Waiting for table level lock |
    select q.question_id,q.question_cate_id,q.question_title,q.question_subtitle,q.question_addtime from |
    | 2055030 | dbproxy | 192.168.201.243:41727 | ask | Query | 2390 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 2055031 | dbproxy | 192.168.201.243:41730 | ask | Query | 2378 | Waiting for table level lock |
    select * from ask_question where question_id=''+(select(sleep(3)))+'' |
    | 2055032 | dbproxy | 192.168.201.243:41733 | ask | Query | 2389 | Waiting for table level lock |
    select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
    | 2055033 | dbproxy | 192.168.201.243:41736 | ask | Query | 2378 | Waiting for table level lock |
    SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
    | 2055034 | dbproxy | 192.168.201.243:41739 | ask | Query | 2388 | Waiting for table level lock |
    select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
    | 2055036 | dbproxy | 192.168.201.243:41742 | ask | Query | 2388 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 2055039 | dbproxy | 192.168.201.243:41747 | ask | Query | 2383 | Waiting for table level lock |
    select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
    | 2055040 | dbproxy | 192.168.201.243:41752 | ask | Query | 2381 | Waiting for table level lock |
    SELECT COUNT(*) AS `numrows`
    FROM (`ask_question` as q)
    LEFT JOIN `ask_question_attention` as qa ON |
    | 2055044 | dbproxy | 192.168.201.243:41764 | ask | Query | 2379 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 2055047 | dbproxy | 192.168.201.243:41789 | ask | Query | 2377 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 2055051 | dbproxy | 192.168.201.243:41792 | ask | Query | 2371 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 2055052 | dbproxy | 192.168.201.243:41799 | ask | Query | 2376 | Waiting for table level lock |
    SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
    | 2055053 | dbproxy | 192.168.201.243:41804 | ask | Query | 2367 | Waiting for table level lock |
    SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
    | 2055054 | dbproxy | 192.168.201.243:41810 | ask | Query | 2373 | Waiting for table level lock |
    SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
    | 2055055 | dbproxy | 192.168.201.243:41819 | ask | Query | 2368 | Waiting for table level lock |
    select * from `ask_question` where question_id = '626326' |
    | 2055063 | dbproxy | 192.168.201.243:41843 | ask | Query | 2367 | Waiting for table level lock |
    SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
    | 2056431 | root | 127.0.0.1:60233 | NULL | Query | 0 | NULL | show
    processlist |
    +---------+-------------+-----------------------+------+---------+---------+----------------------------------

    引起原因:锁表(全局表读锁)

    4.kill线程
    kill不掉

    5.重启mysql
    [root@JQ-pdt-host-2 ~]# mysqld_multi stop 7 --password=password
    [root@JQ-pdt-host-2 ~]# mysqld_multi report 7
    Reporting MySQL servers
    MySQL server from group: mysqld7 is not running
    [root@JQ-pdt-host-2 ~]# mysqld_multi start 7
    [root@JQ-pdt-host-2 ~]# mysqld_multi report 7
    Reporting MySQL servers
    MySQL server from group: mysqld7 is running

    6.确认复制
    mysql> show slave statusG
    *************************** 1. row ***************************
    Slave_IO_State: Waiting for master to send event
    Master_Host: 192.168.201.216
    Master_User: dbslave
    Master_Port: 3307
    Connect_Retry: 60
    Master_Log_File: mysql-bin.000109
    Read_Master_Log_Pos: 674890825
    Relay_Log_File: mysql-relay-bin.000033
    Relay_Log_Pos: 66526
    Relay_Master_Log_File: mysql-bin.000109
    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
    Replicate_Do_DB:
    Replicate_Ignore_DB: mysql,test,performance_schema
    Replicate_Do_Table:
    Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
    Replicate_Wild_Ignore_Table:
    Last_Errno: 0
    Last_Error:
    Skip_Counter: 0
    Exec_Master_Log_Pos: 674890825
    Relay_Log_Space: 66682
    Until_Condition: None
    Until_Log_File:
    Until_Log_Pos: 0
    Master_SSL_Allowed: No
    Master_SSL_CA_File:
    Master_SSL_CA_Path:
    Master_SSL_Cert:
    Master_SSL_Cipher:
    Master_SSL_Key:
    Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
    Last_IO_Errno: 0
    Last_IO_Error:
    Last_SQL_Errno: 0
    Last_SQL_Error:
    Replicate_Ignore_Server_Ids:
    Master_Server_Id: 33071
    1 row in set (0.00 sec)
    正常

    7.确认mmm
    [root@JQ-pdt-host-3 ~]# mmm_control @3307 show
    db2(192.168.201.2) slave/ADMIN_OFFLINE. Roles:
    db215(192.168.201.215) slave/ONLINE. Roles: reader(192.168.201.21), reader(192.168.201.22), reader(192.168.201.23)
    db216(192.168.201.216) master/ONLINE. Roles: writer(192.168.201.20)
    db217(192.168.201.217) slave/REPLICATION_DELAY. Roles:
    db3(192.168.201.3) master/ONLINE. Roles:

    [root@JQ-pdt-host-3 ~]# mmm_control @3307 set_online db2
    OK: State of 'db2' changed to ONLINE. Now you can wait some time and check its new roles!
    [root@JQ-pdt-host-3 ~]# mmm_control @3307 show
    db2(192.168.201.2) slave/ONLINE. Roles: reader(192.168.201.22)
    db215(192.168.201.215) slave/ONLINE. Roles: reader(192.168.201.21), reader(192.168.201.23)
    db216(192.168.201.216) master/ONLINE. Roles: writer(192.168.201.20)
    db217(192.168.201.217) slave/REPLICATION_DELAY. Roles:
    db3(192.168.201.3) master/ONLINE. Roles:


    8.另外一台场一样操(不用se offline)
    [root@JQ-pdt-host-217 ~]# mysqld_multi stop 7 --password=password
    [root@JQ-pdt-host-217 ~]# mysqld_multi report 7
    Reporting MySQL servers
    MySQL server from group: mysqld7 is not running
    [root@JQ-pdt-host-217 ~]# mysqld_multi start 7
    [root@JQ-pdt-host-3 ~]# mmm_control @3307 show
    db2(192.168.201.2) slave/ONLINE. Roles: reader(192.168.201.22)
    db215(192.168.201.215) slave/ONLINE. Roles: reader(192.168.201.23)
    db216(192.168.201.216) master/ONLINE. Roles: writer(192.168.201.20)
    db217(192.168.201.217) slave/ONLINE. Roles: reader(192.168.201.21)
    db3(192.168.201.3) master/ONLINE. Roles:

    9.联系开发 提交slow-log

    在这个过程中后三台slave搞锁了


    10.查询前端错误和访问日志
    grep 'share.html' /var/log/nginx/ask_access.log
    183.60.196.29 - - [05/May/2014:16:47:19 +0800] "GET /question/share.html?
    question_id=2000000000%27%28%22%29%29%28%28%29%27%27 HTTP/1.1" 500 5 "-" "sqlmap/0.9 (http://sqlmap.sourceforge.net)"

    183.60.196.29 - - [05/May/2014:16:47:19 +0800] "GET /question/share.html?question_id=2000000000%29%20AND
    %20993%3D1067%20AND%20%289822%3D9822 HTTP/1.1" 500 1293 "-" "sqlmap/0.9 (http://sqlmap.sourceforge.net)"

    183.60.196.29 - - [05/May/2014:16:47:19 +0800] "GET /question/share.html?question_id=2000000000%20AND%205341%3D5341
    HTTP/1.1" 500 1293 "-" "sqlmap/0.9 (http://sqlmap.sourceforge.net)"
    14.17.87.22 - - [05/May/2014:16:49:39 +0800] "GET /question/share.html?question_id=999999.9+union+all+select
    +0x31303235343830303536%2C0x31303235343830303536%2C0x31303235343830303536%2C0x31303235343830303536%2C0x31303235343830
    303536-- HTTP/1.1" 500 1293 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij"
    14.17.87.22 - - [05/May/2014:16:49:45 +0800] "GET /question/share.html?question_id=2000000000%3B+if+
    %281%3D1%29+waitfor+delay+%2700%3A00%3A01%27-- HTTP/1.1" 500 5 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT
    5.1; SV1; .NET CLR 2.0.50727) Havij"

    14.17.87.22 - - [05/May/2014:16:49:45 +0800] "GET /question/share.html?question_id=2000000000+and+if
    %281%3D1%2CBENCHMARK%28218400%2CMD5%280x41%29%29%2C0%29 HTTP/1.1" 500 1293 "-" "Mozilla/4.0 (compatible; MSIE 7.0;
    Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij
    14.17.87.22 - - [05/May/2014:16:49:46 +0800] "GET /question/share.html?question_id=2000000000%27+and+if%28Length
    %28%28database%28%29%29%29%3C32%2CBENCHMARK%28105380%2CMD5%280x41%29%29%2C0%29+and+%27x%27%3D%27x HTTP/1.1" 500 1293
    "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij"
    14.17.87.22 - - [05/May/2014:16:49:46 +0800] "GET /question/share.html?question_id=2000000000%27+and+if%28Length
    %28%28database%28%29%29%29%3C16%2CBENCHMARK%28105380%2CMD5%280x41%29%29%2C0%29+and+%27x%27%3D%27x HTTP/1.1" 500 1293
    "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij"

    14.17.87.22 - - [05/May/2014:16:49:46 +0800] "GET /question/share.html?question_id=2000000000%27+and+if%28Length
    %28%28database%28%29%29%29%3C4%2CBENCHMARK%28105380%2CMD5%280x41%29%29%2C0%29+and+%27x%27%3D%27x HTTP/1.1" 500 1293
    "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij"
    121.8.210.24 - - [05/May/2014:18:43:53 +0800] "GET /question/share.html?question_id='and 1=(if(2>1,1,0)) and '
    HTTP/1.1" 500 1293 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
    Chrome/28.0.1500.63 Safari/537.36"
    121.8.210.24 - - [05/May/2014:18:43:57 +0800] "GET /question/share.html?
    question_id='and 1=(if(2>1,1,')) and ' HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64)
    AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
    121.8.210.24 - - [05/May/2014:18:44:01
    +0800] "GET /question/share.html?question_id='and 1=(if(2>1,1,')) and ' HTTP/1.1" 500 5 "http://ask.qq.com"
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
    121.8.210.24 - - [05/May/2014:18:44:03 +0800] "GET /question/share.html?question_id='and 1=(if(2>1,1,)) and '
    HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
    Chrome/28.0.1500.63 Safari/537.36"
    121.8.210.24 - - [05/May/2014:18:44:25 +0800] "GET /question/share.html?
    question_id='and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1;
    WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
    121.8.210.24 - -
    [05/May/2014:18:44:26 +0800] "GET /question/share.html?question_id='and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1"
    500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
    Chrome/28.0.1500.63 Safari/537.36"
    121.8.210.24 - - [05/May/2014:18:44:27 +0800] "GET /question/share.html?
    question_id='and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1;
    WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
    121.8.210.24 - -
    [05/May/2014:18:44:34 +0800] "GET /question/share.html?question_id=' and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1"
    500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
    Chrome/28.0.1500.63 Safari/537.36"
    121.8.210.24 - - [05/May/2014:18:44:35 +0800] "GET /question/share.html?
    question_id=' and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT
    6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
    121.8.210.24 - -
    [05/May/2014:18:46:58 +0800] "GET /question/share.html?question_id=' union select(0)from(select(sleep(9))) HTTP/1.1"
    500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
    Chrome/28.0.1500.63 Safari/537.36"
    121.8.210.24 - - [05/May/2014:18:46:58 +0800] "GET /question/share.html?
    question_id=' union select(0)from(select(sleep(9))) HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1;
    WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
    121.8.210.24 - -
    [05/May/2014:18:46:59 +0800] "GET /question/share.html?question_id=' union select(0)from(select(sleep(9))) HTTP/1.1"
    500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
    Chrome/28.0.1500.63 Safari/537.36"
    121.8.210.24 - - [05/May/2014:18:47:37 +0800] "GET /question/share.html?
    question_id='%2b(select(0)from(select(sleep(0)))v)%2b' HTTP/1.1" 500 1293 "http://ask.qq.com" "Mozilla/5.0 (Windows
    NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"


    11.确定原因
    SQL注入引起的, 日志内可见如下访问:
    "GET /question/share.html?question_id=620779%20AND%20100%3E(SELECT%20COUNT(*)%20FROM
    %20information_schema.columns,information_schema.columns%20T2,information_schema.columns
    %20T3,information_schema.columns%20T4,information_schema.columns%20T5,information_schema.columns
    %20T6,information_schema.columns%20T7,information_schema.columns%20T8,information_schema.columns
    %20T9,information_schema.columns%20T10%20WHERE%207551=7551)%20AND%20(SELECT%20LENGTH(CURRENT_USER))%20%3C
    %202147483647 HTTP/1.1" 200 5 "-" "-"
    "GET /question/share.html?question_id='%2b(select(0)from(select(sleep(3)))v)%2b'%22%2b(select(0)from(select(sleep
    (0)))v)%2b%22*/ HTTP/1.1" 500

    排查 web_code/application/controllers/quesiton.php::share 对question_id未做SQL注入防范,已修正

    12.解决办法
    开发patch补丁


    总结:
    本次故障原因
    1.开发未做SQL注入防范,漏洞被别人抓住
    2.SQL注入引起slave锁表,进一步造成slave SQL线程无法正常工作,脱离MMM

    MMM故障排查总结
    1.查看MMM状态
    2.查看MYSQL服务状态
    3.查看单个mysql 同步状态和proceslist状态
    4.查看单个mysql errorlog和slow log
    5.前端access 和 error log

  • 相关阅读:
    Redis概述与安装
    CentOS-Linux系统下安装MySQL
    CentOS-Linux系统下安装Tomcat
    CentOS-Linux系统下安装JDK
    占位符
    设计模式之装饰者模式
    Servlet3.0
    注解入门
    PHP如何实现99乘法表?
    2017年PHP程序员未来路在何方?(转载)
  • 原文地址:https://www.cnblogs.com/diege/p/3664078.html
Copyright © 2011-2022 走看看