我的具体情况是:使用RobotFramework测试时,本来可以通过的一个case报错了,报错为:InternalError: (1205, u'Lock wait timeout exceeded; try restarting transaction。
找到问题所在了,就是线上开发环境和开本地环境同时跑AT,结果因为争抢数据库资源导致数据库死锁。解决方法其实可以简化为两步:1.查出锁死的数据库线程SELECT trx_mysql_thread_id FROM information_schema.INNODB_TRX;
;2.将查出的线程杀死 kill 。
前言:朋友咨询我说执行简单的update语句失效,症状如下:
mysql> update order_info set province_id=15 ,city_id= 1667 where order_from=10 and order_out_sn='1407261241xxxx';
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
mysql>
1,查看数据库的隔离级别:
mysql> select @@tx_isolation;
---------------------------------
| @@tx_isolation |
---------------------------------
| REPEATABLE-READ |
---------------------------------
1 row in set (0.00 sec)
mysql>
2,去查看现当前库的线程情况:
mysql> show processlist;
--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | User | Host | db | Command | Time | State | Info |
--------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | event_scheduler | localhost | NULL | Daemon | 9635385 | Waiting on empty queue | NULL |
| 9930577 | business_web | 192.168.1.21:45503 | business_db | Sleep | 153 | | NULL |
| 9945825 | business_web | 192.168.1.25:49518 | business_db | Sleep | 43 | | NULL |
| 9946322 | business_web | 192.168.1.23:44721 | business_db | Sleep | 153 | | NULL |
| 9960167 | business_web | 192.168.3.28:2409 | business_db | Sleep | 93 | | NULL |
| 9964484 | business_web | 192.168.1.21:24280 | business_db | Sleep | 7 | | NULL |
| 9972499 | business_web | 192.168.3.28:35752 | business_db | Sleep | 13 | | NULL |
| 10000117 | business_web | 192.168.3.28:9149 | business_db | Sleep | 6 | | NULL |
| 10002523 | business_web | 192.168.3.29:42872 | business_db | Sleep | 6 | | NULL |
| 10007545 | business_web | 192.168.1.21:51379 | business_db | Sleep | 155 | | NULL |
--------------------------------------------------------------------------------------------------------------------------------------------------------
没有看到正在执行的慢SQL记录线程,再去查看innodb的事务表INNODB_TRX,看下里面是否有正在锁定的事务线程,看看ID是否在show full processlist里面的sleep线程中,如果是,就证明这个sleep的线程事务一直没有commit或者rollback而是卡住了,我们需要手动kill掉。
mysql> SELECT * FROM information_schema.INNODB_TRX;
*************************** 1. row ***************************
trx_id: 20866
trx_state: LOCK WAIT
trx_started: 2014-07-31 10:42:35
trx_requested_lock_id: 20866:617:3:3
trx_wait_started: 2014-07-30 10:42:35
trx_weight: 2
trx_mysql_thread_id: 9930577
trx_query: delete from dltask where id=1
trx_operation_state: starting index read
trx_tables_in_use: 1
trx_tables_locked: 1
trx_lock_structs: 2
trx_lock_memory_bytes: 376
trx_rows_locked: 1
trx_rows_modified: 0
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 10000
trx_is_read_only: 0
trx_autocommit_non_locking: 0
3,看到有这条9930577的sql,kill掉,执行kill 9930577;
mysql> kill 9930577;
Query OK, 0 rows affected (0.00 sec)
mysql>
然后再去查询INNODB_TRX表,就没有阻塞的事务sleep线程存在了,如下所示:
mysql> SELECT * FROM INNODB_TRX;
Empty set (0.00 sec)
ERROR:
No query specified
mysql>
再去执行update语句,就能正常执行了,如下所示:
mysql> update order_info set province_id=15 ,city_id= 1667 where order_from=10 and order_out_sn='1407261241xxxx';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql>
4,总结分析
表数据量也不大,按照普通的情况来说,简单的update应该不会造成阻塞的,mysql都是autocommit,不会出现update卡住的情况,去查看下autocommit的值。
mysql> select @@autocommit;
-------------------------------------
| @@autocommit |
-------------------------------------
| 0 |
-------------------------------------
1 row in set (0.00 sec)
mysql>
看到亮闪闪的0,这个设置导致原来的update语句如果没有commit的话,你再重新执行update语句,就会等待锁定,当等待时间过长的时候,就会报ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction的错误。
所以赶紧commit刚才执行的update语句,之后 set global autocommit=1;