zoukankan      html  css  js  c++  java
  • gdb

    问题描述

    如下图,mysql_upgrade 过程中,执行 DROP DATABASE IF EXISTS performance_schema 一直在等待 metadata lock

    屏幕快照 2017-04-01 14.30.03.png

    问题排查

    简单粗暴的方法

    有一种简单的解决方法,把其他连接kill掉,释放 metadata lock

    对于这个案例,占用元数据锁的是 Id = 107768,User = xx1 的连接

    但是这种方法指标不治本,案例中占用元数据锁的连接,是一个agent服务建立的

    mysql_upgrade也是程序执行,不能每次都手工kill连接,需要查明为什么占用锁

    详细查明问题原因

    据业务方反馈,agent服务和调用mysql_upgrade的代码和5.6也在用,没有出现问题。

    怀疑是5.7引入的bug

    根据上述现象,显然是agent占了metadata lock,大概率不是mysql的bug

    为了说服业务方,我们继续排查是在等待什么锁

    查询 performance_schema.metadata_locks

    首先想到5.7的 performance_schema.metadata_locks ,很遗憾这张表里并没有记录

    screenshot.png

    gdb 获取元数据锁信息

    我们尝试使用 gdb 获取锁等待信息

    ps aux | grep 端口号,找出mysqld进程号 pid,pstack pid > stack.log

    在stack.log中搜索 acquire_lock(请求mdl锁的函数),可以看出是 thread 3 在请求元数据锁

    screenshot.png

    gdb -p pid
    thread 3
    切换到目标线程
    
    #0  0x0000003fe940ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    #1  0x0000000000bd3fb2 in native_cond_timedwait (this=0x7eff640e05d8, owner=0x7eff640e0540, abs_timeout=0x7effa83b2ce0, set_status_on_timeout=Unhandled dwarf expression opcode 0xf3
    ) 
    #2  my_cond_timedwait (this=0x7eff640e05d8, owner=0x7eff640e0540, abs_timeout=0x7effa83b2ce0, set_status_on_timeout=Unhandled dwarf expression opcode 0xf3
    ) 
    #3  inline_mysql_cond_timedwait (this=0x7eff640e05d8, owner=0x7eff640e0540, abs_timeout=0x7effa83b2ce0, set_status_on_timeout=Unhandled dwarf expression opcode 0xf3
    ) 
    #4  MDL_wait::timed_wait (this=0x7eff640e05d8, owner=0x7eff640e0540, abs_timeout=0x7effa83b2ce0, set_status_on_timeout=Unhandled dwarf expression opcode 0xf3
    ) 
    #5  0x0000000000bd6048 in MDL_context::acquire_lock (this=0x7eff640e05d8, mdl_request=0x7eff640aa870, lock_wait_timeout=Unhandled dwarf expression opcode 0xf3
    ) 
    
    f 5
    跳转到  MDL_context::acquire_lock
    acquire_lock 函数参数中有 MDL_request
    MDL_request::MDL_key 中有详细的锁信息
    
    p mdl_request->key
    
    {m_length = 34, m_db_name_length = 18, m_ptr = "03performance_schema00global_status", '00' <repeats 353 times>, static m_namespace_to_wait_state_name = { {m_key = 0, 
      m_name = 0x130aa9b "Waiting for global read lock", m_flags = 0}, {m_key = 0, m_name = 0x130abb0 "Waiting for tablespace metadata lock", m_flags = 0}, {m_key = 0, 
      m_name = 0x130abd8 "Waiting for schema metadata lock", m_flags = 0}, {m_key = 0, m_name = 0x130ac00 "Waiting for table metadata lock", m_flags = 0}, {m_key = 0, 
      m_name = 0x130ac20 "Waiting for stored function metadata lock", m_flags = 0}, {m_key = 0, m_name = 0x130ac50 "Waiting for stored procedure metadata lock", m_flags = 0}, {m_key = 0, 
      m_name = 0x130ac80 "Waiting for trigger metadata lock", m_flags = 0}, {m_key = 0, m_name = 0x130aca8 "Waiting for event metadata lock", m_flags = 0}, {m_key = 0, 
      m_name = 0x130aab8 "Waiting for commit lock", m_flags = 0}, {m_key = 0, m_name = 0x130aad0 "User lock", m_flags = 0}, {m_key = 0, m_name = 0x130acc8 "Waiting for locking service lock", 
      m_flags = 0} } }
    

    上述信息可以看出,正在请求performance_schema.global_status这张表的锁

    排查业务代码

    和业务方确认,agent中确实执行了 “show global status” , 但是已经设置了autocommit

    简化逻辑后,agent代码如下

    import MySQLdb
    from time import sleep
    conn = MySQLdb.connect(host='47.93.49.119', port=3001, user='xx1')
    conn.autocommit = True
    cur=conn.cursor()
    cur.execute("show global status")
    while 1:
        sleep(1)
    

    代码中确实设置了autocommit,但是并没有生效(如果执行了commit,不可能不释放元数据锁)

    MySQLdb.connect 返回 Connection 类,根据上述代码,autocommit是 Connection的成员属性

    class Connection(_mysql.connection):
    

    Connection 继承自_mysql.connection,_mysql 是c语言实现的python库,查看_mysql.c

    static PyMethodDef _mysql_ConnectionObject_methods[] = {
        {
            "affected_rows",
            (PyCFunction)_mysql_ConnectionObject_affected_rows,
            METH_VARARGS,
            _mysql_ConnectionObject_affected_rows__doc__
        },
        {
            "autocommit",
            (PyCFunction)_mysql_ConnectionObject_autocommit,
            METH_VARARGS,
            _mysql_ConnectionObject_autocommit__doc__
        },
        {
            "commit",
            (PyCFunction)_mysql_ConnectionObject_commit,
            METH_VARARGS,
            _mysql_ConnectionObject_commit__doc__
        },
    

    autommit 并不是成员属性,而是一个成员方法

    结论

    conn.autocommit = True 强行将 autocommit 的函数指针赋值为 True,并没有真正设置autocommit

    5.6中没有发现这个问题

    一是 agent 中只有查询语句,不设autocommit也能返回查询结果

    二是 5.6中 “show global status” 查询的是 information_shcema,5.7中是performance_schema,5.6中不会影响 drop database performance_schema

    ###########################################################

    MDL(Meta Data LocK)的作用

    在MySQL5.1及之前的版本中,如果有未提交的事务trx,当执行DROP/RENAME/ALTER TABLE RENAME操作时,不会被其他事务阻塞住。这会导致如下问题(MySQL bug#989)

    master: 未提交的事务,但SQL已经完成(binlog也准备好了),表schema发生更改,在commit的时候不会被察觉到.

    slave: 在binlog里是以事务提交顺序记录的,DDL隐式提交,因此在备库先执行DDL,后执行事务trx,由于trx作用的表已经发生了改变,因此trx会执行失败。 在DDL时的主库DML压力越大,这个问题触发的可能性就越高

    在5.5引入了MDL(meta data lock)锁来解决在这个问题

    MDL锁的类型

    metadata lock也是一种锁。每个metadata lock都会定义锁住的对象,锁的持有时间和锁的类型

    属性范围作用
    GLOBAL 全局锁 主要作用是防止DDL和写操作的过程中执行 set golbal_read_only =on 或flush tables with read lock;
    commit 提交保护锁 主要作用是执行flush tables with read lock后,防止已经开始在执行的写事务提交
    SCHEMA 库锁 对象
    TABLE 表锁 对象
    FUNCTION 函数锁 对象
    PROCEDURE 存储过程锁 对象
    TRIGGER 触发器锁 对象
    EVENT 事件锁 对象

    这些锁具有以下层级关系 MDL_SCOPE.png

    MDL锁的简单示例

    在实际工作中,最常见的MDL冲突就DDL的操作被没用提交的事务所阻塞。 我们下面通过一个具体的实例来演示DDL加MDL锁的过程。在这个实例中,利用gdb来跟踪DDL申请MDL锁的过程。

    会话1:

    mysql> create table ti(id int primary key, c1 int, key(c1)) engine=InnoDB  
    stats_auto_recalc=default;
    Query OK, 0 rows affected (0.03 sec)
    
    mysql> insert into ti values (1,1), (2,2);
    Query OK, 2 rows affected (0.03 sec)
    Records: 2  Duplicates: 0  Warnings: 0
    
    mysql> start transaction;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> select * from ti;
    +----+------+
    | id | c1   |
    +----+------+
    |  1 |    1 |
    |  2 |    2 |
    +----+------+
    2 rows in set (0.00 sec)
    
    
    

    再开启第二个会话,利用gdb来跟踪mysql加MDL的过程 会话2:

    [root@localhost mysql]# ps -ef|grep mysql
    root      3336  2390  0 06:33 pts/2    00:00:01 /u02/mysql/bin/mysqld --basedir=/u02/mysql/ --datadir=/u02/mysql/data 
    --plugin-dir=/u02/mysql//lib/plugin --user=root 
    --log-error=/u02/mysql/tmp/error1.log --open-files-limit=10240 
    --pid-file=/u02/mysql/tmp/mysql.pid 
    --socket=/u02/mysql/tmp/mysql.sock --port=3306
    
    [root@localhost mysql]# gdb -p 3336
    ----在GDB设置以下断点
    (gdb) b MDL_context::acquire_lock
    Breakpoint 1 at 0x730cab: file /u02/mysql-server-5.6/sql/mdl.cc, line 2187.
    (gdb) b lock_rec_lock
    Breakpoint 2 at 0xb5ef50: file /u02/mysql-server-5.6/storage/innobase/lock/lock0lock.cc, line 2296.
    
    (gdb) c
    Continuing.....
    

    开启第三个会话

    mysql> alter table ti stats_auto_recalc=1;
    这个操作被hang住
    

    在会话2中执行下面的操作

    (gdb) p mdl_request
    $1 = (MDL_request *) 0x7f697d1c3bd0
    (gdb) p *mdl_request
    $2 = {
    type = MDL_INTENTION_EXCLUSIVE, duration = MDL_STATEMENT, next_in_list = 0x7f697002a560, prev_in_list = 0x7f697d1c3df8, ticket = 0x0, key = {m_length = 3, m_db_name_length = 0,
        m_ptr = '00' <repeats 20 times>, "0|02p0000010060<34}i1770000>240344000000000000	00pi177000000	00pi1770000`>34}i1770000V3123440000000000240>34}i177000033336125400b010000a?0001", '00' <repeats 20 times>, "0|02p00000100220<34}i1770000>240344000000000034023602pi17700003333612540000000000a?0001", '00' <repeats 12 times>"340, >34}i177000060|02p000001003506222003000000003333612540000000000$226363", '00' <repeats 14 times>,
    "?34}i177000060|02p0000010000=34}i1770000>240344000000000000"...,
    static m_namespace_to_wait_state_name = {
    {m_key = 101,
            m_name = 0xf125a2 "Waiting for global read lock", m_flags = 0}, 
    {m_key = 102, 
    	m_name = 0xf125c0 "Waiting for schema metadata lock", m_flags = 0}, 
    {m_key = 103,
            m_name = 0xf125e8 "Waiting for table metadata lock", m_flags = 0}, 
    {m_key = 104, 
    	m_name = 0xf12608 "Waiting for stored function metadata lock", m_flags = 0}, 
    {m_key = 105,
            m_name = 0xf12638 "Waiting for stored procedure metadata lock", m_flags = 0}, 
    {m_key = 106, 
    	m_name = 0xf12668 "Waiting for trigger metadata lock", m_flags = 0}, 
    {m_key = 107,
            m_name = 0xf12690 "Waiting for event metadata lock", m_flags = 0}, 
    {m_key = 108, 
    	m_name = 0xf126b0 "Waiting for commit lock", m_flags = 0}}}}
    (gdb)
    

    从上面的输出中,我只能看到申请了一个语句级别的MDL_INTENTION_EXCLUSIVE。并没有看到什么其他有意义的信息。我们继续gdb跟踪

    (gdb) p *(mdl_request->next_in_list)
    $3 = {type = MDL_INTENTION_EXCLUSIVE, duration = MDL_TRANSACTION, next_in_list = 0x7f697002a388, prev_in_list = 0x7f697d1c3bd8, ticket = 0x0, key = {m_length = 7, m_db_name_length = 4,
        m_ptr = "01test0000217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217217", 
    static m_namespace_to_wait_state_name = {
    {m_key = 101,
            m_name = 0xf125a2 "Waiting for global read lock", m_flags = 0}, 
    {m_key = 102, 
    	m_name = 0xf125c0 "Waiting for schema metadata lock", m_flags = 0}, 
    {m_key = 103,
            m_name = 0xf125e8 "Waiting for table metadata lock", m_flags = 0}, 
    {m_key = 104, 
    	m_name = 0xf12608 "Waiting for stored function metadata lock", m_flags = 0}, 
    {m_key = 105,
            m_name = 0xf12638 "Waiting for stored procedure metadata lock", m_flags = 0}, 
    {m_key = 106, 
    	m_name = 0xf12668 "Waiting for trigger metadata lock", m_flags = 0}, 
    {m_key = 107,
            m_name = 0xf12690 "Waiting for event metadata lock", m_flags = 0}, 
    {m_key = 108, 
    	m_name = 0xf126b0 "Waiting for commit lock", m_flags = 0}}}}        
    

    从上面的输出中,我们看到了需要在test(见输出中的 m_ptr = “01test)数据库上加一把事务级的MDL_INTENTION_EXCLUSIVE锁。它并没有告诉我们最终的MDL会落在哪个对象上。我们继续跟踪

    $4 = {type = MDL_SHARED_UPGRADABLE, duration = MDL_TRANSACTION, next_in_list = 0x0, prev_in_list = 0x7f697002a568, ticket = 0x0, key = {m_length = 9, m_db_name_length = 4,
        m_ptr = "02test00ti", '00' <repeats 378 times>, 
    static m_namespace_to_wait_state_name = {
    {m_key = 101, 
    	m_name = 0xf125a2 "Waiting for global read lock", m_flags = 0}, 
    {m_key = 102,
    	m_name = 0xf125c0 "Waiting for schema metadata lock", m_flags = 0}, 
    {m_key = 103, 
    	m_name = 0xf125e8 "Waiting for table metadata lock", m_flags = 0}, 
    {m_key = 104,
            m_name = 0xf12608 "Waiting for stored function metadata lock", m_flags = 0}, 
    {m_key = 105, 
    	m_name = 0xf12638 "Waiting for stored procedure metadata lock", m_flags = 0}, 
    {m_key = 106,
            m_name = 0xf12668 "Waiting for trigger metadata lock", m_flags = 0}, 
    {m_key = 107, 
    	m_name = 0xf12690 "Waiting for event metadata lock", m_flags = 0}, 
    {m_key = 108,
            m_name = 0xf126b0 "Waiting for commit lock", m_flags = 0}}}}
    

    从上面的输出中,我们可以看出最终是要在test数据库的ti对象上加一把MDL_SHARED_UPGRADABLE锁。在做DDL时会先加MDL_SHARED_UPGRADABLE锁,然后升级到MDL_EXCLUSIVE锁

    我来执行下面的过程 会话1

    mysql> commit;
    Query OK, 0 rows affected (5.51 sec)
    
    

    会话2

    (gdb) p *mdl_request
    $5 = {type = MDL_EXCLUSIVE, duration = MDL_TRANSACTION, next_in_list = 0x20302000000, prev_in_list = 0x200000001, ticket = 0x0, key = {m_length = 9, m_db_name_length = 4,
        m_ptr = "02test00ti00000000@3122003000000003333612540000000000260<34}i17700003023622540000000000300<34}i177000060|02pi1770000320<34}i1770000360236344000000000000	00pi1770000(}02pi1770000360<34}i17700002343123440000000000H24502pi17700003333612540000000000233600001", '00' <repeats 12 times>, "`S05pi177000060|02p0000010060=34}i1770000>240344000000000000	00pi177000000	00pi1770000200=34}i17700002313103440000000000240=34}i1770000l-d0t0000H34400010000000023360000100000000226"..., 
    static m_namespace_to_wait_state_name = {
    {m_key = 101, 
    	m_name = 0xf125a2 "Waiting for global read lock", m_flags = 0}, 
    {m_key = 102, 
    	m_name = 0xf125c0 "Waiting for schema metadata lock", m_flags = 0}, 
    {m_key = 103,
            m_name = 0xf125e8 "Waiting for table metadata lock", m_flags = 0}, 
    {m_key = 104, 
    	m_name = 0xf12608 "Waiting for stored function metadata lock", m_flags = 0}, 
    {m_key = 105,
            m_name = 0xf12638 "Waiting for stored procedure metadata lock", m_flags = 0}, 
    {m_key = 106, 
    	m_name = 0xf12668 "Waiting for trigger metadata lock", m_flags = 0}, 
    {m_key = 107,
            m_name = 0xf12690 "Waiting for event metadata lock", m_flags = 0}, 
    {m_key = 108, 
    	m_name = 0xf126b0 "Waiting for commit lock", m_flags = 0}}}}       
    

    从上面的输出中,我们看到了最终是在test.ti上申请了事务级别的MDL_EXCLUSIVE锁。

    会话3

    mysql> alter table ti stats_auto_recalc=1;
    Query OK, 0 rows affected (22 min 58.99 sec)
    Records: 0  Duplicates: 0  Warnings: 0
    

    小结

    本例只是简单的演示了,在同一个事务的不同时期加的不同的MDL的锁。MYSQL中DDL的操作不属于事务操作的范围。这就给mysql主备基于语句级别同步带来了困难。mysql主备在同步的过程中,为了保证主备结构一致性,而引入了MDL机制。为了尽可能的降低MDL带来的影响。请在业务低谷的时候,执行DDL操作。

  • 相关阅读:
    jmeter如何操作数据库
    jmeter压力测试
    cmd中用ping命令时,提示ping命令不是外部或内部命令问题
    scrapy post Request payload类型值
    scrapy-deltafetch实现增量爬取
    django虚拟环境搭建笔记
    python Image模块基本语法
    登录北京住房公积金,使用已注册过账号
    登录北京社保网站
    python通过pop3方式登录邮箱(qq,新浪,网易)
  • 原文地址:https://www.cnblogs.com/igoodful/p/12152507.html
Copyright © 2011-2022 走看看