zoukankan      html  css  js  c++  java
  • MySQL redo lock 死锁问题排查 & 解决过程

    版权声明:本文由张青林原创文章,转载请注明出处: 

    文章原文链接:https://www.qcloud.com/community/article/181

    来源:腾云阁 https://www.qcloud.com/community

    周一上班,首先向同事了解了一下上周的测试情况,被告知在多实例场景下 MySQL Server hang 住,无法测试下去,原生版本不存在这个问题,而新版本上出现了这个问题,不禁心头一颤,心中不禁感到奇怪,好在现场环境还在,为排查问题提供了一个好的环境,随投入到紧张的问题排查过程当中……,问题实例表现如下:

    • 并发量为 384 的时候出现的问题;
    • MySQL 服务器无法执行事务相关的语句,即使简单的 select 语句也无法执行;
    • 所有线程处于等待状态,无法 KILL;

    现场环境的收集

    • 首先,通过 pstack 工具获取当前问题实例的堆栈信息以便后面具体线程的查找 & 问题线程的定位:
      pstack { pidof hang mysqld server } > 20002.info
    • 使用 pt-pmp 工具统计 hang.info 中的进程信息,如下:
      [root@TENCENT64 ~]# pt-pmp 20002.info
      305 select(libc.so.6),os_thread_sleep(os0thread.cc:261),srv_conc_enter_innodb_with_atomics(srv0conc.cc:276),srv_conc_enter_innodb(srv0conc.cc:276),innobase_srv_conc_enter_innodb(ha_innodb.cc:1173),ha_innobase::index_read(ha_innodb.cc:1173),handler::index_read_idx_map(handler.cc:6846),handler::ha_index_read_idx_map(handler.cc:2787),join_read_(handler.cc:2787),join_read__table(handler.cc:2787),make_join_statistics(sql_optimizer.cc:3592),JOIN::optimize(sql_optimizer.cc:363),mysql_execute_select(sql_select.cc:1087),mysql_select(sql_select.cc:1087),handle_select(sql_select.cc:110),execute_sqlcom_select(sql_parse.cc:5156),mysql_execute_command(sql_parse.cc:2656),mysql_parse(sql_parse.cc:6408),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:990),handle_one_connection(sql_connect.cc:906),start_thread(libpthread.so.0),clone(libc.so.6)
       68 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),log_reserve_and_write_fast(sync0sync.ic:218),mtr_log_reserve_and_write(sync0sync.ic:218),mtr_commit(sync0sync.ic:218),trx_prepare(trx0trx.cc:2206),trx_prepare_for_mysql(trx0trx.cc:2258),innobase_xa_prepare(ha_innodb.cc:13618),ha_prepare_low(handler.cc:2104),ha_commit_trans(handler.cc:1434),trans_commit(transaction.cc:228),mysql_execute_command(sql_parse.cc:4302),mysql_parse(sql_parse.cc:6408),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:990),handle_one_connection(sql_connect.cc:906),start_thread(libpthread.so.0),clone(libc.so.6)
       61 select(libc.so.6),os_thread_sleep(os0thread.cc:261),srv_conc_enter_innodb_with_atomics(srv0conc.cc:276),srv_conc_enter_innodb(srv0conc.cc:276),innobase_srv_conc_enter_innodb(ha_innodb.cc:1173),ha_innobase::index_read(ha_innodb.cc:1173),ha_index_read_map(handler.cc:2752),handler::read_range_first(handler.cc:2752),handler::multi_range_read_next(handler.cc:5864),QUICK_RANGE_SELECT::get_next(opt_range.cc:10644),rr_quick(records.cc:369),mysql_update(sql_update.cc:742),mysql_execute_command(sql_parse.cc:3336),mysql_parse(sql_parse.cc:6408),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:990),handle_one_connection(sql_connect.cc:906),start_thread(libpthread.so.0),clone(libc.so.6)
       14 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),log_reserve_and_write_fast(sync0sync.ic:218),mtr_log_reserve_and_write(sync0sync.ic:218),mtr_commit(sync0sync.ic:218),trx_undo_assign_undo(trx0undo.cc:1813),trx_undo_report_row_operation(trx0rec.cc:1282),btr_cur_upd_lock_and_undo(btr0cur.cc:1710),btr_cur_update_in_place(btr0cur.cc:1710),btr_cur_optimistic_update(btr0cur.cc:2170),row_upd_clust_rec(row0upd.cc:2132),row_upd_clust_step(row0upd.cc:2466),row_upd(row0upd.cc:2521),row_upd_step(row0upd.cc:2521),row_update_for_mysql(row0mysql.cc:1779),ha_innobase::update_row(ha_innodb.cc:7091),handler::ha_update_row(handler.cc:7305),mysql_update(sql_update.cc:819),mysql_execute_command(sql_parse.cc:3336),mysql_parse(sql_parse.cc:6408),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:990),handle_one_connection(sql_connect.cc:906),start_thread(libpthread.so.0),clone(libc.so.6)
       10 __io_getevents_0_4(libaio.so.1),os_aio_linux_collect(os0file.cc:4975),os_aio_linux_handle(os0file.cc:4975),fil_aio_wait(fil0fil.cc:5796),io_handler_thread(srv0start.cc:492),start_thread(libpthread.so.0),clone(libc.so.6)
        5 pthread_cond_wait,inline_mysql_cond_wait(mysql_thread.h:1162),block_until_new_connection(mysql_thread.h:1162),one_thread_per_connection_end(mysql_thread.h:1162),do_handle_one_connection(sql_connect.cc:997),handle_one_connection(sql_connect.cc:906),start_thread(libpthread.so.0),clone(libc.so.6)
        4 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),log_reserve_and_write_fast(sync0sync.ic:218),mtr_log_reserve_and_write(sync0sync.ic:218),mtr_commit(sync0sync.ic:218),btr_pcur_commit_specify_mtr(btr0pcur.ic:386),ibuf_btr_pcur_commit_specify_mtr(btr0pcur.ic:386),ibuf_merge_or_delete_for_page(btr0pcur.ic:386),buf_page_io_complete(buf0buf.cc:4313),fil_aio_wait(fil0fil.cc:5832),io_handler_thread(srv0start.cc:492),start_thread(libpthread.so.0),clone(libc.so.6)
        3 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),log_reserve_and_write_fast(sync0sync.ic:218),mtr_log_reserve_and_write(sync0sync.ic:218),mtr_commit(sync0sync.ic:218),btr_pcur_commit_specify_mtr(btr0pcur.ic:386),ibuf_btr_pcur_commit_specify_mtr(btr0pcur.ic:386),ibuf_delete_rec(btr0pcur.ic:386),ibuf_merge_or_delete_for_page(ibuf0ibuf.cc:4884),buf_page_io_complete(buf0buf.cc:4313),fil_aio_wait(fil0fil.cc:5832),io_handler_thread(srv0start.cc:492),start_thread(libpthread.so.0),clone(libc.so.6)
        3 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),srv_worker_thread(srv0srv.cc:2499),start_thread(libpthread.so.0),clone(libc.so.6)
        1 sigwait(libpthread.so.0),signal_hand(mysqld.cc:3217),start_thread(libpthread.so.0),clone(libc.so.6)
        1 select(libc.so.6),os_thread_sleep(os0thread.cc:261),srv_conc_enter_innodb_with_atomics(srv0conc.cc:276),srv_conc_enter_innodb(srv0conc.cc:276),innobase_srv_conc_enter_innodb(ha_innodb.cc:1173),ha_innobase::index_read(ha_innodb.cc:1173),ha_innobase::index_first(ha_innodb.cc:7883),handler::ha_index_first(handler.cc:2861),join_read_first(sql_executor.cc:2539),sub_select(sql_executor.cc:1259),do_select(sql_executor.cc:936),JOIN::exec(sql_executor.cc:936),mysql_execute_select(sql_select.cc:1101),mysql_select(sql_select.cc:1101),handle_select(sql_select.cc:110),execute_sqlcom_select(sql_parse.cc:5156),mysql_execute_command(sql_parse.cc:2656),mysql_parse(sql_parse.cc:6408),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:990),handle_one_connection(sql_connect.cc:906),start_thread(libpthread.so.0),clone(libc.so.6)
        1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),rw_lock_s_lock_spin(sync0rw.cc:429),rw_lock_s_lock_func(sync0rw.ic:369),pfs_rw_lock_s_lock_func(sync0rw.ic:369),buf_page_get_gen(sync0rw.ic:369),trx_undo_page_get(trx0undo.ic:170),trx_undo_get_first_rec(trx0undo.ic:170),trx_purge_read_undo_rec(trx0purge.cc:741),trx_purge_choose_next_log(trx0purge.cc:782),trx_purge_get_next_rec(trx0purge.cc:888),trx_purge_fetch_next_rec(trx0purge.cc:960),trx_purge_attach_undo_recs(trx0purge.cc:960),trx_purge(trx0purge.cc:960),srv_do_purge(srv0srv.cc:2602),srv_purge_coordinator_thread(srv0srv.cc:2602),start_thread(libpthread.so.0),clone(libc.so.6)
        1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),rw_lock_s_lock_spin(sync0rw.cc:429),rw_lock_s_lock_func(sync0rw.ic:369),pfs_rw_lock_s_lock_func(sync0rw.ic:369),buf_page_get_gen(sync0rw.ic:369),btr_block_get_func(btr0btr.ic:60),btr_cur_latch_leaves(btr0btr.ic:60),btr_cur_open_at_rnd_pos_func(btr0cur.cc:1047),btr_pcur_open_at_rnd_pos_func(btr0pcur.ic:548),ibuf_merge_pages(btr0pcur.ic:548),ibuf_merge(ibuf0ibuf.cc:2759),ibuf_contract_in_background(ibuf0ibuf.cc:2759),srv_master_do_active_tasks(srv0srv.cc:2116),srv_master_thread(srv0srv.cc:2116),start_thread(libpthread.so.0),clone(libc.so.6)
        1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),mtr_add_dirtied_pages_to_flush_list(sync0sync.ic:218),mtr_log_reserve_and_write(mtr0mtr.cc:270),mtr_commit(mtr0mtr.cc:270),row_upd_sec_index_entry(row0upd.cc:1819),row_upd_sec_step(row0upd.cc:1861),row_upd(row0upd.cc:1861),row_upd_step(row0upd.cc:1861),row_update_for_mysql(row0mysql.cc:1779),ha_innobase::update_row(ha_innodb.cc:7091),handler::ha_update_row(handler.cc:7305),mysql_update(sql_update.cc:819),mysql_execute_command(sql_parse.cc:3336),mysql_parse(sql_parse.cc:6408),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:990),handle_one_connection(sql_connect.cc:906),start_thread(libpthread.so.0),clone(libc.so.6)
        1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),log_write_up_to(sync0sync.ic:218),log_write_up_to(log0log.cc:1651),buf_flush_write_block_low(buf0flu.cc:919),buf_flush_page(buf0flu.cc:919),buf_flush_try_neighbors(buf0flu.cc:1276),buf_flush_page_and_try_neighbors(buf0flu.cc:1350),buf_do_flush_list_batch(buf0flu.cc:1350),buf_flush_batch(buf0flu.cc:1700),buf_flush_list(buf0flu.cc:1700),page_cleaner_do_flush_batch(buf0flu.cc:2166),page_cleaner_flush_pages_if_needed(buf0flu.cc:2166),buf_flush_page_cleaner_thread(buf0flu.cc:2166),start_thread(libpthread.so.0),clone(libc.so.6)
        1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),log_reserve_and_write_fast(sync0sync.ic:218),mtr_log_reserve_and_write(sync0sync.ic:218),mtr_commit(sync0sync.ic:218),row_ins_sec_index_entry_low(row0ins.cc:2779),row_ins_sec_index_entry(row0ins.cc:2930),row_upd_sec_index_entry(row0upd.cc:1834),row_upd_sec_step(row0upd.cc:1861),row_upd(row0upd.cc:1861),row_upd_step(row0upd.cc:1861),row_update_for_mysql(row0mysql.cc:1779),ha_innobase::update_row(ha_innodb.cc:7091),handler::ha_update_row(handler.cc:7305),mysql_update(sql_update.cc:819),mysql_execute_command(sql_parse.cc:3336),mysql_parse(sql_parse.cc:6408),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:990),handle_one_connection(sql_connect.cc:906),start_thread(libpthread.so.0),clone(libc.so.6)
        1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),log_reserve_and_write_fast(sync0sync.ic:218),mtr_log_reserve_and_write(sync0sync.ic:218),mtr_commit(sync0sync.ic:218),ibuf_mtr_commit(ibuf0ibuf.ic:58),ibuf_merge_or_delete_for_page(ibuf0ibuf.ic:58),buf_page_io_complete(buf0buf.cc:4313),fil_aio_wait(fil0fil.cc:5832),io_handler_thread(srv0start.cc:492),start_thread(libpthread.so.0),clone(libc.so.6)
        1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),log_get_lsn(sync0sync.ic:218),srv_error_monitor_thread(sync0sync.ic:218),start_thread(libpthread.so.0),clone(libc.so.6)
        1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),buf_dump_thread(buf0dump.cc:594),start_thread(libpthread.so.0),clone(libc.so.6)
        1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:177),os_event_wait_time_low(os0sync.cc:177),srv_monitor_thread(srv0srv.cc:1551),start_thread(libpthread.so.0),clone(libc.so.6)
        1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:177),os_event_wait_time_low(os0sync.cc:177),lock_wait_timeout_thread(lock0wait.cc:503),start_thread(libpthread.so.0),clone(libc.so.6)
        1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:177),os_event_wait_time_low(os0sync.cc:177),ib_wqueue_timedwait(ut0wqueue.cc:154),fts_optimize_thread(fts0opt.cc:3000),start_thread(libpthread.so.0),clone(libc.so.6)
        1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:177),os_event_wait_time_low(os0sync.cc:177),dict_stats_thread(dict0stats_bg.cc:349),start_thread(libpthread.so.0),clone(libc.so.6)
        1 poll(libc.so.6),vio_io_wait(viosocket.c:771),vio_socket_io_wait(viosocket.c:68),vio_read(viosocket.c:123),net_read_raw_loop(net_serv.cc:669),net_read_packet_header(net_serv.cc:751),net_read_packet(net_serv.cc:751),my_net_read(net_serv.cc:894),do_command(sql_parse.cc:970),do_handle_one_connection(sql_connect.cc:990),handle_one_connection(sql_connect.cc:906),start_thread(libpthread.so.0),clone(libc.so.6)
        1 poll(libc.so.6),handle_connections_sockets(mysqld.cc:6320),mysqld_main(mysqld.cc:5730),__libc_start_main(libc.so.6),_start
      

    问题分析

    从堆栈上可以看出,有这样几类线程:

    • 等待进入 INNODB engine 层的用户线程,测试环境中 innodb_thread_concurrency=16, 当 INNODB 层中的活跃线程数目大于此值时则需要排队,所以会有大量的排队线程,这个参数的影响&作用本身就是一篇很不错的文章,由于篇幅有限,在此不做扩展,感兴趣者可以参考官方文档
    • 操作过程中需要写 redo log 的后台线程,主要包括 page cleaner 线程、异步 io threads等;
    • 正在读取Page页面的 purge 线程 & 操作 change buffer 的 master thread
    • 大量的需要写 redo log 的用户线程;

    从以上的分类不难看出,所有需要写 redo log 的线程都在等待 log_sys->mutex,那么这个保护redo log buffer 的 mutex 被究竟被哪个线程获取了呢,因此,我们可以顺着这个线索进行问题排查,需要解决以下问题:

    • 问题一:哪个线程获取了 log_sys->mutex ?
    • 问题二:获取 log_sys->mutex 的线程为什么没有继续执行下去,是在等其它锁还是其它原因?
    • 问题三:如果不是硬件问题,整个资源竟争的过程是如何的?

    问题一 由表及里

    在查找 log_sys->mutex 所属线程情况时,有两点可以帮助我们快速的定位到这个线程:

    • 由于 log_sys->mutex 同时只能被同一个线程获得,所以在 pt-pmp 的信息输出中就可以排除线程数目大于1的线程;
    • 此线程既然已经获取了 log_sys->mutex, 那就应该还是在写日志的过程中,因此重点可以查看写日志的逻辑,即包括:mtr_log_reserve_and_write 或 log_write_up_to 的堆栈;

    顺着上面的思路很快的从 pstack 中找到了以下线程:

    Thread 446 (Thread 0x7fbea736d700 (LWP 32591)):
    #0  0x00007fd1b72c8705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    #1  0x00000000009f47f1 in os_cond_wait (fast_mutex=0x7defd680, cond=0x7defd6c0) at ../storage/innobase/os/os0sync.cc:214
    #2  os_event_wait_low (event=event@entry=0x7defd680, reset_sig_count=<optimized out>) at ../storage/innobase/os/os0sync.cc:592
    #3  0x0000000000a5ca1e in sync_array_wait_event (arr=0x340b850, index=1) at ../storage/innobase/sync/sync0arr.cc:424
    #4  0x0000000000a5e750 in mutex_spin_wait (mutex=mutex@entry=0x7defd410, file_name=file_name@entry=0xd2d348 "../storage/innobase/mtr/mtr0mtr.cc", line=line@entry=215) at ../storage/innobase/sync/sync0sync.cc:580
    #5  0x00000000009ebc27 in mutex_enter_func (line=215, file_name=0xd2d348 "../storage/innobase/mtr/mtr0mtr.cc", mutex=0x7defd410) at ../storage/innobase/include/sync0sync.ic:218
    #6  pfs_mutex_enter_func (line=215, file_name=0xd2d348 "../storage/innobase/mtr/mtr0mtr.cc", mutex=0x7defd410) at ../storage/innobase/include/sync0sync.ic:250
    #7  mtr_add_dirtied_pages_to_flush_list (mtr=0x7fbea736ae10) at ../storage/innobase/mtr/mtr0mtr.cc:215
    #8  0x00000000009ec34c in mtr_log_reserve_and_write (mtr=<optimized out>) at ../storage/innobase/mtr/mtr0mtr.cc:270
    #9  mtr_commit (mtr=mtr@entry=0x7fbea736ae10) at ../storage/innobase/mtr/mtr0mtr.cc:324
    #10 0x0000000000a4a477 in row_upd_sec_index_entry (node=node@entry=0x7fbb9c01cd18, thr=thr@entry=0x7fbb9c01d018) at ../storage/innobase/row/row0upd.cc:1819
    #11 0x0000000000a4e73c in row_upd_sec_step (thr=<optimized out>, node=0x7fbb9c01cd18) at ../storage/innobase/row/row0upd.cc:1861
    #12 row_upd (thr=0x7fbb9c01d018, node=0x7fbb9c01cd18) at ../storage/innobase/row/row0upd.cc:2556
    #13 row_upd_step (thr=thr@entry=0x7fbb9c01d018) at ../storage/innobase/row/row0upd.cc:2673
    #14 0x0000000000a339c8 in row_update_for_mysql (mysql_rec=mysql_rec@entry=0x7fbb9c017528 "377366
    401350e/016571626760676467656363-46485490867-60617341003-10452679142-20155412750-96257700347-89668812213-35633691718-21407253672-69113394932", ' ' <repeats 72 times>..., prebuilt=0x7fbb9c01c568) at ../storage/innobase/row/row0mysql.cc:1779
    #15 0x00000000009ac99e in ha_innobase::update_row (this=0x7fbb9c016f10, old_row=0x7fbb9c017528 "377366
    401350e/016571626760676467656363-46485490867-60617341003-10452679142-20155412750-96257700347-89668812213-35633691718-21407253672-69113394932", ' ' <repeats 72 times>..., new_row=0x7fbb9c017300 "377366
    401351e/016571626760676467656363-46485490867-60617341003-10452679142-20155412750-96257700347-89668812213-35633691718-21407253672-69113394932", ' ' <repeats 72 times>...) at ../storage/innobase/handler/ha_innodb.cc:7091
    #16 0x00000000005b4332 in handler::ha_update_row (this=0x7fbb9c016f10, old_data=0x7fbb9c017528 "377366
    401350e/016571626760676467656363-46485490867-60617341003-10452679142-20155412750-96257700347-89668812213-35633691718-21407253672-69113394932", ' ' <repeats 72 times>..., new_data=0x7fbb9c017300 "377366
    401351e/016571626760676467656363-46485490867-60617341003-10452679142-20155412750-96257700347-89668812213-35633691718-21407253672-69113394932", ' ' <repeats 72 times>...) at ../sql/handler.cc:7305
    #17 0x000000000076b145 in mysql_update (thd=thd@entry=0x7e604470, table_list=<optimized out>, fields=..., values=..., conds=0x7fbe2800aca0, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false, found_return=found_return@entry=0x7fbea736bac0, updated_return=updated_return@entry=0x7fbea736c020) at ../sql/sql_update.cc:819
    #18 0x00000000006fc2dd in mysql_execute_command (thd=thd@entry=0x7e604470) at ../sql/sql_parse.cc:3336
    #19 0x0000000000700618 in mysql_parse (thd=thd@entry=0x7e604470, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7fbea736c770)
    ...
    

    这里我们简单的介绍一下MySQL写 redo log 的过程(省略undo & buffer pool 部分),当对数据进行修改时,MySQL 会首先对针对操作类型记录不同的 redo 日志,主要过程是:

    • 记录操作前的数据,根据不同的类型生成不同的 redo 日志,redo 的类型可以参考文件:src/storage/innobase/include/mtr0mtr.h
    • 记录操作之后的数据,对于不同的类型会包含不同的内容,具体可以参考函数:recv_parse_or_apply_log_rec_body()
    • 写日志到 redo buffer,并将此次涉及到脏页的数据加入到 buffer_pool 的 flush list 链表中;
    • 根据 innodb_flush_log_at_trx_commit 的值来判断在commit 的时候是否进行 sync 操作;

    上面的堆栈则是写Redo后将脏页加到 flush list 过程中时 hang 住了,即此线程在获取了 log_sys->mutex 后,在获取 log_sys->log_flush_order_mutex 的过程中 hang 住了,而此时有大量的线程在等待该线程释放log_sys->mutex锁,问题一 已经有了答案,那么log_sys->log_flush_order_mutex 是个什么东东,它又被哪个占用了呢?

    说 明:MySQL 的 buffer pool 维护了一个有序的脏页链表 (flush list according LSN order),这样在做 checkpoint & log_free_check 的过程中可以很快的定位到 redo log 需要推进的位置,在将脏页加入
    flush list 过程中需要对其上锁以保证 flush list 中 LSN 的有序性, 但是如果使用 log_sys->mutex,在并发量大的时候则会造成 log_sys->mutex 的 contention,进而引起性能问题,因此添加了
    另外一个 mutex 来保护脏页按 LSN 的有序性,代码说明如下:

            ib_mutex_t     log_flush_order_mutex;/*!< mutex to serialize access to
                                            the flush list when we are putting
                                            dirty blocks in the list. The idea
                                            behind this mutex is to be able
                                            to release log_sys->mutex during
                                            mtr_commit and still ensure that
                                            insertions in the flush_list happen
                                            in the LSN order. */
    

    问题二 弹尽粮绝

    在问题一的排查过程中我们确定了 log_sys->mutex 的所属线程, 这个线程在获得 log_sys->log_flush_order_mutex 的过程中 hang 住了,因此线程堆栈可以分以为下几类:

    • Thread 446, 获得 log_sys->mutex, 等待获取 log_sys->log_flush_order_mutex 以把脏页加入到 buffer_pool 的 flush list中;
    • 需要获得 log_sys->mutex 以写日志或者读取日志信息的线程;
    • 未知线程获得 log_sys->log_flush_order_mutex,在做其它事情的时候被 hang 住;

    因此,问题的关键是找到哪个线程获取了log_sys->log_flush_order_mutex

    为了找到相关的线程做了以下操作:

    • 查找获取log_sys->log_flush_order_mutex 的地方;

      [root@TENCENT64 /data/workplace/CDB]# grep log_flush_order_mutex_enter storage/innobase/* -r --color
      storage/innobase/buf/buf0buf.cc:        log_flush_order_mutex_enter();
      storage/innobase/include/log0log.h:#define log_flush_order_mutex_enter() do {           
      storage/innobase/log/log0recv.cc:               log_flush_order_mutex_enter();
      storage/innobase/mtr/mtr0mtr.cc:                log_flush_order_mutex_enter();
      
    • 结合现有 pstack 中的线程信息,仔细查看上述查找结果中的相关代码,发现基本没有线程获得log_sys->log_flush_order_mutex

    • gdb 进入 MySQL Server, 将log_sys->log_flush_order_mutex 打印出来,发现{waiters=1; lock_word= 0}!!!,即 Thread 446 在等待一个空闲的 mutex,而这个Mutex也确实被等待,由于我们的版本为 Release 版本,所以很多有用的信息没有办法得到,而若用 debug 版本跑则很难重现问题,log_flush_order_mutex 的定义如下:
    /** InnoDB mutex */
    struct ib_mutex_t {
            os_event_t      event;  /*!< Used by sync0arr.cc for the wait queue */
            volatile lock_word_t    lock_word;      /*!< lock_word is the target  of the atomic test-and-set instruction when atomic operations are enabled. */
    
    #if !defined(HAVE_ATOMIC_BUILTINS)
            os_fast_mutex_t
            os_fast_mutex;  /*!< We use this OS mutex in place of lock_word when atomic operations are not enabled */
    #endif  
            ulint   waiters;        /*!< This ulint is set to 1 if there are (or may be) threads waiting in the global wait array for this mutex to be released. Otherwise, this is 0. */
            UT_LIST_NODE_T(ib_mutex_t)      list; /*!< All allocated mutexes are put into a list. Pointers to the next and prev. */
    
    #ifdef UNIV_SYNC_DEBUG
            const char*     file_name;      /*!< File where the mutex was locked */
            ulint   line;           /*!< Line where the mutex was locked */
            ulint   level;          /*!< Level in the global latching order */
    #endif /* UNIV_SYNC_DEBUG */
            const char*     cfile_name;/*!< File name where mutex created */
            ulint           cline;  /*!< Line where created */
            ulong           count_os_wait;  /*!< count of os_wait */
    #ifdef UNIV_DEBUG
    
    /** Value of mutex_t::magic_n */
    # define MUTEX_MAGIC_N  979585UL
    
            os_thread_id_t thread_id; /*!< The thread id of the thread which locked the mutex. */
            ulint           magic_n;        /*!< MUTEX_MAGIC_N */
            const char*     cmutex_name;    /*!< mutex name */
            ulint           ib_mutex_type;  /*!< 0=usual mutex, 1=rw_lock mutex */
    
    #endif /* UNIV_DEBUG */
    
    #ifdef UNIV_PFS_MUTEX    struct PSI_mutex* pfs_psi;      /*!< The performance schema instrumentation hook */
    #endif
    };
    

    由以上的分析可以得出 问题二 的答案:

    • 只有两个线程和log_sys->log_flush_order_mutex有关,其中一个是 Thread 446 线程, 另外一个则是最近一次调用 log_flush_order_mutex_exit() 的线程;
    • 现有线程中某个线程在释放log_sys->log_flush_order_mutex的过程中没有唤醒 Thread 446,导致Thread 446 hang 并造成其它线程不能获得 log_sys->mutex,进而造成实例不可用;
    • log_sys->log_flush_order_mutex 没有被任何线程获得;

    问题三 绝处逢生

    由问题二的分析过程可知 log_sys->log_flush_order_mutex 没有被任何线程获得,可是为什么 Thread 446 没有被唤醒呢,信号丢失还是程序问题?如果是信号丢失,为什么可以稳定复现?官方的bug list 列表中是没有类似的 Bug的,搜了一下社区,发现可用信息很少,这个时候分析好像陷入了死胡同,心里压力开始无形中变大……,好像没有办法,但是任何问题都是有原因的,找到了原因,也就是有解的了……,再一次将注意力移到了 Thread 446 的堆栈中,然后查看了函数:

    /************************************************************//**
    Append the dirty pages to the flush list. */
    static
    void
    mtr_add_dirtied_pages_to_flush_list(
    /*================================*/
            mtr_t*  mtr)    /*!< in/out: mtr */
    {
            ut_ad(!srv_read_only_mode);
    
            /* No need to acquire log_flush_order_mutex if this mtr has
            not dirtied a clean page. log_flush_order_mutex is used to
            ensure ordered insertions in the flush_list. We need to
            insert in the flush_list iff the page in question was clean
            before modifications. */
            if (mtr->made_dirty) {
                    log_flush_order_mutex_enter();
            }
    
            /* It is now safe to release the log mutex because the
            flush_order mutex will ensure that we are the first one
            to insert into the flush list. */
            log_release();
    
            if (mtr->modifications) {
                    mtr_memo_note_modifications(mtr);
            }
    
            if (mtr->made_dirty) {
                    log_flush_order_mutex_exit();
            }
    }
    

    由问题二的分析过程可以得出某线程在 log_flush_order_mutex_exit 的退出过程没有将 Thread 446 唤醒,那么就顺着这个函数找,看它如何唤醒其它本程的,在没有办法的时候也只有这样一步一步的分析代码,希望有些收获,随着函数调用的不断深入,将目光定在了 mutex_exit_func 上, 函数中的注释引起了我的注意:

    /******************************************************************//**
    NOTE! Use the corresponding macro mutex_exit(), not directly this function!
    Unlocks a mutex owned by the current thread. */
    UNIV_INLINE                             
    void    mutex_exit_func(
    /*============*/
            ib_mutex_t*     mutex)  /*!< in: pointer to mutex */
    {
            ut_ad(mutex_own(mutex));
    
            ut_d(mutex->thread_id = (os_thread_id_t) ULINT_UNDEFINED);
    
    #ifdef UNIV_SYNC_DEBUG
            sync_thread_reset_level(mutex);
    #endif
            mutex_reset_lock_word(mutex);
    
            /* A problem: we assume that mutex_reset_lock word
            is a memory barrier, that is when we read the waiters
            field next, the read must be serialized in memory
            after the reset. A speculative processor might
            perform the read first, which could leave a waiting
            thread hanging indefinitely.
    
            Our current solution call every second
            sync_arr_wake_threads_if_sema_free()
            to wake up possible hanging threads if
            they are missed in mutex_signal_object. */
    
            if (mutex_get_waiters(mutex) != 0) {
    
                    mutex_signal_object(mutex);
            }
    
    #ifdef UNIV_SYNC_PERF_STAT
            mutex_exit_count++;
    #endif
    }
    

    从上面的注释中可以得到两点信息:

    • 由于 memory barrier 的存在,mutex_get_waiters & mutex_reset_lock_word 的调用顺序可能与执行顺序相反,这种情况下会引起 hang 问题;
    • 专门写了一个函数 sync_arr_wake_threads_if_sema_free() 来解决上述问题;

    由上面的注释可以看到,并不是信号丢失,而是多线程 memory barrier 的存在可能会造成指令执行的顺序的异常,这种问题确定存在,但既然有sync_arr_wake_threads_if_sema_free() 规避这个问题,为什么还会存在 hang 呢?有了这个线索,瞬间感觉有了些盼头……,经过查找sync_arr_wake_threads_if_sema_free 只在srv_error_monitor_thread 有调用,这个线程是专门对 MySQL 内部异常情况进行监控并打印出 error 信息的线程,臭名昭著的 600S 自杀案也是它的杰作, 那么问题来了:

    • 机器周末都在 hang 着,为什么没有检测到异常并 abort 呢?
    • 既然sync_arr_wake_threads_if_sema_free 可以唤醒,为什么没有唤醒呢?

    顺着这个思路,查看了pstack 中 srv_error_monitor_thread 的堆栈,可以发现此线程在获取log_sys->mutex 的时候hang 住了,因此无法执行sync_arr_wake_threads_if_sema_free() & 常归的异常检查,正好回答了上面的问题,详细堆栈如下:

    Thread 470 (Thread 0x7fbeb0a2e700 (LWP 20032)):
    #0  0x00007fd1b72c8705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    #1  0x00000000009f47f1 in os_cond_wait (fast_mutex=0x7defd5f0, cond=0x7defd630) at ../storage/innobase/os/os0sync.cc:214
    #2  os_event_wait_low (event=event@entry=0x7defd5f0, reset_sig_count=<optimized out>) at ../storage/innobase/os/os0sync.cc:592
    #3  0x0000000000a5ca1e in sync_array_wait_event (arr=0x340b850, index=86) at ../storage/innobase/sync/sync0arr.cc:424
    #4  0x0000000000a5e750 in mutex_spin_wait (mutex=mutex@entry=0x7defd3c8, file_name=file_name@entry=0xd2d380 "../storage/innobase/include/log0log.ic", line=line@entry=409) at ../storage/innobase/sync/sync0sync.cc:580
    #5  0x0000000000a552e9 in mutex_enter_func (line=409, file_name=0xd2d380 "../storage/innobase/include/log0log.ic", mutex=0x7defd3c8) at ../storage/innobase/include/sync0sync.ic:218
    #6  pfs_mutex_enter_func (line=409, file_name=0xd2d380 "../storage/innobase/include/log0log.ic", mutex=0x7defd3c8) at ../storage/innobase/include/sync0sync.ic:250
    #7  log_get_lsn () at ../storage/innobase/include/log0log.ic:409
    #8  srv_error_monitor_thread (arg=<optimized out>) at ../storage/innobase/srv/srv0srv.cc:1722
    #9  0x00007fd1b72c4df3 in start_thread () from /lib64/libpthread.so.0
    #10 0x00007fd1b63ac1bd in clone () from /lib64/libc.so.6
    

    经过上面的分析问题越来越明朗了,过程可以简单的归结为:

    • Thread 446 获得log_sys->mutex, 但是在等待 log_sys->log_flush_order_mutex 的过程中没有被唤醒;
    • Thread XXX 在释放 log_sys->log_flush_order_mutex 的过程中出现了 memory barrier 问题,没有唤醒 Thread 446 ;
    • Thread 470 获得 log_sys->mutex 时被 hang 住,导致无法执行sync_arr_wake_threads_if_sema_free(), 导致了整个实例的 hang 住;
    • Thread 470 需要获得 Thread 446 的 log_sys->mutex, 而 Thread 446 需要被 Thread 470 唤醒才会释放 log_sys->mutex;

    结合 log_sys->log_flush_order_mutex 的状态信息,实例 hang 住的整个过程如下:

    THD_1                                                      THD_2                                        THD_N                             srv_error_monitor_thread   
    
    mutex_enter(log_sys->mutex)
    ...
    mutx_enter(log_sys->log_flush_order_mutex)
    mutex_exit(log_sys->mutex)
    mutx_exit(log_sys->log_flush_order_mutex)                  mutex_enter(log_sys->mutex)                  mutex_enter(log_sys->mutex)       mutex_enter(log_sys->mutex)
           mutex_get_waiters()                                 ...
                                                               mutx_enter(log_sys->log_flush_order_mutex)
                                                                     mutex_set_waiters
                                                                     sync_array_wait_event
           mutex_reset_lock_word()
    

    关于 Memory barrier 的介绍可以参考①介绍

    问题解决

    既然知道了问题产生的原因,那么问题也就可以顺利解决了,有两种方法:

    • 直接移除 log_get_lsn 在此处的判断,本身就是开发人员加的一些判断信息,为了定位 LSN 的异常而写的,用到的时候也Crash了,用处不大;
    • 保留判断,将 log_get_lsn 修改为 log_peek_lsn, 后者会首先进行 try_lock,当发现上锁失败的时候会直接返回,而不进行判断,这种方法较优雅些;
      经过修改之后的版本在测试过程中没有没有再复现此问题;

    问题扩展

    虽然问题解决了,但官方版本中肯定存在着这个问题,为什么 buglist 没有找到相关信息呢,于是在查看了最新代码,发现这个问题已经修复,修复方法为上面列的第二种方法,详细的 commit message 信息如下:

    commit 79032a7ae1b4e000028a64ab0d2f216d4c23767b
    Author: Shaohua Wang <shaohua.wang@oracle.com>
    Date:   Tue Jan 12 15:08:09 2016 +0800
    
    BUG#22530768 Innodb freeze running REPLACE statements
    
    we can see from the hang stacktrace, srv_monitor_thread is blocked
    when getting log_sys::mutex, so that sync_arr_wake_threads_if_sema_free
    cannot get a change to break the mutex deadlock.
    
    The fix is simply removing any mutex wait in srv_monitor_thread.
    
    Patch is reviewed by Sunny over IM.
    

    bug 影响范围:MySQL 5.6.28 及之前的版本都有此问题

  • 相关阅读:
    【Leetcode】23. Merge k Sorted Lists
    【Leetcode】109. Convert Sorted List to Binary Search Tree
    【Leetcode】142.Linked List Cycle II
    【Leetcode】143. Reorder List
    【Leetcode】147. Insertion Sort List
    【Leetcode】86. Partition List
    jenkins 配置安全邮件
    python 发送安全邮件
    phpstorm 同步远程服务器代码
    phpUnit 断言
  • 原文地址:https://www.cnblogs.com/purpleraintear/p/6051413.html
Copyright © 2011-2022 走看看