zoukankan      html  css  js  c++  java
  • Mysql5.7 的错误日志中最常见的note级别日志解释

     
     
     
    在使用mysql5.7的时候,发现了不少在mysql5.6上不曾见过的日志,级别为note, 最常见的note日志以下三种,下面我们来逐个解释。
    第一种,Aborted connection . 如上图,信息如下:
    2016-03-17T14:44:24.102542Z 59 [Note] Aborted connection 59 to db: ‘unconnected’ user: ‘mha’ host: ‘197.xx.xx.xx’ (Got an error reading communication packets)
    2016-03-17T14:44:31.273897Z 61 [Note] Aborted connection 61 to db: ‘unconnected’ user: ‘mha’ host: ‘197.xx.xx.xx(Got an error reading communication packets)
    2016-03-17T14:44:31.273897Z 61 [Note] Aborted connection 61 to db: ‘unconnected’ user: ‘mha’ host: ‘197.xx.xx.xx(Got an error reading communication packets)
     
    2016-03-17T14:44:31.273897Z 61 [Note] Aborted connection 61 to db: ‘unconnected’ user: ‘mha’ host: ‘197.xx.xx.xx(Got  timeout  reading communication packets)
     
    上面两个是连接中断的错误信息,原因不一样,Got an error reading communication packets  的原因是因为网络等原因导致Got  timeout  reading communication packets 这个错误的原因是会话的idle时间达到了数据库指定的timeout时间。
     
    第二种:SLAVE多线程同步的信息

    信息如下:
    2016-03-14T15:48:26.432150Z 73 [Note]Multi-threaded slave statistics for channel ”: seconds elapsed = 121; eventsassigned = 100374529; worker queues filled over
     overrun level = 0; waited due aWorker queue full = 0; waited due the total size = 0; waited at clock conflicts= 1451875661700 waited (count) when Workers occupied = 3211993 waited whenWorkers occupied = 445032386000
    2016-03-14T15:50:28.398745Z 73 [Note]Multi-threaded slave statistics for channel ”: seconds elapsed = 122; eventsassigned = 100500481; worker queues filled over
     overrun level = 0; waited due aWorker queue full = 0; waited due the total size = 0; waited at clock conflicts= 1452001865500 waited (count) when Workers occupied = 3211993 waited whenWorkers occupied = 445032386000
     
    我们通过源代码,找到下面一段,该段实现了上述日志的输出。
      if ((my_now
     – rli->mts_last_online_stat)>=
               mts_online_stat_period)
            {
             sql_print_information(“Multi-threadedslave
     statistics%s: “
                                    “seconds
     elapsed = %lu; “
                                    “events
     assigned = %llu; “
                                    “worker
     queues filled over overrun level = %lu;”
                                    “waited
     due a Worker queue full = %lu; “
                                    “waited
     due the total size = %lu; “
                                    “waited
     at clock conflicts = %llu “
                                   “waited(count)
     when Workers occupied = %lu “
                                    “waited
     when Workers occupied = %llu”,
                                    rli->get_for_channel_str(),
                                    static_cast<unsignedlong>
                                    (my_now – rli->mts_last_online_stat),
                                    rli->mts_events_assigned,
                                    rli->mts_wq_overrun_cnt,
                                    rli->mts_wq_overfill_cnt,
                                    rli->wq_size_waits_cnt,
                                    rli->mts_total_wait_overlap,
                                    rli->mts_wq_no_underrun_cnt,
                                    rli->mts_total_wait_worker_avail);
              rli->mts_last_online_stat=my_now;  
     
     通过这一句(my_now
     – rli->mts_last_online_stat),  以及最后一句rli->mts_last_online_stat=my_now;   可以得知,
     seconds elapsed 就是上一次统计跟这一次统计的时间间隔。
    而mts_online_stat_period =120秒,硬代码,因此就是几乎每隔120秒,就有上述日志的输出。
     通过进一步查看原代码,初步了解上述日志信息的含义,如下:
      
    events assigned:总共有多少个event被分配执行,计的是总数。
    worker queues filled over overrun level:多线程同步中,worker 的私有队列长度超长的次数,计的是总数。
    waited due a Worker queue full :因为worker的队列超长而产生等待的次数,计的是总数。
    waited due the total size :超过最大size的次数,这个由参数slave_pending_jobs_size_max  指定。
    waited at clock conflicts :因为逻辑时间产生冲突的等待时间,单位是纳秒。
    waited (count) when Workers occupied :因为workder被占用而出现等待的次数。(总计值)。
    waited when Workersoccupied :因为workder被占用而出现等待的总时间,总计值,单位是纳秒。
     
     
    第三种:page_cleaner线程的输出日志
     

    如图,信息如下:
     
    2016-03-24T17:45:28.005117Z 0 [Note] InnoDB:page_cleaner: 1000ms intended loop took 4750ms.The
     settings might not be optimal. (flushed=1519 and evicted=0, during the time.)
     
    查找源代码,发现是上面的日志由下面一段代码输出:
            if (ret_sleep
     == OS_SYNC_TIME_EXCEEDED) {
                ulint 
     curr_time = ut_time_ms();
     
                if (curr_time
     >next_loop_time + 3000) {
                    if (warn_count
     == 0) {
                        ib::info()
     << “page_cleaner: 1000ms”
                            ” intended
     loop took “
                            <<1000 + curr_time
                               – next_loop_time
                            <<“ms. The
     settings might not”
                            ” be optimal.
     (flushed=”
                            <<n_flushed_last
                            <<” and evicted=”
                            <<n_evicted
                            <<“, during
     the time.)”;
                        if (warn_interval
     >300) {
                            warn_interval= 600;
                        }else {
                            warn_interval*= 2;
                        }
     
                        warn_count= warn_interval;
                    } else {
                        –warn_count;
                    }
                } else {
                    /* reset counter */
                    warn_interval= 1;
                    warn_count= 0;
                }
     
                next_loop_time= curr_time + 1000;
                n_flushed_last= n_evicted = 0;
            }
     
     
    通过分析源代码, 输出上述日志的条件是curr_time> next_loop_time + 3000 ,即比原定的循环时间next_loop_time多3000毫秒,而next_loop_time的标准时间是1000毫秒,即1秒钟做一次刷新页的操作。
    loop took 4750ms ,即是这次刷新循环的实际经历时间
     
    后面还有一个(flushed=1519 and evicted=0,during the time.)这样的日志,即对应n_flushed_lastn_evicted 变量,而这两个变量又由n_flushed_list与n_flushed_lru赋值。
     
    ./storage/innobase/buf/buf0flu.cc:3322:                 n_flushed_last +=n_flushed_list;
    ./storage/innobase/buf/buf0flu.cc:3321:                 n_evicted += n_flushed_lru;
     
     
    而n_flushed_list与n_flushed_lru赋值函数为pc_wait_finished,如下,我们来看看该函数的注释。
    pc_wait_finished(&n_flushed_lru,&n_flushed_list);
     
     
    /**
    Wait until all flush requests are finished.
    @param n_flushed_lru    numberof pages flushed from the end of the LRU list.
    @param n_flushed_list   numberof pages flushed from the end of the
                flush_list.
    @return         trueif all flush_list flushing batch were success. */
    static
    bool
    pc_wait_finished(
        ulint*  n_flushed_lru,
        ulint*  n_flushed_list)
    {
    。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
    }
    通过源代码的注释,我们获知如下信息:
    n_flushed_lru   number of pages flushed from the end of the LRU list.
    n_flushed_lru  这个值表示从lru 列表尾部刷新的页数,也就是日志中如evicted=0 指标的所表示的值,如果该值不为零,则存在innodb buffer不够的嫌疑。
    n_flushed_list  这个是从刷新列表中刷新的页数,也就是脏页数,也就是日志中flushed=1519 的值。
     
     
  • 相关阅读:
    11G-使用跨平台增量备份减少可移动表空间的停机时间 XTTS (Doc ID 1389592.1)
    如何在asm上定位数据块
    log file switch (checkpoint incomplete)
    踩坑记录(git)-误提交target文件夹删除办法
    踩坑记录(java)-双层增强for调用remove(obj)报错 java.util.ConcurrentModificationException(并发修改异常)
    Zookeeper服务端的创建及简单的客户端创建节点
    SpringAOP(注解方式实现面向切面编程)之常用Before、After、Around
    EasyExcel示例(阿里巴巴)基于Maven
    Redis简单命令(部分示例代码)
    sqlserver日志处理不当而造成的隐患
  • 原文地址:https://www.cnblogs.com/DataArt/p/9984671.html
Copyright © 2011-2022 走看看