zoukankan      html  css  js  c++  java
  • mcelog用法详解

    手动启动mcelog方法:

    # mcelog --daemon

    Run mcelog in daemon mode, waiting for errors from the kernel.

    后台服务启动mcelog:

    RHEL 7:

        systemctl start mcelog

        systemctl enable mcelog

    RHEL 6:

        service mcelogd start

        chkconfig mcelogd on

    查看mcelog日志:

    # vim /var/log/mcelog

    查看mcelog守护进程是否检测到错误信息:

    # mcelog --client

    Query a currently running mcelog daemon for errors

    解析系统异常时的mce输出:

    # mcelog --ascii < file.log

    or:

    # mcelog --ascii --file file.log

    Decode machine check ASCII output from kernel logs

        异常输出内容示例如下:

    [Hardware Error]: CPU 12: Machine Check Exception: 5 Bank 22: be200000000c110a

    [Hardware Error]: RIP !INEXACT! 10:<ffffffff81014527> {mwait_idle+0x77/0xd0}

    [Hardware Error]: TSC 103e7072fa77de ADDR c5f17ee00 MISC b0fe435602184086 

    [Hardware Error]: PROCESSOR 0:306e4 TIME 1462390781 SOCKET 0 APIC 1

    [Hardware Error]: Run the above through 'mcelog --ascii'

       file.log内容要去掉前面的“[Hardware Error]: ”:

    CPU 12: Machine Check Exception: 5 Bank 22: be200000000c110a

    RIP !INEXACT! 10:<ffffffff81014527> {mwait_idle+0x77/0xd0}

    TSC 103e7072fa77de ADDR c5f17ee00 MISC b0fe435602184086 

    PROCESSOR 0:306e4 TIME 1462390781 SOCKET 0 APIC 1

    mcelog logs and accounts machine checks (in particular memory, IO, and CPU hardware errors) on modern x86 Linux systems.

    mcelog is required by both 32bit x86 Linux kernels (since 2.6.30) and 64bit Linux kernels (since early 2.6 kernel releases) to log machine checks and should run on all Linux systems that need error handling.

    The mcelog daemon accounts memory and some other errors errors in various waysmcelog --client can be used to query a running daemon. The daemon can also execute triggers when configurable error thresholds are exceeded. This is used to implement a range of automatic predictive failure analysis algorithms: including bad page offlining and automatic cache error handling. User defined actions can be also configured.

    All errors are logged to /var/log/mcelog or syslog or the journal.

    For memory errors it supports modern x86 systems with integrated memory controllers; for CPU errors all modern x86 systems are supported.

    Traditionally mcelog was run as a cronjob, but this usage is deprecated now. The modern way to run it is to start it at boot up time and run it always as a daemon. In addition it can be used to decode fatal machine checks on the command line (but this is also usually not needed anymore on modern kernels which log those after reboot automatically)

    For installation information and how to set up a mcelog package (if you're a distributor) please see the README.

    mcelog.conf reference

    mcelog is configured through the /etc/mcelog.conf configuration file.

    General format

    optionname = value

    white space is not allowed in value currently, except at the end where it is dropped

    In general all command line options that are not commands work here. See man mcelog or mcelog --help for a list. e.g. to enable the --no-syslog option use

    no-syslog = yes (or no to disable)

    when the option has a argument

    logfile = /tmp/logfile

    below are the options which are not command line options.

    Set cpu type for which mcelog decodes events:

    cpu = type

    For valid values for type please see mcelog --help. If this value is set incorrectly the decoded output will be likely incorrect. By default when this parameter is not set mcelog uses the CPU it is running on on very new kernels the mcelog events reported by the kernel also carry the CPU type which is used too when available and not overriden.

    Enable daemon mode:

    daemon = yes

    By default mcelog just processes the currently pending events and exits. In daemon mode it will keep running as a daemon in the background and poll the kernel for events and then decode them.

    Filter out known broken events by default.

    filter = yes

    Don't log memory errors individually. They still get accounted if that is enabled.

    filter-memory-errors = yes

    Output in undecoded raw format to be easier machine readable (default is decoded).

    raw = yes

    Set cpu mhz to decode uptime from time stamp counter (output unreliable, not needed on new kernels which report the event time directly. A lot of systems don't have a linear time stamp clock and the output is wrong then. Normally mcelog tries to figure out if it the TSC is reliable and only uses the current frequency then. Setting a frequency forces timestamp decoding. This setting is obsolete with modern kernels which report the time directly.

    cpumhz = 1800.00

    Log output options Log decoded machine checks in syslog (default stdout or syslog for daemon)

    syslog = yes

    Log decoded machine checks in syslog with error level

    syslog-error = yes

    Never log anything to syslog

    no-syslog = yes

    Append log output to logfile instead of stdout. Only when no syslog logging is active

    logfile = filename

    Use smbios information to decode dimms (needs root). This function is not recommended to use right now and generally not needed. The exception is memdb prepopulation, which is configured separately below.

    dmi = no

    When in daemon mode run as this user after set up. Note that the triggers will run as this user too. Setting this to non root will mean that triggers cannot take some corrective action, like offlining objects.

    run-credentials-user = root

    Group to run as daemon with default to the group of the run-credentials-user

    run-credentials-group = nobody

    The [server] config section

    User allowed to access client socket. when set to * match any root is always allowed to access. default: root only

    client-user = root

    group allowed to access mcelog When no group is configured any group matches (but still user checking). when set to * match any

    client-group = root

    Path to the unix socket for client<->server communication. When no socket-path is configured the server will not start

    socket-path = /var/run/mcelog-client

    When mcelog starts it checks if a server is already running. This configures the timeout for this check.

    initial-ping-timeout = 2

    The [dimm] config section

    Is the in memory dimm error tracking enabled? Only works on systems with integrated memory controller and which are supported. Only takes effect in daemon mode.

    dimm-tracking-enabled = yes

    Use DMI information from the BIOS to prepopulate DIMM database. Note this might not work with all BIOS and requires mcelog to run as root. Alternative is to let mcelog create DIMM objects on demand.

    dmi-prepopulate = yes

    Execute these triggers when the rate of corrected or uncorrected Errors per DIMM exceeds the threshold. Note when the hardware does not report DIMMs this might also be per channel. The default of 10/24h is reasonable for server quality DDR3 DIMMs as of 2009/10.

    uc-error-trigger = dimm-error-trigger

    uc-error-threshold = 1 / 24h

    ce-error-trigger = dimm-error-trigger

    ce-error-threshold = 10 / 24h

    The [socket] config section

    Enable memory error accounting per socket.

    socket-tracking-enabled = yes

    Threshold and trigger for uncorrected memory errors on a socket.

    mem-uc-error-trigger = socket-memory-error-trigger

    mem-uc-error-threshold = 100 / 24h

    Trigger script for corrected memory errors on a socket.

    mem-ce-error-trigger = socket-memory-error-trigger

    Threshold on when to trigger a correct error for the socket.

    mem-ce-error-threshold = 100 / 24h

    log socket error threshold explicitely?

    mem-ce-error-log = yes

    Trigger script for uncorrected bus error events

    bus-uc-threshold-trigger = bus-error-trigger

    Trigger script for uncorrected iomca erors

    iomca-threshold-trigger = iomca-error-trigger

    Trigger script for other uncategorized errors

    unknown-threshold-trigger = unknown-error-trigger

    The [cache] config section

    Processing of cache error thresholds reported by intel cpus.

    cache-threshold-trigger = cache-error-trigger

    Should cache threshold events be logged explicitely?

    cache-threshold-log = yes

    The [page] config section

    Memory error accouting per 4k memory page. Threshold for the correct memory errors trigger script.

    memory-ce-threshold = 10 / 24h

    Trigger script for corrected errors.

    memory-ce-trigger = page-error-trigger

    Should page threshold events be logged explicitely?

    memory-ce-log = yes

    Specify the internal action in mcelog to exceeding a page error threshold this is done in addition to executing the trigger script if available

    memory-ce-action = off|account|soft|hard|soft-then-hard

    memory-ce-action = soft

    off no action
    account only account errors
    soft try to soft-offline page without killing any processes
      This requires an uptodate kernel. Might not be successfull.
    hard try to hard-offline page by killing processes
      Requires an uptodate kernel. Might not be successfull.
    soft-then-hard First try to soft offline, then try hard offlining

    The [trigger] config section

    Maximum number of running triggers

    children-max = 2

    execute triggers in this directory

    directory = /etc/mcelog

     更详细的信息: http://www.mcelog.org

    Overview
    Download
    Installation
    Configuration
    Triggers
    FAQ
    Manpage
    Glossary

  • 相关阅读:
    Akka框架使用注意点
    log4j配置文件加载
    iptables常规使用
    linux ipv6临时地址
    组合数取模Lucas定理及快速幂取模
    Shell变量的定义与赋值操作注意事项
    虚拟机软件bochs编译使用问题
    实现一个简陋操作系统的相关笔记日志
    linux内核增加系统调用--Beginner's guide
    c语言几种异常
  • 原文地址:https://www.cnblogs.com/DataArt/p/10374165.html
Copyright © 2011-2022 走看看