zoukankan      html  css  js  c++  java
  • Red Hat Enterprise Linux 5

    Issue

    Red Hat Enterprise Linux Server release 5.7 (Tikanga)

    Intermittent nodes in the redhat cluster got evicted with the following messages on the remaining node:

    Sep 10 23:32:09 dl380ceda openais[8293]: [TOTEM] The token was lost in the OPERATIONAL state.
    Sep 10 23:32:09 dl380ceda openais[8293]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
    Sep 10 23:32:09 dl380ceda openais[8293]: [TOTEM] Transmit multicast socket send buffer size (320000 bytes).
    Sep 10 23:32:09 dl380ceda openais[8293]: [TOTEM] entering GATHER state from 2.
    Sep 10 23:32:11 dl380ceda openais[8293]: [TOTEM] entering GATHER state from 0.
    Sep 10 23:32:11 dl380ceda openais[8293]: [TOTEM] Creating commit token because I am the rep.
    Sep 10 23:32:11 dl380ceda openais[8293]: [TOTEM] Storing new sequence id for ring 90
    Sep 10 23:32:11 dl380ceda openais[8293]: [TOTEM] entering COMMIT state.
    Sep 10 23:32:11 dl380ceda openais[8293]: [TOTEM] entering RECOVERY state.
    Sep 10 23:32:11 dl380ceda openais[8293]: [TOTEM] position [0] member 172.19.20.6:
    Sep 10 23:32:11 dl380ceda openais[8293]: [TOTEM] previous ring seq 140 rep 172.19.20.6
    Sep 10 23:32:11 dl380ceda openais[8293]: [TOTEM] aru 84 high delivered 84 received flag 1
    Sep 10 23:32:11 dl380ceda openais[8293]: [TOTEM] Did not need to originate any messages in recovery.
    Sep 10 23:32:11 dl380ceda openais[8293]: [TOTEM] Sending initial ORF token
    Sep 10 23:32:11 dl380ceda openais[8293]: [CLM  ] CLM CONFIGURATION CHANGE
    Sep 10 23:32:11 dl380ceda openais[8293]: [CLM  ] New Configuration:
    Sep 10 23:32:11 dl380ceda openais[8293]: [CLM  ]        r(0) ip(172.19.20.6)
    Sep 10 23:32:11 dl380ceda kernel: dlm: closing connection to node 2
    Sep 10 23:32:11 dl380ceda fenced[8328]: dl380cedbhb not a cluster member after 0 sec post_fail_delay
    Sep 10 23:32:11 dl380ceda openais[8293]: [CLM  ] Members Left:
    Sep 10 23:32:11 dl380ceda fenced[8328]: fencing node "dl380cedbhb"
    Sep 10 23:32:11 dl380ceda openais[8293]: [CLM  ]        r(0) ip(172.19.20.16)
    Sep 10 23:32:11 dl380ceda openais[8293]: [CLM  ] Members Joined:
    Sep 10 23:32:11 dl380ceda openais[8293]: [CLM  ] CLM CONFIGURATION CHANGE
    Sep 10 23:32:11 dl380ceda openais[8293]: [CLM  ] New Configuration:
    Sep 10 23:32:11 dl380ceda openais[8293]: [CLM  ]        r(0) ip(172.19.20.6)
    Sep 10 23:32:11 dl380ceda openais[8293]: [CLM  ] Members Left:
    Sep 10 23:32:11 dl380ceda openais[8293]: [CLM  ] Members Joined:
    Sep 10 23:32:11 dl380ceda openais[8293]: [SYNC ] This node is within the primary component and will provide service.
    Sep 10 23:32:11 dl380ceda openais[8293]: [TOTEM] entering OPERATIONAL state.
    Sep 10 23:32:11 dl380ceda openais[8293]: [CLM  ] got nodejoin message 172.19.20.6
    Sep 10 23:32:11 dl380ceda openais[8293]: [CPG  ] got joinlist message from node 1
    Sep 10 23:32:20 dl380ceda qdiskd[8312]: <notice> Writing eviction notice for node 2
    Sep 10 23:32:21 dl380ceda qdiskd[8312]: <notice> Node 2 evicted
    Sep 10 23:32:26 dl380ceda fenced[8328]: fence "dl380cedbhb" success

    In addition qdisk timeouts are also seen:

    Sep 15 11:58:37 dl380ceda qdiskd[22894]: <warning> qdisk cycle took more than 1 second to complete (2.240000)

    Solution

    Per default the totem token is set to 10 seconds, this is in most cases too short. This is also true for the qdisk cycle, which is per default 1 second.

    Consider the following as a best practice, modify or change the values according to the own wishes or situation.

    qdisk and totem token setting in /etc/cluster/cluster.conf are like:

            <quorumd interval="3" label="rhclquorum" min_score="1" tko="10" votes="1">
                    <heuristic interval="15" program="/bin/ping -c 1 -t 5 <ip-address-network-switch>" tko="10" score="1"/>
            <totem token="75000"/>

    The interval of the qdisk is per default set to 1 second, this interval is also the timeout, in case the storage is very reliable and fast, this could be a proper setting, In case the load on the system is higher (CPU/diskIO/networkIO) it is very easy for the qdisk cycle to take more than 1 second.

    If the qdisk warning happens once per month, it is not an issue, but in case user sees this couple of times per day. There could be a problem with the shared storage or very high load on the server.

    In case this warning happens to much, the end-user should install and run collectl, this will capture system resources/performance info Which will help us to determine if the warnings are caused by a high load on a system resource.

    It is better to use 2 or 3 4 or 5 seconds. This depends on the end-users configurations and the system/network/IO load.

    The tko: This specifies the amount of failed qdisk cycles which are needed before a node is declared down, 10 is the default, 20 is also fine.

    The totem token: this is an important one, this specified how long the token cycle can take place before a node is declared down.

    The default is 10 seconds, which is too short and it is easy with a high load (CPU, diskIO, networkIO) to take longer.

    What value is good: It describes within the qdisk man page of RHEL6, user can't find a description of this in RHEL5.

    But user can use the same RHEL6 formula for RHEL5 which is:

           totem token =   interval * (tko + master_wait + upgrade_wait) + interval

    Below the complete description of the section within the manual page of qdisk:

    3.3.1. Quorum Disk Timings

    Qdiskd should not be used in environments requiring failure detection times of less than approximately 10 seconds.

    Qdiskd will attempt to automatically configure timings based on the totem timeout and the TKO. If configuring manually, Totem's token timeout must be set to a value at least 1 interval greater than the following function:

             interval * (tko + master_wait + upgrade_wait)

    So, if user has an interval of 2, a tko of 7, master_wait of 2 and upgrade_wait of 2, the token timeout should be at least 24 seconds (24000 msec).

    It is recommended to have at least 3 intervals to reduce the risk of quorum loss during heavy I/O load. As a rule of thumb, using a totem timeout more than 2x of qdiskd's timeout will result in good behavior.

    An improper timing configuration will cause CMAN to give up on qdiskd, causing a temporary loss of quorum during master transition.

    So incase the qdisk interval is 1, tko = 10 master_wait = 5  (according to the man pages its tko/2) and upgrade_wait = 2
    the totem token should be at least:   1 * (10 + 5 + 2) + 1   = 18 seconds,   I would set it to 30 seconds
    So incase the qdisk interval is 2, tko = 10 master_wait = 5  ( according to the man pages its tko/2) and upgrade_wait = 2
    the totem token should be at least:   2 * (10 + 5 + 2) + 2   = 36 seconds    I would set it to 50 seconds
    So incase the qdisk interval is 3, tko = 10 master_wait = 5  ( according to the man pages its tko/2) and upgrade_wait = 2
    the totem token should be at least:   3 * (10 + 5 + 2) + 3   = 54 seconds     I would set it to 75 seconds
    So incase the qdisk interval is 4, tko = 10 master_wait = 5  ( according to the man pages its tko/2) and upgrade_wait = 2
    the totem token should be at least:   4 * (10 + 5 + 2) + 4   = 72 seconds     I would set it to 85 seconds
    So incase the qdisk interval is 5, tko = 10 master_wait = 5  ( according to the man pages its tko/2) and upgrade_wait = 2
    the totem token should be at least:   5 * (10 + 5 + 2) + 5   = 90 seconds     I would set it to 100 seconds

    About the heuristic, this is an additional check to see if the node is connected to the network. It is believed that a ping of one sample is enough to check if the network is up and running.

    The intervals of 5, 10 or 15 are good values I would add the tko parameter of 10 or 20 to it, so the node is declared down after 10 or 20 failures of the heuristic. And user can add the -t 5 flag to the ping command to specify a timeout (TTL) of the sample.

    It is very important the server or network device which is used is always online, preferably use a network switch. It is also possible to add more heuristics, for example 3, so when one network device/system fails but the network is still online (the other 2 heuristics work fine) the node is not fenced directly, but only when the failing heuristic score is less than half the max score. (so 2 are failing, in case the scores are equal).

    A description of the parameters can be found within the man pages of qdisk, cman and openais.conf.

    Below considerations when nodes are fencing: (from the cluster admin guide of RHEL6.

    9.10. Fencing Occurs at Random:

    If user finds that a node is being fenced at random, check for the following conditions.

    • The root cause of fences is always a node losing token, meaning that it lost communication with the rest of the cluster and stopped returning heartbeat.

    • Any situation that results in a system not returning heartbeat within the specified token interval could lead to a fence. By default the token interval is 10 seconds. It can be specified by adding the desired value (in milliseconds) to the token parameter of the totem tag in the cluster.conf file (for example, setting totem token=30000 for 30 seconds).

    • Ensure that the network is sound and working as expected.

    • Ensure that exotic bond modes and VLAN tagging are not in use on interfaces that the cluster uses for inter-node communication.

    • Take measures to determine if the system is freezing or kernel panicking. Set up the kdump utility and see if user gets a core during one of these fences.

    • Make sure some situation is not arising that user is wrongly attributing to a fence, for example the quorum disk ejecting a node due to a storage failure or a third party product like Oracle RAC rebooting a node due to some outside condition. The messages logs are often very helpful in determining such problems. Whenever fences or node reboots occur it should be standard practice to inspect the messages logs of all nodes in the cluster from the time the reboot/fence occurred.

    • Thoroughly inspect the system for hardware faults that may lead to the system not responding to heartbeat when expected.


    来源 https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c03045256

  • 相关阅读:
    thinkphp3.2 无法加载模块
    php 使用 wangeditor3 图片上传
    nginx 配置 server
    oracle练手(一)
    Oracle练习(一)
    java运算符优先级
    数据库(mysql和oracle)
    java实现4种内部排序
    mysql-----分库分表
    NIO总结-----Buffer
  • 原文地址:https://www.cnblogs.com/jonathanyue/p/9301145.html
Copyright © 2011-2022 走看看