  • 在Redhat 9下实现双机热备和集群功能

    http://www.5dmail.net/html/2004-11-1/200411193630.htm        linux 双机热备份

    Red hat 9 linux的集群安装比较简单,需要的安装文件有以下几个:

    #rpm -ivh heartbeat-pils-1.0.4-2.rh.9.um.1.i386.rpm
    #rpm -ivh net-snmp-5.0.6-17.i386.rpm
    安装完成之后,开始配置主服务器。配置文件位于/etc/ha.d下,用rpm安装之后不会产生配置文件,需要从/usr/share/doc /heartbeat-1.0.4下,把ha.cf,,,,authkeys,,,,,,,,haresources,,,,三个文件cp到/etc /ha.d下面。
    # There are lots of options in this file. All you have to have is a set
    # of nodes listed {"node ...}
    # and one of {serial, bcast, mcast, or ucast}
    # ATTENTION: As the configuration file is read line by line,
    #     In particular, make sure that the timings and udpport
    #     et al are set before the heartbeat media are defined!
    #     All will be fine if you keep them ordered as in this
    #     example.
    #       Note on logging:
    #       If any of debugfile, logfile and logfacility are defined then they
    #       will be used. If debugfile and/or logfile are not defined and
    #       logfacility is defined then the respective logging and debug
    #       messages will be loged to syslog. If logfacility is not defined
    #       then debugfile and logfile will be used to log messges. If
    #       logfacility is not defined and debugfile and/or logfile are not
    #       defined then defaults will be used for debugfile and logfile as
    #       required and messages will be sent there.
    # File to write debug messages to
    debugfile /var/log/ha-debug                             【heartbeat的debug信息记录文件】
    # File to write other messages to
    logfile /var/log/ha-log                                       【日志文件】
    # Facility to use for syslog()/logger
    logfacility local0                                               【记录日志在syslog中,可选项】
    # A note on specifying "how long" times below...
    # The default time unit is seconds
    #   10 means ten seconds
    # You can also specify them in milliseconds
    #   1500ms means 1.5 seconds
    # keepalive: how long between heartbeats?
    keepalive 3                                                     【每3秒发送一次keeplive消息】
    # deadtime: how long-to-declare-host-dead?
    deadtime 15                                                     【如果15秒没有收到keeplive消息将会认为节点已经失效】
    # warntime: how long before issuing "late heartbeat" warning?
    # See the FAQ for how to use warntime to tune deadtime.
    warntime 10                                                     【在日志中记录最后心跳last heartbeat-best 前的警告时间】
    # Very first dead time (initdead)
    # On some machines/OSes, etc. the network takes a while to come up
    # and start working right after you've been rebooted. As a result
    # we have a separate dead time for when things first come up.
    # It should be at least twice the normal dead time.
    initdead 60                                                         【如果节点的机器重启后,可能需要一些时间启动网络,这个时间与deadtime不一样,要单独对待】
    # nice_failback: determines whether a resource will
    # automatically fail back to its "primary" node, or remain
    # on whatever node is serving it until that node fails.
    # The default is "off", which means that it WILL fail
    # back to the node which is declared as primary in haresources
    # "on" means that resources only move to new nodes when
    # the nodes they are served on die. This is deemed as a
    # "nice" behavior (unless you want to do active-active).
    nice_failback on                                                  【如果主节点失效之后,重新恢复后,不会再成为主节点,只有当当前主节点失效,此节点才可恢复
    # hopfudge maximum hop count minus number of nodes in config
    #hopfudge 1
    # Baud rate for serial ports...
    # (must precede "serial" directives)
    #baud 19200
    # serial serialportname ...
    #serial /dev/ttyS0 # Linux
    #serial /dev/cuaa0 # FreeBSD
    #serial /dev/cua/a # Solaris
    # What UDP port to use for communication?
    #   [used by bcast and ucast]
    #udpport 694
    # What interfaces to broadcast heartbeats over?
    #bcast eth1   # Linux
    #bcast eth1 eth2 # Linux
    #bcast le0   # Solaris
    #bcast le1 le2   # Solaris
    # Set up a multicast heartbeat medium
    # mcast [dev] [mcast group] [port] [ttl] [loop]
    # [dev]   device to send/rcv heartbeats on
    # [mcast group] multicast group to join (class D multicast address
    # -
    # [port]   udp port to sendto/rcvfrom (no reason to differ
    #    from the port used for broadcast heartbeats)
    # [ttl]   the ttl value for outbound heartbeats. This affects
    #    how far the multicast packet will propagate. (1-255)
    # [loop]   toggles loopback for outbound multicast heartbeats.
    #    if enabled, an outbound packet will be looped back and
    #    received by the interface it was sent on. (0 or 1)
    #    This field should always be set to 0.
    mcast eth1 694 1 0                                          【使用组播225.0.0.22,端口694发送keeplive消息】
    # Set up a unicast / udp heartbeat medium
    # ucast [dev] [peer-ip-addr]
    # [dev]   device to send/rcv heartbeats on
    # [peer-ip-addr] IP address of peer to send packets to
    #ucast eth0
    # Watchdog is the watchdog timer. If our own heart doesn't beat for
    # a minute, then our machine will reboot.
    #watchdog /dev/watchdog
    #       "Legacy" STONITH support
    #       Using this directive assumes that there is one stonith
    #       device in the cluster. Parameters to this device are
    #       read from a configuration file. The format of this line is:
    #         stonith <stonith_type> <configfile>
    #       NOTE: it is up to you to maintain this file on each node in the
    #       cluster!
    #stonith baytech /etc/ha.d/conf/stonith.baytech
    #       STONITH support
    #       You can configure multiple stonith devices using this directive.
    #       The format of the line is:
    #         stonith_host <hostfrom> <stonith_type> <params...>
    #         <hostfrom> is the machine the stonith device is attached
    #              to or * to mean it is accessible from any host.
    #         <stonith_type> is the type of stonith device (a list of
    #              supported drives is in /usr/lib/stonith.)
    #         <params...> are driver specific parameters. To see the
    #              format for a particular device, run:
    #           stonith -l -t <stonith_type>
    # Note that if you put your stonith device access information in
    # here, and you make this file publically readable, you're asking
    # for a denial of service attack ;-)
    #stonith_host *     baytech mylogin mysecretpassword
    #stonith_host ken3 rps10 /dev/ttyS1 kathy 0
    #stonith_host kathy rps10 /dev/ttyS1 ken3 0
    # Tell what machines are in the cluster
    # node nodename ... -- must match uname -n
    node rh-9-a                                                【定义节点名称,必须是节点的主机名】
    node rh-9-b
    # Less common options...
    # Treats as a psuedo-cluster-member
    #ping www.163.com www.google.com
    # Started and stopped with heartbeat. Restarted unless it exits
    #     with rc=100
    #respawn userid /path/name/to/run
    # Authentication file. Must be mode 600
    # Must have exactly one auth directive at the front.
    # auth send authentication using this method-id
    # Then, list the method and key that go with that method-id
    # Available methods: crc sha1, md5. Crc doesn't need/want a key.
    # You normally only have one authentication method-id listed in this file
    # Put more than one to make a smooth transition when changing auth
    # methods and/or keys.
    # sha1 is believed to be the "best", md5 next best.
    # crc adds no security, except from packet corruption.
    #   Use only on physically secure networks.
    auth 3                      【指定认证加密方式,3 表示加密方式的行号】
    #1 crc
    #2 sha1 HI!
    3 md5 Hello!              【使用md5加密,密码为hello!】
    # This is a list of resources that move from machine to machine as
    # nodes go down and come up in the cluster. Do not include
    # "administrative" or fixed IP addresses in this file.
    # The haresources files MUST BE IDENTICAL on all nodes of the cluster.
    # The node names listed in front of the resource group information
    # is the name of the preferred node to run the service. It is
    # not necessarily the name of the current machine. If you are running
    # nice_failback OFF then these services will be started
    # up on the preferred nodes - any time they're up.
    # If you are running with nice_failback ON, then the node information
    # will be used in the case of a simultaneous start-up.
    # BUT FOR ALL OF THESE CASES, the haresources files MUST BE IDENTICAL.
    # If your files are different then almost certainly something
    # won't work right.
    # We refer to this file when we're coming up, and when a machine is being
    # taken over after going down.
    # You need to make this right for your installation, then install it in
    # /etc/ha.d
    # Each logical line in the file constitutes a "resource group".
    # A resource group is a list of resources which move together from
    # one node to another - in the order listed. It is assumed that there
    # is no relationship between different resource groups. These
    # resource in a resource group are started left-to-right, and stopped
    # right-to-left. Long lists of resources can be continued from line
    # to line by ending the lines with backslashes ("\").
    # These resources in this file are either IP addresses, or the name
    # of scripts to run to "start" or "stop" the given resource.
    # The format is like this:
    #node-name resource1 resource2 ... resourceN
    # If the resource name contains an :: in the middle of it, the
    # part after the :: is passed to the resource script as an argument.
    #       Multiple arguments are separated by the :: delimeter
    # In the case of IP addresses, the resource script name IPaddr is
    # implied.
    # For example, the IP address could also be represented
    # as IPaddr::
    # THIS IS IMPORTANT!!     vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
    # The given IP address is directed to an interface which has a route
    # to the given address. This means you have to have a net route
    # set up outside of the High-Availability structure. We don't set it
    # up here -- we key off of it.
    # The broadcast address for the IP alias that is created to support
    # an IP address defaults to the highest address on the subnet.
    # The netmask for the IP alias that is created defaults to the same
    # netmask as the route that it selected in in the step above.
    # The base interface for the IPalias that is created defaults to the
    # same netmask as the route that it selected in in the step above.
    # If you want to specify that this IP address is to be brought up
    # on a subnet with a netmask of, you would specify
    # this as IPaddr:: .
    # If you wished to tell it that the broadcast address for this subnet
    # was, then you would specify that this way:
    #   IPaddr::
    # If you wished to tell it that the interface to add the address to
    # is eth0, then you would need to specify it this way:
    #   IPaddr::
    #       And this way to specify both the broadcast address and the
    #       interface:
    #   IPaddr::
    # The IP addresses you list in this file are called "service" addresses,
    # since they're they're the publicly advertised addresses that clients
    # use to get at highly available services.
    # For a hot/standby (non load-sharing) 2-node system with only
    # a single service address,
    # you will probably only put one system name and one IP address in here.
    # The name you give the address to is the name of the default "hot"
    # system.
    # Where the nodename is the name of the node which "normally" owns the
    # resource. If this machine is up, it will always have the resource
    # it is shown as owning.
    # The string you put in for nodename must match the uname -n name
    # of your machine. Depending on how you have it administered, it could
    # be a short name or a FQDN.
    # Simple case: One service address, default subnet and netmask
    #   No servers that go up and down with the IP address
    # Assuming the adminstrative addresses are on the same subnet...
    # A little more complex case: One service address, default subnet
    # and netmask, and you want to start and stop http when you get
    # the IP address...
    #just.linux-ha.org http
    # A little more complex case: Three service addresses, default subnet
    # and netmask, and you want to start and stop http when you get
    # the IP address...
    #just.linux-ha.org httpd
    # One service address, with the subnet, interface and bcast addr
    #       explicitly defined.
    #just.linux-ha.org httpd
    #       An example where a shared filesystem is to be used.
    #       Note that multiple aguments are passed to this script using
    #       the delimiter '::' to separate each argument.
    rh-9-a                              【定义主节点使用的公网IP,掩码和接口名称】
    # Regarding the node-names in this file:
    # They must match the names of the nodes listed in ha.cf, which in turn
    # must match the `uname -n` of some node in the cluster. So they aren't
    # virtual in any sense of the word.
    /etc/rc.d/init.d/heartbeat start [stop|restart]
