zoukankan      html  css  js  c++  java
  • PGPool-II+PG流复制实现HA主备切换

    基于PGPool的双机集群如下图所示:pg主节点和备节点实现流复制热备,pgpool1,pgpool2作为中间件,将主备pg节点加入集群,实现读写分离,负载均衡和HA故障自动切换。两pgpool节点可以委托一个虚拟ip节点作为应用程序访问的地址,两节点之间通过watchdog进行监控,当pgpool1宕机时,pgpool2会自动接管虚拟ip继续对外提供不间断服务。
     
    1.	主机规划
    192.168.20.201 redis01
    192.168.20.202 redis02
    192.168.20.203 vip
    2.	配置主机ssh互信
    ssh-keygen -t rsa
    ssh-copy-id -i ~/.ssh/id_rsa.pub   redis02
    
    3.	安装pgpool
    wget http://www.pgpool.net/mediawiki/images/pgpool-II-4.1.1.tar.gz
    [postgres@redis01 ~]$ tar -zxf pgpool-II-4.1.0.tar.gz
    ./configure  --prefix=/u01/pgpool --with-pgsql=/u01/pgsql
    make && make install
    4.	配置pgpool
    4.1配置环境变量
    pgpool装在了postgres账户下,在该账户中添加环境变量,master,slave节点都执行。
    [postgres@redis01 ~]$ cat .bash_profile 
    # .bash_profile
    export PGHOME=/u01/pgsql  
    export PGDATA=/u01/pgsql/data        
    export PGPOOLHOME=/u01/pgpool
    export PGPASSFILE=/u01/pgsql/data/.pgpass
    export PATH=$PATH:$HOME/bin:$PGHOME/bin:$PGPOOLHOME/bin
    export LD_LIBRARY_PATH=$PGHOME/lib:$PGPOOLHOME/lib
    4.2配置pool_hba.conf
    pool_hba.conf是对登录用户进行验证的,要和pg的pg_hba.conf保持一致,要么都是trust,要么都是md5验证方式,这里采用了md5验证方式如下设置:
    # IPv4 local connections:
    host    all             all             127.0.0.1/32            trust
    host    postgres        postgres        192.168.20.0/24       md5
    host    postgres        rep             192.168.20.0/24       md5
    host    all         all         0.0.0.0/0             md5
    
    # IPv6 local connections:
    host    all             all             ::1/128                 trust
    # Allow replication connections from localhost, by a user with the
    # replication privilege.
    local   replication     all                                     trust
    host    replication     all             127.0.0.1/32            trust
    host    replication     rep         192.168.20.0/24       md5
    host    replication     all             ::1/128                 trust
    4.3配置pcp.conf
    pcp.conf配置用于pgpool自己登陆管理使用的,一些操作pgpool的工具会要求提供密码等,配置如下:
    pg_md5 posgres
    [postgres@redis01 etc]$ cat pcp.conf
    # USERID:MD5PASSWD
    postgres:e8a48653851e28c69d0506508fb27fc5
    #pcp.conf是pgpool管理器自己的用户名和密码,用于管理集群
    pg_md5 -p -m -u postgres pool_passwd
    #数据库登录用户是postgres,这里输入登录密码,不能出错
    #输入密码后,在pgpool/etc目录下会生成一个pool_passwd文件
    4.4配置系统命令权限
    配置 ifconfig, arping 执行权限 ,执行failover_stream.sh需要用到,可以让其他普通用户执行。
    chmod u+s /sbin/ip
    chmod u+s /usr/sbin/arping
    4.5配置pgpool.conf
    后台连接设置
    # - Backend Connection Settings -
    backend_hostname0 = 'redis01'
    backend_port0 = 5432
    backend_weight0 = 1
    backend_data_directory0 = '/u01/pgsql/data'
    backend_flag0 = 'ALLOW_TO_FAILOVER'
    backend_application_name0 = 'server1'
    
    backend_hostname1 = 'redis02'
    backend_port1 = 5432
    backend_weight1 = 1
    backend_data_directory1 = '/u01/pgsql/data'
    backend_flag1 = 'ALLOW_TO_FAILOVER'
    backend_application_name1 = 'server2'
    负载均衡
    load_balance_mode = on
    postgres流复制
    master_slave_mode = on
    master_slave_sub_mode = 'stream'
    健康检查设置
    #------------------------------------------------------------------------------
    # HEALTH CHECK GLOBAL PARAMETERS
    #------------------------------------------------------------------------------
    health_check_period = 10
    health_check_timeout = 20
    health_check_user = 'postgres'
    health_check_password = 'postgres'
    health_check_database = ''
    health_check_max_retries = 0
    health_check_retry_delay = 1
    connect_timeout = 10000
    灾备切换设置
    failover_command = '/u01/pgpool/failover_stream.sh %H %h'
                                       # Executes this command at failover
                                       # Special values:
                                       #   %d = failed node id
                                       #   %h = failed node host name
                                       #   %p = failed node port number
                                       #   %D = failed node database cluster path
                                       #   %m = new master node id
                                       #   %H = new master node hostname
                                       #   %M = old master node id
                                       #   %P = old primary node id
                                       #   %r = new master port number
                                       #   %R = new master database cluster path
                                       #   %N = old primary node hostname
                                       #   %S = old primary node port number
                                       #   %% = '%' character
    
    开启看门狗
    #------------------------------------------------------------------------------
    # WATCHDOG
    #------------------------------------------------------------------------------
    
    # - Enabling -
    use_watchdog = on
    # -Connection to up stream servers -
    trusted_servers = '192.168.20.201,192.168.20.202'
    ping_path = '/bin'
    # - Watchdog communication Settings -
    wd_hostname = 'redis01'
    wd_port = 9000
    wd_priority = 1
    wd_authkey = ''
    wd_ipc_socket_dir = '/tmp'
    
    配置vip
    # - Virtual IP control Setting -
    
    delegate_IP = '192.168.20.205'
    if_cmd_path = '/sbin'
    if_up_cmd = ' /sbin/ip addr add $_IP_$/24 dev eth2 label eth2:0'
    if_down_cmd = '/sbin/ip addr del $_IP_$/24 dev eth2'
    arping_path = '/usr/sbin'
    arping_cmd = '/usr/sbin/arping -U $_IP_$ -w 1 -I eth2'
    
    其他pgpool连接设置
    # - Other pgpool Connection Settings -
    other_pgpool_hostname0 = 'redis02'
    other_pgpool_port0 =9999 
    other_wd_port0 = 9000
    4.6配置failover_stream.sh脚本
    [postgres@redis01 pgpool]$ cat failover_stream.sh 
    #! /bin/sh 
    # Failover command for streaming replication. 
    # Arguments: $1: new master hostname. 
    
    new_master=$1 
    old_master=$2
    RECOVERYCONF=$PGDATA/recovery.conf
    touch_command="touch $RECOVERYCONF"
    trigger_command="$PGHOME/bin/pg_ctl promote -D $PGDATA" 
    delrecovery_command="rm -f  $PGDATA/recovery.done && rm -f  $PGDATA/recovery.conf"
    echo1_cmd="echo -e  " primary_conninfo = 'host=$new_master port=5432  user=rep password=rep passfile=''/u01/pgsql/data/.pgpass''' " >> $PGDATA/recovery.conf"
    echo2_cmd="echo -e " recovery_target_timeline = 'latest' "  >> $PGDATA/recovery.conf"
    echo3_cmd="echo -e " standby_mode = 'on' "  >> $PGDATA/recovery.conf"
    
    
    
    # Prompte standby database. 
    /usr/bin/ssh -T $new_master $trigger_command 
    # create recovery.conf 
    /usr/bin/ssh -T $old_master $delrecovery_command
    /usr/bin/ssh -T $old_master  $touch_command
    /usr/bin/ssh -T $old_master  $echo3_cmd
    /usr/bin/ssh -T $old_master  $echo1_cmd
    /usr/bin/ssh -T $old_master  $echo2_cmd
    exit 0;
    检测到postgresql down机以后把standby转成primary模式可读写,在down机的主机生成recovery.conf,down机的postgresql启动起来以后自动转成standby。
    
    
    
    
    4.7启动pgpool
    启动pgpool以前先启动postgresql, postgresql已配置好流复制
    启动postgressql
    pg_ctl start
    查看流复制
    [postgres@redis01 etc]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;'
      client_addr   | application_name | sync_state 
    ----------------+------------------+------------
     192.168.20.202 | walreceiver      | sync
    启动pgpool
    pgpool -n  -D  > /u01/pgpool/log/pgpool.log 2>&1 &
    查看pgpool节点
    [postgres@redis01 etc]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes'
    node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
    ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
     0       | redis01  | 5432 | up     | 0.500000  | primary | 0          | true              | 0                 |                   |                        | 2020-02-23 22:04:41
     1       | redis02  | 5432 | up     | 0.500000  | standby | 0          | false             | 0                 |                   |                        | 2020-02-23 22:04:41
    
    查看vip
    [postgres@redis01 ~]$ ifconfig -a
    eth1      Link encap:Ethernet  HWaddr 08:00:27:E9:92:7B  
              inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fee9:927b/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:11 errors:0 dropped:0 overruns:0 frame:0
              TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:1921 (1.8 KiB)  TX bytes:1968 (1.9 KiB)
    
    eth2      Link encap:Ethernet  HWaddr 08:00:27:CF:77:EB  
              inet addr:192.168.20.201  Bcast:192.168.20.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fecf:77eb/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:32488 errors:0 dropped:0 overruns:0 frame:0
              TX packets:31045 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:3737971 (3.5 MiB)  TX bytes:3848610 (3.6 MiB)
    
    eth2:0    Link encap:Ethernet  HWaddr 08:00:27:CF:77:EB  
              inet addr:192.168.20.205  Bcast:0.0.0.0  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
    
    5.	演示postgresql down机
    
    [postgres@redis01 ~]$ pg_ctl stop -m fast
    waiting for server to shut down..... done
    server stopped
    
    redis01 postgressql日志
    2020-02-23 22:45:21.402 CST [2577] LOG:  received fast shutdown request
    2020-02-23 22:45:21.682 CST [2577] LOG:  aborting any active transactions
    2020-02-23 22:45:21.682 CST [2577] LOG:  background worker "logical replication launcher" (PID 2586) exited with exit code 1
    2020-02-23 22:45:21.683 CST [2580] LOG:  shutting down
    2020-02-23 22:45:23.174 CST [2577] LOG:  database system is shut down
    
    redis02 postgressql日志
    2020-02-23 22:45:23.174 CST [12119] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
    		Is the server running on host "redis01" (192.168.20.201) and accepting
    		TCP/IP connections on port 5432?
    2020-02-23 22:45:28.180 CST [12126] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
    		Is the server running on host "redis01" (192.168.20.201) and accepting
    		TCP/IP connections on port 5432?
    2020-02-23 22:45:31.395 CST [3584] LOG:  received promote request
    2020-02-23 22:45:31.395 CST [3584] LOG:  redo done at 0/26000028
    2020-02-23 22:45:31.405 CST [3584] LOG:  selected new timeline ID: 12
    2020-02-23 22:45:32.389 CST [3584] LOG:  archive recovery complete
    2020-02-23 22:45:32.402 CST [3582] LOG:  database system is ready to accept connections
    
    查看主备
    [postgres@redis02 ~]$ pg_controldata|grep cluster
    Database cluster state:               in production
    [postgres@redis02 ~]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;'
     client_addr | application_name | sync_state 
    -------------+------------------+------------
    (0 rows)
    
    Redis02主机上的postgressql数据库直接从standby转成primary
    
    查看pgpool集群
    [postgres@redis01 pgpool]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes'
     node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
    ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
     0       | redis01  | 5432 | down   | 0.500000  | standby | 0          | false             | 0                 |                   |                        | 2020-02-23 22:45:33
     1       | redis02  | 5432 | up     | 0.500000  | primary | 0          | true              | 0                 |                   |                        | 2020-02-23 22:45:33
    (2 rows)
    从pgpool 来看也是把Redis02主机上的postgressql数据库直接从standby转成primary
    
    启动down机的主节点;启动起来以后转成standby
    [postgres@redis01 pgpool]$ pg_ctl start
    waiting for server to start....2020-02-23 22:58:00.891 CST [9600] LOG:  listening on IPv4 address "192.168.20.201", port 5432
    2020-02-23 22:58:00.897 CST [9600] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
    2020-02-23 22:58:00.912 CST [9600] LOG:  redirecting log output to logging collector process
    2020-02-23 22:58:00.912 CST [9600] HINT:  Future log output will appear in directory "log".
     done
    server started
    
    [postgres@redis02 ~]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;'
      client_addr   | application_name | sync_state 
    ----------------+------------------+------------
     192.168.20.201 | walreceiver      | sync
    (1 row)
    
    down机的节点加入pgpool集群
    [postgres@redis01 pgpool]$ pcp_attach_node -d -U postgres -h 192.168.20.205 -p 9898 -n 0
    Password: 
    DEBUG: recv: tos="m", len=8
    DEBUG: recv: tos="r", len=21
    DEBUG: send: tos="C", len=6
    DEBUG: recv: tos="c", len=20
    pcp_attach_node -- Command Successful
    DEBUG: send: tos="X", len=4
    [postgres@redis01 pgpool]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes'
     node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
    ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
     0       | redis01  | 5432 | up     | 0.500000  | standby | 0          | true              | 0                 |                   |                        | 2020-02-23 23:00:11
     1       | redis02  | 5432 | up     | 0.500000  | primary | 0          | false             | 0                 |                   |                        | 2020-02-23 22:45:33
    (2 rows)
    
    到此灾备演练完毕。
    

      

  • 相关阅读:
    Windows Azure Cloud Service (14) 使用Windows Azure诊断收集日志记录数据
    Windows Azure Cloud Service (13) 用Visual Studio 2010 将应用程序部署到Windows Azure平台
    Windows Azure Cloud Service (15) 多个VM Instance场景下如何处理ASP.NET Session
    Windows Azure Storage (5) Windows Azure Drive
    Windows Azure Storage (7) 使用工具管理Windows Azure Storage
    SQL Azure(二) SQL Azure vs SQL Server
    webbrowser的自动提交
    提取视频的背景声音的软件
    Listview列排序的bug原因
    两个奇怪的问题
  • 原文地址:https://www.cnblogs.com/omsql/p/12354992.html
Copyright © 2011-2022 走看看