zoukankan      html  css  js  c++  java
  • PGPool-II+PG流复制实现HA主备切换

    基于PGPool的双机集群如下图所示:pg主节点和备节点实现流复制热备,pgpool1,pgpool2作为中间件,将主备pg节点加入集群,实现读写分离,负载均衡和HA故障自动切换。两pgpool节点可以委托一个虚拟ip节点作为应用程序访问的地址,两节点之间通过watchdog进行监控,当pgpool1宕机时,pgpool2会自动接管虚拟ip继续对外提供不间断服务。
     
    1.	主机规划
    192.168.20.201 redis01
    192.168.20.202 redis02
    192.168.20.203 vip
    2.	配置主机ssh互信
    ssh-keygen -t rsa
    ssh-copy-id -i ~/.ssh/id_rsa.pub   redis02
    
    3.	安装pgpool
    wget http://www.pgpool.net/mediawiki/images/pgpool-II-4.1.1.tar.gz
    [postgres@redis01 ~]$ tar -zxf pgpool-II-4.1.0.tar.gz
    ./configure  --prefix=/u01/pgpool --with-pgsql=/u01/pgsql
    make && make install
    4.	配置pgpool
    4.1配置环境变量
    pgpool装在了postgres账户下,在该账户中添加环境变量,master,slave节点都执行。
    [postgres@redis01 ~]$ cat .bash_profile 
    # .bash_profile
    export PGHOME=/u01/pgsql  
    export PGDATA=/u01/pgsql/data        
    export PGPOOLHOME=/u01/pgpool
    export PGPASSFILE=/u01/pgsql/data/.pgpass
    export PATH=$PATH:$HOME/bin:$PGHOME/bin:$PGPOOLHOME/bin
    export LD_LIBRARY_PATH=$PGHOME/lib:$PGPOOLHOME/lib
    4.2配置pool_hba.conf
    pool_hba.conf是对登录用户进行验证的,要和pg的pg_hba.conf保持一致,要么都是trust,要么都是md5验证方式,这里采用了md5验证方式如下设置:
    # IPv4 local connections:
    host    all             all             127.0.0.1/32            trust
    host    postgres        postgres        192.168.20.0/24       md5
    host    postgres        rep             192.168.20.0/24       md5
    host    all         all         0.0.0.0/0             md5
    
    # IPv6 local connections:
    host    all             all             ::1/128                 trust
    # Allow replication connections from localhost, by a user with the
    # replication privilege.
    local   replication     all                                     trust
    host    replication     all             127.0.0.1/32            trust
    host    replication     rep         192.168.20.0/24       md5
    host    replication     all             ::1/128                 trust
    4.3配置pcp.conf
    pcp.conf配置用于pgpool自己登陆管理使用的,一些操作pgpool的工具会要求提供密码等,配置如下:
    pg_md5 posgres
    [postgres@redis01 etc]$ cat pcp.conf
    # USERID:MD5PASSWD
    postgres:e8a48653851e28c69d0506508fb27fc5
    #pcp.conf是pgpool管理器自己的用户名和密码,用于管理集群
    pg_md5 -p -m -u postgres pool_passwd
    #数据库登录用户是postgres,这里输入登录密码,不能出错
    #输入密码后,在pgpool/etc目录下会生成一个pool_passwd文件
    4.4配置系统命令权限
    配置 ifconfig, arping 执行权限 ,执行failover_stream.sh需要用到,可以让其他普通用户执行。
    chmod u+s /sbin/ip
    chmod u+s /usr/sbin/arping
    4.5配置pgpool.conf
    后台连接设置
    # - Backend Connection Settings -
    backend_hostname0 = 'redis01'
    backend_port0 = 5432
    backend_weight0 = 1
    backend_data_directory0 = '/u01/pgsql/data'
    backend_flag0 = 'ALLOW_TO_FAILOVER'
    backend_application_name0 = 'server1'
    
    backend_hostname1 = 'redis02'
    backend_port1 = 5432
    backend_weight1 = 1
    backend_data_directory1 = '/u01/pgsql/data'
    backend_flag1 = 'ALLOW_TO_FAILOVER'
    backend_application_name1 = 'server2'
    负载均衡
    load_balance_mode = on
    postgres流复制
    master_slave_mode = on
    master_slave_sub_mode = 'stream'
    健康检查设置
    #------------------------------------------------------------------------------
    # HEALTH CHECK GLOBAL PARAMETERS
    #------------------------------------------------------------------------------
    health_check_period = 10
    health_check_timeout = 20
    health_check_user = 'postgres'
    health_check_password = 'postgres'
    health_check_database = ''
    health_check_max_retries = 0
    health_check_retry_delay = 1
    connect_timeout = 10000
    灾备切换设置
    failover_command = '/u01/pgpool/failover_stream.sh %H %h'
                                       # Executes this command at failover
                                       # Special values:
                                       #   %d = failed node id
                                       #   %h = failed node host name
                                       #   %p = failed node port number
                                       #   %D = failed node database cluster path
                                       #   %m = new master node id
                                       #   %H = new master node hostname
                                       #   %M = old master node id
                                       #   %P = old primary node id
                                       #   %r = new master port number
                                       #   %R = new master database cluster path
                                       #   %N = old primary node hostname
                                       #   %S = old primary node port number
                                       #   %% = '%' character
    
    开启看门狗
    #------------------------------------------------------------------------------
    # WATCHDOG
    #------------------------------------------------------------------------------
    
    # - Enabling -
    use_watchdog = on
    # -Connection to up stream servers -
    trusted_servers = '192.168.20.201,192.168.20.202'
    ping_path = '/bin'
    # - Watchdog communication Settings -
    wd_hostname = 'redis01'
    wd_port = 9000
    wd_priority = 1
    wd_authkey = ''
    wd_ipc_socket_dir = '/tmp'
    
    配置vip
    # - Virtual IP control Setting -
    
    delegate_IP = '192.168.20.205'
    if_cmd_path = '/sbin'
    if_up_cmd = ' /sbin/ip addr add $_IP_$/24 dev eth2 label eth2:0'
    if_down_cmd = '/sbin/ip addr del $_IP_$/24 dev eth2'
    arping_path = '/usr/sbin'
    arping_cmd = '/usr/sbin/arping -U $_IP_$ -w 1 -I eth2'
    
    其他pgpool连接设置
    # - Other pgpool Connection Settings -
    other_pgpool_hostname0 = 'redis02'
    other_pgpool_port0 =9999 
    other_wd_port0 = 9000
    4.6配置failover_stream.sh脚本
    [postgres@redis01 pgpool]$ cat failover_stream.sh 
    #! /bin/sh 
    # Failover command for streaming replication. 
    # Arguments: $1: new master hostname. 
    
    new_master=$1 
    old_master=$2
    RECOVERYCONF=$PGDATA/recovery.conf
    touch_command="touch $RECOVERYCONF"
    trigger_command="$PGHOME/bin/pg_ctl promote -D $PGDATA" 
    delrecovery_command="rm -f  $PGDATA/recovery.done && rm -f  $PGDATA/recovery.conf"
    echo1_cmd="echo -e  " primary_conninfo = 'host=$new_master port=5432  user=rep password=rep passfile=''/u01/pgsql/data/.pgpass''' " >> $PGDATA/recovery.conf"
    echo2_cmd="echo -e " recovery_target_timeline = 'latest' "  >> $PGDATA/recovery.conf"
    echo3_cmd="echo -e " standby_mode = 'on' "  >> $PGDATA/recovery.conf"
    
    
    
    # Prompte standby database. 
    /usr/bin/ssh -T $new_master $trigger_command 
    # create recovery.conf 
    /usr/bin/ssh -T $old_master $delrecovery_command
    /usr/bin/ssh -T $old_master  $touch_command
    /usr/bin/ssh -T $old_master  $echo3_cmd
    /usr/bin/ssh -T $old_master  $echo1_cmd
    /usr/bin/ssh -T $old_master  $echo2_cmd
    exit 0;
    检测到postgresql down机以后把standby转成primary模式可读写,在down机的主机生成recovery.conf,down机的postgresql启动起来以后自动转成standby。
    
    
    
    
    4.7启动pgpool
    启动pgpool以前先启动postgresql, postgresql已配置好流复制
    启动postgressql
    pg_ctl start
    查看流复制
    [postgres@redis01 etc]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;'
      client_addr   | application_name | sync_state 
    ----------------+------------------+------------
     192.168.20.202 | walreceiver      | sync
    启动pgpool
    pgpool -n  -D  > /u01/pgpool/log/pgpool.log 2>&1 &
    查看pgpool节点
    [postgres@redis01 etc]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes'
    node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
    ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
     0       | redis01  | 5432 | up     | 0.500000  | primary | 0          | true              | 0                 |                   |                        | 2020-02-23 22:04:41
     1       | redis02  | 5432 | up     | 0.500000  | standby | 0          | false             | 0                 |                   |                        | 2020-02-23 22:04:41
    
    查看vip
    [postgres@redis01 ~]$ ifconfig -a
    eth1      Link encap:Ethernet  HWaddr 08:00:27:E9:92:7B  
              inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fee9:927b/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:11 errors:0 dropped:0 overruns:0 frame:0
              TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:1921 (1.8 KiB)  TX bytes:1968 (1.9 KiB)
    
    eth2      Link encap:Ethernet  HWaddr 08:00:27:CF:77:EB  
              inet addr:192.168.20.201  Bcast:192.168.20.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fecf:77eb/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:32488 errors:0 dropped:0 overruns:0 frame:0
              TX packets:31045 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:3737971 (3.5 MiB)  TX bytes:3848610 (3.6 MiB)
    
    eth2:0    Link encap:Ethernet  HWaddr 08:00:27:CF:77:EB  
              inet addr:192.168.20.205  Bcast:0.0.0.0  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
    
    5.	演示postgresql down机
    
    [postgres@redis01 ~]$ pg_ctl stop -m fast
    waiting for server to shut down..... done
    server stopped
    
    redis01 postgressql日志
    2020-02-23 22:45:21.402 CST [2577] LOG:  received fast shutdown request
    2020-02-23 22:45:21.682 CST [2577] LOG:  aborting any active transactions
    2020-02-23 22:45:21.682 CST [2577] LOG:  background worker "logical replication launcher" (PID 2586) exited with exit code 1
    2020-02-23 22:45:21.683 CST [2580] LOG:  shutting down
    2020-02-23 22:45:23.174 CST [2577] LOG:  database system is shut down
    
    redis02 postgressql日志
    2020-02-23 22:45:23.174 CST [12119] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
    		Is the server running on host "redis01" (192.168.20.201) and accepting
    		TCP/IP connections on port 5432?
    2020-02-23 22:45:28.180 CST [12126] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
    		Is the server running on host "redis01" (192.168.20.201) and accepting
    		TCP/IP connections on port 5432?
    2020-02-23 22:45:31.395 CST [3584] LOG:  received promote request
    2020-02-23 22:45:31.395 CST [3584] LOG:  redo done at 0/26000028
    2020-02-23 22:45:31.405 CST [3584] LOG:  selected new timeline ID: 12
    2020-02-23 22:45:32.389 CST [3584] LOG:  archive recovery complete
    2020-02-23 22:45:32.402 CST [3582] LOG:  database system is ready to accept connections
    
    查看主备
    [postgres@redis02 ~]$ pg_controldata|grep cluster
    Database cluster state:               in production
    [postgres@redis02 ~]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;'
     client_addr | application_name | sync_state 
    -------------+------------------+------------
    (0 rows)
    
    Redis02主机上的postgressql数据库直接从standby转成primary
    
    查看pgpool集群
    [postgres@redis01 pgpool]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes'
     node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
    ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
     0       | redis01  | 5432 | down   | 0.500000  | standby | 0          | false             | 0                 |                   |                        | 2020-02-23 22:45:33
     1       | redis02  | 5432 | up     | 0.500000  | primary | 0          | true              | 0                 |                   |                        | 2020-02-23 22:45:33
    (2 rows)
    从pgpool 来看也是把Redis02主机上的postgressql数据库直接从standby转成primary
    
    启动down机的主节点;启动起来以后转成standby
    [postgres@redis01 pgpool]$ pg_ctl start
    waiting for server to start....2020-02-23 22:58:00.891 CST [9600] LOG:  listening on IPv4 address "192.168.20.201", port 5432
    2020-02-23 22:58:00.897 CST [9600] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
    2020-02-23 22:58:00.912 CST [9600] LOG:  redirecting log output to logging collector process
    2020-02-23 22:58:00.912 CST [9600] HINT:  Future log output will appear in directory "log".
     done
    server started
    
    [postgres@redis02 ~]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;'
      client_addr   | application_name | sync_state 
    ----------------+------------------+------------
     192.168.20.201 | walreceiver      | sync
    (1 row)
    
    down机的节点加入pgpool集群
    [postgres@redis01 pgpool]$ pcp_attach_node -d -U postgres -h 192.168.20.205 -p 9898 -n 0
    Password: 
    DEBUG: recv: tos="m", len=8
    DEBUG: recv: tos="r", len=21
    DEBUG: send: tos="C", len=6
    DEBUG: recv: tos="c", len=20
    pcp_attach_node -- Command Successful
    DEBUG: send: tos="X", len=4
    [postgres@redis01 pgpool]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes'
     node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
    ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
     0       | redis01  | 5432 | up     | 0.500000  | standby | 0          | true              | 0                 |                   |                        | 2020-02-23 23:00:11
     1       | redis02  | 5432 | up     | 0.500000  | primary | 0          | false             | 0                 |                   |                        | 2020-02-23 22:45:33
    (2 rows)
    
    到此灾备演练完毕。
    

      

  • 相关阅读:
    [转]写给刚工作的技术人员
    IT人士如何有效地学习
    一个学习Unity的网址,记下来以后用.
    .NET Remoting学习总结之一:Remoting的使用场景
    犯了不该犯的错:UNION和UNION ALL没搞清楚,今天写下它们的区别,要谨记在心!
    实用代码:单个读取Console中输入的键
    无意中试用了一下PostSharp,感觉很不错,看得出它对那些重复性的编程会有不少帮助.
    博客的搬迁历程
    .NET 6期 11月上旬教学计划(11月6日更新)
    .NET 6期 10月下旬教学计划(10月31日更新)
  • 原文地址:https://www.cnblogs.com/omsql/p/12354992.html
Copyright © 2011-2022 走看看