基于PGPool的双机集群如下图所示:pg主节点和备节点实现流复制热备,pgpool1,pgpool2作为中间件,将主备pg节点加入集群,实现读写分离,负载均衡和HA故障自动切换。两pgpool节点可以委托一个虚拟ip节点作为应用程序访问的地址,两节点之间通过watchdog进行监控,当pgpool1宕机时,pgpool2会自动接管虚拟ip继续对外提供不间断服务。 1. 主机规划 192.168.20.201 redis01 192.168.20.202 redis02 192.168.20.203 vip 2. 配置主机ssh互信 ssh-keygen -t rsa ssh-copy-id -i ~/.ssh/id_rsa.pub redis02 3. 安装pgpool wget http://www.pgpool.net/mediawiki/images/pgpool-II-4.1.1.tar.gz [postgres@redis01 ~]$ tar -zxf pgpool-II-4.1.0.tar.gz ./configure --prefix=/u01/pgpool --with-pgsql=/u01/pgsql make && make install 4. 配置pgpool 4.1配置环境变量 pgpool装在了postgres账户下,在该账户中添加环境变量,master,slave节点都执行。 [postgres@redis01 ~]$ cat .bash_profile # .bash_profile export PGHOME=/u01/pgsql export PGDATA=/u01/pgsql/data export PGPOOLHOME=/u01/pgpool export PGPASSFILE=/u01/pgsql/data/.pgpass export PATH=$PATH:$HOME/bin:$PGHOME/bin:$PGPOOLHOME/bin export LD_LIBRARY_PATH=$PGHOME/lib:$PGPOOLHOME/lib 4.2配置pool_hba.conf pool_hba.conf是对登录用户进行验证的,要和pg的pg_hba.conf保持一致,要么都是trust,要么都是md5验证方式,这里采用了md5验证方式如下设置: # IPv4 local connections: host all all 127.0.0.1/32 trust host postgres postgres 192.168.20.0/24 md5 host postgres rep 192.168.20.0/24 md5 host all all 0.0.0.0/0 md5 # IPv6 local connections: host all all ::1/128 trust # Allow replication connections from localhost, by a user with the # replication privilege. local replication all trust host replication all 127.0.0.1/32 trust host replication rep 192.168.20.0/24 md5 host replication all ::1/128 trust 4.3配置pcp.conf pcp.conf配置用于pgpool自己登陆管理使用的,一些操作pgpool的工具会要求提供密码等,配置如下: pg_md5 posgres [postgres@redis01 etc]$ cat pcp.conf # USERID:MD5PASSWD postgres:e8a48653851e28c69d0506508fb27fc5 #pcp.conf是pgpool管理器自己的用户名和密码,用于管理集群 pg_md5 -p -m -u postgres pool_passwd #数据库登录用户是postgres,这里输入登录密码,不能出错 #输入密码后,在pgpool/etc目录下会生成一个pool_passwd文件 4.4配置系统命令权限 配置 ifconfig, arping 执行权限 ,执行failover_stream.sh需要用到,可以让其他普通用户执行。 chmod u+s /sbin/ip chmod u+s /usr/sbin/arping 4.5配置pgpool.conf 后台连接设置 # - Backend Connection Settings - backend_hostname0 = 'redis01' backend_port0 = 5432 backend_weight0 = 1 backend_data_directory0 = '/u01/pgsql/data' backend_flag0 = 'ALLOW_TO_FAILOVER' backend_application_name0 = 'server1' backend_hostname1 = 'redis02' backend_port1 = 5432 backend_weight1 = 1 backend_data_directory1 = '/u01/pgsql/data' backend_flag1 = 'ALLOW_TO_FAILOVER' backend_application_name1 = 'server2' 负载均衡 load_balance_mode = on postgres流复制 master_slave_mode = on master_slave_sub_mode = 'stream' 健康检查设置 #------------------------------------------------------------------------------ # HEALTH CHECK GLOBAL PARAMETERS #------------------------------------------------------------------------------ health_check_period = 10 health_check_timeout = 20 health_check_user = 'postgres' health_check_password = 'postgres' health_check_database = '' health_check_max_retries = 0 health_check_retry_delay = 1 connect_timeout = 10000 灾备切换设置 failover_command = '/u01/pgpool/failover_stream.sh %H %h' # Executes this command at failover # Special values: # %d = failed node id # %h = failed node host name # %p = failed node port number # %D = failed node database cluster path # %m = new master node id # %H = new master node hostname # %M = old master node id # %P = old primary node id # %r = new master port number # %R = new master database cluster path # %N = old primary node hostname # %S = old primary node port number # %% = '%' character 开启看门狗 #------------------------------------------------------------------------------ # WATCHDOG #------------------------------------------------------------------------------ # - Enabling - use_watchdog = on # -Connection to up stream servers - trusted_servers = '192.168.20.201,192.168.20.202' ping_path = '/bin' # - Watchdog communication Settings - wd_hostname = 'redis01' wd_port = 9000 wd_priority = 1 wd_authkey = '' wd_ipc_socket_dir = '/tmp' 配置vip # - Virtual IP control Setting - delegate_IP = '192.168.20.205' if_cmd_path = '/sbin' if_up_cmd = ' /sbin/ip addr add $_IP_$/24 dev eth2 label eth2:0' if_down_cmd = '/sbin/ip addr del $_IP_$/24 dev eth2' arping_path = '/usr/sbin' arping_cmd = '/usr/sbin/arping -U $_IP_$ -w 1 -I eth2' 其他pgpool连接设置 # - Other pgpool Connection Settings - other_pgpool_hostname0 = 'redis02' other_pgpool_port0 =9999 other_wd_port0 = 9000 4.6配置failover_stream.sh脚本 [postgres@redis01 pgpool]$ cat failover_stream.sh #! /bin/sh # Failover command for streaming replication. # Arguments: $1: new master hostname. new_master=$1 old_master=$2 RECOVERYCONF=$PGDATA/recovery.conf touch_command="touch $RECOVERYCONF" trigger_command="$PGHOME/bin/pg_ctl promote -D $PGDATA" delrecovery_command="rm -f $PGDATA/recovery.done && rm -f $PGDATA/recovery.conf" echo1_cmd="echo -e " primary_conninfo = 'host=$new_master port=5432 user=rep password=rep passfile=''/u01/pgsql/data/.pgpass''' " >> $PGDATA/recovery.conf" echo2_cmd="echo -e " recovery_target_timeline = 'latest' " >> $PGDATA/recovery.conf" echo3_cmd="echo -e " standby_mode = 'on' " >> $PGDATA/recovery.conf" # Prompte standby database. /usr/bin/ssh -T $new_master $trigger_command # create recovery.conf /usr/bin/ssh -T $old_master $delrecovery_command /usr/bin/ssh -T $old_master $touch_command /usr/bin/ssh -T $old_master $echo3_cmd /usr/bin/ssh -T $old_master $echo1_cmd /usr/bin/ssh -T $old_master $echo2_cmd exit 0; 检测到postgresql down机以后把standby转成primary模式可读写,在down机的主机生成recovery.conf,down机的postgresql启动起来以后自动转成standby。 4.7启动pgpool 启动pgpool以前先启动postgresql, postgresql已配置好流复制 启动postgressql pg_ctl start 查看流复制 [postgres@redis01 etc]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;' client_addr | application_name | sync_state ----------------+------------------+------------ 192.168.20.202 | walreceiver | sync 启动pgpool pgpool -n -D > /u01/pgpool/log/pgpool.log 2>&1 & 查看pgpool节点 [postgres@redis01 etc]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes' node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | redis01 | 5432 | up | 0.500000 | primary | 0 | true | 0 | | | 2020-02-23 22:04:41 1 | redis02 | 5432 | up | 0.500000 | standby | 0 | false | 0 | | | 2020-02-23 22:04:41 查看vip [postgres@redis01 ~]$ ifconfig -a eth1 Link encap:Ethernet HWaddr 08:00:27:E9:92:7B inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fee9:927b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11 errors:0 dropped:0 overruns:0 frame:0 TX packets:19 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1921 (1.8 KiB) TX bytes:1968 (1.9 KiB) eth2 Link encap:Ethernet HWaddr 08:00:27:CF:77:EB inet addr:192.168.20.201 Bcast:192.168.20.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fecf:77eb/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:32488 errors:0 dropped:0 overruns:0 frame:0 TX packets:31045 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3737971 (3.5 MiB) TX bytes:3848610 (3.6 MiB) eth2:0 Link encap:Ethernet HWaddr 08:00:27:CF:77:EB inet addr:192.168.20.205 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 5. 演示postgresql down机 [postgres@redis01 ~]$ pg_ctl stop -m fast waiting for server to shut down..... done server stopped redis01 postgressql日志 2020-02-23 22:45:21.402 CST [2577] LOG: received fast shutdown request 2020-02-23 22:45:21.682 CST [2577] LOG: aborting any active transactions 2020-02-23 22:45:21.682 CST [2577] LOG: background worker "logical replication launcher" (PID 2586) exited with exit code 1 2020-02-23 22:45:21.683 CST [2580] LOG: shutting down 2020-02-23 22:45:23.174 CST [2577] LOG: database system is shut down redis02 postgressql日志 2020-02-23 22:45:23.174 CST [12119] FATAL: could not connect to the primary server: could not connect to server: Connection refused Is the server running on host "redis01" (192.168.20.201) and accepting TCP/IP connections on port 5432? 2020-02-23 22:45:28.180 CST [12126] FATAL: could not connect to the primary server: could not connect to server: Connection refused Is the server running on host "redis01" (192.168.20.201) and accepting TCP/IP connections on port 5432? 2020-02-23 22:45:31.395 CST [3584] LOG: received promote request 2020-02-23 22:45:31.395 CST [3584] LOG: redo done at 0/26000028 2020-02-23 22:45:31.405 CST [3584] LOG: selected new timeline ID: 12 2020-02-23 22:45:32.389 CST [3584] LOG: archive recovery complete 2020-02-23 22:45:32.402 CST [3582] LOG: database system is ready to accept connections 查看主备 [postgres@redis02 ~]$ pg_controldata|grep cluster Database cluster state: in production [postgres@redis02 ~]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;' client_addr | application_name | sync_state -------------+------------------+------------ (0 rows) Redis02主机上的postgressql数据库直接从standby转成primary 查看pgpool集群 [postgres@redis01 pgpool]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes' node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | redis01 | 5432 | down | 0.500000 | standby | 0 | false | 0 | | | 2020-02-23 22:45:33 1 | redis02 | 5432 | up | 0.500000 | primary | 0 | true | 0 | | | 2020-02-23 22:45:33 (2 rows) 从pgpool 来看也是把Redis02主机上的postgressql数据库直接从standby转成primary 启动down机的主节点;启动起来以后转成standby [postgres@redis01 pgpool]$ pg_ctl start waiting for server to start....2020-02-23 22:58:00.891 CST [9600] LOG: listening on IPv4 address "192.168.20.201", port 5432 2020-02-23 22:58:00.897 CST [9600] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2020-02-23 22:58:00.912 CST [9600] LOG: redirecting log output to logging collector process 2020-02-23 22:58:00.912 CST [9600] HINT: Future log output will appear in directory "log". done server started [postgres@redis02 ~]$ psql -c 'SELECT client_addr,application_name,sync_state FROM pg_stat_replication;' client_addr | application_name | sync_state ----------------+------------------+------------ 192.168.20.201 | walreceiver | sync (1 row) down机的节点加入pgpool集群 [postgres@redis01 pgpool]$ pcp_attach_node -d -U postgres -h 192.168.20.205 -p 9898 -n 0 Password: DEBUG: recv: tos="m", len=8 DEBUG: recv: tos="r", len=21 DEBUG: send: tos="C", len=6 DEBUG: recv: tos="c", len=20 pcp_attach_node -- Command Successful DEBUG: send: tos="X", len=4 [postgres@redis01 pgpool]$ psql -h 192.168.20.205 -p 9999 -c 'show pool_nodes' node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | redis01 | 5432 | up | 0.500000 | standby | 0 | true | 0 | | | 2020-02-23 23:00:11 1 | redis02 | 5432 | up | 0.500000 | primary | 0 | false | 0 | | | 2020-02-23 22:45:33 (2 rows) 到此灾备演练完毕。