zoukankan      html  css  js  c++  java
  • postgresql 高可用 repmgr 的使用之九 1 Primary + 2 Standby 的 auto failover

    os:ubunbu 16.04
    postgresql:9.6.8
    repmgr:4.1.1

    192.168.56.101 node1
    192.168.56.102 node2
    192.168.56.103 node3

    配置好 1 Primary + 2 Standby

    详细过程略,参考前面的blog。

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location   | Connection string                                            
    ----+-------+---------+-----------+----------+------------+-----------------------------------------------------------------
     1  | node1 | primary | * running |          | location01 | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
     2  | node2 | standby |   running | node1    | location01 | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2
     3  | node3 | standby |   running | node1    | location01 | host=192.168.56.103 user=repmgr dbname=repmgr connect_timeout=2
     
    

    手动关闭node1主库模拟异常

    node1 上操作

    $ sudo pg_ctlcluster 9.6 main stop
    

    node2 上查看

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location   | Connection string                                            
    ----+-------+---------+-----------+----------+------------+-----------------------------------------------------------------
     1  | node1 | primary | - failed  |          | location01 | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
     2  | node2 | primary | * running |          | location01 | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2
     3  | node3 | standby |   running | node2    | location01 | host=192.168.56.103 user=repmgr dbname=repmgr connect_timeout=2
    
    WARNING: following issues were detected
      - when attempting to connect to node "node1" (ID: 1), following error encountered :
    "could not connect to server: Connection refused
    	Is the server running on host "192.168.56.101" and accepting
    	TCP/IP connections on port 5432?"
    	
    

    可以看到 node2 上的 postgresql 已经提升为新的master。
    且 node3 的 postgresql 的 upstream 已经由之前的node1调整为 node2 了。

    node3 上查看

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location   | Connection string                                            
    ----+-------+---------+-----------+----------+------------+-----------------------------------------------------------------
     1  | node1 | primary | - failed  |          | location01 | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
     2  | node2 | primary | * running |          | location01 | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2
     3  | node3 | standby |   running | node2    | location01 | host=192.168.56.103 user=repmgr dbname=repmgr connect_timeout=2
    
    WARNING: following issues were detected
      - when attempting to connect to node "node1" (ID: 1), following error encountered :
    "could not connect to server: Connection refused
    	Is the server running on host "192.168.56.101" and accepting
    	TCP/IP connections on port 5432?"
    	
    

    node2虚拟机掉电

    此时,node2 上postgresql 为新的master,继续测试ha,把node2虚拟机掉电。

    node3 上查看

    $ repmgr -f /etc/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location   | Connection string                                            
    ----+-------+---------+-----------+----------+------------+-----------------------------------------------------------------
     1  | node1 | primary | - failed  |          | location01 | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
     2  | node2 | primary | - failed  |          | location01 | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2
     3  | node3 | primary | * running |          | location01 | host=192.168.56.103 user=repmgr dbname=repmgr connect_timeout=2
    
    WARNING: following issues were detected
      - when attempting to connect to node "node1" (ID: 1), following error encountered :
    "could not connect to server: Connection refused
    	Is the server running on host "192.168.56.101" and accepting
    	TCP/IP connections on port 5432?"
      - when attempting to connect to node "node2" (ID: 2), following error encountered :
    "timeout expired"
    
    $ tail -f /var/log/postgresql/repmgrd.log
    [2018-09-26 10:54:59] [INFO] node "node3" (node ID: 3) monitoring upstream node "node2" (node ID: 2) in normal state
    [2018-09-26 10:54:59] [DETAIL] last monitoring statistics update was 5 seconds ago
    [2018-09-26 10:55:11] [WARNING] unable to connect to upstream node "node2" (node ID: 2)
    [2018-09-26 10:55:11] [INFO] checking state of node 2, 1 of 10 attempts
    [2018-09-26 10:55:13] [INFO] sleeping 5 seconds until next reconnection attempt
    [2018-09-26 10:55:18] [INFO] checking state of node 2, 2 of 10 attempts
    [2018-09-26 10:55:20] [INFO] sleeping 5 seconds until next reconnection attempt
    [2018-09-26 10:55:25] [INFO] checking state of node 2, 3 of 10 attempts
    [2018-09-26 10:55:27] [INFO] sleeping 5 seconds until next reconnection attempt
    [2018-09-26 10:55:32] [INFO] checking state of node 2, 4 of 10 attempts
    [2018-09-26 10:55:34] [INFO] sleeping 5 seconds until next reconnection attempt
    [2018-09-26 10:55:39] [INFO] checking state of node 2, 5 of 10 attempts
    [2018-09-26 10:55:41] [INFO] sleeping 5 seconds until next reconnection attempt
    [2018-09-26 10:55:46] [INFO] checking state of node 2, 6 of 10 attempts
    [2018-09-26 10:55:48] [INFO] sleeping 5 seconds until next reconnection attempt
    [2018-09-26 10:55:53] [INFO] checking state of node 2, 7 of 10 attempts
    [2018-09-26 10:55:55] [INFO] sleeping 5 seconds until next reconnection attempt
    [2018-09-26 10:56:00] [INFO] checking state of node 2, 8 of 10 attempts
    [2018-09-26 10:56:02] [INFO] sleeping 5 seconds until next reconnection attempt
    [2018-09-26 10:56:07] [INFO] checking state of node 2, 9 of 10 attempts
    [2018-09-26 10:56:09] [INFO] sleeping 5 seconds until next reconnection attempt
    [2018-09-26 10:56:14] [INFO] checking state of node 2, 10 of 10 attempts
    
    [2018-09-26 10:56:16] [WARNING] unable to reconnect to node 2 after 10 attempts
    [2018-09-26 10:56:16] [NOTICE] this node is the only available candidate and will now promote itself
    [2018-09-26 10:56:16] [INFO] promote_command is:
      "/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"
    [2018-09-26 10:56:16] [NOTICE] redirecting logging output to "/var/log/postgresql/repmgrd.log"
    
    [2018-09-26 10:56:18] [NOTICE] promoting standby to primary
    [2018-09-26 10:56:18] [DETAIL] promoting server "node3" (ID: 3) using "sudo pg_ctlcluster 9.6 main promote"
    [2018-09-26 10:56:18] [DETAIL] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
    [2018-09-26 10:56:19] [NOTICE] STANDBY PROMOTE successful
    [2018-09-26 10:56:19] [DETAIL] server "node3" (ID: 3) was successfully promoted to primary
    [2018-09-26 10:56:19] [INFO] switching to primary monitoring mode
    [2018-09-26 10:56:19] [NOTICE] monitoring cluster primary "node3" (node ID: 3)
    [2018-09-26 10:56:29] [INFO] monitoring primary node "node3" (node ID: 3) in normal state
    [2018-09-26 10:56:39] [INFO] monitoring primary node "node3" (node ID: 3) in normal state
    

    1 Primary + 2 Standby 的 autofailover 和 1 Primary + 1 Standby 的 autofailover 基本一致,只是多了一个 standby,就多了一点ha。

  • 相关阅读:
    一个文件汇集搜索系统(NiFi + ELK)
    Apache NiFi
    JSONPath
    git免密push方法
    SSH的那些keys
    Elasticsearch
    kubernetes intro
    几个流行的npm包
    Micro-Frontend微前端
    Consul服务注册与服务发现
  • 原文地址:https://www.cnblogs.com/ctypyb2002/p/9792864.html
Copyright © 2011-2022 走看看