zoukankan      html  css  js  c++  java
  • 说说pgpoolII的 health check

    pgpool-II中,与health check 相干的配置文件项有两个:

    health_check_period
    health_check_timeout

    乍一看他们 文档的解释,看官方网站的说法:

    http://pgpool.projects.postgresql.org/pgpool-II/doc/pgpool-en.html

    health_check_period
    This parameter specifies the interval between the health checks in seconds. 
    
    Default is 0, which means health check is disabled. You need to reload pgpool.conf if you change health_check_period.
    health_check_timeout
    pgpool-II periodically tries to connect to the backends to detect any error on the servers or networks. This error check procedure is called "health check".

    If an error is detected, pgpool-II tries to perform failover or degeneration.

    This parameter serves to prevent the health check from waiting for a long time in acase such as un unplugged network cable. The timeout value is in seconds. Default value is 20.

    0 disables timeout (waits until TCP/IP timeout).

    This health check requires one extra connection to each backend,
    so max_connections in the postgresql.conf needs to be incremented as needed. You need to reload pgpool.conf if you change this value.

    实际的情形如何呢,这里以 pgpool-II 3.1 为例(为了看着方便,去掉了一部分不重要的代码):

    /*                                    
    * pgpool main program                                    
    */                                    
    int main(int argc, char **argv)                                    
    {                                    
        ……                                
        /*                                
         * This is the main loop                                
         */                                
        for (;;)                                
        {                                
            CHECK_REQUEST;             
            /* do we need health checking for PostgreSQL? */   
            if (pool_config->health_check_period > 0)                            
            {                          
                ……                     
                if (pool_config->health_check_timeout > 0)                        
                {                        
                    /*                    
                     * set health checker timeout. we want to detect  
                     * communication path failure much earlier before 
                     * TCP/IP stack detects it.                    
                     */                    
                    pool_signal(SIGALRM, health_check_timer_handler); 
                    alarm(pool_config->health_check_timeout);                    
                }                        
                                        
                /*                        
                 * do actual health check. trying to connect to the backend   
                 */                        
                errno = 0;                        
                health_check_timer_expired = 0;                        
                POOL_SETMASK(&UnBlockSig);                        
                sts = health_check();                        
                POOL_SETMASK(&BlockSig);                        
                                        
                if (pool_config->parallel_mode || pool_config->enable_query_cache) 
                    sys_sts = system_db_health_check();                    
                                        
                if ((sts > 0 || sys_sts < 0) 
    && (errno != EINTR || (errno == EINTR && health_check_timer_expired))) { if (sts > 0) { sts--; if (!pool_config->parallel_mode) { if (POOL_DISALLOW_TO_FAILOVER(BACKEND_INFO(sts).flag)) { pool_log("health_check: %d failover is canceld
                          because failover is disallowed
    ", sts); } else { pool_log("set %d th backend down status", sts); Req_info->kind = NODE_DOWN_REQUEST; Req_info->node_id[0] = sts; failover(); /* need to distribute this info to children */ } } else { retrycnt++; pool_signal(SIGALRM, SIG_IGN); /* Cancel timer */ if (retrycnt > NUM_BACKENDS) { /* retry count over */ pool_log("set %d th backend down status", sts); Req_info->kind = NODE_DOWN_REQUEST; Req_info->node_id[0] = sts; failover(); retrycnt = 0; } else { /* continue to retry */ sleep_time = pool_config->health_check_period/
                                      NUM_BACKENDS; pool_debug("retry sleep time: %d seconds", sleep_time); pool_sleep(sleep_time); continue; } } } …… } if (pool_config->health_check_timeout > 0) { /* seems ok. cancel health check timer */ pool_signal(SIGALRM, SIG_IGN); } sleep_time = pool_config->health_check_period; pool_sleep(sleep_time); } else { for (;;) { int r; struct timeval t = {3, 0}; POOL_SETMASK(&UnBlockSig); r = pool_pause(&t); POOL_SETMASK(&BlockSig); if (r > 0) break; } } } pool_shmem_exit(0); }

    可以看得比较清楚了,

    第一点,health_check_period的作用,如果不为零,则health_check可以发生。
    其他非零值其实都是一样。

    第二点,health_check_timeout的作用,如果>0,则会被设置timer,timer到时间后,激 活 health_check_timer_handler,对调用 health_check()函数的。

    第三点,这里是最坑爹的部分了:

    在主循环里面,只要 health_check_period不为零,则要不断地在循环里面作 health_check()动作。
    这个一般而言比 缺省的 health_check_timeout 20秒可高多了。

    实际运行 pgpool命令的时候,如果加入 -d 参数,就可以看到这一点:pgpool-II不断通过调用healt_check()来检查各节点状况。

    可以说,有了这个主循环里面折腾 health_check以后,health_check_timeout就形同虚设了。

    只是不知道从哪个版本开始变成这样的,或者可以说 pgpool-II的开发者很不负责,没有很好地协调代码和文档。也许这是很多开源项目的通病了。

  • 相关阅读:
    web前端之jQuery
    java之awt编程
    java连接数据库的基本操作
    实习生应聘经历2018/3/1
    javaweb学习之建立简单网站
    mysql之视图
    71. Simplify Path
    347. Top K Frequent Elements
    7. Reverse Integer
    26. Remove Duplicates from Sorted Array
  • 原文地址:https://www.cnblogs.com/gaojian/p/2611935.html
Copyright © 2011-2022 走看看