如果 我直接 kill 掉 bgwriter 的进程,会发生什么呢?
[root@localhost postgresql-9.2.0]# ps -ef|grep post root 2928 2897 0 10:34 pts/1 00:00:00 su - postgres postgres 2929 2928 0 10:34 pts/1 00:00:00 -bash postgres 3101 2929 0 11:09 pts/1 00:00:00 ./postgres -D /usr/local/pgsql/data postgres 3103 3101 0 11:09 ? 00:00:00 postgres: checkpointer process postgres 3104 3101 0 11:09 ? 00:00:00 postgres: writer process postgres 3105 3101 0 11:09 ? 00:00:00 postgres: wal writer process postgres 3106 3101 0 11:09 ? 00:00:00 postgres: autovacuum launcher process postgres 3107 3101 0 11:09 ? 00:00:00 postgres: stats collector process root 3109 2977 0 11:10 pts/2 00:00:00 grep post [root@localhost postgresql-9.2.0]# kill 3104 [root@localhost postgresql-9.2.0]# ps -ef|grep post root 2928 2897 0 10:34 pts/1 00:00:00 su - postgres postgres 2929 2928 0 10:34 pts/1 00:00:00 -bash postgres 3101 2929 0 11:09 pts/1 00:00:00 ./postgres -D /usr/local/pgsql/data postgres 3103 3101 0 11:09 ? 00:00:00 postgres: checkpointer process postgres 3105 3101 0 11:09 ? 00:00:00 postgres: wal writer process postgres 3106 3101 0 11:09 ? 00:00:00 postgres: autovacuum launcher process postgres 3107 3101 0 11:09 ? 00:00:00 postgres: stats collector process postgres 3110 3101 0 11:10 ? 00:00:00 postgres: writer process root 3112 2977 0 11:10 pts/2 00:00:00 grep post [root@localhost postgresql-9.2.0]# kill 3110 [root@localhost postgresql-9.2.0]# ps -ef|grep post root 2928 2897 0 10:34 pts/1 00:00:00 su - postgres postgres 2929 2928 0 10:34 pts/1 00:00:00 -bash postgres 3101 2929 0 11:09 pts/1 00:00:00 ./postgres -D /usr/local/pgsql/data postgres 3103 3101 0 11:09 ? 00:00:00 postgres: checkpointer process postgres 3105 3101 0 11:09 ? 00:00:00 postgres: wal writer process postgres 3106 3101 0 11:09 ? 00:00:00 postgres: autovacuum launcher process postgres 3107 3101 0 11:09 ? 00:00:00 postgres: stats collector process postgres 3114 3101 0 11:10 ? 00:00:00 postgres: writer process root 3116 2977 0 11:10 pts/2 00:00:00 grep post [root@localhost postgresql-9.2.0]#
我删除了几次 bgwriter 的进程,都再次生成了。
那么其原因是什么呢?
这和 postmaster.c 的监控有关。来看代码吧:为简化起见,吧postmaster 与 postgres 当成一个东西。
postmaster 生成了各个子进程以后,会在一旁进行监控:
/* * Reaper -- signal handler to cleanup after a child process dies. */ static void reaper(SIGNAL_ARGS) { int save_errno = errno; int pid; /* process id of dead child process */ int exitstatus; /* its exit status */ /* These macros hide platform variations in getting child status */ #ifdef HAVE_WAITPID int status; /* child exit status */ #define LOOPTEST() ((pid = waitpid(-1, &status, WNOHANG)) > 0) #define LOOPHEADER() (exitstatus = status) #else /* !HAVE_WAITPID */ #ifndef WIN32 union wait status; /* child exit status */ #define LOOPTEST() ((pid = wait3(&status, WNOHANG, NULL)) > 0) #define LOOPHEADER() (exitstatus = status.w_status) #else /* WIN32 */ #define LOOPTEST() ((pid = win32_waitpid(&exitstatus)) > 0) #define LOOPHEADER() #endif /* WIN32 */ #endif /* HAVE_WAITPID */ PG_SETMASK(&BlockSig); ereport(DEBUG4, (errmsg_internal("reaping dead processes"))); while (LOOPTEST()) { LOOPHEADER(); …… /* * Was it the bgwriter? Normal exit can be ignored; we'll start a new * one at the next iteration of the postmaster's main loop, if * necessary. Any other exit condition is treated as a crash. */ if (pid == BgWriterPID) { BgWriterPID = 0; if (!EXIT_STATUS_0(exitstatus)) HandleChildCrash(pid, exitstatus, _("background writer process")); continue; } …… } …… }
[作者:技术者高健@博客园 mail: luckyjackgao@gmail.com ]
由于我所使用的是 linux 平台,
[root@localhost postgresql-9.2.0]# find ./ -name "*.h"|xargs grep "HAVE_WAITPID"
./src/include/pg_config.h:#define HAVE_WAITPID 1
[root@localhost postgresql-9.2.0]#
所以,循环程序可以认为是:
while (((pid = waitpid(-1, &status, WNOHANG)) > 0)) { exitstatus = status; …… /* * Was it the bgwriter? Normal exit can be ignored; we'll start a new * one at the next iteration of the postmaster's main loop, if * necessary. Any other exit condition is treated as a crash. */ if (pid == BgWriterPID) { BgWriterPID = 0; if (!(exitstatus==0)) HandleChildCrash(pid, exitstatus, _("background writer process")); continue; } …… }
waitpid 用于监控子进程的结束。
其参数:
pid=-1 就是 等待任何子进程,相当于 wait()。
WNOHANG 就是 若pid指定的子进程没有结束,则waitpid()函数返回0,不予以等待。若结束,则返回该子进程的ID
而 HandleChildCrash 会完成重新建立子进程的工作。
[作者:技术者高健@博客园 mail: luckyjackgao@gmail.com ]
结束