zoukankan      html  css  js  c++  java
  • TraTraffic Server 进程模型

    1.概述

             Traffic Server包括三个一起工作的进程来服务Traffic Server的请求,管理/控制/监控系统的健康状况。图1说明了三个进程的关系,三个进程将会在下面描述。

     

    图1:进程之间的关系

    1)traffic_server进程是 Traffic Server的事务处理引擎。它负责接收连接、处理协议请求以及从本地缓存或源服务器提供资源。

    2)traffic_manager进程是用来命令和控制Traffic Server的工具,负责启动、监控以及重新配置traffic_server进程。traffic_manager进程同时负责代理自动配置端口、统计接口、集群管理以及vip故障转移。

             如果traffic_manager进程检测到traffic_server进程失败,它不仅会立即重启该进程,而且会为所有传入的请求维护一个连接队列。在traffic_server重启前的几秒内传入的所有连接将会被保存在连接队列中,并以FIFO的方式处理。这个连接队列接受任何server故障重启时的连接。
     

    3)traffic_cop进程监控traffic_server和traffic_manager进程的健康状况。traffic_cop进程通过抓取合成web页面的心跳请求方式周期性的(每分钟若干次)查询traffic_server和traffic_manager进程。如果失败事件发生(如果在超时时间间隔内没有收到请求或者收到错误的请求),traffic_cop重启traffic_server和traffic_manager进程。系统这样设计的好处便是给traffic_server进程加上了来自traffic_manager和traffic_cop的双重保障,因为traffic_server进程是工作进程,必须保证它的正常运行。-

    4)traffic server采用的是多线程异步事件处理模型:Traffic Server并不是为每个连接都建立一个线程,而是事先创建一组数量可配置的工作线程,每一个工作线程上都运行着独立的异步事件处理程序。traffic_server创建若干组Thread,并将Event按类型调度到相应的Thread的Event队列上,Thread通过执行Event对应的Continuation中的回调函数,来完成状态的迁移。从初始态到终止态的迁移代表了整个事件的执行过程,而Thread是永不退出的,等待着下一个事件的到来。

             本文重点在于分析traffic server中三个进程的关系以及实现,对于其多线程异步事件处理模型不作深入分析。进程模型图如下:

    2.实现原理

    基本原理:对traffic_manager进程和traffic_server进程分别配置对应的manager_lockfile和server_lockfile文件,traffic_cop通过两个lockfile文件来监控traffic_manager和traffic_server进程,同理traffic_manager进程通过server_lockfile来监控traffic_server进程。图2说明了这种关系:

    图2:进程以及lockfile文件的关系

     

    关键实现:

     

    关键类 Lockfile

    Lockfile::Open(pid_t * holding_pid)函数详解:

    解释和说明:Lockfile::Open(pid_t * holding_pid)会有三种类型的返回值,close-on-exec:具体作用在于当开辟其他进程调用exec()族函数时,在调用exec函数之前为exec族函数释放对应的文件描述符。

    (1):返回1说明lockfile可以被打开,这也说明与lockfile关联的进程没有运行,如果关联的进程在运行,lockfile会被进程持有,就不会被打开;

    (2):返回0说明检测到lockfile被某个进程持有,那么将持有lockfile的进程ID写入holding_pid返回,持有lockfile的进程ID是在对应进程运行的时候,由Get()函数写入到lockfile中的;

    (3):返回负值一共有三种情况,一是打开fname失败,二是获取close-on-exec标识失败,三是设置clsoe-on-exec标识失败。

    重要的kill进程的相关函数,简要说明如下:

    // kill

    //用于杀死指定pid的进程

    //return: 0--okay,-1—error

    1.int kill(pid_t pid, int sig);


    //ink_killall     

     

    //杀死程序名称为pname的所有进程
    // return: 0--okay,-1—error
     2. ink_killall(const char *pname, int sig);

    ink_killall调用:

      3. ink_killall_get_pidv_xmalloc (pname, &pidv, &pidvcnt);
      4. ink_killall_kill_pidv (pidv, pidvcnt, sig);

    // ink_killall_get_pidv_xmalloc
    //根据程序panme,获取程序运行的进程ID到pidv数组中,以及进程的个数到pidvcnt

    //变量中

    //return: -1 error (pidv: set to NULL; pidvcnt: set to 0); 0 okay (pidv: ats_malloc'd //pid vector; pidvcnt: number of pid's;if pidvcnt is set to 0, then pidv will //be set to NULL)

    3.int ink_killall_get_pidv_xmalloc(const char *pname, pid_t ** pidv, int *pidvcnt);


     
    // ink_killall_kill_pidv (pidv, pidvcnt, sig);
    //将pidv中记录的进程ID逐个调用kill( pidv[i],sig)
    // return: 0--okay,-1—error
    4.int ink_killall_kill_pidv(pid_t * pidv, int pidvcnt,int sig);
    ink_killall_kill_pidv调用:
      

      1.kill(pid_t pid, int sig);


     

    // safe_kill
    //用于安全的杀死程序名称为pname的所有进程,lockfile_name为进程需要关联的lockfile文件//group表明是否需要杀死pname进程创造的子进程,因为它们在同一个进程组;

    //return: void

     5. static void safe_kill(const char *lockfile_name, const char *pname, bool group);
    static void safe_killd调用:

      6. Lockfile::Kill(killsig, coresig, pname);

      7. Lockfile::KillGroup(killsig, coresig, pname);

    // Lockfile::Kill

    //处理好对应的lockfile文件,杀死程序名为pname的所有进程,其中sig一般就是kill信号,//initial_sig默认为0,用于发送给init_pid进程的

    //return:void

    6. void Lockfile::Kill(int sig, int initial_sig, const char *pname);
    Lockfile::Kill调用:

      8.LockKill::lockfile_kill_internal(pid, initial_sig, pid, pname, sig);



    // Lockfile::KillGroup

    //处理好对应的lockfile文件,杀死程序名为pname的进程,以及该进程创建的子进程(当然也包括//子进程创建的线程),sig为kill信号

    //信号

    //initial_sig同上kill函数

    //return :void

    7.void Lockfile::KillGroup(int sig, int initial_sig, const char *pname);
    Lockfile::KillGroup调用:
      
      8.LockKill::lockfile_kill_internal(pid, initial_sig, pid, pname, sig);

    // LockKill::lockfile_kill_internal

    //首先杀死init_pid进程,然后杀死程序名称为pname的所有进程

    //return :void

    8.static void lockfile_kill_internal(pid_t init_pid, int init_sig, pid_t pid, const char *pname, int sig);
    lockfile_kill_internal调用:

      1.kill(init_pid, init_sig);

      3.ink_killall_get_pidv_xmalloc(pname, &pidv, &pidvcnt);
      4.ink_killall_kill_pidv(pidv, pidvcnt, sig);

    若想了解详细实现细节,请参见源代码.

    2.     模拟traffic_coptraffic_managertraffic_server的监控

    Traffic_cop启动以后进入main函数,main函数会调用一个check函数,在check里面会周期性的调用check_programs()函数来对traffic_manager和traffic_server进行监控。check_programs()函数有些复杂,流程图如下图。

    3.模拟测试                                                                                   

    根据原理,模仿了traffic_cop、traffic_manager和traffic_server三个进程,其中将traffic_cop实现为守护进程,traffic_manager进程对traffic-server进程的监控类似于traffic_cop对traffic_manager与traffic_server的监控,故不作重复说明。实验中,由于测试traffic_manager与traffic_server进程健康度的函数heartbear_manager()、server_up()与heartbeat_server()函数涉及到端口通信部分内容,由于其不妨碍原理部分的模拟,略写了它们的代码,而是让它们直接返回正常值。(程序运行的时候需要manage_lokfile和server_lockfile文件,读者应自己在可执行文件所在文件夹下加上这两个文件)

    程序运行后,敲入命令 ps –axj|grep binary得到图如下:

    前四个标识分别是:父进程ID/进程ID/进程组ID/会话ID

     

    图中可以看出它们的正常关系。

    当traffic_manager进程异常退出的时候,traffic_cop会重启traffic_manager进程,在日志文件中可以看出这一动作:(日志部分内容如下)

    ==============traffic_server is running, pid:'5443'!

    ----------------traffic_manager is running, pid:'5436'!

    ==============traffic_server is running, pid:'5443'!

    ---------------traffic_manager has a expcetion and eixt!

    Entering check_programs()

    traffic_manager not running, making sure traffic_server is dead

    Entering safe_kill

    Leaving safe_kill

    Entering spwan_manager()!

    Leaving spwan_manager()!

    Leaving check_programs

    ----------------traffic_manager is running, pid:'5463'!

    Entering spwan_server()!

    Leaving spwan_server()!

    ==============traffic_server is running, pid:'5467'!

    从日志中可以看出,某个时刻,traffic_manager进程ID是5436,traffic_server进程ID是5443;下一时刻中,traffic_manager进程出现了异常(---------------traffic_manager has a expcetion and eixt!),然后traffic_cop在周期性的check_programs()中发现” traffic_manager not running”,然后它杀死了traffic_server进程(” making sure traffic_server is dead”),然后重新创建了traffic_manager进程(” Entering spwan_manager()!”),traffic_manager进程的ID已经变成了5463,traffic_manager正常运行后,发现traffic_server进程没有运行,随后它调用spwan_server()产生新的traffic_server进程,其ID号变成了5467。说明traffic_cop监控功能正常。

           当traffic_server进程异常退出的时候,traffic_manager进程会检测到这一行为,然后重启traffic_server进程,在日志文件中也可以看出这一动作:(日志部分内容如下)

    ==============traffic_server is running, pid:'7703'!

    ----------------traffic_manager is running, pid:'7699'!

    =================traffic_server has a expcetion and exit!

    Entering safe_kill

    Leaving safe_kill

    --------------Entering spwan_server()!

    --------------Leaving spwan_server()!

     

    ----------------traffic_manager is running, pid:'7699'!

    ==============traffic_server is running, pid:'7712'!

    从日志上可以看出,某时刻,traffic_manager进程ID为7699,traffic_server进程ID是7703,接下来traffic_server进程出现异常退出,traffic_manager进程则调用spwan_server()重新开启了一个traffic_server进程,ID号为7712,此时traffic_manager进程的ID号仍然是7699,说明traffic_manager进程没有改变。这说明traffic_manager起到了监控traffic_server进程的作用。

    4.总结

         为什么设计了三个进程来工作,而不是采用两个进程:直接让traffic_manager进程来监管traffic_server进程。由于traffic_manager进程所负担的系统角色说明单独的两个进程是无法满足系统要求的。特别是当traffic_manager进程检测到traffic_server进程失败的时候,它会暂时将请求放入队列中,所以它也需要在端口上暂时监听请求,这样系统就无法保障该进程不会出现异常,这也意味着traffic_manager进程同样也会出现异常。为此系统设计了traffic_cop守护进程来监控,traffic_cop进程的角色就是纯粹的监控另外两个进程,理论上这个守护进程是不会异常结束的,这样的三层设计比两层设计更安全更可靠。当三个进程协同工作的时候,客户对于服务器的异常是透明的(设计上如此,但并非绝对,当traffic_manager与traffic_server同时异常结束的时候,traffic_cop在重启它们的几秒钟内,客户的请求会无法接收,小概率),客户是不会感知到自己的请求会出现问题的,可能会感觉延迟大了一些。从服务器的架构设计上可以看出,服务器的要求是尽可能的稳定安全,对于异常情况的考虑应周全。

     源代码:

    1.lock_and_kill.h

      1 #ifndef LOCK_AND_KILL_H
      2 #define LOCK_AND_KILL_H
      3 #include <sys/types.h>
      4 #include <string.h>
      5 #define PATH_NAME_MAX 4096
      6 
      7 /*-------------------------------------------------------------------------
      8    ink_killall
      9    - Sends signal 'sig' to all processes with the name 'pname'
     10    - Returns: -1 error
     11                0 okay
     12   -------------------------------------------------------------------------*/
     13 int ink_killall(const char *pname, int sig);
     14 
     15 /*-------------------------------------------------------------------------
     16    ink_killall_get_pidv_xmalloc
     17    - Get all pid's named 'pname' and stores into ats_malloc'd
     18      pid_t array, 'pidv'
     19    - Returns: -1 error (pidv: set to NULL; pidvcnt: set to 0)
     20                0 okay (pidv: ats_malloc'd pid vector; pidvcnt: number of pid's;
     21                    if pidvcnt is set to 0, then pidv will be set to NULL)
     22   -------------------------------------------------------------------------*/
     23 int ink_killall_get_pidv_xmalloc(const char *pname, pid_t ** pidv, int *pidvcnt);
     24 
     25 /*-------------------------------------------------------------------------
     26    ink_killall_kill_pidv
     27    - Kills all pid's in 'pidv' with signal 'sig'
     28    - Returns: -1 error
     29                0 okay
     30   -------------------------------------------------------------------------*/
     31 int ink_killall_kill_pidv(pid_t * pidv, int pidvcnt, int sig);
     32 
     33 
     34 
     35 class Lockfile
     36 {
     37 public:
     38     
     39   Lockfile(void):fd(0)
     40   {
     41     fname[0] = '\0';
     42   }
     43 
     44 
     45   // coverity[uninit_member]
     46   Lockfile(const char *filename):fd(0)
     47   {
     48     strcpy(fname, filename);
     49   }
     50 
     51 
     52   ~Lockfile(void)
     53   {
     54   }
     55 
     56   void SetLockfileName(const char *filename)
     57   {
     58     strcpy(fname, filename);
     59   }
     60 
     61   const char *GetLockfileName(void)
     62   {
     63     return fname;
     64   }
     65 
     66   // Open() -----非常重要的函数
     67   //
     68   // Tries to open a lock file, returning:
     69   //   -errno on error
     70   //   0 if someone is holding the lock (with holding_pid set)
     71   //   1 if we now have a writable lock file
     72   int Open(pid_t * holding_pid);
     73 
     74   // Get()
     75   //
     76   // Gets write access to a lock file, and if successful, truncates
     77   // file, and writes the current process ID.  Returns:
     78   //   -errno on error
     79   //   0 if someone is holding the lock (with holding_pid set)
     80   //   1 if we now have a writable lock file
     81   int Get(pid_t * holding_pid);
     82 
     83   // Close()
     84   //
     85   // Closes the file handle on the opened Lockfile.
     86   void Close(void);
     87 
     88   // Kill()
     89   // KillGroup()
     90   //
     91   // Ensures no one is holding the lock. It tries to open the lock file
     92   // and if that does not succeed, it kills the process holding the lock.
     93   // If the lock file open succeeds, it closes the lock file releasing
     94   // the lock.
     95   //
     96   // The intial signal can be used to generate a core from the process while
     97   // still ensuring it dies.
     98   void Kill(int sig, int initial_sig = 0, const char *pname = NULL);
     99   void KillGroup(int sig, int initial_sig = 0, const char *pname = NULL);
    100 
    101 private:
    102   char fname[PATH_NAME_MAX];
    103   int fd;
    104 };
    105 
    106 
    107 #endif

    2.lock_and_kill.cpp

      1 #include <stdio.h>
      2 #include <stdlib.h>
      3 #include <dirent.h>
      4 #include<unistd.h> 
      5 #include<sys/file.h>
      6 #include <errno.h>
      7 #include <signal.h>
      8 
      9 #include "lock_and_kill.h"
     10 
     11 
     12 #define PROC_BASE "/proc"
     13 #define INITIAL_PIDVSIZE 32
     14 #define LOCKFILE_BUF_LEN 16 
     15 #define LINE_MAX 1024 //may be hava problem with it
     16 int
     17 ink_killall(const char *pname, int sig)
     18 {
     19   int err;
     20   pid_t *pidv;
     21   int pidvcnt;
     22   
     23   if (ink_killall_get_pidv_xmalloc(pname, &pidv, &pidvcnt) < 0) {
     24     return -1;
     25   }
     26 
     27   if (pidvcnt == 0) {
     28     free(pidv);
     29     return 0;
     30   }
     31 
     32   err = ink_killall_kill_pidv(pidv, pidvcnt, sig);
     33   free(pidv);
     34   return err;
     35 }
     36 
     37 int
     38 ink_killall_get_pidv_xmalloc(const char *pname, pid_t ** pidv, int *pidvcnt)
     39 {
     40   DIR *dir;
     41   FILE *fp;
     42   struct dirent *de;
     43   pid_t pid, self;
     44   char buf[LINE_MAX], *p, *comm;
     45   int pidvsize = INITIAL_PIDVSIZE;
     46 
     47   if (!pname || !pidv || !pidvcnt)
     48     goto l_error;
     49 
     50   self = getpid();
     51   if (!(dir = opendir(PROC_BASE)))
     52     goto l_error;
     53 
     54   *pidvcnt = 0;
     55   *pidv = (pid_t *)malloc(pidvsize * sizeof(pid_t));
     56 
     57   while ((de = readdir(dir))) {
     58     if (!(pid = (pid_t) atoi(de->d_name)) || pid == self)
     59       continue;
     60     snprintf(buf, sizeof(buf), PROC_BASE "/%d/stat", pid);
     61     if ((fp = fopen(buf, "r"))) {
     62       if (fgets(buf, sizeof buf, fp) == 0)
     63         goto l_close;
     64       if ((p = strchr(buf, '('))) {
     65         comm = p + 1;
     66         if ((p = strchr(comm, ')')))
     67           *p = '\0';
     68         else
     69           goto l_close;
     70         if (strcmp(comm, pname) == 0) {
     71           if (*pidvcnt >= pidvsize) {
     72             pid_t *pidv_realloc;
     73             pidvsize *= 2;
     74             if (!(pidv_realloc = (pid_t *)realloc(*pidv, pidvsize * sizeof(pid_t)))) {
     75               free(*pidv);
     76               goto l_error;
     77             } else {
     78               *pidv = pidv_realloc;
     79             }
     80           }
     81           (*pidv)[*pidvcnt] = pid;
     82           (*pidvcnt)++;
     83         }
     84       }
     85     l_close:
     86       fclose(fp);
     87     }
     88   }
     89   closedir(dir);
     90 
     91   if (*pidvcnt == 0) {
     92     free(*pidv);
     93     *pidv = 0;
     94   }
     95   return 0;
     96 l_error:
     97   *pidv = NULL;
     98   *pidvcnt = 0;
     99   return -1;
    100 }
    101 
    102 int
    103 ink_killall_kill_pidv(pid_t * pidv, int pidvcnt, int sig)
    104 {
    105   int err = 0;
    106   if (!pidv || (pidvcnt <= 0))
    107     return -1;
    108   while (pidvcnt > 0) {
    109     pidvcnt--;
    110     if (kill(pidv[pidvcnt], sig) < 0)
    111       err = -1;
    112   }
    113   return err;
    114 }
    115 
    116 
    117 ////////////////////类函数的实现在下面//////////////////////////////////
    118 ////////////////////////////////////////////////////////////////////////
    119 int
    120 Lockfile::Open(pid_t * holding_pid)
    121 {
    122   char buf[LOCKFILE_BUF_LEN];
    123   pid_t val;
    124   int err;
    125   *holding_pid = 0;
    126 
    127 #define FAIL(x) \
    128 { \
    129   if (fd > 0) \
    130     close (fd); \
    131   return (x); \
    132 }
    133 
    134   struct flock lock;
    135   char *t;
    136   int size;//开始的时候设置成无效的一个值
    137 
    138   // Try and open the Lockfile. Create it if it does not already
    139   // exist.
    140   do {
    141     fd = open(fname, O_RDWR | O_CREAT, 0644);
    142   } while ((fd < 0) && (errno == EINTR));
    143 
    144   if (fd < 0)
    145     return (-errno);
    146 
    147   // Lock it. Note that if we can't get the lock EAGAIN will be the
    148   // error we receive.
    149   lock.l_type = F_WRLCK;
    150   lock.l_start = 0;
    151   lock.l_whence = SEEK_SET;
    152   lock.l_len = 0;
    153 
    154   do {
    155     err = fcntl(fd, F_SETLK, &lock);
    156   } while ((err < 0) && (errno == EINTR));
    157 
    158   if (err < 0) {
    159     // We couldn't get the lock. Try and read the process id of the
    160     // process holding the lock from the lockfile.
    161     t = buf;
    162 
    163     for (size = 15; size > 0;) {
    164       do {
    165         err = read(fd, t, size);
    166       } while ((err < 0) && (errno == EINTR));
    167 
    168       if (err < 0)
    169         FAIL(-errno);
    170       if (err == 0)
    171         break;
    172 
    173       size -= err;
    174       t += err;
    175     }
    176     *t = '\0';
    177 
    178     // coverity[secure_coding]
    179     if (sscanf(buf, "%d\n", (int*)&val) != 1) {
    180       *holding_pid = 0;
    181     } else {
    182       *holding_pid = val;
    183     }
    184     FAIL(0);
    185     
    186   }
    187   // If we did get the lock, then set the close on exec flag so that
    188   // we don't accidently pass the file descriptor to a child process
    189   // when we do a fork/exec.
    190   do {
    191     err = fcntl(fd, F_GETFD, 0);
    192   } while ((err < 0) && (errno == EINTR));
    193 
    194   if (err < 0)
    195     FAIL(-errno);
    196   
    197   val = err | FD_CLOEXEC;
    198 
    199   do {
    200     err = fcntl(fd, F_SETFD, val);
    201   } while ((err < 0) && (errno == EINTR));
    202 
    203   if (err < 0)
    204     FAIL(-errno);
    205 
    206   // Return the file descriptor of the opened lockfile. When this file
    207   // descriptor is closed the lock will be released.
    208   return (1);                   // success
    209 #undef FAIL
    210 }
    211 
    212 int
    213 Lockfile::Get(pid_t * holding_pid)
    214 {
    215   char buf[LOCKFILE_BUF_LEN];
    216   int err;
    217   *holding_pid = 0;
    218 
    219   fd = -1;
    220 
    221   // Open the Lockfile and get the lock. If we are successful, the
    222   // return value will be the file descriptor of the opened Lockfile.
    223   err = Open(holding_pid);
    224   if (err != 1)
    225     return err;
    226 
    227   if (fd < 0) {
    228     return -1;
    229   }
    230 
    231   // Truncate the Lockfile effectively erasing it.
    232   do {
    233     err = ftruncate(fd, 0);
    234   } while ((err < 0) && (errno == EINTR));
    235 
    236   if (err < 0) {
    237     close(fd);
    238     return (-errno);
    239   }
    240 
    241   // Write our process id to the Lockfile.
    242   snprintf(buf, sizeof(buf), "%d\n", (int) getpid());
    243 
    244   do {
    245     err = write(fd, buf, strlen(buf));
    246   } while ((err < 0) && (errno == EINTR));
    247 
    248   if (err != (int) strlen(buf)) {
    249     close(fd);
    250     return (-errno);
    251   }
    252   return (1);                   // success
    253 }
    254 
    255 void
    256 Lockfile::Close(void)
    257 {
    258   if (fd != -1) {
    259     close(fd);
    260   }
    261 }
    262 
    263 //-------------------------------------------------------------------------
    264 // Lockfile::Kill() and Lockfile::KillAll()
    265 //
    266 // Open the lockfile. If we succeed it means there was no process
    267 // holding the lock. We'll just close the file and release the lock
    268 // in that case. If we don't succeed in getting the lock, the
    269 // process id of the process holding the lock is returned. We
    270 // repeatedly send the KILL signal to that process until doing so
    271 // fails. That is, until kill says that the process id is no longer
    272 // valid (we killed the process), or that we don't have permission
    273 // to send a signal to that process id (the process holding the lock
    274 // is dead and a new process has replaced it).
    275 //
    276 // INKqa11325 (Kevlar: linux machine hosed up if specific threads
    277 // killed): Unfortunately, it's possible on Linux that the main PID of
    278 // the process has been successfully killed (and is waiting to be
    279 // reaped while in a defunct state), while some of the other threads
    280 // of the process just don't want to go away.  Integrate ink_killall
    281 // into Kill() and KillAll() just to make sure we really kill
    282 // everything and so that we don't spin hard while trying to kill a
    283 // defunct process.
    284 //-------------------------------------------------------------------------
    285 
    286 
    287 static void
    288 lockfile_kill_internal(pid_t init_pid, int init_sig, pid_t pid, const char *pname, int sig)
    289 {
    290   int err;
    291 
    292 #if defined(linux)
    293 
    294   pid_t *pidv;
    295   int pidvcnt;
    296 
    297   // Need to grab pname's pid vector before we issue any kill signals.
    298   // Specifically, this prevents the race-condition in which
    299   // traffic_manager spawns a new traffic_server while we still think
    300   // we're killall'ing the old traffic_server.
    301   if (pname) {
    302       //这函数的功能是什么,将程序名为pname的进程都不给杀死,pidv是pid的数组指针,pidvcnt是进程个数
    303     ink_killall_get_pidv_xmalloc(pname, &pidv, &pidvcnt);
    304   }
    305 
    306   if (init_sig > 0) {
    307     kill(init_pid, init_sig);
    308     // sleep for a bit and give time for the first signal to be
    309     // delivered
    310     sleep(1);
    311   }
    312 
    313   do {
    314     if ((err = kill(pid, sig)) == 0) {
    315       sleep(1);
    316     }
    317     if (pname && (pidvcnt > 0)) {
    318       ink_killall_kill_pidv(pidv, pidvcnt, sig);
    319       sleep(1);
    320     }
    321   } while ((err == 0) || ((err < 0) && (errno == EINTR)));
    322 
    323   free(pidv);
    324 
    325 #else
    326 
    327   if (init_sig > 0) {
    328     kill(init_pid, init_sig);
    329     // sleep for a bit and give time for the first signal to be
    330     // delivered
    331     sleep(1);
    332   }
    333 
    334   do {
    335     err = kill(pid, sig);
    336   } while ((err == 0) || ((err < 0) && (errno == EINTR)));
    337 
    338 #endif  // linux check
    339 
    340 }
    341 
    342 /////////////////////////////////////////////////////////////////
    343 /////////////////////////////////////////////////////////////////
    344 void
    345 Lockfile::Kill(int sig, int initial_sig, const char *pname)
    346 {
    347   int err;
    348   int pid;
    349   pid_t holding_pid;
    350 
    351   err = Open(&holding_pid);
    352   if (err == 1)                 // success getting the lock file,说明没有对应的server进程存在
    353   {
    354     Close();                    //因此不需要处理,关闭就行了
    355   } else if (err == 0)          // someone else has the lock
    356   {
    357     pid = holding_pid;          //获取持有锁进程的pid
    358     if (pid != 0) {             //当进程pid有效的时候,就去杀死这个进程
    359     
    360       lockfile_kill_internal(pid, initial_sig, pid, pname, sig);
    361     }
    362   }
    363 }
    364 
    365 
    366 /////////////////////////////////////////////////////////////////////
    367 /////////////////////////////////////////////////////////////////////
    368 //没怎么明白这个函数!!
    369 void
    370 Lockfile::KillGroup(int sig, int initial_sig, const char *pname)
    371 {
    372   int err;
    373   pid_t pid;
    374   pid_t holding_pid;
    375 
    376   err = Open(&holding_pid);
    377   if (err == 1)                 // success getting the lock file
    378   {
    379     Close();
    380   } else if (err == 0)          // someone else has the lock
    381   {
    382     do {
    383       pid = getpgid(holding_pid);//获得进程组识别码
    384     } while ((pid < 0) && (errno == EINTR));
    385 
    386     if ((pid < 0) || (pid == getpid()))
    387       pid = holding_pid;
    388     else
    389       pid = -pid;
    390 
    391     if (pid != 0) {
    392       // We kill the holding_pid instead of the process_group
    393       // initially since there is no point trying to get core files
    394       // from a group since the core file of one overwrites the core
    395       // file of another one
    396       lockfile_kill_internal(holding_pid, initial_sig, pid, pname, sig);
    397     }
    398   }
    399 }

    3.log.h

     1 #ifndef LOG_H
     2 #define LOG_H
     3 #include <stdio.h>
     4 
     5 void write_to_log(char* c){
     6 
     7     FILE* fd;
     8     fd = fopen("log.txt", "ab"); 
     9     if (fd)
    10       {
    11         fputs(c, fd); 
    12         fclose(fd);
    13       }
    14 }
    15 
    16 #endif

    4.traffic_cop.cpp

      1 #include "lock_and_kill.h"
      2 #include "log.h"
      3 #include <sys/types.h>
      4 #include <sys/ipc.h>
      5 #include <sys/sem.h>
      6 #include <signal.h>
      7 #include <sys/param.h>
      8 #include <unistd.h>
      9 #include <stdlib.h>
     10 #include <sys/wait.h>
     11 #include <time.h>
     12 #include <string.h>
     13 #include <stdio.h>
     14 #include <sys/stat.h> 
     15 
     16 
     17 #define    NOWARN_UNUSED(x)    (void)(x)
     18 
     19 static char cop_lockfile[PATH_NAME_MAX];
     20 static char manager_lockfile[PATH_NAME_MAX];
     21 static char server_lockfile[PATH_NAME_MAX];
     22 
     23 static char manager_binary[PATH_NAME_MAX] = "traffic_manager";
     24 static char server_binary[PATH_NAME_MAX] = "traffic_server";
     25 static int killsig=SIGKILL;
     26 static int coresig=0;
     27 static int server_not_found = 0;
     28 static int server_failures=0;
     29 static int manager_failures =0;
     30 
     31 static const int sleep_time = 10;       // 10 sec
     32 static const int manager_timeout = 3 * 60;      //  3 min
     33 static const int server_timeout = 3 * 60;       //  3 min
     34 static const int kill_timeout = 1 * 60; //  1 min
     35 
     36 
     37 static void sig_alarm_warn(int signum=0)
     38 {
     39      alarm(kill_timeout);
     40 }
     41 
     42 
     43 static void sig_fatal(int signum)
     44 {
     45     abort();
     46 }
     47 
     48 
     49 static void set_alarm_warn()
     50 {
     51     struct sigaction action;
     52     action.sa_handler = sig_alarm_warn;
     53      sigemptyset(&action.sa_mask);
     54      action.sa_flags = 0;
     55     sigaction(SIGALRM, &action, NULL);
     56 }
     57 
     58 static void set_alarm_death()
     59 {
     60     struct sigaction action;
     61     action.sa_handler = sig_fatal;
     62       sigemptyset(&action.sa_mask);
     63       action.sa_flags = 0;
     64     sigaction(SIGALRM, &action, NULL);
     65 }
     66 
     67 static void sig_child(int signum)
     68 {
     69   NOWARN_UNUSED(signum);
     70   pid_t pid = 0;
     71   int status = 0;
     72   for (;;) {
     73     pid = waitpid(WAIT_ANY, &status, WNOHANG);
     74 
     75     if (pid <= 0) {
     76       break;
     77     }
     78     // TSqa03086 - We can not log the child status signal from
     79     //   the signal handler since syslog can deadlock.  Record
     80     //   the pid and the status in a global for logging
     81     //   next time through the event loop.  We will occasionally
     82     //   lose some information if we get two sig childs in rapid
     83     //   succession
     84    // child_pid = pid;
     85     //child_status = status;
     86   }
     87 }
     88 
     89 
     90 static void init_signals()
     91 {
     92       struct sigaction action;
     93       write_to_log("Entering init_signals()\n");
     94       action.sa_handler = sig_child;
     95       sigemptyset(&action.sa_mask);
     96       action.sa_flags = 0;
     97       sigaction(SIGCHLD, &action, NULL);
     98       action.sa_handler = sig_fatal;
     99       sigemptyset(&action.sa_mask);
    100       action.sa_flags = 0;
    101       write_to_log("leaving init_signals()\n\n");
    102 }
    103 
    104 
    105 static void safe_kill(const char* lockfile_name,const char * pname,bool group)
    106 {
    107     Lockfile lockfile(lockfile_name);
    108     write_to_log("Entering safe_kill\n");
    109     set_alarm_warn();
    110       alarm(kill_timeout);
    111 
    112       if (group == true) {
    113         lockfile.KillGroup(killsig, coresig, pname);
    114       } else {
    115         lockfile.Kill(killsig, coresig, pname);
    116       }
    117       alarm(0);
    118       set_alarm_death();
    119      write_to_log("Leaving safe_kill\n\n");
    120 
    121 }
    122 
    123 
    124 //为了简单化,直接返回0
    125 static int server_up()
    126 {
    127     return 1;
    128 
    129 }
    130 
    131 
    132 static int heartbeat_manager()
    133 {
    134     //safe_kill(manager_lockfile, manager_binary, true);
    135     return 1;
    136 }
    137 
    138 static int heartbeat_server()
    139 {
    140     //safe_kill(server_lockfile, server_binary, false);
    141     //server_failures = 0;
    142     return 1;
    143 }
    144 
    145 
    146 
    147 static void spawn_manager()
    148 {
    149       int err;
    150       int key;
    151       err = fork();
    152   write_to_log("Entering spwan_manager()!\n\n");
    153   if (err == 0) {
    154     err = execv(manager_binary, NULL);
    155   write_to_log("somehow execv failed!\n");
    156     exit(1);
    157   } else if (err == -1) {
    158     write_to_log("unable to fork !\n");
    159     exit(1);
    160   } 
    161   
    162   manager_failures = 0;
    163   write_to_log("Leaving spwan_manager()!\n\n");
    164 }
    165 
    166 
    167 static void init_lockfiles()
    168 {
    169  // Layout::relative_to(cop_lockfile, sizeof(cop_lockfile), Layout::get()->runtimedir, COP_LOCK);
    170  // Layout::relative_to(manager_lockfile, sizeof(manager_lockfile), Layout::get()->runtimedir,      MANAGER_LOCK);
    171  // Layout::relative_to(server_lockfile, sizeof(server_lockfile), Layout::get()->runtimedir, SERVER_LOCK);
    172 
    173  write_to_log("Entering init_lockfiles()\n");
    174  strcpy(cop_lockfile,"cop_lockfile");
    175  strcpy(manager_lockfile,"manager_lockfile");
    176  strcpy(server_lockfile,"server_lockfile");
    177 
    178  strcpy(manager_binary,"manager_binary");
    179  strcpy(server_binary,"server_binary");
    180 
    181 
    182  write_to_log("leaving init_lockfiles()\n\n");
    183 
    184  //manager_lockfile="manager_lockfile";
    185  //server_lockfile="server_lockfile";
    186  //manager_binary="manager_binary";
    187  //server_binary="server_binary";
    188 
    189 }
    190 
    191 
    192 static void check_lockfile()
    193 {
    194 
    195   write_to_log("Entering check_lockfile()\n");
    196   int err;
    197   pid_t holding_pid;
    198   Lockfile cop_lf(cop_lockfile);
    199   err = cop_lf.Get(&holding_pid);
    200 
    201 
    202   if (err < 0) {
    203     write_to_log("leaving check_lockfile(),and err<0\n\n");
    204     exit(1);
    205   } else if (err == 0) {
    206     write_to_log("leaving check_lockfile(),and err==0\n\n");
    207     exit(1);
    208   }
    209     write_to_log("leaving check_lockfile()\n\n");
    210 
    211 }
    212 
    213 
    214 
    215 static void check_programs()
    216 {
    217     int err;
    218     pid_t holding_pid;
    219 
    220     write_to_log("Entering check_programs()\n");
    221     printf("Entering check_programs()\n");
    222   //尝试去获取 manager的lockfile,如果成功,说明没有manager进程在运行
    223     Lockfile manager_lf(manager_lockfile);
    224         err = manager_lf.Open(&holding_pid);
    225 
    226    //通过检测err的值来判断manager进程的运行情况
    227    if(err==0){
    228         write_to_log("in check_programs(),manager_lockfile,err==0\n");
    229 
    230         printf("in check_programs(),manager_lockfile,err==0\n");
    231         
    232         if(kill(holding_pid,0)==-1){
    233           
    234            printf("holding_pid is %d,and invalid\n",holding_pid);
    235 
    236                 ink_killall(manager_binary, killsig);
    237                 sleep(1);                 // give signals a chance to be received 
    238                  err = manager_lf.Open(&holding_pid);
    239             }
    240 
    241    }
    242 
    243 
    244     if(err>0){//说明可以获得manager lockfile
    245         // 'lockfile_open' returns the file descriptor of the opened
    246         // lockfile.  We need to close this before spawning the
    247         // manager so that the manager can grab the lock. 
    248             manager_lf.Close(); 
    249             // Make sure we don't have a stray traffic server running.
    250 
    251             write_to_log("traffic_manager not running, making sure traffic_server is dead\n");
    252             safe_kill(server_lockfile,server_binary,false);
    253             spawn_manager();
    254     }
    255     else
    256     {
    257 
    258             
    259             
    260 
    261             //err<0,Open中返回负值,说明可能是加锁成功,但是设置lockfile的文件信息失败
    262             // If there is a manager running we want to heartbeat it to
    263             // make sure it hasn't wedged. If the manager test succeeds we
    264             // check to see if the server is up. (That is, it hasn't been
    265             // brought down via the UI).  If the manager thinks the server
    266             // is up, we make sure there is actually a server process
    267             // running. If there is we test it.
    268 
    269                 alarm(2*manager_timeout);
    270                 err=heartbeat_manager();//?
    271                 alarm(0);
    272 
    273                 if(err<0){//???what case
    274                     return ;
    275 
    276                     }
    277 
    278                 
    279                 if(server_up()<=0){//???what case
    280                     return;//err>0 ,manager is running ,if server is down  we think manager can create a new server ,so return
    281                     }
    282 
    283                 Lockfile server_lf(server_lockfile);
    284                 err=server_lf.Open(&holding_pid);
    285 
    286                 if(err==0){
    287                     if(kill(holding_pid,0)==-1){
    288                         ink_killall(server_binary,killsig);
    289                         sleep(1);// give signals a chance to be received
    290                         err=server_lf.Open(&holding_pid);
    291                         }
    292                     }
    293 
    294                 if(err>0){
    295                     server_lf.Close();
    296                     server_not_found += 1;
    297 
    298                     if(server_not_found>1){
    299 
    300 
    301                         server_not_found=0;
    302                         safe_kill(manager_lockfile, manager_binary, true);
    303                         }
    304                 }else{
    305                           alarm(2 * server_timeout);
    306                                 heartbeat_server();//?
    307                               alarm(0);
    308 
    309                         }
    310                 
    311     }
    312    printf("Leaving check_programs\n\n");
    313    write_to_log("Leaving check_programs\n\n");
    314 }
    315 
    316 
    317 static void init()
    318 {    
    319     write_to_log("Entering init()\n");
    320     init_signals();
    321         init_lockfiles();
    322         check_lockfile();
    323     write_to_log("Leaving init()\n\n");
    324 }
    325 
    326 static void millisleep(int ms)
    327 {
    328   struct timespec ts;
    329   ts.tv_sec = ms / 1000;
    330   ts.tv_nsec = (ms - ts.tv_sec * 1000) * 1000 * 1000;
    331   nanosleep(&ts, NULL);
    332 }
    333 
    334 // Changed function from taking no argument and returning void
    335 // to taking a void* and returning a void*. The change was made
    336 // so that we can call ink_thread_create() on this function
    337 // in the case of running cop as a win32 service.
    338 
    339 static void* check(void* arg)
    340 {
    341     //bool mgmt_init=false;
    342     write_to_log("Entering check()\n\n");
    343     for(;;){
    344         
    345         // problems with the ownership of this file as root Make sure it is
    346         // owned by the admin user
    347         
    348         alarm(2 * (sleep_time + manager_timeout * 2 + server_timeout));
    349 
    350         check_programs();
    351         millisleep(sleep_time * 1000);
    352         }
    353     write_to_log("Leaveing check()\n\n");
    354     return arg;
    355 }
    356 
    357 void init_daemon(void) 
    358 { 
    359     int i; 
    360     pid_t pid;
    361     struct rlimit rl;
    362     struct sigaction sa;
    363     //printf("------------------------------\n");
    364     //umask(0);
    365     if(getrlimit(RLIMIT_NOFILE,&rl)<0){
    366         exit(1);
    367     }
    368 
    369 
    370     if((pid=fork())<0){
    371         exit(1);//fork失败,退出 
    372     }else if(pid> 0){ 
    373         exit(0);//是父进程,结束父进程 
    374         }
    375 
    376     //是第一子进程,后台继续执行 
    377     setsid();//第一子进程成为新的会话组长和进程组长 
    378     //并与控制终端分离 
    379     sa.sa_handler=SIG_IGN;
    380     sigemptyset(&sa.sa_mask);
    381     sa.sa_flags=0;
    382 
    383     if(sigaction(SIGHUP,&sa,NULL)<0){
    384         exit(1);
    385     }
    386 
    387     if((pid=fork())<0){
    388         exit(1);//fork失败,退出 
    389     }else if(pid> 0){ 
    390         exit(0);//是父进程,结束父进程 
    391         }
    392     //是第二子进程,继续 
    393     //第二子进程不再是会话组长 
    394     umask(0);
    395     if (rl.rlim_max==RLIM_INFINITY){
    396         rl.rlim_max=1024;
    397 
    398     }
    399 
    400     for(i=0;i< rl.rlim_max;++i)//关闭打开的文件描述符 
    401          {         
    402         close(i);
    403           } 
    404 
    405     //chdir("/tmp");//改变工作目录到/tmp 
    406     return; 
    407 } 
    408 
    409 
    410 int main()
    411 {
    412 
    413     init_daemon();//守护进程初始化函数
    414       write_to_log("Entering main()\n");
    415       signal(SIGHUP, SIG_IGN);
    416       signal(SIGTSTP, SIG_IGN);
    417       signal(SIGTTOU, SIG_IGN);
    418       signal(SIGTTIN, SIG_IGN);
    419       //setsid(); 
    420       init();
    421         check(NULL);
    422       write_to_log("leaving main()\n\n");
    423        return 0;
    424 }


    5.traffic_manager.cpp

      1 #include "lock_and_kill.h"
      2 #include "log.h"
      3 #include <sys/types.h>
      4 #include <sys/ipc.h>
      5 #include <sys/sem.h>
      6 #include <signal.h>
      7 #include <unistd.h>
      8 #include <stdlib.h>
      9 #include <sys/wait.h>
     10 #include <time.h>
     11 #include <string.h>
     12 #include <stdio.h>
     13 
     14 #define    NOWARN_UNUSED(x)    (void)(x)
     15 static char manager_lockfile[4096]="manager_lockfile";
     16 static char server_lockfile[4096]="server_lockfile";
     17 static int server_failures=0;
     18 static int killsig=SIGKILL;
     19 static int coresig=0;
     20 static char server_binary[4096] = "server_binary";
     21 static const int sleep_time = 10;       // 10 sec
     22 static const int manager_timeout = 3 * 60;      //  3 min
     23 static const int server_timeout = 3 * 60;       //  3 min
     24 static const int kill_timeout = 1 * 60; //  1 min
     25 
     26 static void sig_alarm_warn(int signum=0)
     27 {
     28      alarm(kill_timeout);
     29 }
     30 
     31 
     32 static void sig_fatal(int signum)
     33 {
     34     abort();
     35 }
     36 
     37 
     38 static void set_alarm_warn()
     39 {
     40     struct sigaction action;
     41     action.sa_handler = sig_alarm_warn;
     42      sigemptyset(&action.sa_mask);
     43      action.sa_flags = 0;
     44     sigaction(SIGALRM, &action, NULL);
     45 }
     46 
     47 static void set_alarm_death()
     48 {
     49     struct sigaction action;
     50     action.sa_handler = sig_fatal;
     51       sigemptyset(&action.sa_mask);
     52       action.sa_flags = 0;
     53     sigaction(SIGALRM, &action, NULL);
     54 }
     55 
     56 static void sig_child(int signum)
     57 {
     58   NOWARN_UNUSED(signum);
     59   pid_t pid = 0;
     60   int status = 0;
     61   for (;;) {
     62     pid = waitpid(WAIT_ANY, &status, WNOHANG);
     63 
     64     if (pid <= 0) {
     65       break;
     66     }
     67     // TSqa03086 - We can not log the child status signal from
     68     //   the signal handler since syslog can deadlock.  Record
     69     //   the pid and the status in a global for logging
     70     //   next time through the event loop.  We will occasionally
     71     //   lose some information if we get two sig childs in rapid
     72     //   succession
     73    // child_pid = pid;
     74     //child_status = status;
     75   }
     76 }
     77 
     78 static void safe_kill(const char* lockfile_name,const char * pname,bool group)
     79 {
     80     Lockfile lockfile(lockfile_name);
     81     write_to_log("Entering safe_kill\n");
     82     set_alarm_warn();
     83       alarm(kill_timeout);
     84 
     85       if (group == true) {
     86         lockfile.KillGroup(killsig, coresig, pname);
     87       } else {
     88         lockfile.Kill(killsig, coresig, pname);
     89       }
     90       alarm(0);
     91       set_alarm_death();
     92       write_to_log("Leaving safe_kill\n\n");
     93 
     94 }
     95 
     96 static void spawn_server()
     97 {
     98       int err;
     99       int key;
    100   write_to_log("--------------Entering spwan_server()!\n\n");
    101       err = fork();
    102   if (err == 0) {
    103     err = execv(server_binary, NULL);
    104     
    105     write_to_log("--------------somehow execv failed!\n");
    106        exit(1);
    107   } else if (err == -1) {
    108         write_to_log("--------------unable to fork server !\n");
    109        exit(1);
    110   } 
    111   
    112   server_failures = 0;
    113   write_to_log("--------------Leaving spwan_server()!\n\n");
    114 }
    115 
    116 
    117 void check_server()
    118 {
    119     int err;
    120     pid_t holding_pid;
    121     Lockfile server_lf(server_lockfile);
    122     err=server_lf.Get(&holding_pid);
    123 
    124     if(err==0){
    125         if(kill(holding_pid,0)==-1){
    126             ink_killall(server_binary,killsig);
    127             sleep(1);
    128             err=server_lf.Open(&holding_pid);
    129             }
    130 
    131         }
    132 
    133     if(err>0){
    134         server_lf.Close();
    135         safe_kill(server_lockfile,server_binary,false);
    136         spawn_server();
    137 
    138         }
    139 
    140 }
    141 
    142 
    143 
    144 
    145 int main()
    146 {
    147     pid_t holding_pid=0;
    148     Lockfile manager_lf(manager_lockfile);
    149     manager_lf.Get(&holding_pid);
    150 
    151     while(1){
    152 
    153         char buf[100];
    154         sprintf(buf,"----------------traffic_manager is running, pid:'%d'!\n",getpid());
    155         write_to_log(buf);
    156         
    157         printf("----------------traffic_manager is running,pidID: %d\n",getpid());
    158 
    159         sleep(5);
    160         int c=rand()%10;
    161         
    162         if(c==1){//模拟manager进程出现状况
    163             write_to_log("----------------traffic_manager has a expcetion and eixt!\n");
    164             exit(1);
    165         }else{//对server进程进行检查
    166             check_server();
    167         }
    168         }
    169 }


    6.traffic_server.cpp

     1 #include "log.h"
     2 #include "lock_and_kill.h"
     3 #include <sys/types.h>
     4 #include <unistd.h>
     5 #include <stdlib.h>
     6 
     7 
     8 static char server_lockfile[4096]="server_lockfile";
     9 
    10 int main()
    11 {
    12 
    13         pid_t holding_pid=0;
    14         Lockfile server_lf(server_lockfile);
    15         server_lf.Get(&holding_pid);
    16 
    17         while(1){
    18 
    19             char buf[100];
    20         sprintf(buf,"==============traffic_server is running, pid:'%d'!\n",getpid());
    21         write_to_log(buf);
    22             sleep(5);
    23             int c=rand()%100;
    24             
    25             if(c<30){//模拟server进程出现状况
    26                 write_to_log("=================traffic_server has a expcetion and exit!\n");
    27                 exit(1);
    28             }
    29         }
    30         return 0;
    31 
    32 }

    以上文档为以前研究时所写,希望能给感兴趣的同学一点帮助,同时也请大家指点。我这里时简要的分析了traffic进程控制的问题,测试中许多是简化的,比如心跳测试之类的,代码中有说明。

  • 相关阅读:
    Android OpenGL 编写简单滤镜
    linux 文件系统
    此博客不再更新
    golang sync包
    KADEMLIA算法
    golang 类型转换
    golang 小例子
    go-ehtereum编译:
    golang编译库文件方式
    以太坊(Ethereum)智能合约NodeJS/Web3 使用
  • 原文地址:https://www.cnblogs.com/liushaodong/p/2933280.html
Copyright © 2011-2022 走看看