zoukankan      html  css  js  c++  java
  • 第08课:【实战】Redis网络通信模块源码分析(1)

      我们这里先研究redis-server端的网络通信模块。除去Redis本身的业务功能以外,Redis的网络通信模块实现思路和细节非常有代表性。由于网络通信模块的设计也是Linux C++后台开发一个很重要的模块,虽然网络上有很多现成的网络库,但是简单易学且可以作为典范的并不多,而redis-server就是这方面值得借鉴学习的材料之一。

    8.1侦听socket初始化工作

      通过前面课程的介绍,我们知道网络通信在应用层上的大致流程如下:

      *服务器端侦听socket;

      *将侦听socket绑定到需要的IP地址和端口上(调用Soket API bind函数);

      *启动侦听(调用socket API listen函数);

      *无限等待客户端连接到来,调用Socket API accept函数接受客户端连接,并称生一个与该客户端对应的客户端socket;

      *处理客户端socket上网络数据的收发,必要时关闭该socket。

    全局搜索了一下Redis的代码,寻找调用了bind()函数的代码,经过过滤和筛选,我们确定了位于anet.c的anetListen()函数。

      

    static int (char *err, int s, struct sockaddr *sa, socklen_t len, int backlog) {
        if (bind(s,sa,len) == -1) {
            anetSetError(err, "bind: %s", strerror(errno));
            close(s);
            return ANET_ERR;
        }
    
        if (listen(s, backlog) == -1) {
            anetSetError(err, "listen: %s", strerror(errno));
            close(s);
            return ANET_ERR;
        }
        return ANET_OK;
    }
    

      用GDB的b命令在这个函数上加个断点,然后重新运行redis-server:

    (gdb) b anetListen
    Breakpoint 1 at 0x555555588620: file anet.c, line 440.
    (gdb) info breakpoints 
    Num     Type           Disp Enb Address            What
    1       breakpoint     keep y   0x0000555555588620 in anetListen at anet.c:440
    (gdb) r
    The program being debugged has been started already.
    Start it from the beginning? (y or n) y
    Starting program: /home/wzq/Desktop/redis-5.0.3/src/redis-server 
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    11580:C 14 Jan 2019 11:27:06.118 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
    11580:C 14 Jan 2019 11:27:06.118 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=11580, just started
    11580:C 14 Jan 2019 11:27:06.118 # Warning: no config file specified, using the default config. In order to specify a config file use /home/wzq/Desktop/redis-5.0.3/src/redis-server /path/to/redis.conf
    11580:M 14 Jan 2019 11:27:06.119 * Increased maximum number of open files to 10032 (it was originally set to 1024).
    
    Breakpoint 1, anetListen (err=0x5555559161e0 <server+576> "", s=6, sa=0x555555b2b240, len=28, backlog=511) at anet.c:440
    warning: Source file is more recent than executable.
    440	static int (char *err, int s, struct sockaddr *sa, socklen_t len, int backlog) {
    (gdb) 
    

      在GDB中断在这个函数时,使用bt命令查看一下此时的调用堆栈:

    (gdb) bt
    #0  anetListen (err=0x5555559161e0 <server+576> "", s=6, sa=0x555555b2b240, len=28, backlog=511) at anet.c:440
    #1  0x00005555555887a4 in _anetTcpServer (err=0x5555559161e0 <server+576> "", port=<optimized out>, bindaddr=<optimized out>, af=10, backlog=511)
        at anet.c:487
    #2  0x000055555558cf07 in listenToPort (port=6379, fds=0x55555591610c <server+364>, count=0x55555591614c <server+428>) at server.c:1924
    #3  0x0000555555591ed0 in initServer () at server.c:2055
    #4  0x0000555555585103 in main (argc=<optimized out>, argv=0x7fffffffccc8) at server.c:4160
    

      通过这个堆栈,结合堆栈#1的6379端口号可以确认这就是我们要找的逻辑,并且这个逻辑在主线程(因为从堆栈上看,最顶层堆栈是main()函数)中进行。

    我们看下堆栈#1处的代码:

    static int _anetTcpServer(char *err, int port, char *bindaddr, int af, int backlog)
    {
        int s = -1, rv;
        char _port[6];  /* strlen("65535") */
        struct addrinfo hints, *servinfo, *p;
    
        snprintf(_port,6,"%d",port);
        memset(&hints,0,sizeof(hints));
        hints.ai_family = af;
        hints.ai_socktype = SOCK_STREAM;
        hints.ai_flags = AI_PASSIVE;    /* No effect if bindaddr != NULL */
    
        if ((rv = getaddrinfo(bindaddr,_port,&hints,&servinfo)) != 0) {
            anetSetError(err, "%s", gai_strerror(rv));
            return ANET_ERR;
        }
        for (p = servinfo; p != NULL; p = p->ai_next) {
            if ((s = socket(p->ai_family,p->ai_socktype,p->ai_protocol)) == -1)
                continue;
    
            if (af == AF_INET6 && anetV6Only(err,s) == ANET_ERR) goto error;
            if (anetSetReuseAddr(err,s) == ANET_ERR) goto error;
            if (anetListen(err,s,p->ai_addr,p->ai_addrlen,backlog) == ANET_ERR) s = ANET_ERR;
            goto end;
        }
        if (p == NULL) {
            anetSetError(err, "unable to bind socket, errno: %d", errno);
            goto error;
        }
    
    error:
        if (s != -1) close(s);
        s = ANET_ERR;
    end:
        freeaddrinfo(servinfo);
        return s;
    }
    

      将堆栈切换至#1,然后输入info arg查看传入给这个额函数的参数。

    (gdb) f 1
    #1  0x00005555555887a4 in _anetTcpServer (err=0x5555559161e0 <server+576> "", port=<optimized out>, bindaddr=<optimized out>, af=10, backlog=511)
        at anet.c:487
    487	        if (anetListen(err,s,p->ai_addr,p->ai_addrlen,backlog) == ANET_ERR) s = ANET_ERR;
    (gdb) info args 
    err = 0x5555559161e0 <server+576> ""
    port = <optimized out>
    bindaddr = <optimized out>
    af = 10
    backlog = 511
    (gdb) 
    

      使用系统API getaddrinfo 来解析得到当前主机的IP地址和端口信息。这里没有选择使用gethostbyname这个API是因为gethostbyname仅用于解析ipv4相关的主机信息,而getaddrinfo既可以用于ipv4也可以用于ipv6,这个函数的签名如下:

    int getaddrinfo(const char *node,const char *service,const struct addrinfo *hints,struct addrinfo **res);
    

      这个函数的具体用法可以在Linux man手册上查看。通常服务器端在调用getaddrinfo之前,将hints参数的ai_flags设置为AL_PASSIVE,用于bind,主机名nodename通常会设置为NULL,返回通配地址[::]。当然,客户端调用getaddrinfo时,hints参数的ai_flags一般不设置AL_PASSIVE,但是主机名node和服务名service(更愿意称之为端口)则应该不为空。

      解析完协议信息后,利用得到的协议信息创建侦听socket,并开启该socket的reuseAddr选项。然后调用anetListen函数,在该函数中先bind后listen。至此,redis-server就可以在6379端口上接受客户端连接了。

    8.2接受客户端连接

      同样的道理,要研究redis-server如何接受客户端连接,只要搜索socket API accept函数即可。

      经定位,我们最终在anet.c文件中找到anetGenericAccept函数:

    static int anetGenericAccept(char *err, int s, struct sockaddr *sa, socklen_t *len) {
        int fd;
        while(1) {
            fd = accept(s,sa,len);
            if (fd == -1) {
                if (errno == EINTR)
                    continue;
                else {
                    anetSetError(err, "accept: %s", strerror(errno));
                    return ANET_ERR;
                }
            }
            break;
        }
        return fd;
    }
    

      我们用b命令在这个函数加个断点,然后重新运行redis-server。一直到程序全部运行起来,GDB都没有触发该断点,这时新打开一个redis-cli,以模拟新客户端连接到redis-server上的行为。断点触发了,此时查看一下调用堆栈。

    Thread 1 "redis-server" hit Breakpoint 2, anetGenericAccept (err=err@entry=0x5555559161e0 <server+576> "", s=s@entry=7, sa=sa@entry=0x7fffffffc9f0, 
        len=len@entry=0x7fffffffc9ec) at anet.c:531
    531	static int anetGenericAccept(char *err, int s, struct sockaddr *sa, socklen_t *len) {
    (gdb) bt
    #0  anetGenericAccept (err=err@entry=0x5555559161e0 <server+576> "", s=s@entry=7, sa=sa@entry=0x7fffffffc9f0, len=len@entry=0x7fffffffc9ec)
        at anet.c:531
    #1  0x00005555555893e2 in anetTcpAccept (err=err@entry=0x5555559161e0 <server+576> "", s=s@entry=7, ip=ip@entry=0x7fffffffcab0 "", 
        ip_len=ip_len@entry=46, port=port@entry=0x7fffffffcaac) at anet.c:552
    #2  0x000055555559aad2 in acceptTcpHandler (el=<optimized out>, fd=7, privdata=<optimized out>, mask=<optimized out>) at networking.c:728
    #3  0x000055555558806c in aeProcessEvents (eventLoop=eventLoop@entry=0x7ffff6c320a0, flags=flags@entry=11) at ae.c:443
    #4  0x000055555558841b in aeMain (eventLoop=0x7ffff6c320a0) at ae.c:501
    #5  0x00005555555851d4 in main (argc=<optimized out>, argv=0x7fffffffccc8) at server.c:4197
    

      分析这个调用堆栈,梳理一下这个调用流程。在main函数的initServer函数中创建侦听socket、绑定地址然后开启侦听,接着调用aeMain函数启动一个循环不断地处理“事件”。

    void aeMain(aeEventLoop *eventLoop) {
        eventLoop->stop = 0;
        while (!eventLoop->stop) {
            if (eventLoop->beforesleep != NULL)
                eventLoop->beforesleep(eventLoop);
            aeProcessEvents(eventLoop, AE_ALL_EVENTS|AE_CALL_AFTER_SLEEP);
        }
    }
    

      循环的退出条件是eventLoop->stop 为 1.事件处理的代码如下:

    int aeProcessEvents(aeEventLoop *eventLoop, int flags)
    {
        int processed = 0, numevents;
    
        /* Nothing to do? return ASAP */
        if (!(flags & AE_TIME_EVENTS) && !(flags & AE_FILE_EVENTS)) return 0;
    
        /* Note that we want call select() even if there are no
         * file events to process as long as we want to process time
         * events, in order to sleep until the next time event is ready
         * to fire. */
        if (eventLoop->maxfd != -1 ||
            ((flags & AE_TIME_EVENTS) && !(flags & AE_DONT_WAIT))) {
            int j;
            aeTimeEvent *shortest = NULL;
            struct timeval tv, *tvp;
    
            if (flags & AE_TIME_EVENTS && !(flags & AE_DONT_WAIT))
                shortest = aeSearchNearestTimer(eventLoop);
            if (shortest) {
                long now_sec, now_ms;
    
                aeGetTime(&now_sec, &now_ms);
                tvp = &tv;
    
                /* How many milliseconds we need to wait for the next
                 * time event to fire? */
                long long ms =
                    (shortest->when_sec - now_sec)*1000 +
                    shortest->when_ms - now_ms;
    
                if (ms > 0) {
                    tvp->tv_sec = ms/1000;
                    tvp->tv_usec = (ms % 1000)*1000;
                } else {
                    tvp->tv_sec = 0;
                    tvp->tv_usec = 0;
                }
            } else {
                /* If we have to check for events but need to return
                 * ASAP because of AE_DONT_WAIT we need to set the timeout
                 * to zero */
                if (flags & AE_DONT_WAIT) {
                    tv.tv_sec = tv.tv_usec = 0;
                    tvp = &tv;
                } else {
                    /* Otherwise we can block */
                    tvp = NULL; /* wait forever */
                }
            }
    
            /* Call the multiplexing API, will return only on timeout or when
             * some event fires. */
            numevents = aeApiPoll(eventLoop, tvp);
    
            /* After sleep callback. */
            if (eventLoop->aftersleep != NULL && flags & AE_CALL_AFTER_SLEEP)
                eventLoop->aftersleep(eventLoop);
    
            for (j = 0; j < numevents; j++) {
                aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];
                int mask = eventLoop->fired[j].mask;
                int fd = eventLoop->fired[j].fd;
                int fired = 0; /* Number of events fired for current fd. */
    
                /* Normally we execute the readable event first, and the writable
                 * event laster. This is useful as sometimes we may be able
                 * to serve the reply of a query immediately after processing the
                 * query.
                 *
                 * However if AE_BARRIER is set in the mask, our application is
                 * asking us to do the reverse: never fire the writable event
                 * after the readable. In such a case, we invert the calls.
                 * This is useful when, for instance, we want to do things
                 * in the beforeSleep() hook, like fsynching a file to disk,
                 * before replying to a client. */
                int invert = fe->mask & AE_BARRIER;
    
                /* Note the "fe->mask & mask & ..." code: maybe an already
                 * processed event removed an element that fired and we still
                 * didn't processed, so we check if the event is still valid.
                 *
                 * Fire the readable event if the call sequence is not
                 * inverted. */
                if (!invert && fe->mask & mask & AE_READABLE) {
                    fe->rfileProc(eventLoop,fd,fe->clientData,mask);
                    fired++;
                }
    
                /* Fire the writable event. */
                if (fe->mask & mask & AE_WRITABLE) {
                    if (!fired || fe->wfileProc != fe->rfileProc) {
                        fe->wfileProc(eventLoop,fd,fe->clientData,mask);
                        fired++;
                    }
                }
    
                /* If we have to invert the call, fire the readable event now
                 * after the writable one. */
                if (invert && fe->mask & mask & AE_READABLE) {
                    if (!fired || fe->wfileProc != fe->rfileProc) {
                        fe->rfileProc(eventLoop,fd,fe->clientData,mask);
                        fired++;
                    }
                }
    
                processed++;
            }
        }
        /* Check time events */
        if (flags & AE_TIME_EVENTS)
            processed += processTimeEvents(eventLoop);
    
        return processed; /* return the number of processed file/time events */
    }
    

      这段代码先通过flag参数检查是否有事件需要处理。如果有定时器事件(AE_TIME_EVENTS标志),则寻找最近要到期的定时器。

    /* Search the first timer to fire.
     * This operation is useful to know how many time the select can be
     * put in sleep without to delay any event.
     * If there are no timers NULL is returned.
     *
     * Note that's O(N) since time events are unsorted.
     * Possible optimizations (not needed by Redis so far, but...):
     * 1) Insert the event in order, so that the nearest is just the head.
     *    Much better but still insertion or deletion of timers is O(N).
     * 2) Use a skiplist to have this operation as O(1) and insertion as O(log(N)).
     */
    static aeTimeEvent *aeSearchNearestTimer(aeEventLoop *eventLoop)
    {
        aeTimeEvent *te = eventLoop->timeEventHead;
        aeTimeEvent *nearest = NULL;
    
        while(te) {
            if (!nearest || te->when_sec < nearest->when_sec ||
                    (te->when_sec == nearest->when_sec &&
                     te->when_ms < nearest->when_ms))
                nearest = te;
            te = te->next;
        }
        return nearest;
    }
    

      这段代码有详细的注释,也非常好理解。注释告诉我们,由于这里的定时器集合是无序的,所以需要遍历一下这个链表,算法复杂度是O(n)。同时,注释中也“暗示”了我们将来Redis在这块的优化方向,即把这个链表按到期时间从小到大排序,这样链表的头部就是我们要的最近时间点的定时器对象,算法复杂度是O(l)。或者使用Redis中的skiplist,算法复杂度是O(log(N))。

      接着获取当前系统时间(aeGetTime(&now_sec,&now_ms);)将最早到期的定时器事件减去当前系统时间获得一个间隔。这个时间间隔作为numevents = aeApiPoll(eventLoop,tvp);调用的参数,aeApiPoll()在Linux平台上使用epoll技术,Redis在这个IO复用技术上、在不同的操作系统平台上使用不同的系统函数,在Windows系统上使用select,在Mac系统上使用kqueue。这里重点看下Linux平台下的实现:

      

    static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
        aeApiState *state = eventLoop->apidata;
        int retval, numevents = 0;
    
        retval = epoll_wait(state->epfd,state->events,eventLoop->setsize,
                tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);
        if (retval > 0) {
            int j;
    
            numevents = retval;
            for (j = 0; j < numevents; j++) {
                int mask = 0;
                struct epoll_event *e = state->events+j;
    
                if (e->events & EPOLLIN) mask |= AE_READABLE;
                if (e->events & EPOLLOUT) mask |= AE_WRITABLE;
                if (e->events & EPOLLERR) mask |= AE_WRITABLE;
                if (e->events & EPOLLHUP) mask |= AE_WRITABLE;
                eventLoop->fired[j].fd = e->data.fd;
                eventLoop->fired[j].mask = mask;
            }
        }
        return numevents;
    }
    

      epoll_wait这个函数的签名如下:

    int epoll_wait(int epfd,struct epoll_event *events,int maxevents,int timeout);
    

      最后一个参数timeout的设置非常有讲究,如果传入进来的tvp是NULL,根据上文的分析,说明定时器事件,则将等待时间设置为-1,这会让epoll_wait无限期地挂起来,直到有事件时才会被唤醒。挂起的好处就是不浪费CPU时间片。反之,将timeout设置成最近的定时器事件间隔,将epoll_wait的等待时间设置为最近的定时器事件来临的时间间隔,可以及时唤醒epoll_wait,这样程序流可以尽快处理这个到期的定时器事件(下文会介绍)。

      对于epoll_wait这种系统调用,所有的fd(对于网络通信,也叫socket)信息包括侦听fd和普通客户端fd都记录在事件循环对象aeEventLoop的apidata字段中,当某个fd上有事件触发时,从apidata中找到该fd,并把事件类型(mask字段)一起记录到aeEventLoop的fired字段中去。我们先吧这个流程介绍完,再介绍epoll_wait函数中使用的epfd是在何时何地创建的,侦听fd、客户端fd是如何挂载到epfd上去的。

      在得到了有事件的fd以后,接下来就要处理这些事件了。在主循环aeProcessEvents中从aeEventLoop对象的fired数组中取出上一步记录的fd,然后根据事件类型(读事件和写事件)分别进行处理。

    for (j = 0; j < numevents; j++) {
                aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];
                int mask = eventLoop->fired[j].mask;
                int fd = eventLoop->fired[j].fd;
                int fired = 0; /* Number of events fired for current fd. */
    
                /* Normally we execute the readable event first, and the writable
                 * event laster. This is useful as sometimes we may be able
                 * to serve the reply of a query immediately after processing the
                 * query.
                 *
                 * However if AE_BARRIER is set in the mask, our application is
                 * asking us to do the reverse: never fire the writable event
                 * after the readable. In such a case, we invert the calls.
                 * This is useful when, for instance, we want to do things
                 * in the beforeSleep() hook, like fsynching a file to disk,
                 * before replying to a client. */
                int invert = fe->mask & AE_BARRIER;
    
                /* Note the "fe->mask & mask & ..." code: maybe an already
                 * processed event removed an element that fired and we still
                 * didn't processed, so we check if the event is still valid.
                 *
                 * Fire the readable event if the call sequence is not
                 * inverted. */
                if (!invert && fe->mask & mask & AE_READABLE) {
                    fe->rfileProc(eventLoop,fd,fe->clientData,mask);
                    fired++;
                }
    
                /* Fire the writable event. */
                if (fe->mask & mask & AE_WRITABLE) {
                    if (!fired || fe->wfileProc != fe->rfileProc) {
                        fe->wfileProc(eventLoop,fd,fe->clientData,mask);
                        fired++;
                    }
                }
    
                /* If we have to invert the call, fire the readable event now
                 * after the writable one. */
                if (invert && fe->mask & mask & AE_READABLE) {
                    if (!fired || fe->wfileProc != fe->rfileProc) {
                        fe->rfileProc(eventLoop,fd,fe->clientData,mask);
                        fired++;
                    }
                }
    
                processed++;
            }
    

      该事件字段rfileProc和写事件字段wfileProc都是函数指针,在程序早期设置好,这里直接调用就可以了。

    typedef void aeFileProc(struct aeEventLoop *eventLoop, int fd, void *clientData, int mask);
    
    
    /* File event structure */
    typedef struct aeFileEvent {
        int mask; /* one of AE_(READABLE|WRITABLE|BARRIER) */
        aeFileProc *rfileProc;
        aeFileProc *wfileProc;
        void *clientData;
    } aeFileEvent;
    

      我们通过搜索关键字epoll_create在ae_poll.c文件中找到EPFD的创建函数aeApiCreate。

    static int aeApiCreate(aeEventLoop *eventLoop) {
        aeApiState *state = zmalloc(sizeof(aeApiState));
    
        if (!state) return -1;
        state->events = zmalloc(sizeof(struct epoll_event)*eventLoop->setsize);
        if (!state->events) {
            zfree(state);
            return -1;
        }
        state->epfd = epoll_create(1024); /* 1024 is just a hint for the kernel */
        if (state->epfd == -1) {
            zfree(state->events);
            zfree(state);
            return -1;
        }
        eventLoop->apidata = state;
        return 0;
    }
    

      使用GDB的b命令在这个函数上加个断点,然后使用run命令重新运行一下redis-server,触发断点,使用bt命令查看此时的调用堆栈。发现EPFD也是在上文介绍的initServer函数中创建的。

    (gdb) bt
    #0  aeCreateEventLoop (setsize=10128) at ae.c:79
    #1  0x0000555555591aa0 in initServer () at server.c:2044
    #2  0x0000555555585103 in main (argc=<optimized out>, argv=0x7fffffffccc8) at server.c:4160
    

      在aeCreateEventLoop中不仅创建了EPFD,也创建了整个事件循环需要的aeEventLoop对象,并把这个对象记录在Redis的一个全局变量的el字段中。这个全局变量叫server,这是个结构体类型。其定义如下:

    struct redisServer server; /* Server global state */
    
    struct redisServer {
        /* General */
     
        int lua_repl;         /* Script replication flags for redis.set_repl(). */
        int lua_timedout;     /* True if we reached the time limit for script
                                 execution. */
        int lua_kill;         /* Kill the script if true. */
        int lua_always_replicate_commands; /* Default replication type. */
        /* Lazy free */
        int lazyfree_lazy_eviction;
        int lazyfree_lazy_expire;
        int lazyfree_lazy_server_del;
        /* Latency monitor */
        long long latency_monitor_threshold;
        dict *latency_events;
         /*
            省略部分  
        */
    
        /* Mutexes used to protect atomic variables when atomic builtins are
         * not available. */
        pthread_mutex_t lruclock_mutex;
        pthread_mutex_t next_client_id_mutex;
        pthread_mutex_t unixtime_mutex;
    };
    

      

      

      

  • 相关阅读:
    PHP 多个文件上传
    Java实现蓝桥杯有歧义的号码
    Java实现蓝桥杯有歧义的号码
    Java实现蓝桥杯有歧义的号码
    Java实现蓝桥杯互补二元组
    Java实现蓝桥杯互补二元组
    Java实现蓝桥杯互补二元组
    Java实现蓝桥杯快乐数
    Java实现蓝桥杯快乐数
    Java实现蓝桥杯快乐数
  • 原文地址:https://www.cnblogs.com/wzqstudy/p/10268169.html
Copyright © 2011-2022 走看看