zoukankan      html  css  js  c++  java
  • epoll的内核实现

    一、内核实现基础
    和之前的select相比,epoll是一个目标性更强的实现。在epoll等待的时候,它会把每个poll的唤醒函数注册为自己特有的函数,在该回调函数中,它将自己(被唤醒的fd)添加到readylist中,然后在poll到底是什么事件的时候只检测在readylist中的描述符即可,而不是像select一样遍历所有的描述符集合进行遍历。大致原理即是如此
    二、代码中实现简单说明
    static int ep_poll-->>ep_events_transfer
    /*
     * Perform the transfer of events to user space.
     */
    static int ep_events_transfer(struct eventpoll *ep,
                      struct epoll_event __user *events, int maxevents)
    {
        int eventcnt = 0;
        struct list_head txlist;

        INIT_LIST_HEAD(&txlist);

        /*
         * We need to lock this because we could be hit by
         * eventpoll_release_file() and epoll_ctl(EPOLL_CTL_DEL).
         */
        down_read(&ep->sem);

        /* Collect/extract ready items */
        if (ep_collect_ready_items(ep, &txlist, maxevents) > 0) {
            /* Build result set in userspace */
            eventcnt = ep_send_events(ep, &txlist, events);

            /* Reinject ready items into the ready list */
            ep_reinject_items(ep, &txlist);
        }

        up_read(&ep->sem);

        return eventcnt;
    }


    /*
     * Walk through the transfer list we collected with ep_collect_ready_items()
     * and, if 1) the item is still "alive" 2) its event set is not empty 3) it's
     * not already linked, links it to the ready list. Same as above, we are holding
     * "sem" so items cannot vanish underneath our nose.
     */
    static void ep_reinject_items(struct eventpoll *ep, struct list_head *txlist)
    {
        int ricnt = 0, pwake = 0;
        unsigned long flags;
        struct epitem *epi;

        write_lock_irqsave(&ep->lock, flags);

        while (!list_empty(txlist)) {
            epi = list_entry(txlist->next, struct epitem, txlink);

            /* Unlink the current item from the transfer list */
            ep_list_del(&epi->txlink);

            /*
             * If the item is no more linked to the interest set, we don't
             * have to push it inside the ready list because the following
             * ep_release_epitem() is going to drop it. Also, if the current
             * item is set to have an Edge Triggered behaviour, we don't have
             * to push it back either.
             */
            if (ep_rb_linked(&epi->rbn) && !(epi->event.events & EPOLLET) &&
                (epi->revents & epi->event.events) && !ep_is_linked(&epi->rdllink)
    ) {
                list_add_tail(&epi->rdllink, &ep->rdllist);
                ricnt++;
            }
        }

        if (ricnt) {
            /*
             * Wake up ( if active ) both the eventpoll wait list and the ->poll()
             * wait list.
             */
            if (waitqueue_active(&ep->wq))
                __wake_up_locked(&ep->wq, TASK_UNINTERRUPTIBLE |
                         TASK_INTERRUPTIBLE);
            if (waitqueue_active(&ep->poll_wait))
                pwake++;
        }

        write_unlock_irqrestore(&ep->lock, flags);

        /* We have to call this outside the lock */
        if (pwake)
            ep_poll_safewake(&psw, &ep->poll_wait);
    }
    这里最为莫名其妙的就是这个ep_reinject_items,把事件返回给用户态之后,此时它会把所有的已经发送的事件再次放入readylist,此时是不是不太符合常规呢?而这一点也是之前比较让我费解的一个地方。
    三、为什么再次注入
    这个要和最开始说的epoll实现来看。对于select来说,它每次执行的时候都会进行一次遍历,这样假设说select返回之后,用户态对这个事件充耳不闻,或者其它异常情况没有处理这个事件,那么没关系,再次进入select的时候内核不辞劳苦的又poll了一遍,如果没有处理,状态依然存在。
    而对于epoll来说,它的此次状态变化事件执行机会在于事件发生的时候,如果epoll只是简单的将事件发送给用户态之后就从ready队列中删除该项,如果用户态没有处理该event事件,那么再次执行epoll_wait的时候将会丢失这次事件,直到有下一个事件发生并再次执行唤醒检查时才会唤醒用户态进程。这也是和select优化的代价。
    四、再次进入内核wait
    static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
               int maxevents, long timeout)
    retry:
        write_lock_irqsave(&ep->lock, flags);

        res = 0;
        if (list_empty(&ep->rdllist)) { 再次进入,次条件不满足,所以执行后面的ep_event_transfer,其中再次poll所有ready事件,如果事件已经消除,则会被从readylist中删除,并再次回到retry位置进行等待,此时满足empty条件
    ……
        }
        /* Is it worth to try to dig for events ? */
        eavail = !list_empty(&ep->rdllist);

        write_unlock_irqrestore(&ep->lock, flags);

        /*
         * Try to transfer events to user space. In case we get 0 events and
         * there's still timeout left over, we go trying again in search of
         * more luck.
         */
        if (!res && eavail &&
            !(res = ep_events_transfer(ep, events, maxevents)) && jtimeout)
            goto retry;
    五、文件等待队列
    和select的等待队列每次进入时创建并添加不同,epoll的等待事件创建之后一直不会被删除,直到epoll_create返回的主文件被删除


    /*
     * This is called from eventpoll_release() to unlink files from the eventpoll
     * interface. We need to have this facility to cleanup correctly files that are
     * closed without being removed from the eventpoll interface.
     */
    void eventpoll_release_file(struct file *file)
    {
        struct list_head *lsthead = &file->f_ep_links;
        struct eventpoll *ep;
        struct epitem *epi;

        /*
         * We don't want to get "file->f_ep_lock" because it is not
         * necessary. It is not necessary because we're in the "struct file"
         * cleanup path, and this means that noone is using this file anymore.
         * The only hit might come from ep_free() but by holding the semaphore
         * will correctly serialize the operation. We do need to acquire
         * "ep->sem" after "epmutex" because ep_remove() requires it when called
         * from anywhere but ep_free().
         */
        mutex_lock(&epmutex);

        while (!list_empty(lsthead)) {
            epi = list_entry(lsthead->next, struct epitem, fllink);

            ep = epi->ep;
            ep_list_del(&epi->fllink);
            down_write(&ep->sem);
            ep_remove(ep, epi);
            up_write(&ep->sem);
        }

        mutex_unlock(&epmutex);
    }
    或者epoll_ctl中EPOLL_CTL_DEL命令删除该关注项。
    六、TODO
    内核调试态验证以上描述。

  • 相关阅读:
    RS交叉表按照预定的节点成员排序
    Open DJ备份与恢复方案
    SQLServer2008备份时发生无法打开备份设备
    数据仓库备份思路
    SQLServer代理新建或者编辑作业报错
    Transfrom在64bit服务下面无法运行
    ActiveReport开发入门-图表的交互性
    ActiveReport开发入门-列表的交互性
    /etc/fstab 参数详解(转)
    CentOS7 查看硬盘情况
  • 原文地址:https://www.cnblogs.com/tsecer/p/10487539.html
Copyright © 2011-2022 走看看