5.4. Interaction Between Devices and Kernel 设备与内核的交互

zoukankan html css js c++ java

5.4. Interaction Between Devices and Kernel 设备与内核的交互

目录：http://www.cnblogs.com/WuCountry/archive/2008/11/15/1333960.html

[不提供插图，读者最好从网上下载源书]

5.4. Interaction Between Devices and Kernel 设备与内核的交互
Nearly all devices (including NICs) interact with the kernel in one of two ways:
几乎所有的设备(包括NIC)有2种方式与内核交互：

Polling 轮询

Driven on the kernel side. The kernel checks the device status at regular intervals to see if it has anything to say.
内核驱动模式，由内核有规则的主动检测设备的状态，看设备是否有内容要输出。

Interrupt

Driven on the device side. The device sends a hardware signal (by generating an interrupt) to the kernel when it needs the kernel's attention.
设备驱动模式，在设备须要内核关注时，由设备主动发送一个硬件信号(通常是通过一个中断)给内核。

In Chapter 9, you can find a detailed discussion of NIC driver design alternatives as well as software interrupts. You will also see how Linux can use a combination of polling and interrupts to increase performance. In this chapter, we will look only at the interrupt-based case.
在第9章，你可以看到详细的讨论，关于NIC驱动可以选择性的设计成很好的软件中断模式。你同样可以看到Linux是如何合并使用轮询和中断来提高性能的。在这一章，我只讲基于中断的情况。

I won't go into detail on how interrupts are reported by the hardware, the difference between hardware exceptions and device interrupts, how the driver and bus kernel infrastructures are designed, etc.; you can refer to Linux Device Drivers and Understanding the Linux Kernel for those topics. But I'll give a brief overview on interrupts to help you understand how device drivers initialize and register the devices they are responsible for, with special attention to the networking aspect.
我不会详细的说明中断是如何从硬件上报给内核的，硬件异常和设备中断的区别，驱动和内核总线的底层是怎样设计的，等等。关于这些内容你可以参考：Linux Device Drivers 和 Understanding the Linux Kernel。但我会简单的说明一下中断，以帮助你理解设备驱动的初始化和让设备回应的注册，而这些都只关注于网络特性。

5.4.1. Hardware Interrupts 使用中断
You do not need to know the low-level background about how hardware interrupts are handled. However, there are details worth mentioning because they can make it easier to understand how NIC device drivers are written, and therefore how they interact with the upper networking layers.
你须要知道一些关于硬件底层中断处理的背景知识。显然，它们的细节是值得被关注的，因为这会让你更加容易的理解NIC设备驱动应该怎样写，而且知道它们是怎样与网络层交互的。

Every interrupt runs a function called an interrupt handler, which must be tailored to the device and therefore is installed by the device driver. Typically, when a device driver registers an NIC, it requests and assigns an IRQ. It then registers and (if the driver is unloaded) unregisters a handler for a given IRQ with the following two architecture-dependent functions. They are defined in kernel/irq/manage.c and are overridden by architecture-specific functions in arch/XXX/kernel/irq.c, where XXX is the architecture-specific directory:
每一个中断是以一个被称为中断处理的函数调用来运行的，这些必须可以通过安装的设备驱动程序，根据设备被裁剪。代表性的，当一个设备驱动注册成一个NIC，它要请求分配一个IRQ。然后它就通过以下两个基于系统体系结构的函数来注册和反注册(如果设备在卸载时)IRQ。这两个函数定义在kernel/irq/manage.c中，而且它们被不同的体系结构在arch/XXX/kernel/irq.c中重载，其中的XXX是指定的体系结构目录：

int request_irq(unsigned int irq, void (*handler)(int, void*, struct pt_regs*), unsigned long irqflags, const char * devname, void *dev_id)

This function registers a handler, first making sure that the requested interrupt is a valid one, and that it is not already allocated to another device unless both devices understand shared IRQs (see the later section "Interrupt sharing").
这个函数注册一个句柄，首先确保请求的中断是一个有效的中断号，而且它还没有分配给另一个设备，除非两个设备都明确的使用共享IRQ(参见后面的一节“Interrupt sharing”)。

void free_irq(unsigned_int irq, void *dev_id)

Given the device identified by dev_id, this function removes the handler and disables the IRQ line if no more devices are registered for that IRQ. Note that to identify the handler, the kernel needs both the IRQ number and the device identifier. This is especially important with shared IRQs, as explained in the later section "Interrupt sharing."
dev_id给出了设备唯一标识，如果一个IRQ上已经没有了注册的设备，这个函数就用于删除处理句柄并禁用IRQ总线。注意，为了标识处理句柄，内核同时须要IRQ号和设备ID。对于共享IRQ来说，这更加重要，在后面的章节“共享中断”中有解释。

When the kernel receives an interrupt notification, it uses the IRQ number to find out the driver's handler and then executes this handler. To find handlers, the kernel stores the associations between IRQ numbers and function handlers in a global table. The association can be either one-to-one or one-to-many, because the Linux kernel allows multiple devices to use the same IRQ, a feature described in the later section "Interrupt sharing."
当内核收到一个中断通知时，它就使用IRQ号来找到设备的句柄，然后执行它。为了找到处理句柄，内核将IRQ号和函数句柄以关联的形式存储在一个全局表中。这个关联可以是一对一的，也可以是一对多的，因为Linux内核充许多设备使用同一个IRQ，相关特性在后面的“共享中断”中有描述

In the following sections, you will see common examples of the information exchanged between devices and drivers by means of interrupts, and how an IRQ can be shared by multiple devices under some conditions.
在下面的章节中，就会看到一些在名义上称为中断的，关于设备与驱动之间信息交换的通用示例，以及一个IRQ是如何在一些条件下，在多个设备之间共享的。

5.4.1.1. Interrupt types　中断类型
With an interrupt, an NIC can tell its driver several different things. Among them are:
在一个中断上，一个NIC可以特性它的驱动一些不同的事。这些有：

Reception of a frame　收到一个帧

This is the most common and standard situation.　这是最常见和标准的情况。

Transmission failure　转送失败

This kind of notification is generated on Ethernet devices only after a feature called exponential binary backoff has failed (this feature is implemented at the hardware level by the NIC). Note that the driver will not relay this notification to higher network layers; they will come to know about the failure by other means (timer timeouts, negative ACKs, etc.).
这种通知只在以太设备上，当一个称为（exponential binary backoff）失败以后才发生。（这一特性是在NIC的硬件层上实现的）。注意，驱动在更高层的网络层上将不再对这个通知转发；它们以另一种形式来获知这个失败（计时器超时，拒绝的确认帧等）。

DMA transfer has completed successfully　DMA传送已经成功的完成

Given a frame to send, the buffer that holds it is released by the driver once the frame has been uploaded into the NIC's memory for transmission on the medium. With synchronous transmissions (no DMA), the driver knows right away when the frame has been uploaded on the NIC. But with DMA, which uses asynchronous transmissions, the device driver needs to wait for an explicit interrupt from the NIC. You can find an example of each case at points where dev_kfree_skb[*] is called within the driver code drivers/net/3c59x.c (DMA) and drivers/net/3c509.c (non-DMA).
给一个发送帧，在介质上传送该帧时，一但帧已经上传到NIC的内存以后，该帧的缓存会被驱动释放。在同步传输时（非DMA），设备可以正确的知道在什么时候帧已经上传到NIC设备中。但在DMA中，也就是用户使用异步传送模式，设备驱动须要等待NIC一个清楚的中断。你可以在drivers/net/3c59x.c和drivers/net/3c509.c中分别找到调用dev_kfree_skb的示例。

[*] Chapter 11 describes this function in detail.　该函数在第11章中有详细说明。

Device has enough memory to handle a new transmission　设备已经有足够的内存来处理新的传输

It is common for an NIC device driver to disable transmissions by stopping the egress queue when that queue does not have sufficient free space to hold a frame of maximum size (e.g., 1,536 bytes for an Ethernet NIC). The queue is then re-enabled when memory becomes available. The rest of this section goes into this case in more detail.
对于NIC设备驱动来说，当队列没有足够的空闲空间来存储一个最大字节的帧（例如，在以太NIC上是1536字节）时，通过停止出口队列来禁止传输是很常见的情况。而当队列上有了足够的内存时就会重新使能该设备。这一节后面的内容会详细讨论这一问题。

The final case in the previous list covers a sophisticated way of throttling transmissions in a manner that can improve efficiency if done properly. In this system, a device driver disables transmissions for lack of queuing space, asks the NIC to issue an interrupt when the available memory is bigger than a given amount (typically the device's Maximum Transmission Unit, or MTU), and then re-enables transmissions when the interrupt comes.
前面列出的最后一个情况，包括了一个如果处理正确的话，可以提高效率的成熟的方法；在这个系统里，一个设备驱动在缺少空间的队列上禁用数据传输，询问该NIC以确认一个中断，当有大于给定数量的内存空间（通常就是设备的最大传输单元，或者MTU），然后当中断到来时重新使能数据传输。

A device driver can also disable the egress queue before a transmission (to prevent the kernel from generating another transmission request on the device), and re-enable it only if there is enough free memory on the NIC; if not, the device asks for an interrupt that allows it to resume transmission at a later time. Here is an example of this logic, taken from the el3_start_xmit routine, which the drivers/net/3c509.c driver installs as its hard_start_xmit function in its net_device structure:
一个设备驱动同样可以禁用出口队列的数据传输（用于防止内核在该设备上产生另一个传输请示），而只在NIC有足够的内存时才重新使能；否则，该设备请求一个中断，该中断充许它在后面某个时刻恢复数据传输。关于这一逻辑，这里有一个从drivers/net/3c509.c里取得的一个el3_start_xmit的示例，设备驱动用类似hard_start_mit的函数放在它的设备数据结构里：

The hard_start_xmit virtual function is described in Chapter 11.
hard_start_xmit虚函数在第11章里讨论

static int
el3_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
    ... ... ...
    netif_stop_queue (dev);
    ... ... ...
    if (inw(ioaddr + TX_FREE) > 1536)
        netif_start_queue(dev);
    else
        outw(SetTxThreshold + 1536, ioaddr + EL3_CMD);
    ... ... ...
}

The driver stops the device queue with netif_stop_queue, thus inhibiting the kernel from submitting further transmission requests. The driver then checks whether the device's memory has enough free space for a packet of 1,536 bytes. If so, the driver starts the queue to allow the kernel once again to submit transmission requests; otherwise, it instructs the device (by writing to a configuration register with an outw call) to generate an interrupt when that condition will be met. An interrupt handler will then re-enable the device queue with netif_start_queue so that the kernel can restart transmissions.
该驱动用netif_stop_queue来停止设备队列，这样就抑制内核后期的传输请求。然后驱动检测设备是否有足够的内存空间来存放一个1536字节的数据包。如果是这样的，设备启动队列，用以再次让内核可以提交数据传输请求。否则，它就指示设备（通过写一个outw调用来写一个配置寄存器）在这个条件满足时产生一个中断。然后一个中断句柄用netif_start_queue来重新使能设备队列，这样内核就可以重新启动数据传输。

The netif_xxx_queue routines are described in the section "Enabling and Disabling Transmissions" in Chapter 11.
netif_xxx_queue函数会在第11章的“传输的使能与去使能”中描述。

5.4.1.2. Interrupt sharing　共享中断
IRQ lines are a limited resource. A simple way to increase the number of devices a system can host is to allow multiple devices to share a common IRQ. Normally, each driver registers its own handler to the kernel for that IRQ. Instead of having the kernel receive the interrupt notification, find the right device, and invoke its handler, the kernel simply invokes all the handlers of those devices that registered for the same shared IRQ. It is up to the handlers to filter spurious invocations, such as by reading a registry on their devices.
IRQ线是有限的资源。一个简单的方法可以增加该系统可以提供的设备数目，就是充许多个设备共享一个通用的IRQ。原来内核是收到一个中断通知以后，查找到正确的设备，然后调用它的句柄；共享IRQ以后，内核取而代之的就是简单的调用这些设备注册在同一个IRQ上的所有句柄。这就等于让句柄去过滤欺骗的行为，例如在他们的设备上读一个寄存器。

For a group of devices to share an IRQ line, all of them must have device drivers capable of handling shared IRQs. In other words, each time a device registers for an IRQ line, it needs to explicitly say whether it supports interrupt sharing. For example, the first device that registers for one IRQ, saying something like "assign me IRQ n and use this routine fn as the handler," must also specify whether it is willing to share the IRQ with other devices. When another device driver tries to register the same IRQ number, it is refused if either it, or the driver to which the IRQ is currently assigned, is incapable of sharing IRQs.
为了让一组设备可以共享一个IRQ线，所有设备都必须有一个在共享IRQ上可以处理的句柄。也就是说，每次在注册一个IRQ时，它必须显示的说明它是否支持共享中断。例如，第一个设备在注册一个IRQ时，像这样申明：“分配给我一个IRQ　n，而且用这个句柄来做为我的调用函数”，同时还必须指定它是否要与其它设备共享这个中断号。当另一个设备来注册同一个中断号是时，如果该设备不使用共享，那么它会被拒绝，或者说，已经注册了该中断号的设备也是无法共享该中断号的。

5.4.1.3. Organization of IRQs to handler mappings　IRQ句柄的映射组织
The mapping of IRQs to handlers is stored in a vector of lists, one list of handlers for each IRQ (see Figure 5-2). A list includes more than one element only when multiple devices share the same IRQ. The size of the vector (i.e., the number of possible IRQ numbers) is architecture dependent and can vary from 15 (on an x86) to more than 200. With the introduction of interrupt sharing, even more devices can be supported on a system at once.
映射IRQ的句柄是存储在一个向量链表中，每一个IRQ（参见图5－2）有一个句柄链表。一个链表只有在多个设备共享同一个IRQ时才会包含更多的元素。向量的大小（例如：IRQ的可用数量）是与体系结构相关的，而且是可以从15（在x86的体系结构上）到多于200的范围上变化的。和中断共享所介绍的一样，可能更多的设备可以一次在一个系统是被支持。

The section "Hardware Interrupts" already introduced the two functions provided by the kernel to register and unregister a handler, respectively. Let's now see the data structure used to store the mappings.
在硬件中断一节中已经介绍了内核所提供的两个函数，用于注册和反注册一个句柄。接下来让我们看看关于映射的数据结构的存储。

Mappings are defined with irqaction data structures. The request_irq function introduced in the earlier section "Hardware Interrupts" is a wrapper around setup_irq, which takes an irqaction structure as input and inserts it into the global irq_desc vector. irq_desc is defined in kernel/irq/handler.c and can be overridden in the per-architecture files arch/XXX/kernel/irq.c. setup_irq is defined in kernel/irq/manage.c and can be overridden in the per-architecture files arch/XXX/kernel/irq.c.
映射和中断动作（irqaction）数据结构定义在一起。request_irq函数已经在前面的“硬件中断”一节中介绍过，它被封装成setup_irq，这个函数用一个中断动作数据结构做为一个输入参数，而且将它插入到一个全局的irq_desc向量中。irq_des在kernel/irq/handler.c中有定义，而且它可在以每个不同的体系结构中被重载，这些体系结构文件在arch/xxx/kernel/irq.c中。setup_irq被定义在kernel/irq/manage.c中，而且可以被arch/xxx/kernle/irq.c中不同的体系结构所重载。

The kernel function that handles interrupts and passes them to drivers is architecture dependent. It is called handle_IRQ_event on most architectures.
内核用于处理句柄并将它们传给设备的函数是与体系结构相关的。它们在很多体系结构中都被称为handle_IRQ_event。

Figure 5-2 shows how irqaction instances are stored: there is an instance of irq_desc for each possible IRQ and an instance of irqaction for each successfully registered IRQ handler. The vector of irq_desc instances is called irq_desc as well, and its size is given by the architecture-dependent symbol NR_IRQS.
图5－2展示了中断动作实例是如何存储的：这里，在每一个可能的IRQ上有一个irq_desc的实例，而且每中断动作的实例已经成功的注册了IRQ处理句柄。irq_desc向量的实例可以就称为irq_desc，而且它的大小是由给定的与体系结构相关的NR_IRQS所决定的。

Note that when you have more than one irqaction instance for a given IRQ number (that is, for a given element of the irq_desc vector), interrupt sharing is required (each structure must have the SA_SHIRQ flag set).
应该注意，当你在一个给定的IRQ数上使多个IRQ动作时（也就是在一个给定的irq_desc向量元素上），中断共享是必须的（每个结构必须同时拥有SA_SHIRQ标志位）。

Figure 5-2. Organization of IRQ handlers
图5－2，IRQ句柄组织

Let's see now what information is stored about IRQ handlers in the fields of an irqaction data structure:
现在让我们来看看关于IRQ句柄的信息在一个中断动作的数据结构中是如何存储的：

void (*handler)(int irq, void *dev_id, struct pt_regs *regs)

Function provided by the device driver to handle notifications of interrupts: whenever the kernel receives an interrupt on line irq, it invokes handler. Here are the function's input parameters:
该函数由设备驱动提供，用于处理中断通知：不管什么时候，只要内核在IRQ线上收到一个中断，它就调用这个句柄。这里是函数的输入参数：

int irq

IRQ number that generated the notification. Most of the time it is not used by the NICs' device drivers to accomplish their job; the device ID is sufficient.
IRQ号，用于合成（中断）通知。多数时候，它只并不被NIC设备驱动所使用去完成它们的任务，设备ID已经足够了。

void *dev_id

Device identifier. The same driver can be responsible for different devices at the same time, so it needs the device ID to process the notification correctly.
设备ID，同一个设备驱动可以同时对不同的设备负责，所以它须要设备ID用于处理正确的中断通知。

struct pt_regs *regs

Structure used to save the content of the processor's registers at the moment the interrupt interrupted the current process. It is normally not used by the interrupt handler.
该结构用于在中断正中断了当前进程时，用于保存当前处理器的寄存器内容。通常它不被中断句柄所使用。

unsigned long flags

Set of flags. The possible values SA_XXX are defined in include/asm-XXX/signal.h. Here are the main ones from the x86 architecture file:
标志位，它的可能值在include/asm-XXX/signal.h中以SA_XXX的形式定义，这里是几个x86体系结构的主要值：

SA_SHIRQ

When set, the device driver can handle shared IRQs.
设置它时，设备驱动可以处理共享IRQ

SA_SAMPLE_RANDOM

When set, the device is making itself available as a source of random events. This can be useful to help the kernel generate random numbers for internal use, and is called contributing to system entropy. This is further described in the later section "Initializing the Device Handling Layer: net_dev_init."
设置它时，设备就让它自己可成为一个随机事件的源。这可以用于帮助内核产生一些内部使用的随机数，而且它被称为给内核贡献的熵值。这个会在后面的“初始化设备句柄层：net_dev_init”章节中描述。

SA_INTERRUPT

When set, the handler runs with interrupts disabled on the local processor. This should be specified only for handlers that can get done very quickly. See one of the handle_IRQ_event instances for an example (for instance, /kernel/irq/handle.c).
当它被设置时，句柄就在本地处理器上以可中断的形式运行。这只有在句柄可以很快的处理完时指定。可以参见一个handle_IRQ_event的实例（/kernel/irq/handle.c）。

There are other values, but they are either obsolete or used only by particular architectures.
这里有另一个值，但它们并没有被丢弃，也没有被特殊的体系结构所使用。

void *dev_id

Pointer to the net_device data structure associated with the device. The reason it is declared void * is that NICs are not the only devices to use IRQs. Because various device types use different data structures to identify and represent device instances, a generic type declaration is used.
用于指向一个与设备相关的数据结构net_device，使用void *的理由是NIC并不是唯一使用IRQ的设备。

struct irqaction *next

All the devices sharing the same IRQ number are linked together in a list with this pointer.
所有的共享同一个IRQ数的设备被一个链表所关联在一起，这个指针指向这个链表。

const char *name

Device name. You can read it by dumping the contents of /proc/interrupts.
设备名，你可以通过导出/proc/interrupts中的信息来读取它。

================================
/\_/\
(=^o^=) Wu.Country@侠缘
(~)@(~) 一辈子，用心做一件事！
--------------------------------
学而不思则罔，思而不学则怠！
================================

查看全文

相关阅读:
2018ddctf wp
装饰器
 python作用域
 闭包
 迭代器
 ord() expected string of length 1, but int found
pygm2安装问题
 elf逆向入门
 【POJ
【POJ

原文地址：https://www.cnblogs.com/WuCountry/p/1382106.html