Use Reentrant Functions for Safer Signal Handling
使用可重入函数进行更安全的信号处理
How and when to employ reentrancy to keep your code bug free
何时及如何利用可重入性避免代码缺陷
Dipak Jha (mailto:dipakjha@in.ibm.com?subject=Use reentrant functions for safer signal handling&cc=dipakjha@yahoo.com), Software Engineer, IBM
Date: 20 Jan 2005
Summary: If you deal with concurrent access of functions, either by threads or processes, you can face problems caused by non-reentrancy of the functions. In this article, learn through code samples how anomalies can result if reentrancy is not ensured, especially with regard to signals. Five recommended programming practices are included, along with a discussion of a proposed compiler model in which the compiler front end deals with reentrancy. 若对函数进行并发访问(无论通过线程或进程),可能会遇到函数不可重入所导致的问题。在本文中,通过代码示例可了解若可重入性不能保证时如何导致异常,尤其是有关信号(signals)方面。本文包含五条推荐的编程实践,并提出和讨论一个编译器模型,该模型中可重入性由编译器前端处理。
In the early days of programming, non-reentrancy was not a threat to programmers; functions did not have concurrent access and there were no interrupts. In many older implementations of the C language, functions were expected to work in an environment of single-threaded processes. 在早期编程中,不可重入性对程序员并未构成威胁;函数不会有并发访问,也没有中断存在。在很多较老的C 语言实现中,函数被认为是在单线程进程的环境中运行。
Now, however, concurrent programming is common practice, and you need to be aware of the pitfalls. This article describes some potential problems due to non-reentrancy of the function in parallel and concurrent programming. Signal generation and handling in particular add extra complexity. Due to the asynchronous nature of signals, it is difficult to point out the bug caused when a signal-handling function triggers a non-reentrant function. 然而,如今并发编程已普遍使用,您需要意识到(可重入性)这一陷阱。本文将描述在并行和并发编程中函数不可重入性导致的一些潜在问题。信号的生成和处理尤其增加了额外的复杂性。由于信号在本质上是异步的,因此难以找出当信号处理函数触发某个不可重入函数时导致的缺陷。
This article:
- Defines reentrancy and includes a POSIX listing of a reentrant function 定义可重入性,并包含一个可重入函数的POSIX清单
- Provides examples to show problems caused by non-reentrancy 给出示例以说明不可重入性所导致的问题
- Suggests ways to ensure reentrancy of the underlying function 指出确保底层函数的可重入性的方法
- Discusses dealing with reentrancy at the compiler level 讨论在编译器层面上处理可重入性
What is reentrancy?
A reentrant function is one that can be used by more than one task concurrently without fear of data corruption. Conversely, a non-reentrant function is one that cannot be shared by more than one task unless mutual exclusion to the function is ensured either by using a semaphore or by disabling interrupts during critical sections of code. A reentrant function can be interrupted at any time and resumed at a later time without loss of data. Reentrant functions either use local variables or protect their data when global variables are used. 可重入函数可以由多于一个任务并发使用,而不必担心数据错误。相反,不可重入函数不能由超过一个任务所共享,除非通过使用信号量或者在代码关键部分禁用中断以确保函数的互斥。可重入函数可在任意时刻被中断,稍后再继续恢复运行,而不会丢失数据。可重入函数要么使用本地变量,要么在使用全局变量时保护自己的数据。
A reentrant function:
- Does not hold static data over successive calls 不为连续的调用保持静态数据
- Does not return a pointer to static data; all data is provided by the caller of the function 不返回指向静态数据的指针;所有数据都由函数的调用者提供
- Uses local data or ensures protection of global data by making a local copy of it 使用本地数据,或制作全局数据的本地拷贝来保护全局数据
- Must not call any non-reentrant functions 绝不调用任何不可重入函数
Don't confuse reentrance with thread-safety. From the programmer perspective, these two are separate concepts: a function can be reentrant, thread-safe, both, or neither. Non-reentrant functions cannot be used by multiple threads. Moreover, it may be impossible to make a non-reentrant function thread-safe. 不要混淆可重入与线程安全。在程序员看来,这是两个独立的概念:函数可以是可重入的,线程安全的,二者皆是或二者皆非。不可重入的函数不能由多个线程使用。此外,也许不可能让某个不可重入的函数是线程安全的。
IEEE Std 1003.1 lists 118 reentrant UNIX® functions, which aren't duplicated here. See Resources for a link to the list at unix.org. IEEE Std 1003.1列出了118个可重入的 UNIX®函数,在此不予赘述。参见参考资料中指向unix.org上该列表的链接。
The rest of the functions are non-reentrant because of any of the following: 其余函数出于以下任意原因而不可重入:
- They call malloc or free 调用malloc或free(之类的函数)
- They are known to use static data structures 已知使用静态数据结构
- They are part of the standard I/O library 标准I/O库的一部分(该库很多实现使用全局数据结构)
Signals and non-reentrant functions
A signal is a software interrupt. It empowers a programmer to handle an asynchronous event. To send a signal to a process, the kernel sets a bit in the signal field of the process table entry, corresponding to the type of signal received. The ANSI C prototype of a signal function is: 信号是软件中断,它使得程序员可以处理异步事件。为了向进程发送一个信号,内核在进程表项的信号域中设置一个比特位,对应于接收信号的类型。信号函数的ANSI C原型是:
void (*signal (int sigNum, void (*sigHandler)(int))) (int); |
Or, in another representation: 或另一种描述形式:
typedef void sigHandler(int); SigHandler *signal(int, sigHandler *); |
When a signal that is being caught is handled by a process, the normal sequence of instructions being executed by the process is temporarily interrupted by the signal handler. The process then continues executing, but the instructions in the signal handler are now executed. If the signal handler returns, the process continues executing the normal sequence of instructions it was executing when the signal was caught. 当进程处理所捕获的信号时,正在执行的正常指令序列被信号处理器临时中断。然后进程继续执行,但现在执行的是信号处理器中的指令。若信号处理器返回,则进程继续执行信号被捕获时正在执行的正常指令序列。
Now, in the signal handler you can't tell what the process was executing when the signal was caught. What if the process was in the middle of allocating additional memory on its heap using malloc, and you call malloc from the signal handler? Or, you call some function that was in the middle of the manipulation of the global data structure and you call the same function from the signal handler. In the case of malloc, havoc can result for the process, because malloc usually maintains a linked list of all its allocated area and it may have been in the middle of changing this list. 此时,在信号处理器中您并不知道信号被捕获时进程正在执行什么内容。若进程正在使用malloc在其堆(heap)上分配额外内存,您通过信号处理器调用malloc,那会怎样?或者,调用正在操作全局数据结构的某个函数,而在信号处理器中又调用同一个函数。若是调用malloc,则进程会被严重破坏,因为malloc通常会为所有它所分配的所有内存区域维持一个链表,而它可能正在修改该链表。
An interrupt can even be delivered between the beginning and end of a C operator that requires multiple instructions. At the programmer level, the instruction may appear atomic (that is, cannot be divided into smaller operations), but it might actually take more than one processor instruction to complete the operation. For example, take this piece of C code: 甚至可在需要多个指令的C操作符开始和结束之间发送中断。在程序员看来,指令似乎是原子的(即不能被分割为更小的操作),但它实际上可能需要不止一个处理器指令才能完成该操作。以这段C代码为例:
temp += 1; |
On an x86 processor, that statement might compile to: 在x86处理器上,该语句可能被编译为:
mov ax,[temp] inc ax mov [temp],ax |
This is clearly not an atomic operation. 这显然不是一个原子操作。
This example shows what can happen if a signal handler runs in the middle of modifying a variable: 该例(清单1)展示了在修改某个变量的过程中运行信号处理器可能会发生什么事情:
1 #include <signal.h> 2 #include <stdio.h> 3 4 struct two_int{ int a, b; }data; 5 6 void signal_handler(int signum){ 7 printf ("%d, %d ", data.a, data.b); 8 alarm (1); 9 } 10 11 int main (void){ 12 static struct two_int zeros = { 0, 0 }, ones = { 1, 1 }; 13 14 signal(SIGALRM, signal_handler); 15 16 data = zeros; 17 18 alarm (1); 19 20 while (1) 21 {data = zeros; data = ones;} 22 }
This program fills data with zeros, ones, zeros, ones, and so on, alternating forever. Meanwhile, once per second, the alarm signal handler prints the current contents. (Calling printf in the handler is safe in this program, because it is certainly not being called outside the handler when the signal happens.) What output do you expect from this program? It should print either 0, 0 or 1, 1. But the actual output is as follows: 该程序向data填充0,1,0,1,一直交替进行。同时,alarm信号处理器每秒打印一次当前内容(该程序在处理器中调用printf是安全的,因为当信号发生时它在处理器外部确实没有正被调用)。您预期该程序会输出什么?它应该打印0,0或1,1。但实际输出如下:
0, 0 1, 1 (Skipping some output...) 0, 1 1, 1 1, 0 1, 0 ... |
On most machines, it takes several instructions to store a new value in data, and the value is stored one word at a time. If the signal is delivered between these instructions, the handler might find that data.a is 0 and data.b is 1, or vice versa. On the other hand, if we compile and run this code on a machine where it is possible to store an object's value in one instruction that cannot be interrupted, then the handler will always print 0, 0 or 1, 1. 在大部分机器上,data中存储一个新值需要若干指令,每次存储一个字。若在这些指令期间发出信号,则处理器可能发现data.a为0而 data.b为1,或者反之。另一方面,若我们编译和运行代码的机器能在一个不可中断的指令内存储一个对象值,那么处理器将总是打印0,0 或 1,1。
Another complication with signals is that, just by running test cases you can't be sure that your code is signal-bug free. This complication is due to the asynchronous nature of signal generation. 信号带来的另一问题是,仅凭运行测试用例无法确保代码没有信号缺陷。该问题原因在于信号生成的异步本质。
Non-reentrant functions and static variables
Suppose that the signal handler uses gethostbyname, which is non-reentrant. This function returns its value in a static object: 假定信号处理器使用不可重入的gethostbyname。该函数将值返回到一个静态对象中:
static struct hostent host; /* result stored here*/ |
And it reuses the same object each time. In the following example, if the signal happens to arrive during a call to gethostbyname in main, or even after a call while the program is still using the value, it will clobber the value that the program asked for. 它每次都重新使用同一个对象。在下面的例子中,若信号刚好在main中调用gethostbyname期间到达,或甚至在调用之后到达,而程序仍然在使用那个(对象)值,则信号将破坏程序请求的值。
1 main(){ 2 struct hostent *hostPtr; 3 //... 4 signal(SIGALRM, sig_handler); 5 //... 6 hostPtr = gethostbyname(hostNameOne); 7 //... 8 } 9 10 void sig_handler(){ 11 struct hostent *hostPtr; 12 //... 13 /* call to gethostbyname may clobber the value stored during the call 14 inside the main() */ 15 hostPtr = gethostbyname(hostNameTwo); 16 //... 17 }
However, if the program does not use gethostbyname or any other function that returns information in the same object, or if it always blocks signals around each use, you're safe. 不过,若程序不使用 gethostbyname或任何其他在同一对象中返回信息的函数,或者每次使用时它都会阻塞信号,那么就是安全的。
Many library functions return values in a fixed object, always reusing the same object, and they can all cause the same problem. If a function uses and modifies an object that you supply, it is potentially non-reentrant; two calls can interfere if they use the same object. 很多库函数在固定的对象中返回值,总是反复使用同一对象,它们都会导致相同的问题。若某个函数使用并修改您提供的某个对象,那它可能就是不可重入的;若两个调用使用同一对象,那么它们会相互干扰。
A similar case arises when you do I/O using streams. Suppose the signal handler prints a message with fprintf and the program was in the middle of an fprintf call using the same stream when the signal was delivered. Both the signal handler's message and the program's data could be corrupted, because both calls operate on the same data structure: the stream itself. 当使用流(stream)进行I/O操作时会出现类似情况。假定信号处理器使用fprintf打印一条消息,而当信号发出时程序正在使用同一个流进行fprintf调用。信号处理器的消息和程序的数据都会被破坏,因为两个调用操作同一数据结构:流本身。
Things become even more complicated when you're using a third-party library, because you never know which parts of the library are reentrant and which are not. As with the standard library, there can be many library functions that return values in fixed objects, always reusing the same objects, which causes the functions to be non-reentrant. 当使用第三方程序库时,事情会变得更为复杂,因为您永远不知道哪部分程序库是可重入的,哪部分是不可重入的。对标准程序库而言,很多库函数在固定的对象中返回值,总是重复使用同一对象,这就使得那些函数不可重入。
The good news is, these days many vendors have taken the initiative to provide reentrant versions of the standard C library. You'll need to go through the documentation provided with any given library to know if there is any change in the prototypes and therefore in the usage of the standard library functions. 好消息是,近来很多提供商已经开始提供标准C程序库的可重入版本。对于任何给定程序库,您需要通读它所提供的文档,以了解其原型和标准库函数的用法是否有所变化。
Practices to ensure reentrancy
Sticking to these five best practices will help you maintain reentrancy in your programs. 遵守这五条最佳实践将帮助您保持程序的可重入性。
Practice 1
Returning a pointer to static data may cause a function to be non-reentrant. For example, a strToUpper function, converting a string to uppercase, could be implemented as follows: 返回指向静态数据的指针可能导致函数不可重入。例如,将字符串转换为大写的strToUpper函数实现可能如下:
1 char *strToUpper(char *str) 2 { 3 /*Returning pointer to static data makes it non-reentrant */ 4 static char buffer[STRING_SIZE_LIMIT]; 5 int index; 6 7 for (index = 0; str[index]; index++) 8 buffer[index] = toupper(str[index]); 9 buffer[index] = '