zoukankan      html  css  js  c++  java
  • The problem with POSIX semaphores 使用信号做进程互斥必看

    Summary

    It has been a number of years since I've used named semaphores. It could be that the last time I used a named semaphore was back in my OS/2 days. But I recently needed to coordinate some work between several applications running in a linux environment, and named semaphores was the right solution.

    Or so I thought...

    Here is what I discovered, which hopefully explains why POSIX named semaphores as implemented in Linux are not working for me:

    IPC

    I was a bit confused when I started looking into inter-process communication on linux systems. It took me a while to understand that we have 2 different IPC systems: the old traditional SysV-based IPC, and the new POSIX-based IPC.

    The SysV semaphores are in #include <sys/sem.h>. This includes:

    1. int semctl(int, int, int, ...);
    2. int semget(key_t, int, int);
    3. int semop(int, struct sembuf *, size_t);

    Meanwhile, the new POSIX semaphores are in #include <semaphore.h>. A partial list of this API includes:

    1. int sem_close(sem_t *);
    2. sem_t *sem_open(const char *, int, ...);
    3. int sem_post(sem_t *);
    4. int sem_wait(sem_t *);

    Of particular importance to note is the SysV IPC command-line tools such as ipcmkipcs and ipcrm will not work with POSIX semaphores, though that is actually of minimal importance when trying to write self-contained C/C++ applications.

    Using semaphores

    Opening a POSIX named semaphore is very simple. Note the following source code extract:

    The problemTo wait on the semaphore, you'd call sem_wait( sem ) and to release it sem_post( sem ).

    In a simple scenario, all of this works very well. Client applications open/create the semaphore, call sem_wait() when it is needed, and then sem_post() when finished. But in non-trivial, embedded and/or commercial software, where you have to be prepared for and recover from external signals, there is a problem. There are at least two signals that cannot be caught: SIGKILL and SIGSTOP.

    Unfortunately, if a client application receives one of these signals between the call to sem_wait() and sem_post(), the semaphore is now unusable. Or, at the very least, you're leaking one count every time this happens. If your initial semaphore count is 1, then a single instance of someone sending SIGKILL to a client application will starve all other clients waiting for that semaphore.

    What I was expecting is for either the semaphore to be auto-posted back when the application gets cleaned up -- somewhat like open files get automatically closed and memory freed -- or for an additional sem_*() API that a watcher can query to determine when this situation has occurred. If there was a reliable way to determine that an application had been killed between the two calls, the watcher itself could decide tosem_post() and "recover" the lost semaphore.

    Note that old SysV-style semaphore do have a way to cleanup after themselves to prevent this type of problem. the SEM_UNDO flag can be specified which causes changes to the semaphore to be reverted if the application terminates abnormally. Sadly, this is specific to SysV and does not apply to POSIX semaphores.

    Alternatives

    Depending on how the semaphore is used, there may be alternatives to POSIX semaphores:

    • SysV semaphores (though this feels like a step backwards...!)
    • other resources which are automatically cleaned up when a process is terminated:
      • sockets bound to a particular port
      • file locks

    Without adding much more complexity, the socket and file solutions only work when the semaphore is boolean, used like a "named mutex". The file lock solution proved to be adequate for what I was working on. This alternative solution caused me to replace all of my calls to sem_wait() and sem_post() with calls to lockf(...).

    fd = open( filename,
        O_RDWR      |   // open the file for both read and write access
        O_CREAT     |   // create file if it does not already exist
        O_CLOEXEC   ,   // close on execute
        S_IRUSR     |   // user permission: read
        S_IWUSR     );  // user permission: write

    lockf( fd, F_TLOCK, 0 ); // lock the "semaphore"

    // do some work here

    // note that close() automatically releases the file lock
    // so technically the call with F_ULOCK is not necessary
    lockf( fd, F_ULOCK, 0 );

    close( fd ); 

    This is still not a perfect solution since the lock can be circumvented by manually deleting the lock file. But in my case, the likelyhood of an uncaught signal was higher and more devastating than someone removing the lock file.Remember to check return values for errors, especially after open() and the first lockf() to ensure you've waited on the "semaphore".
  • 相关阅读:
    UVa 531 Compromise
    UVa 10130 SuperSale
    UVa 624 CD
    2015年第一天有感
    Bootstrap3.0学习(一)
    IIS上.net注册
    11g Oracle导出表 默认不导出数据为空的表解决
    Oracle数据库密码重置、导入导出库命令
    每天进步一点--WCF学习笔记
    C#每天进步一点--异步编程模式
  • 原文地址:https://www.cnblogs.com/super119/p/3092267.html
Copyright © 2011-2022 走看看