innodb里实现了2类很常用的互斥量,一个是mutex_t(独占形式),另外一个是rw_lock_t(读共享,写独占),innodb对其进行了改造,以适应数据库的性能要求。因为并发是innodb主打的看点,所以这两类互斥量在整个代码里面占了很重要的地位(特别是mutex_t,几乎贯穿了整个体系),而在介绍这两种互斥量之前,先要介绍一个基础的模块——os_event,它实现了基本的事件收发机制, mutex_t和rw_lock_t的互斥通知都依赖的是os_event。
note: innodb喜欢把封装了系统调用的模块没其名曰os_xxxxx
先描述一下os_event的事件收发流程
thread A calls os_event_reset(event_1) [开始接收事件通知]
thread B calls os_event_set(event_1) [ 发送事件通知]
thread A calls os_event_wait(event_1) [等待事件]
thread A 等待完毕
1. A进程调用了os_event_reset()后就已经加入了争抢event_1的队伍,而不是只在wait的时候才开始接收事件,也就是说在reset和wait之间发的该事件信号A也收得到(具体实现code体现)
2. os_event_set的事件通知是惊群模式(调用的pthread_cond_broadcast
), 通知所有的waiter这个肯定增加cpu开销,
但是可以满足rw_lock_t的需求,下面是pthread manual的一段解释
The pthread_cond_broadcast() function is used whenever the shared-vari-
able state has been changed in a way that more than one thread can pro-
ceed with its task. Consider a single producer/multiple consumer prob-
lem, where the producer can insert multiple items on a list that is
accessed one item at a time by the consumers. By calling the
pthread_cond_broadcast() function, the producer would notify all con-
sumers that might be waiting, and thereby the application would receive
more throughput on a multi-processor. In addition, pthread_cond_broad-
cast() makes it easier to implement a read-write lock. The
pthread_cond_broadcast() function is needed in order to wake up all
waiting readers when a writer releases its lock. Finally, the two-
phase commit algorithm can use this broadcast function to notify all
clients of an impending transaction commit.
3.os_event_wait是个pthread_mutex和pthread_cond的常见组合,网上很多这种介绍。
我们看看os_event的实现
下面是event的结构
struct os_event_struct { os_fast_mutex_t os_mutex; /*!< this mutex protects the next fields */ ibool is_set; /*!< this is TRUE when the event is in the signaled state, i.e., a thread does not stop if it tries to wait for this event */ ib_int64_t signal_count; /*!< this is incremented each time the event becomes signaled */ os_cond_t cond_var; /*!< condition variable is used in waiting for the event */ UT_LIST_NODE_T(os_event_struct_t) os_event_list; /*!< list of all created events */ };
1) is_set和signal_count是一个事件状态的标志组合
线程发送事件(event_set),is_set设置为true,且signal_count++(signal_count只会一直递增)
os_fast_mutex_lock(&(event->os_mutex)); if (event->is_set) { /* Do nothing */ } else { event->is_set = TRUE; event->signal_count += 1; os_cond_broadcast(&(event->cond_var)); } os_fast_mutex_unlock(&(event->os_mutex));
线程开始接收事件通知(event_reset)会返回此刻的signal_count(假定调用的该线程将返回值保留在old_signal_count里)且is_set设置为false
os_fast_mutex_lock(&(event->os_mutex)); if (!event->is_set) { /* Do nothing */ } else { event->is_set = FALSE; } ret = event->signal_count; os_fast_mutex_unlock(&(event->os_mutex));
(old_signal_count==signal_count && is_set==false) 作为判定从reset到wait之间是否已经有event的标志(表达式为真则无event来,还有一个是timeout_wait的,但实现大同小异)
//
os_fast_mutex_lock(&event->os_mutex);
//初始化这个event的时候signal_count从1开始,因为0在os_event_wait_low判断放弃reset到wait直接的event通知的标志,
//也就是说old_signal_count硬性设置为0则等于从cond_wait才开始接收该事件的通知
if (!reset_sig_count)
{
reset_sig_count = event->signal_count;
}
while (!event->is_set && event->signal_count == reset_sig_count)
{
os_cond_wait(&(event->cond_var), &(event->os_mutex));
/* Solaris manual said that spurious wakeups may occur: we have to check if the event really has been signaled after we came here to wait */
}
os_fast_mutex_unlock(&event->os_mutex);
这样做的好处是
event_reset把is_set设置为false,则屏蔽了reset之前的所有event通知,避免早已有event_set把is_set设置过了,但是仅这样设计有缺陷,因为如果是下面这样
A:event_reset
B: event_set
C:event_reset
A : event_wait
这样B的事件通知被C给意外抹杀掉了,A就丢失了这次通知,继续等待下去,所以还得引入signal_count这个变量的判断,如果A在reset的时候记录了signal_count的oldvalue,那么就算is_set
被C给设置成false了,(old_signal_count==signal_count && is_set==false)还是判断为假,A的wait依然会通过。
2) os_mutex保证并发情况下这个os_event内成员的修改一致性,也会配合cond_var等待事件,(os_fast_mutex_t和os_cond_t是对pthread_mutex和pthread_cond的简单封装)
3) 所有的os_event都会加入到一个全局双链表中,os_event_list则又反向指向这个链表
/* The os_sync_mutex can be NULL because during startup an event can be created [ because it's embedded in the mutex/rwlock ] before this module has been initialized */ if (os_sync_mutex != NULL) { os_mutex_enter(os_sync_mutex); } /* Put to the list of events */ UT_LIST_ADD_FIRST(os_event_list, os_event_list, event); os_event_count++; if (os_sync_mutex != NULL) { os_mutex_exit(os_sync_mutex); }
os_event的成员就这么多,实现也是比较简单的,主要还是靠对is_set和signal_count的修改和判断来实现整个事件行为,后面的rw_lock和mutex会复杂一点