写下标题,突然发觉有点淫荡了,把旋改成慰,一字之差,效果全然不同,性质也千差万别.淫者见淫,仁者未必仁。本文写作的初衷是探讨lock或者Monitor.Enter的实现是否用到了自旋,也即所谓的spinning。因为网上众多的意见和看法都是lock是C#/NET中关于线程同步的一种轻量级实现,类似于Windows临界区CriticalSection。那麽究竟有多像,像在哪里?由此也激发了一个偷窥者的那一点小小的兴趣。
我们首先来看看CriticalSection的概念与用法,本文在概念这一方面较大量地引用了valdok在CodeProject上的一文Fast critical sections with timeout,作者在这篇文章里实现了一个性能更好的CriticalSection,文中对spin和CriticalSection的分析非常到位,有兴趣的读者请自行前往阅读。
MSDN关于CriticalSection的部分描述如下,要注意着色部分:
A critical section object provides synchronization similar to that provided by a mutex object, except that a critical section can be used only by the threads of a single process. Event, mutex, and semaphore objects can also be used in a single-process application, but critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization (a processor-specific test and set instruction). Like a mutex object, a critical section object can be owned by only one thread at a time, which makes it useful for protecting a shared resource from simultaneous access. Unlike a mutex object, there is no way to tell whether a critical section has been abandoned.
从这一段我们得出一个结论临界区使用的线程同步技术和Mutex等不同,Mutex和Event多借助于WaitForSingleObject或WaitForXXXXObject(s)等待内核对象状态,涉及到内核态与用户态的切换,而临界区使用的技术是more efficient mechanism,是a processor-specific test and set instruction,是基于处理器命令的,其实这个技术的实现就是自旋。
再看valdok对临界区的一段描述,稍加修改翻译:
Minding all the above, we introduce critical sections. It's a hybrid. When you attempt to lock it (call EnterCriticalSection,调用
), the idea is to perform the following steps:EnterCriticalSection
试图获得锁
- Check if this thread already owns the critical section. If it does, the lock/unlock is omitted (skip the rest). 先检查当前线程是否已经拥有临界区的锁。
- Attempt to lock a dedicated variable via the interlocked instruction (similar to what we've done). If the lock succeeds, return (skip the rest). 通过处理器interlocked命令尝试锁住一个变量,如果成功则意味获得锁并返回。
- Optionally, retry the second step a number of times. This can be enabled by calling
InitializeCriticalSectionAndSpinCount
instead ofInitializeCriticalSection
. This step is always skipped on single-processor machines. 调用InitializeCriticalSectionAndSpinCount
来初始化临界区的自旋次数。只在多处理器计算机上有效。 - After we've tried all of the above, call a kernel-mode
WaitXXXX
function.如果上述步骤皆试过且依旧没有获得锁,则调用WaitXXXX进入内核态等待.
valdok在文中给了我们一段自旋的实现代码:
while (InterlockedCompareExchange(&nLockCount, 1, 0));
InterlockedCompareExchange是Interlocked函数家族的一员,凡冠以此前缀的函数都在处理器级别实现了对变量的操作同步(即使是多处理器情况下也如此,Interlocked会通过总线向其它处理器发送命令,告诉它们此时此刻只有我才能动),InterlockedCompareExchange函数会判断第一个参数与第三个参数是否相等,如果相等则将第二个参数复制给第一个参数,并返回原始值(这里就是0),调用InterlockedCompareExchange意味着只有第一次才生效,之后的调用都不会对nLockCount修改,返回值都是1,所以现在假设有两个线程同一时间分别执行这段代码,那麽将会只有一个线程退出while循环并继续下面的操作,即获得了临界区的锁,而另外一个线程则会无限循环,因为它那里InterlockedCompareExchange永远返回1==TRUE。这就是一个自旋的实现。不过这样的自旋如果没有退出循环的条件则意味着无限自旋和等待,所以真正的临界区肯定会有退出的条件,使用InitializeCriticalSectionAndSpinCount
初始化一个自旋次数就是一个条件,如果达到自旋次数且依旧没有获得锁,则直接WaitXXXX进入内核态。这时候我们再来看看valdok提供的一个类似EnterCriticalSection的代码实现就容易理解多了:
// Attempt spin-lock
for (DWORD dwSpin = 0; dwSpin < m_dwSpinMax; dwSpin++)
{
if (PerfLockImmediate(dwThreadID))
return true;
YieldProcessor();
}
// Ensure we have the kernel event created
AllocateKernelSemaphore();
bool bVal = PerfLockKernel(dwThreadID, dwTimeout);
WaiterMinus();
return bVal;
PerfLockImmediate的代码:
inline bool PerfLockImmediate(DWORD dwThreadID)
{
return !_InterlockedCompareExchange((long*) &m_nLocker, dwThreadID, 0);
}
PerfLockKernel的部分代码:
switch (WaitForSingleObject(m_hSemaphore, dwWait))
{
case WAIT_OBJECT_0:
bWaiter = false;
break;
case WAIT_TIMEOUT:
bWaiter = true;
break;
default:
TestSys(TRUE);
}
这里插一句,究竟Interlocked为啥比WaitXXXX性能好?差距是多大?valdok说:
Interlocked operations cost us from tens to hundreds of processor cycles, whereas every kernel-mode call, including WaitForSingleObject
and ReleaseMutex
, for example, cost thousands of cycles, so that on multi-processor systems for short-time locking - spinning may be a preferred way to go.
现在,我们搞清楚了自旋的概念,是时候来瞧瞧Monitor.Enter和EnterCriticalSection是胞胎呢还仅仅是慕名模仿者呢?等会我们再弄,先吃饭去。