Because disabling interrupts does not work on multiple processors, system designers started to
invent hardware support for locking. The earliest multiprocessor systems, such as the Burroughts
B5000 in the early 1960's, had such support; today all systems provide this type of support, even
for single CPU systems.
The simple bit of hardware support to understand is what is known as a test-and-set instruction,
also known as atomic exchange. To understand how test-and-set works, let's first try to build a
simple lock without it. In this failed attempt, we use a simple flag variable to denote whether the
lock is held or not.
In this first attempt, the idea is quite simple: use a simple variale to indicate whether some
thread has possession of a lock. The first thread that enters the critical section will call lock(),
which tests whether the flag is equal to 1 (in this case, it is not), and then sets the flag to 1 to
indicate that the thread now holds the lock. When finished with the critical section, the thread
calls unlock() and clears the flag, thus indicating that the lock is no longer held.
typedef struct __lock_t { int flag; } lock_t; void init(lock_t* mutex) { mutex->flag = 0; } void lock(lock_t* mutex) { while (mutex->flag == 1) ; mutex->flag = 1; } void unlock(lock_t* mutex) { mutex->flag = 0; }
If another thread happens to call lock() while that first thread is in the critical section, it will
simply spin-wait in the while loop for that thread to call unlock() and clear the flag. Once the first
flag does so, the waiting thread will fall out of the while loop, set the flag to 1 for itself, and
proceed into the critical section.
Unfortunately, the code has two problems: one of correctness, and another of performance. The
correctness problem is simple to see once you get used to thinking about concurrent programming
. Imagine the code interleaving; assume flag = 0 to being.
As you can see from this interleaving, with timely (untimely?) interrupts, we can easily produce
a case where both threads set the flag to 1 and both threads are thus able to enter the critical
section. This behavior is what professionals call "bad" - we have obviously failed to provide the
most basic requirement: providing mutual exclusion.
The performance problem, which we will address more later on, is the fact that the way a thread
waits to acquire a lock that is already held: it endlessly checks the value of flag, a technique
known as spin-waiting. Spin-waiting wastes time waiting for another thread to release a lock. The
waste is exceptionally high on a uniprocessor, where the thread that the waiter is waiting for
cannot even run (at least, until a context switch occurs!) Thus, as we move forward and develop
more sophisticated solutions, we should also consider ways to avoid this kind of waste.
Building A Working Spin Lock
While the idea behind the example above is a good one, it is not possible to implement without
some support from the hardware. Fortunately, some systems provide an instruction to support
the creation of simple based one this concepty. This more powerful instruction has different
names -- on SPARC, it is load/store unsigned byte instruction (ldstub), whereas on x86, it is the
atomic exchange instruction (xchg) -- but basically does the same thing across platforms, and is
generally referred to as test-and-set. We define what the test-and-set instruction does with the
following C code snippet:
int TestAndSet(int* old_ptr, int new) { int old = *old_ptr; *old_ptr = new; return old; }
What the test-and-set instruction does is as follows. It returns the old value pointed to by the ptr,
and simultaneously updates said value to new. The key, of course, is that this sequence of
operations is performed atomically. The reason it is called test-and-set is that it enables you to
test the old value (which is what is returned) while simultaneouly setting the memory location to
a new value; as it turns out, this slightly more powerful instruction is enough to build a simple
spin lock, as we now examine in figure 28.3. Or better yet: figure it out first yourself!
Let's make sure we understand why this lock works. Imagine first the case where a thread calls
lock() and no other thread currently holds the lock; thus, flag should be 0. When the thread calls
TestAndSet(flag, 1), the routine will return the old value of flag, which is 0; thus, the calling
thread, which is testing the value of flag, will not get caught spinning in the while loop and will
acquire the lock. The thread will also atomically set the value to 1, thus indicating that the lock
is now held. When the thread is finished with its critical section, it calls unlock() to set the flag
back to zero.
typedef struct __lock_t { int flag; } void init(lock_t* lock) { lock->flag = 0; } void lock(lock_t* lock) { while (TestAndSet(&lock->flag, 1) == 1) ; } void unlock(lock_t* lock) { lock->flag = 0; }
The second case we can imagine arises when one thread already has the lock held (i.e., flag is 1).
In this case, this thread will call lock() and then call TestAndSet(flag, 1) as well. This time,
() will return the old value at flag, which is 1 (because the lock is held), while simultaneouly
setting it to 1 again. As long as the lock is held by another thread, TestAndSet() will repeatedly
return 1, and thus this thread will spin and spin until the lock is finally released. When the flag is
finally set to 0 by some other thread, this thread will call TestAndSet() again, which will now
return 0 while atomically setting the value to 1 and thus acquire the lock and enter the critical
section.
By making both the test of the old lock value and set of the new value a single atomic operation,
we ensure that only one thread acquires the lock. And thst's how to build a working mutual
exclusion primitive!
You may also now understand why this type of lock is usually referred to as a spin lock. It is the
simplest type of lock to build, and simply spins using CPU cycles, until the lock becomes available.
To work corectly on a single processor, it requires a preemptive scheduler (i.e., one that will
interrupt a thread via a timer, in order to run a different thread, from time to time). Without
preemption, spin locks don't make much sense on a single CPU, as a thread spinning on a CPU
will never relinquish it.
TIPs: Think About Concurrent As Malicious Scheduler
From this example, you might get a sense of the approach you need to take to understand
concurrent execution. What you should try to do is to pretend you are a malicious scheduler, one
that interrupts threads at the most inopportune of times in order to foil their feeble attempts at
building synchronization promitives. What a mean scheduler you are! Although the exact sequence
of interrupts may be improbable, it is possible, and that is all we need to demonstrate that a
particular approach does not work. It can be useful to think maliciouly! (At least, sometimes.)