One final hardware primitive is the fetch-and-add instruction, which atomically increments a value while returning
the old value at a partucular address. The C presudocode for the fetch-and-add instruction looks like this:
int FetchAndAdd(int* ptr) { int old = *ptr; *ptr = old + 1; return old; }
In this example, we will use fetch-and-add to build a more interesting ticket lock, as instroduced by Mellor-Crummey
and Scott. The lock and unlock looks like what you see in following figure.
typedef struct __lock_t { int ticket; int turn; } lock_t; void lock_init(lock_t* lock) { lock->ticket = 0; lock->turn = 0; } void lock(lock_t* lock) { int myturn = FetchAndAdd(&lock->ticket); while (lock->turn == myturn) ; } void unlock(lock_t* lock) { lock->turn = lock->turn + 1; }
Instead of a signal value, this solution uses a ticket and tuen variable in combination to build a lock. The basic operation
is pretty simple: when a thread wishes to build a lock, it first does an atomic fetch-and-add on the ticket value; that
value is now considered this thread's turn. The globally shared lock->turn is then used to determine which thread's
turn it is; when myturn == turn for a given thread, it is that thread's turn to enter the critical section. Unlock is accomplished
simply by incrementing the turn such that the next waiting thread (if there is one) can now enter the critical section.
Note one important difference with this solution versus our previous attempts: it ensures progress for all threads. Once
a thread is assigned its ticket value, it will be scheduled at some point in the future (once these in front of it have passed
through the critical section and released the lock). In our previous attempt, no such guarantee existed; a thread
spining on test-and-set (for example) could spin forever even as other threads acquire and release the lock.
Too Much Spinning: What Now ?
Our simple hardware-based locks are simple (only a few lines of code) and they work (you could even prove that
if you would like to, by writing some code), which are two excellent properties of any sytem or code. However, in
some cases, these solutions can be quite inefficient. Imagine you are running two threads on a single processor.
Now imagine that one thread (thread 0) is in a critical section and thus has a lock held, and unfortunately gets
interrupted. The second thread (thread 1) now tries to acquire the lock, but finds that it is held. Thus, it begins to
spin. And spin. Then it spins some mroe. And finally, a timer interrupt goes off, thread 0 is run again, which releases
the lock, and finally the next time it runs, thread 1 won't have to spin so much and will be able to acquire the lock.
Thus, any time a thread gets caught spinning in a situation like this, it wastes an entire time slice doing nothing but
checking a value that is not going to change! The problem gets worse with N threads contending for a lock; N - 1
time slices may be wasted in a similar manner, simply spinning and waiting for a single thread to release the lock.
And thus, our next problem.
The Crux: How to avoid spinning
How can we develop a lock that does not needlessly waste time spinning on the CPU ?
Hardware support alone cannot solve this problem. We will need OS support too! Let's now figure out just how
that might work.