The simple thread example we showed above was useful in showing how threads are
created and how they can run in different orders depending on how the scheduler decides
to run them. What it doesn't show you, though, is how threads interact when they access
shared data.
The Heart Of The Problem: Uncontrolled Scheduling
To understand why this happens, we must understand the code sequence that the compiler
generates for the update to counter. In this case, we wish to simple add a number 1 to counter.
Thus, the code sequence for doing so might look something like this (in X86);
mov 0x8049a1c, %eax
add $0x01, %eax
mov %eax, 0x8049a1c
This example assumes that the variable counter is located at address 0x8049a1c. In this
three-instruction sequence, the x86 mov instruction is used first to get the memory value at
the address and put it into register eax. Then, the add is performed, adding 1 to the contents
of the eax register, and finally, the contents of eax are stored back into memory at the same
address.
Let us imagine one of our two threads (Thread 1) enters this region of code, and is thus about
to increment counter by one. It loads the value of counter (let's say it's 50 to begin with) into
its register eax. Thus, eax=50 for thread 1. Then it adds one to the register; thus eax=51.
Now, something unfortunate happens: a timer interrupt goes off; thus, the OS saves the state
of the currently running thread (its PC, its registers including eax, etc) to the thread's TCB.
Now something worse happens: Thread 2 is chosen to run, and it enters this ame piece of code.
It also executes the first intruction, getting the value of counter and putting it into its eax (
remember: each thread when running has its own private registers; the registers are virtualized
by the context-switch code that saves and restores them). The value of counter is still 50 at this
point, and thus thread 2 has eax=50. Let's then assume that Thread 2 executes the next two
instructions, incrementing eax by 1 (thus eax=51), and then saving the contents of eax into
counter (address 0x8049a1c). Thus, the global variable counter now has the value 51.
Finally, another context switch occurs, and Thread 1 resumes running. Recall that it had just
executed the mov and add, and is now about to perform the final mov instruction. Recall also
that eax=51. Thus, the final mov instruction executes, and saves the value to memory; the
counter is set to 51 again.
Put simply, what has happened is this: the code to increment counter has been run twice, but
counter, which starts at 50, is now only equal to 51. A "correct" version of this program should
resulted in the variable counter equal to 52.
Let's look at a detailed execution trace to understand the problem better. Assume, for this
example, that the above code is loaded at address 100 in memory, like the following sequence
(note fotr those of you use to nice, RISC-like instruction sets: x86 has variable-length
instructions; this mov instruction takes up 5 bytes of memory, and the add only 3:
100 mov 0x8049a1c, %eax 101 add $0x1, %eax 102 mov %eax, 0x8049a1c
With these assumptions, what happens is shown in Figure 26.7. Assume the counter starts at
value 50, and trace through this example to make sure you understand what is going on.
What we have demonstrated here is called race condition: the results depend on the timing
execution of the code. With some bad luck (i.e., context switches that occur at untimely points
in the execution), we get the wrong result. In fact, we may get a different result each time; thus,
instead of a nice determinstic computation (which we are used to from computers), we call this
result indeterminate, where it is not known that the output will be and it is indeed likely to be
different across runs.
Because multiple threads executing this code can result in a race condition, we call this code
a critical section. A critical section is a piece of code that accesses a shared variable (or more
generally, a shared resource) and must not be concurrently executed by more than one thread.
What we really want for this code is what we call mutual exclusion. This property guarantees
that if one thread is executing within the critical section, the others will be prevented from
doing so.
Virtually all of these terms, by the way, were coined by Edsger Dijkstra, who was a pioneer in
the field and indeed won the Turing Award because of this and other work; see his 1968 paper
on "Cooperating Sequential Processes' for an amazingly clear description of the problem. We
will hearing more about Dijkstra in this section of this book.