http://people.cs.clemson.edu/~mark/multithreading.html
History of Multithreading
Mark Smotherman Last major update: April 2005 (Minor update: November 2007)
Summary: Multithreading first appeared in the 1950s, and simultaneous multithreading (SMT) was investigated by IBM in 1968.
(under construction)
Most attempts at a history of multithreaded processors start with the PPUs in the CDC 6600. However, the ideas of sharing a single execution path among multiple threads of execution started at least a decade earlier.
First, there are a number of types of multithreading.
- "multiple sequence"
- multiple program counters
- some degree of program-controlled switching (coroutines)
- a.k.a. explicit-switch, static, blocked multithreading in Ungerer's taxonomy
- e.g., break bits in instruction format in DYSEAC (1954) and TX-2 (1957)
- e.g., Transputer (1985)
- "fine-grain MT (FGMT)"
- switch threads on each cycle
- a.k.a. "vertical multithreading"
- a.k.a. "interleaved multithreading (IMT)"
- a.k.a. "time-sharing multithreading"
- a.k.a. "hardware multiprogramming" in H800 (1958) [see also Singer System 10 - 1970 FJCC]
- e.g., CDC 6600 PPUs
- e.g., HEP
- e.g., Tera MTA - 128 threads
- in-order core
- goal is to cover memory latency and improve overall throughput
- "coarse-grain MT (CGMT)"
- a.k.a. "concurrent multithreading (CMT)"
- a.k.a. "switch-on-event multithreading (SOEMT)"
- a.k.a. "dynamic blocked multithreading (BMT)"
- e.g., IBM AS400 PPC Pulsar
- two threads, 3-cycle thread switch, priority scheme for scheduling, also can set to switch on L2 miss
- R. Eickemeyer, R. Johnson, S. Kunkel, B.-H. Lim, M. Squillante, and C. Wu, "Evaluation of Multithreaded Processors and Thread Switch Policies", 1997 International Symposium on High Performance Computing, Springer LNCS 1336, Nov. 1997, Fukuoka, Japan, pp. 75-90.
- J.M. Borkenhagen, R.J. Eickemeyer, R.N. Kalla, and S.R. Kunkel, "A multithreaded PowerPC processor for commercial servers," IBM J. Res. Dev., vol. 44, no. 6, 2000, pp. 885-898.
- goal is to cover latencies and improve overall throughput
- "simultaneous MT (SMT)"
- a.k.a. "horizontal multithreading"
- e.g., U. Wash. work, DEC 21464 (EV8) plans for 4-way SMT (Joel Emer of Compaq apparently claimed that EV8 would achieve an IPC speed up of 2x at a cost of 10%. --check for quote)
- e.g., Intel P4 "hyper-threading"
- multiple instructions per cycle, out-of-order core
- goal is to increase IPC
- implicit MT (IMT)
- compiler-based speculative threads with hardware support beyond regular SMP
- e.g., Manoj Franklin, Multiscalar
- e.g., T.N. Vijaykumar
- dynamic MT (DMT)
- hardware-based speculative threads
- e.g., Akkary & Driscoll, "A Dynamic Multithreading Processor"
- forking threads at call sites and after loops
- 31st Annual ACM/IEEE International Symposium on Microarchitecture, http://computer.org/proceedings/micro/8609/8609toc.htm
- http://www.ee.princeton.edu/~rblee/ELE572Papers/DynamicMultithreadingProc_akkary.pdf
- Antonio Gonzales, Speculative Multithreading (SpMT), forking threads for loop bodies with data value prediction
- S. Wallace, B. Calder, and D. M. Tullsen, "Threaded multiple path execution"
- H. Wang, P. Wang, R.D. Weldon, S.M. Ettinger, H. Saito, M. Girkar, S.S-W. Liao, amnd J. Shen, "Speculative Precomputation: Exploring the Use of Multithreading for Latency." Intel Technology Journal. http://developer.intel.com/technology/itj/2002/volume06issue01/ (Feb 2002).
thread-level speculation
http://citeseer.nj.nec.com/purser00study.html
http://citeseer.nj.nec.com/luk01tolerating.html
http://citeseer.nj.nec.com/collins01dynamic.html
http://citeseer.nj.nec.com/zilles01executionbased.html
http://citeseer.nj.nec.com/reinhardt00transient.html
http://citeseer.nj.nec.com/steffan00scalable.html
http://citeseer.nj.nec.com/85835.html
http://citeseer.nj.nec.com/sohi95multiscalar.html
http://citeseer.nj.nec.com/rauchwerger95lrpd.html
Patents
- US 6,493,820: Processor having multiple program counters and trace buffers outside an execution pipeline.
- US 6,463,522: Memory system for ordering load and store instructions in a processor that performs multithread execution.
- Fischer
- US 6,260,150: Foreground and background context controller setting processor to power saving mode when all contexts are inactive
- US 6,243,736: Context controller having status-based background functional task resource allocation capability and processor employing the same
- US 6,205,468: System for multitasking management employing context controller having event vector selection by priority encoding of context events
A partial timeline of multithreaded systems
1950s
- NBS SEAC (1950) and DYSEAC (1954) - two threads
- Lincoln Labs TX-2 (late 1950s) - up to 33 threads
- Bull Gamma 60 (late 1950s)
- Honeywell 800 (late 1950s)
1960s
- CDC 6600 (1964) - multithreading in peripheral processors
- IBM ACS-360 (1968) - successor design to IBM ACS-1 was planned to be 2-way multithreaded
A second instruction counter and a second set of registers were added to the simulator to make the ACS-360 into a multithreaded design. Instructions were tagged with an additional "red/blue" bit to designate the instruction stream and register set; and, as project members had expected, the utilization of the functional units was increased since more independent instructions were available.
(apparently this same scheme was later investigated for use on the S/370 model 195)
- Mike Flynn papers on multithreading (check Spring AFIPS, 1967; ONR 1969; IFIP 1971; IEEE Computer 1972; AFIPS 1972)
[see also: M. J. Flynn and A. Podvin, "Shared Resource Multiprocessing," IEEE Computer, Mar. 1972, pp. 20-28.]
1970s
- HEP (1978) - Burton Smith
- Xerox Alto (1979) - microtasking
1980s
- HEP-2 and HEP-3 designs (a.k.a. Vulture) - Denelcor closes in 1985
- Transputer (1985)
- Horizon (1988) - Burton Smith
- Stellar GS-1000 (1988) - four-way multithreading
- ... others ... MASA ...
1990s
- Tera MTA () - Burton Smith (impl. in GaAs)
- SMT work at Univ. of Washington
- Dean Tullsen, Susan Eggers, and Henry Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," ISCA 22, June, 1995.
- Dean Tullsen, Susan Eggers, Joel Emer, Henry Levy, Jack Lo, and Rebecca L. Stamm, "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor," ISCA 23, May, 1996.
From interview with Joel Emer:
"The challenge in processor design has been getting better single-threaded performance. I hadn't seriously considered multithreading since I finished my Ph.D. on that subject in the late '70s. Then I saw this research from the University of Washington [Dean Tullsen, Susan Eggers, Henry Levy, 1995] where they proposed doing simultaneous multithreading. I was really inspired by this idea of doing things simultaneously. I thought, 'Well that's compatible with the out-of-order design style that we're moving towards. And that gives us the opportunity to develop the best single-threaded machine we can, and do multithreading as well.'
We had this persistent problem of low utilization of individual components. That meant that there was a lot of opportunity to get more performance out of the machine. Their idea was 'Hey, just fill those opportunities with instructions from another program.' I was working with Rebecca Stamm at Digital Equipment Corporation and we worked through a way of incorporating this SMT idea into an out-of-order design. We began to collaborate with the University of Washington and studying the application of SMT in an out-of-order context.
... So in the first go around, Dean [Tullsen] wrote a simulator and came back and said, 'I ran the out-of-order SMT, and I got performance that's worse than if I didn't run multiple threads.' Then he noticed that he was having resource clogging problems. One of the threads would get a huge number of instructions in some part of the machine, and it would just bottleneck that part. None of the other threads could get any work done.
We had a conference call and out of that we realized that resource allocation, or choosing what to do, was the key to getting it to work. We proposed some schemes and then explored 'How do we choose which instructions should be executing now?'
The result was a second paper on SMT that Rebecca and I coauthored with the researchers from the University of Washington. And that's the seminal paper that explains how important it is to choose which instruction stream to fetch instructions from at any point in time."
- Michael Fischer - combination of foreground (preemptive) and background (fixed time slices)
- ... others ... April/Sparcle/Alewife [IEEE Micro, June 1993], *T, ... Cal Tech Mosaic T [myricom] ...
2000s
- Cray/Tera MTA-2 (CMOS)
- Intel Pentium 4 HT
- Intel Tech Journal special issue on hyper-threading technology, Feb. 2002
Other References
- Greg Byrd and Mark Holliday, "Multithreaded processor architectures," IEEE Spectrum, 32(8):38-46, August 1995.
- Theo Ungerer, Borut Robic, Jurij Silc, "A survey of processors with explicit multithreading," ACM Computing Surveys, Volume 35, Issue 1, March 2003.
see also "Multithreaded processors," The Computer Journal, Volume 45, Issue 3, 2002, pp. 320-348.
- Iannucci, Gao, Halstead, Smith, Multithreaded Computer Architecture: A Summary of the State of the Art, Kluwer Academic Publishers, 1994
- Jon Stokes, "Introduction to Multithreading, Superthreading and Hyperthreading," Ars Technica, October 2002.