zoukankan      html  css  js  c++  java
  • Operating System: Three Easy Pieces --- Lock Concurrent Data Structures (Note)

    Before moving beyong locks, we will first describe how to use locks in some common data

    structures. Adding locks to a data structure to make it usable by threads makes the structure

    thread safe. Of course, exactly how such locks are added determines both the correctness

    and performance of the data structure. And thus, our challenge:

              CRUX: How To Add Locks To Data Structures

    When given a particular data structure, how shoule we add locks to it, in order to make it work

    correctly? Further, how do we add locks such that the data structure yields high performance,

    enabling mang threads to access the structure at once, i.e., concurrently?

    Of course, we will be hard pressed to cover all data structures or all methods for adding 

    concurrency, as this is a topic that has been studied for years, with literally thousands of 

    research papers published about it. Thus, we hope to provide a sufficient introduction to the 

    type of thinking required, and refer you to some good resources of material for further inquiry

    on your own. We found Moir and Shavit's survey to be a great source of information.

                  Concurrent Counters

    One of the simplest data structures is a counter, it is a structure that is commonly used and has

    a simple interface. We define a simple nonconcurrent counter in Figure 29.1.

                  Simple But Not Scale

    As you can see, the non-synchronized counter is a trivial data structure, requiring a tiny amount

    of code to implement. We now have our next challenge: how can we make this code thread safe

    ? Figure 29.2 shows how we do so.

    This concurrent counter is simple and works correctly. In face, it follows a design pattern

    common to the simplest and most basic concurrent data structures: it simply adds a single 

    lock, which is acquired when calling a routne that manipulates the data structure, and is

    released when returning  from the call. In this manner, it is similar to a data structure built

    with monitors, wherer locks are acquired and released automatically as you call and return

    from object methods.

    At this point, you have a working concurrent data structure. The problem you might have is

    performance. If your data structure is too slow, you will have to do more than just add a single

    lock; such optimization, if needed, are thus the topic of the rest of the chapter. Note that if the

    data structure is not too slow, you are done! No need to do something fancy if something simple

    will work.

    To understand the performance costs of the simple approach, we run a benchmark in which

    each thread updates a single shared counter a fixed number of times; we then vary the number

    of threads. Figure 29.3 shows the total time taken, with one or four threads active; each

    thread updates the counter one million times. This experiment was run upon an iMac with four

    Intel 2.7GHz i5 CPUs; with more CPUs active, we hope to get more total work done per unit 

    time.

    From the top line in the figure (labeled precise), you can see that the performance of the

    synchronized counter scales poorly. Whereas a single thread can complete the million counter

    updates in a tiny amount of time (roughly 0.03 second), having two threads each update

    the counter one million times concurrently leads to a massive slowdown (taking over 5

    seconds!). It only gets worse with more threads.

    Ideally, you would like to see the threads complete just as quickly on multiple processors as the

    single thread does on one. Achieving this end is called perfect scaling; even though more work

    is done, it is done in parallel, and hence the time taken to complete the task is not increased.

                        Scalable Counting

    Amazingly, researchers have studied how to build more scalable counters for yeas. Even more

    amazing is the fact that scalable counters matter, as recent work in operating system 

    performance analysis has shown; without scalable counting, some workloads running on Linux

    suffer from serious scalability problems on multicore machines.

    Though many techniques have been developed to attacj this problem, we will now describe one 

    particular apparoach. The idea, introduced in recent research, is known as a sloppy counter.

    The sloppy counter works by representing a single logical counter with numerous local physical

    counters, one per CPU core, as well as a single global counter. Specifically, on a machine with

    four CPUs, there are four local counters and one global one. In addition to these counters, 

    there are also locks: one for each local counter, and one for global counter.

    The basic idea of sloppy counting is as follows. When a thread running on a given core wishes

    to increment the counter, it increments its local counter; access to this local counter is 

    synchronized via the corresponding local lock. Because each CPU has its own local counter, 

    threads across CPUs can update local counters without contention, and thus counter updates

    are scalable.

    However, to keep the global counter up to date (in case a thread wishes to read its value), the

    local values are periodically transferred to the global counter, by acquiring the global lock and

    incrementing it by the local counter's value; the local counter is then reset to zero.

    How often this local-to-global transfer occurs is determined by a threshold, which we call S here

    (for sloppiness). The smaller S is, the more the counter behaves like the non-scalable counter

    above; the bigger S is, the more scalable the counter, but the further off the the global value

    might be from the actual count. One could simply acquire all the local locks and the global lock (

    in a specified order, to avoid deadlock) to get an exact value, but that is not scalable.

    To make this clear, let's look at an example. In this example, the threshold S is set to 5, and

    there are threads on each of four CPUs updating their local counters L1, L2, L3, and L4. The

    global counter value (G) is also shown in the trace, with time increasing downward. At each

    time step, a local counter may be incremented; if the local value reaches the threshold S, the

    local value is transferred to the global counter and the local counter is reset.

    The lower line in Figure 29.3 (labeled sloppy, on page 3) shows the performance of sloppy

    counters with a threshold S of 1024. Performance is excellent; the time taken to update the

    counter four million times on four processors is hardly higher than the time taken to update it

    one million times on one processor.

    Figure 29.6 shows the importance of threshold value S, with four thread each incrementing

    the counter 1 million times on four CPUs. If S is low, performance is poor (but the global

    count is always quite accurate); if S is high, performance is excellent, but the global count

    lags (by the number of CPus multiplied by S). This accuracy/performance trade-off is what

    sloppy counters enables.

    A rough version of such a sloppy counter is found is in Figure 29.5. Read it, or better yet,

    run it yourself in some experiments to better understand how it works.

  • 相关阅读:
    接口测试常见bug
    软件测试面试题含答案
    每个测试都该知道的测试用例方法及细节设计
    从“如何测试一个杯子”理解功能、界面、性能、安全测试?
    小白必看:测试人有必要参考的软件测试工作规范
    经验分享:给软件测试人员15个最好的测试管理工具
    DFS路径规划
    Trian(列车调度)
    GAIA
    CSWS_E_ROB深度估计方法
  • 原文地址:https://www.cnblogs.com/miaoyong/p/4932054.html
Copyright © 2011-2022 走看看