zoukankan      html  css  js  c++  java
  • 体系结构笔记(1)Fundamentals of Computer Design

     

    参考书:

    1. John L. Hennessy, David A. Patterson. Computer Architecture: A Quantitative Approach, 3ed. 机械工业出版社. 2005

    2. David A. Patterson, John L. Hennessy. Computer Organization and Design: The Hardware/Software Interface.. 3ed.  Elsevier. 2005

    3. Douglas E. Comer. Network Systems Design, Using Network Processors. 电子工业出版社. 2004

    Chapter 1: Fundamentals of Computer Design

    1, Benchmark Suites

    Desktop Benchmarks: SPEC CPU 2000(www.spec.org), Business Winstone, CC Winstone, Winbench

    Server Benchmarks: TPC(www.tpc.org)

    Embedded Benchmarks: EEMBC (www.eembc.org)

    Comparing and Summarizing Performance: arithmetic mean; weighted arithmetic mean; geometric mean;

    2, Amdahl's law

    "Overall system speed is governed by the slowest component." By Gene Amdahl, chief architect of IBM's first mainframe series and founder of Amdahl Corporation and other companies. Amdahl's law applied to networking. The slowest device in the network will determine the maximum speed of the network.

    Amdahl's law, named after computer architect Gene Amdahl, is used to find out the maximum expected improvement to an overall system when only a part of the system is improved. Amdahl's law is a demonstration of the law of diminishing returns: while one could speed up part of a computer a hundred-fold or more, if the improvement only affects 12% of the overall task, the best the speedup could possibly be is 1/(1-0.12)=1.136times faster.

    More technically, the law is concerned with the speedup achievable from an improvement to a computation that affects a proportion P of that computation where the improvement has a speedup of S. For example, if an improvement can speedup 30% of the computation, P will be 0.3; if the improvement makes the portion affected twice as fast, S will be 2. Amdahl's law states that the overall speedup of applying the improvement will be

    1/((1-P)+P/S).

    To see how this formula was derived, assume that the running time of the old computation was 1, for some unit of time. The running time of the new computation will be the length of time the unimproved fraction takes (which is 1 − P) plus the length of time the improved fraction takes. The length of time for the improved part of the computation is the length of the improved part's former running time divided by the speedup, making the length of time of the improved part P/S. The final speedup is computed by dividing the old running time by the new running time, which is what the above formula does.

    In the special case of parallelization, Amdahl's law states that if F is the fraction of a calculation that is sequential (i.e. cannot benefit from parallelisation), and (1 − F) is the fraction that can be parallelised, then the maximum speedup that can be achieved by using N processors is

    1/(F+(1+F)/N).

    In the limit, as N tends to infinity, the maximum speedup tends to 1/F. In practice, price/performance ratio falls rapidly as N is increased once (1 − F)/N is small compared to F.

    As an example, if F is only 10%, the problem can be sped up by only a maximum of a factor of 10, no matter how large the value of N used. For this reason, parallel computing is only useful for either small numbers of processors, or problems with very low values of F: so-called embarrassingly parallel problems. A great part of the craft of parallel programming consists of attempting to reduce F to the smallest possible value.

    3, CPU time, IC and IPC

    CPU time=CPU clock cycles for a program × Clock cycle time= CPU clock cycles for a program/Clock rate

    IC (instruction count)

    CPI(clock cycles per instruction)= CPU clock cycles for a program/ instruction count

    4, Principles of Locality

    Principles of Locality: Programs tend to reuse data and instructions they have used recently. A widely held rule of thumb is that a program spends 90% of its execution time in only 10% of the code.

    Temporal locality and spatial locality

    5, Others

    Hand-codes assembly is almost 10 times faster than compiler-generated high level language performance.

    Peak performance is almost 10 times faster than observed performance.

    Tuning parameters can affect results in a benchmark test

    Computer architecture is a kind of art

    版权所有,欢迎转载
  • 相关阅读:
    LInux SSH远程文件/目录传输命令scp(转载)
    Linux系统时间设置(转载)
    Linux重置mysql密码(转载)
    快速输入(简单版)
    bitset
    或 、与、异或
    bitset
    Java面向对象3(K~O)
    Java面向对象2(G~J)
    数据结构实验之栈与队列六:下一较大值(二)(SDUT 3333)
  • 原文地址:https://www.cnblogs.com/xiaotie/p/231195.html
Copyright © 2011-2022 走看看