zoukankan      html  css  js  c++  java
  • 工具/插件 -- CACTI:一种Cache/Memory分析工具

    工具/插件 -- CACTI:一种Cache/Memory分析工具

    @(工具/插件)

    最近发现了一种可以评估DRAM访存功耗的工具,对于需要分析片外存储(DRAM)的访存功耗以及延时的设计比较有用,例如:深度学习加速器设计。

    1. 简介

    CACTI是一种分析工具,它接受一组 Caches/Memory参数作为输入,并计算其访存时间、功耗、周期时间和面积。目前更新到7.0版本,并且支持下面几种Memory的分析:

    • direct mapped caches
    • set-associative caches
    • fully associative caches
    • Embedded DRAM memories
    • Commodity DRAM memories

    此外,还有以下功能:

    • 支持multi-ported uniform cache access (UCA)和multi-banked, multi-ported non-uniform cache access (NUCA).

    • 泄漏功耗的计算也考虑到了环境温度。

    • Router power model.

    • Interconnect model with different delay, power, and area properties including low-swing wire model.

    • An interface to perform trade-off analysis involving power, delay,area, and bandwidth.

    • All process specific values used by the tool are obtained from ITRS and currently, the tool supports 90nm, 65nm, 45nm, and 32nm technology nodes.

    • Chip IO model to calculate latency and energy for DDR bus. Users can model different loads (fan-outs) and evaluate the impact on frequency and energy. This model can be used to study LR-DIMMs, R-DIMMs, etc.

    2. 使用

    源码地址:https://github.com/HewlettPackard/cacti
    技术文档: http://www.hpl.hp.com/techreports/2013/HPL-2013-79.pdf

    在Windows上没调起来(windows上c++库缺少pthread,没找到比较简单的方法),后面直接在Centos上测试,下面是简单的使用方法:

    1. 从源码地址下载c++源码,放到centos系统下。
    2. 进入源码文件夹,直接在命令行里make
    3. 生成名为cacti的可执行文件后,执行
      ./cacti -infile ***.cfg
      其中.cfg文件是配置memory属性的文件,需要根据所使用的DRAM属性进行更改,这里我直接拿了他sample里的一个配置文件运行了:./cacti -infile sample_config_files/ddr3_cache.cfg

    最后会得到一个详细的分析文档,这边贴一下:

    Cache size                    : 8388608
    Block size                    : 64
    Associativity                 : 8
    Read only ports               : 0
    Write only ports              : 0
    Read write ports              : 1
    Single ended read ports       : 0
    Cache banks (UCA)             : 1
    Technology                    : 0.022
    Temperature                   : 360
    Tag size                      : 42
    array type                    : Cache
    Model as memory               : 0
    Model as 3D memory       	 : 0
    Access mode                   : 0
    Data array cell type          : 0
    Data array peripheral type    : 0
    Tag array cell type           : 0
    Tag array peripheral type     : 0
    Optimization target           : 2
    Design objective (UCA wt)     : 0 0 0 100 0
    Design objective (UCA dev)    : 20 100000 100000 100000 100000
    Cache model                   : 0
    Nuca bank                     : 0
    Wire inside mat               : 1
    Wire outside mat              : 1
    Interconnect projection       : 1
    Wire signaling               : 1
    Print level                   : 1
    ECC overhead                  : 1
    Page size                     : 8192
    Burst length                  : 8
    Internal prefetch width       : 8
    Force cache config            : 0
    Subarray Driver direction       : 1
    iostate                       : READ
    dram_ecc                      : NO_ECC
    io_type                     : DDR3
    dram_dimm                      : UDIMM
    IO Area (sq.mm) = inf
    IO Timing Margin (ps) = 35.8333
    IO Votlage Margin (V) = 0.155
    IO Dynamic Power (mW) = 1282.42 PHY Power (mW) = 232.752 PHY Wakeup Time (us) = 27.503
    IO Termination and Bias Power (mW) = 3136.7
    
    ---------- CACTI (version 7.0.3DD Prerelease of Aug, 2012), Uniform Cache Access SRAM Model ----------
    
    Cache Parameters:
        Total cache size (bytes): 8388608
        Number of banks: 1
        Associativity: 8
        Block size (bytes): 64
        Read/write Ports: 1
        Read ports: 0
        Write ports: 0
        Technology size (nm): 22
    
        Access time (ns): 3.03414
        Cycle time (ns):  1.84197
        Total dynamic read energy per access (nJ): 0.381869
        Total dynamic write energy per access (nJ): 0.446873
        Total leakage power of a bank (mW): 2520.29
        Total gate leakage power of a bank (mW): 4.71441
        Cache height x width (mm): 3.07383 x 2.89775
    
        Best Ndwl : 8
        Best Ndbl : 8
        Best Nspd : 2
        Best Ndcm : 1
        Best Ndsam L1 : 8
        Best Ndsam L2 : 1
    
        Best Ntwl : 16
        Best Ntbl : 8
        Best Ntspd : 8
        Best Ntcm : 1
        Best Ntsam L1 : 8
        Best Ntsam L2 : 2
        Data array, H-tree wire type: Global wires with 30% delay penalty
        Tag array, H-tree wire type: Global wires with 30% delay penalty
    
    Time Components:
    
      Data side (with Output driver) (ns): 3.03414
    	H-tree input delay (ns): 0.860695
    	Decoder + wordline delay (ns): 0.607741
    	Bitline delay (ns): 0.473783
    	Sense Amplifier delay (ns): 0.00189739
    	H-tree output delay (ns): 1.09002
    
      Tag side (with Output driver) (ns): 0.866708
    	H-tree input delay (ns): 0.250295
    	Decoder + wordline delay (ns): 0.0962495
    	Bitline delay (ns): 0.078
    	Sense Amplifier delay (ns): 0.00189739
    	Comparator delay (ns): 0.0162774
    	H-tree output delay (ns): 0.440265
    
    
    Power Components:
    
      Data array: Total dynamic read energy/access  (nJ): 0.360657
    	Total energy in H-tree (that includes both address and data transfer) (nJ): 0.270396
    	Output Htree inside bank Energy (nJ): 0.263979
    	Decoder (nJ): 0.000237668
    	Wordline (nJ): 0.000275334
    	Bitline mux & associated drivers (nJ): 0
    	Sense amp mux & associated drivers (nJ): 0
    	Bitlines precharge and equalization circuit (nJ): 0.00163006
    	Bitlines (nJ): 0.0612354
    	Sense amplifier energy (nJ): 0.0018371
    	Sub-array output driver (nJ): 0.0249178
    	Total leakage power of a bank (mW): 2357.99
    	Total leakage power in H-tree (that includes both address and data network) ((mW)): 18.9776
    	Total leakage power in cells (mW): 0
    	Total leakage power in row logic(mW): 0
    	Total leakage power in column logic(mW): 0
    	Total gate leakage power in H-tree (that includes both address and data network) ((mW)): 0.0916133
    
      Tag array:  Total dynamic read energy/access (nJ): 0.0212128
    	Total leakage read/write power of a bank (mW): 162.298
    	Total energy in H-tree (that includes both address and data transfer) (nJ): 0.00268136
    	Output Htree inside a bank Energy (nJ): 0.00104879
    	Decoder (nJ): 0.000585105
    	Wordline (nJ): 0.000356972
    	Bitline mux & associated drivers (nJ): 0
    	Sense amp mux & associated drivers (nJ): 0.000288214
    	Bitlines precharge and equalization circuit (nJ): 0.00153419
    	Bitlines (nJ): 0.0132631
    	Sense amplifier energy (nJ): 0.00155643
    	Sub-array output driver (nJ): 8.13397e-05
    	Total leakage power of a bank (mW): 162.298
    	Total leakage power in H-tree (that includes both address and data network) ((mW)): 0.23223
    	Total leakage power in cells (mW): 0
    	Total leakage power in row logic(mW): 0
    	Total leakage power in column logic(mW): 0
    	Total gate leakage power in H-tree (that includes both address and data network) ((mW)): 0.00146699
    
    
    Area Components:
    
      Data array: Area (mm2): 7.28836
    	Height (mm): 3.07383
    	Width (mm): 2.3711
    	Area efficiency (Memory cell area/Total area) - 73.1983 %
    		MAT Height (mm): 0.716448
    		MAT Length (mm): 0.540768
    		Subarray Height (mm): 0.328909
    		Subarray Length (mm): 0.26532
    
      Tag array: Area (mm2): 0.377107
    	Height (mm): 0.716051
    	Width (mm): 0.526648
    	Area efficiency (Memory cell area/Total area) - 74.9106 %
    		MAT Height (mm): 0.173381
    		MAT Length (mm): 0.063873
    		Subarray Height (mm): 0.0822272
    		Subarray Length (mm): 0.027995
    
    Wire Properties:
    
      Delay Optimal
    	Repeater size - 42.0297 
    	Repeater spacing - 0.0329013 (mm) 
    	Delay - 0.216837 (ns/mm) 
    	PowerD - 0.000279845 (nJ/mm) 
    	PowerL - 0.0215298 (mW/mm) 
    	PowerLgate - 9.15623e-05 (mW/mm)
    	Wire width - 0.022 microns
    	Wire spacing - 0.022 microns
    
      5% Overhead
    	Repeater size - 17.0297 
    	Repeater spacing - 0.0329013 (mm) 
    	Delay - 0.226875 (ns/mm) 
    	PowerD - 0.0001818 (nJ/mm) 
    	PowerL - 0.00872349 (mW/mm) 
    	PowerLgate - 3.70994e-05 (mW/mm)
    	Wire width - 0.022 microns
    	Wire spacing - 0.022 microns
    
      10% Overhead
    	Repeater size - 15.0297 
    	Repeater spacing - 0.0329013 (mm) 
    	Delay - 0.235988 (ns/mm) 
    	PowerD - 0.000174237 (nJ/mm) 
    	PowerL - 0.00769899 (mW/mm) 
    	PowerLgate - 3.27424e-05 (mW/mm)
    	Wire width - 0.022 microns
    	Wire spacing - 0.022 microns
    
      20% Overhead
    	Repeater size - 12.0297 
    	Repeater spacing - 0.0329013 (mm) 
    	Delay - 0.257722 (ns/mm) 
    	PowerD - 0.00016297 (nJ/mm) 
    	PowerL - 0.00616223 (mW/mm) 
    	PowerLgate - 2.62069e-05 (mW/mm)
    	Wire width - 0.022 microns
    	Wire spacing - 0.022 microns
    
      30% Overhead
    	Repeater size - 10.0297 
    	Repeater spacing - 0.0329013 (mm) 
    	Delay - 0.28134 (ns/mm) 
    	PowerD - 0.000155511 (nJ/mm) 
    	PowerL - 0.00513773 (mW/mm) 
    	PowerLgate - 2.18498e-05 (mW/mm)
    	Wire width - 0.022 microns
    	Wire spacing - 0.022 microns
    
      Low-swing wire (1 mm) - Note: Unlike repeated wires, 
    	delay and power values of low-swing wires do not
    	have a linear relationship with length. 
    	delay - 0.0902442 (ns) 
    	powerD - 2.8399e-06 (nJ) 
    	PowerL - 1.71796e-07 (mW) 
    	PowerLgate - 1.29017e-09 (mW)
    	Wire width - 4.4e-08 microns
    	Wire spacing - 4.4e-08 microns
    
    
    Segmentation fault
    
    

    其中

    Cache Parameters:
        Total dynamic read energy per access (nJ): 0.381869
        Total dynamic write energy per access (nJ): 0.446873
    

    给出了单次的读写功耗。

    具体的配置文件相关条目的说明可以翻阅上面提到的技术文档,后面有时间再研究一下。

  • 相关阅读:
    try catch finally
    类的小练习
    易混淆概念总结
    C#中struct和class的区别详解
    Doing Homework again
    悼念512汶川大地震遇难同胞——老人是真饿了
    Repair the Wall
    Saving HDU
    JAVA-JSP隐式对象
    JAVA-JSP动作
  • 原文地址:https://www.cnblogs.com/lyc-seu/p/12934186.html
Copyright © 2011-2022 走看看