zoukankan      html  css  js  c++  java
  • HLS Coding Style: Arrays and Data Types

     

    Data Types for Efficient Hardware

    C-based native data types are all on 8-bit boundaries (8, 16, 32, 64 bits). RTL buses
    (corresponding to hardware) support arbitrary data lengths.

    Arrays

    If you specify a very large array, it might cause C simulation to run out of memory and fail,
    as shown in the following example:

    1 #include "ap_cint.h"
    2 int i, acc;
    3 // Use an arbitrary precision type
    4 int32 la0[10000000], la1[10000000];
    5 for (i=0 ; i < 10000000; i++) {
    6 acc = acc + la0[i] + la1[i];
    7 }

    A solution is to use dynamic memory allocation for simulation but a fixed sized array for
    synthesis, as shown in the next example. This means that the memory required for this is
    allocated on the heap, managed by the OS, and which can use local disk space to grow.

     1  #include "ap_cint.h"
     2 int i, acc;
     3 #ifdef __SYNTHESIS__
     4 // Use an arbitrary precision type & array for synthesis
     5 int32 la0[10000000], la1[10000000];
     6 #else
     7 // Use an arbitrary precision type & dynamic memory for simulation
     8 int32 *la0 = malloc(10000000 * sizeof(int32));
     9 int32 *la1 = malloc(10000000 * sizeof(int32));
    10 #endif
    11 for (i=0 ; i < 10000000; i++) {
    12 acc = acc + la0[i] + la1[i];
    13 }

    Arrays are typically implemented as a memory (RAM, ROM or FIFO) after synthesis. As
    discussed in Arrays on the Interface, arrays on the top-level function interface are
    synthesized as RTL ports that access a memory outside. Arrays internal to the design are
    synthesized to internal block RAM, LUTRAM, UltraRAM, or registers, depending on the
    optimization settings.

    Cases in which arrays can create issues in the RTL include:
    • Array accesses can often create bottlenecks to performance. When implemented as a
    memory, the number of memory ports limits access to the data. Array initialization, if
    not performed carefully, can result in undesirably long reset and initialization in the
    RTL.
    • Some care must be taken to ensure arrays that only require read accesses are
    implemented as ROMs in the RTL.

    Array Accesses and Performance

    The following code example shows a case in which accesses to an array can limit
    performance in the final RTL design.

    1 #include "array_mem_bottleneck.h"
    2 dout_t array_mem_bottleneck(din_t mem[N]) {
    3 dout_t sum=0;
    4 int i;
    5 SUM_LOOP:for(i=2;i<N;++i)
    6 sum += mem[i] + mem[i-1] + mem[i-2];
    7 return sum;
    8 }

    Trying to pipeline SUM_LOOP with an initiation interval of 1 results in the following message
    (after failing to achieve a throughput of 1, Vivado HLS relaxes the constraint):

    INFO: [SCHED 61] Pipelining loop 'SUM_LOOP'.
    WARNING: [SCHED 69] Unable to schedule 'load' operation ('mem_load_2',
    bottleneck.c:62) on array 'mem' due to limited memory ports.
    INFO: [SCHED 61] Pipelining result: Target II: 1, Final II: 2, Depth: 3.

    A dual-port RAM could be used, but this allows only two accesses per clock cycle. Three
    reads are required to calculate the value of sum, and so three accesses per clock cycle are
    required to pipeline the loop with an new iteration every clock cycle.

    This code can be rewritten as shown in the following code example to allow
    the code to be pipelined with a throughput of 1. In the following code example, by
    performing pre-reads and manually pipelining the data accesses, there is only one array
    read specified in each iteration of the loop. This ensures that only a single-port RAM is
    required to achieve the performance.

     1 #include "array_mem_perform.h"
     2 dout_t array_mem_perform(din_t mem[N]) {
     3 din_t tmp0, tmp1, tmp2;
     4 dout_t sum=0;
     5 int i;
     6 tmp0 = mem[0];
     7 tmp1 = mem[1];
     8 SUM_LOOP:for (i = 2; i < N; i++) {
     9 tmp2 = mem[i];
    10 sum += tmp2 + tmp1 + tmp0;
    11 tmp0 = tmp1;
    12 tmp1 = tmp2;
    13 }
    14 return sum;
    15 }

    Vivado HLS includes optimization directives for changing how arrays are implemented and
    accessed. It is typically the case that directives can be used, and changes to the code are not
    required. Arrays can be partitioned into blocks or into their individual elements. In some
    cases, Vivado HLS partitions arrays into individual elements. This is controllable using the
    configuration settings for auto-partitioning.

    FIFO Accesses

    Accesses to a FIFO must be in sequential order starting from location zero.

    Arrays on the Interface

    Vivado HLS synthesizes arrays into memory elements by default. When you use an array as
    an argument to the top-level function, Vivado HLS assumes the following:
    •Memory is off-chip
    Vivado HLS synthesizes interface ports to access the memory.
    •Memory is standard block RAM with a latency of 1
    The data is ready 1 clock cycle after the address is supplied.

    To configure how Vivado HLS creates these ports:
    • Specify the interface as a RAM or FIFO interface using the INTERFACE directive.
    • Specify the RAM as a single or dual-port RAM using the RESOURCE directive.
    • Specify the RAM latency using the RESOURCE directive.
    • Use array optimization directives (Array_Partition, Array_Map, or
    Array_Reshape) to reconfigure the structure of the array and therefore, the number
    of I/O ports.

    Array Interfaces

    The resource directive can explicitly specify which type of RAM is used, and therefore which
    RAM ports are created (single-port or dual-port). If no resource is specified, Vivado HLS
    uses:
    • A single-port RAM by default.
    • A dual-port RAM if it reduces the initiation interval or reduces latency.

    1 #include "array_RAM.h"
    2 void array_RAM (dout_t d_o[4], din_t d_i[4], didx_t idx[4]) {
    3 int i;
    4 For_Loop: for (i=0;i<4;i++) {
    5 d_o[i] = d_i[idx[i]];
    6 }
    7 }

    A single-port RAM interface is used because the for-loop ensures that only one element
    can be read and written in each clock cycle. There is no advantage in using a dual-port RAM
    interface.


    If the for-loop is unrolled, Vivado HLS uses a dual-port. Doing so allows multiple elements
    to be read at the same time and improves the initiation interval. The type of RAM interface
    can be explicitly set by applying the resource directive.

    FIFO Interfaces

    Vivado HLS allows array arguments to be implemented as FIFO ports in the RTL. If a FIFO
    ports is to be used, be sure that the accesses to and from the array are sequential. Vivado
    HLS determines whether the accesses are sequential.

     1 #include "array_FIFO.h"
     2 void array_FIFO (dout_t d_o[4], din_t d_i[4], didx_t idx[4]) {
     3 int i;
     4 #pragma HLS INTERFACE ap_fifo port=d_i
     5 #pragma HLS INTERFACE ap_fifo port=d_o
     6 // Breaks FIFO interface d_o[3] = d_i[2];
     7 For_Loop: for (i=0;i<4;i++) {
     8 d_o[i] = d_i[idx[i]];
     9 }
    10 }

    In this case, the behavior of variable idx determines whether or not a FIFO interface can be
    successfully created.
    • If idx is incremented sequentially, a FIFO interface can be created.
    • If random values are used for idx, a FIFO interface fails when implemented in RTL.
    Because this interface might not work, Vivado HLS issues a message during synthesis and
    creates a FIFO interface.

    @W [XFORM-124] Array 'd_i': may have improper streaming access(es).
    • Note: FIFO ports cannot be synthesized for arrays that are read from and written to. Separate input and output arrays must be created.

    The following general rules apply to arrays that are implemented with a Streaming interface:

    • The array must be written and read in only one loop or function. This can be
    transformed into a point-to-point connection that matches the characteristics of FIFO
    links.
    • The array reads must be in the same order as the array write. Because random access is
    not supported for FIFO channels, the array must be used in the program following first
    in, first out semantics.
    • The index used to read and write from the FIFO must be analyzable at compile time.
    Array addressing based on run time computations cannot be analyzed for FIFO
    semantics and prevent the tool from converting an array into a FIFO.

    Array Initialization

    • RECOMMENDED: As discussed in Type Qualifiers, although not a requirement, Xilinx recommends specifying arrays that are to be implemented as memories with the static qualifier. This not only ensures that Vivado HLS implements the array with a memory in the RTL, it also allows the initialization behavior of static types to be used.

    In the following code, an array is initialized with a set of values. Each time the function is
    executed, array coeff is assigned these values. After synthesis, each time the design
    executes the RAM that implements coeff is loaded with these values. For a single-port
    RAM this would take 8 clock cycles. For an array of 1024, it would of course, take 1024 clock
    cycles, during which time no operations depending on coeff could occur.

    int coeff[8] = {-2, 8, -4, 10, 14, 10, -4, 8, -2};

    The following code uses the static qualifier to define array coeff. The array is initialized
    with the specified values at start of execution. Each time the function is executed, array
    coeff remembers its values from the previous execution. A static array behaves in C code
    as a memory does in RTL.

    static int coeff[8] = {-2, 8, -4, 10, 14, 10, -4, 8, -2};

    In addition, if the variable has the static qualifier, Vivado HLS initializes the variable in the
    RTL design and in the FPGA bitstream. This removes the need for multiple clock cycles to
    initialize the memory and ensures that initializing large memories is not an operational
    overhead.

    Implementing ROMs

    Vivado HLS does not require that an array be specified with the static qualifier to
    synthesize a memory or the const qualifier to infer that the memory should be a ROM.
    Vivado HLS analyzes the design and attempts to create the most optimal hardware.


    Xilinx highly recommends using the static qualifier for arrays that are intended to be
    memories. As noted in Array Initialization, a static type behaves in an almost identical
    manner as a memory in RTL.


    The const qualifier is also recommended when arrays are only read, because Vivado HLS
    cannot always infer that a ROM should be used by analysis of the design. The general rule
    for the automatic inference of a ROM is that a local, static (non-global) array is written to
    before being read. The following practices in the code can help infer a ROM:
    • Initialize the array as early as possible in the function that uses it.
    • Group writes together.
    • Do not interleave array(ROM) initialization writes with non-initialization code.
    • Do not store different values to the same array element (group all writes together in
    the code).
    • Element value computation must not depend on any non-constant (at compile-time)
    design variables, other than the initialization loop counter variable.


    If complex assignments are used to initialize a ROM (for example, functions from the
    math.h library), placing the array initialization into a separate function allows a ROM to be
    inferred. In the following example, array sin_table[256] is inferred as a memory and
    implemented as a ROM after RTL synthesis.

     1 #include "array_ROM_math_init.h"
     2 #include <math.h>
     3 void init_sin_table(din1_t sin_table[256])
     4 {
     5 int i;
     6 for (i = 0; i < 256; i++) {
     7 dint_t real_val = sin(M_PI * (dint_t)(i - 128) / 256.0);
     8 sin_table[i] = (din1_t)(32768.0 * real_val);
     9 }
    10 }
    11 dout_t array_ROM_math_init(din1_t inval, din2_t idx)
    12 {
    13 short sin_table[256];
    14 init_sin_table(sin_table);
    15 return (int)inval * (int)sin_table[idx];
    16 }
    • TIP: Because the result of the sin() function results in constant values, no core is required in the RTL design to implement the sin() function.

    Data Types

    Standard Types

    Composite Data Types

    Vivado HLS supports composite data types for synthesis:
    • struct
    • enum
    • union

    Structs

     When structs are used as arguments to the top-level function, the ports created by
    synthesis are a direct reflection of the struct members. Scalar members are implemented as
    standard scalar ports and arrays are implemented, by default, as memory ports.

     In this design example, struct data_t is defined in the header file shown in the
    following code example. This struct has two data members:
    • An unsigned vector A of type short (16-bit).
    • An array B of four unsigned char types (8-bit).

    1 typedef struct {
    2 unsigned short A;
    3 unsigned char B[4];
    4 } data_t;
    5 data_t struct_port(data_t i_val, data_t *i_pt, data_t *o_pt);

    In the following code example, the struct is used as both a pass-by-value argument (from
    i_val to the return of o_val) and as a pointer (*i_pt to *o_pt).

     1 #include "struct_port.h"
     2 data_t struct_port(
     3 data_t i_val,
     4 data_t *i_pt,
     5 data_t *o_pt
     6 ) {
     7 data_t
     8 int i;
     9 o_val;
    10 // Transfer pass-by-value structs
    11 o_val.A = i_val.A+2;
    12 for (i=0;i<4;i++) {
    13 o_val.B[i] = i_val.B[i]+2;
    14 }
    15 // Transfer pointer structs
    16 o_pt->A = i_pt->A+3;
    17 for (i=0;i<4;i++) {
    18 o_pt->B[i] = i_pt->B[i]+3;
    19 }
    20 return o_val;
    21 }

    All function arguments and the function return are synthesized into ports as follows:
    • Struct element A results in a 16-bit port.
    • Struct element B results in a RAM port, accessing 4 elements.

    Global Variables
    Global variables can be freely used in the code and are fully synthesizable. By default, global
    variables are not exposed as ports on the RTL interface.

    The following code example shows the default synthesis behavior of global variables. It uses
    three global variables.
    • Values are read from array Ain.
    • Array Aint is used to transform and pass values from Ain to Aout.


    The outputs are written to array Aout.

     1 din_t Ain[N];
     2 din_t Aint[N];
     3 dout_t Aout[N/2];
     4 void types_global(din1_t idx) {
     5 int i,lidx;
     6 // Move elements in the input array
     7 for (i=0; i<N; ++i) {
     8 lidx=i;
     9 if(lidx+idx>N-1)
    10 lidx=i-N;
    11 Aint[lidx] = Ain[lidx+idx] + Ain[lidx];
    12 }
    13 // Sum to half the elements
    14 for (i=0; i<(N/2); i++) {
    15 Aout[i] = (Aint[i] + Aint[i+1])/2;
    16 }
    17 }

    By default, after synthesis, the only port on the RTL design is port idx. Global variables are
    not exposed as RTL ports by default. In the default case:
    • Array Ain is an internal RAM that is read from.
    • Array Aout is an internal RAM that is written to.

    Pointers

    Reference:

    1. Xilinx UG902

  • 相关阅读:
    一些java的基础知识
    android基础AlertDialog使用
    Js+XML 操作 [ZT]
    [ASP.NET2.0] asp.net在ie7中使用FileUpload上传前预览图片 [ZT]
    C#对图片的几种简单处理 [ZT]
    使用 Bulk Copy 将大量数据复制到数据库 [ZT]
    html中name和id的区别 [ZT]
    两个分页存储过程
    C#常用的文件操作 (转)
    JSON
  • 原文地址:https://www.cnblogs.com/wordchao/p/10950110.html
Copyright © 2011-2022 走看看