zoukankan      html  css  js  c++  java
  • HLS Interface

    Managing Interfaces

    Vivado HLS supports two solutions for specifying the type of I/O protocol used:
    • Interface Synthesis, where the port interface is created based on efficient industry
    standard interfaces.
    • Manual interface specification where the interface behavior is explicitly described in
    the input source code. This allows any arbitrary I/O protocol to be used.

    Interface Synthesis

    1 #include "sum_io.h"
    2 dout_t sum_io(din_t in1, din_t in2, dio_t *sum) {
    3 dout_t temp;
    4 *sum = in1 + in2 + *sum;
    5 temp = in1 + in2;
    6 return
    7 temp;
    8 }

    This example above includes:
    • Two pass-by-value inputs in1 and in2.
    • A pointer sum that is both read from and written to.
    • A function return, the value of temp.

    Vivado HLS creates three types of ports on the RTL design:
    • Clock and Reset ports: ap_clk and ap_rst.
    • Block-Level interface protocol. These are shown expanded in the preceding figure:
    ap_start, ap_done, ap_ready, and ap_idle.
    • Port Level interface protocols. These are created for each argument in the top-level
    function and the function return (if the function returns a value). In this example, these
    ports are: in1, in2, sum_i, sum_o, sum_o_ap_vld, and ap_return.

    • The design starts when ap_start is asserted High.
    • The ap_idle signal is asserted Low to indicate the design is operating.
    • The input data is read at any clock after the first cycle. Vivado HLS schedules when the
    reads occur. The ap_ready signal is asserted high when all inputs have been read.
    • When output sum is calculated, the associated output handshake (sum_o_ap_vld)
    indicates that the data is valid.
    • When the function completes, ap_done is asserted. This also indicates that the data on
    ap_return is valid.
    • Port ap_idle is asserted High to indicate that the design is waiting start again.

    Clock and Reset Ports

    Block-Level Interface Protocol

    By default, a block-level interface protocol is added to the design. These signal control the
    block, independently of any port-level I/O protocols. These ports control when the block
    can start processing data (ap_start), indicate when it is ready to accept new inputs
    (ap_ready) and indicate if the design is idle (ap_idle) or has completed operation
    (ap_done).

    Port-Level Interface Protocol

    The I/O protocol created depends on the type
    of C argument and on the default. After the block-level protocol has been used to start the operation of the
    block, the port-level IO protocols are used to sequence data into and out of the block.
    By default input pass-by-value arguments and pointers are implemented as simple wire
    ports with no associated handshaking signal. In the above example, the input ports are
    therefore implemented without an I/O protocol, only a data port. If the port has no I/O
    protocol, (by default or by design) the input data must be held stable until it is read.

    By default output pointers are implemented with an associated output valid signal to
    indicate when the output data is valid. In the above example, the output port is
    implemented with an associated output valid port (sum_o_ap_vld) which indicates when the
    data on the port is valid and can be read. If there is no I/O protocol associated with the
    output port, it is difficult to know when to read the data. It is always a good idea to use an
    I/O protocol on an output.

    Function arguments which are both read from and writes to are split into separate input and
    output ports. In the above example, sum is implemented as input port sum_i and output
    port sum_o with associated I/O protocol port sum_o_ap_vld.

    If the function has a return value, an output port ap_return is implemented to provide the
    return value. When the design completes one transaction - this is equivalent to one
    execution of the C function - the block-level protocols indicate the function is complete
    with the ap_done signal. This also indicates the data on port ap_return is valid and can
    be read.

    Interface Synthesis I/O Protocols

    The type of interfaces that are created by interface synthesis depend on the type of C
    argument, the default interface mode, and the INTERFACE optimization directive. The
    following figure shows the interface protocol mode you can specify on each type of C
    argument. This figure uses the following abbreviations:
    •D: Default interface mode for each type.
    • I: Input arguments, which are only read.
    • O: Output arguments, which are only written to.
    • I/O: Input/Output arguments, which are both read and written.

    Block-Level Interface Protocols

    Vivado HLS uses the interface types ap_ctrl_none, ap_ctrl_hs, and ap_ctrl_chain
    to specify whether the RTL is implemented with block-level handshake signals. Block-level
    handshake signals specify the following:
    • When the design can start to perform the operation
    • When the operation ends
    • When the design is idle and ready for new inputs
    You can specify these block-level I/O protocols on the function or the function return. If the
    C code does not return a value, you can still specify the block-level I/O protocol on the
    function return.

    After reset, the following occurs:
    1. The block waits for ap_start to go High before it begins operation.
    2. Output ap_idle goes Low immediately to indicate the design is no longer idle.

    3. The ap_start signal must remain High until ap_ready goes High. Once ap_ready
    goes High:
    ° If ap_start remains High the design will start the next transaction.
    ° If ap_start is taken Low, the design will complete the current transaction and halt
    operation.
    4. Data can be read on the input ports.
    Note: The input ports can use a port-level I/O protocol that is independent of this block-level
    I/O protocol. For details, see Port-Level I/O Protocols.
    5. Data can be written to the output ports.
    Note: The output ports can use a port-level I/O protocol that is independent of this block-level
    I/O protocol. For details, see Port-Level I/O Protocols.
    6. Output ap_done goes High when the block completes operation.
    Note: If there is an ap_return port, the data on this port is valid when ap_done is High.
    Therefore, the ap_done signal also indicates when the data on output ap_return is valid.
    7. When the design is ready to accept new inputs, the ap_ready signal goes High.
    Following is additional information about the ap_ready signal:
    ° The ap_ready signal is inactive until the design starts operation.
    ° In non-pipelined designs, the ap_ready signal is asserted at the same time as
    ap_done.
    ° In pipelined designs, the ap_ready signal might go High at any cycle after
    ap_start is sampled High. This depends on how the design is pipelined.
    ° If the ap_start signal is Low when ap_ready is High, the design executes until
    ap_done is High and then stops operation.
    ° If the ap_start signal is High when ap_ready is High, the next transaction starts
    immediately, and the design continues to operate.
    8. The ap_idle signal indicates when the design is idle and not operating. Following is
    additional information about the ap_idle signal:
    ° If the ap_start signal is Low when ap_ready is High, the design stops operation,
    and the ap_idle signal goes High one cycle after ap_done.
    ° If the ap_start signal is High when ap_ready is High, the design continues to
    operate, and the ap_idle signal remains Low.

    Port-Level Interface Protocols: AXI4 Interfaces

    • AXI4-Stream interface: Specify on input arguments or output arguments only, not on
    input/output arguments.
    • AXI4-Lite interface: Specify on any type of argument except arrays. You can group
    multiple arguments into the same AXI4-Lite interface.
    • AXI4 master interface: Specify on arrays and pointers(and references in C++) only. You
    can group multiple arguments into the same AXI4 interface.

    Port-Level Interface Protocols: No I/O Protocol

    The ap_none and ap_stable modes specify that no I/O protocol be added to the port.
    When these modes are specified the argument is implemented as a data port with no other
    associated signals. The ap_none mode is the default for scalar inputs. The ap_stable
    mode is intended for configuration inputs which only change when the device is in reset
    mode.

    Port-Level Interface Protocols: Wire Handshakes

    The ap_hs port-level I/O protocol provides the greatest flexibility in the development
    process, allowing both bottom-up and top-down design flows. Two-way handshakes safely
    perform all intra-block communication, and manual intervention or assumptions are not
    required for correct operation. The ap_hs port-level I/O protocol provides the following
    signals:
    • Data port
    • Acknowledge signal to indicate when data is consumed
    • Valid signal to indicate when data is read

    For inputs, the following occurs:
    • After start is applied, the block begins normal operation.
    • If the design is ready for input data but the input valid is Low, the design stalls and
    waits for the input valid to be asserted to indicate a new input value is present.
    Note: The preceding figure shows this behavior. In this example, the design is ready to read data
    input in on clock cycle 4 and stalls waiting for the input valid before reading the data.

    When the input valid is asserted High, an output acknowledge is asserted High to
    indicate the data was read.

    For outputs, the following occurs:
    • After start is applied, the block begins normal operation.
    • When an output port is written to, its associated output valid signal is simultaneously
    asserted to indicate valid data is present on the port.
    • If the associated input acknowledge is Low, the design stalls and waits for the input
    acknowledge to be asserted.
    • When the input acknowledge is asserted, the output valid is deasserted on the next
    clock edge.

    Port-Level Interface Protocols: Memory Interfaces

    ap_memory, bram

    The ap_memory and bram interface port-level I/O protocols are used to implement array
    arguments. This type of port-level I/O protocol can communicate with memory elements
    (for example, RAMs and ROMs) when the implementation requires random accesses to the
    memory address locations.

    After reset, the following occurs:
    • After start is applied, the block begins normal operation.
    • Reads are performed by applying an address on the output address ports while
    asserting the output signal d_ce.
    Note: For a default block RAM, the design expects the input data d_q0 to be available in the
    next clock cycle. You can use the RESOURCE directive to indicate the RAM has a longer read
    latency.

    Write operations are performed by asserting output ports d_ce and d_we while
    simultaneously applying the address and output data d_d0.

    ap_fifo
    An ap_fifo interface is the most hardware-efficient approach when the design requires
    access to a memory element and the access is always performed in a sequential manner,
    that is, no random access is required. The ap_fifo port-level I/O protocol supports the
    following:
    • Allows the port to be connected to a FIFO
    • Enables complete, two-way empty-full communication
    • Works for arrays, pointers, and pass-by-reference argument types

    For inputs, the following occurs:
    • After start is applied, the block begins normal operation.
    • If the input port is ready to be read but the FIFO is empty as indicated by input port
    in_empty_n Low, the design stalls and waits for data to become available.
    • When the FIFO contains data as indicated by input port in_empty_n High, an output
    acknowledge in_read is asserted High to indicate the data was read in this cycle.

    For outputs, the following occurs:
    • After start is applied, the block begins normal operation.
    • If an output port is ready to be written to but the FIFO is full as indicated by
    out_full_n Low, the data is placed on the output port but the design stalls and waits
    for the space to become available in the FIFO.
    • When space becomes available in the FIFO as indicated by out_full_n High, the
    output acknowledge signal out_write is asserted to indicate the output data is valid.
    • If the top-level function or the top-level loop is pipelined using the -rewind option,
    Vivado HLS creates an additional output port with the suffix _lwr. When the last write
    to the FIFO interface completes, the _lwr port goes active-High.

    Interface Synthesis and Structs

    Structs on the interface are by default de-composed into their member elements and ports
    are implemented separately for each member element.

    Arrays of structs are implemented as multiple arrays, with a separate array for each member
    of the struct.

    The DATA_PACK optimization directive is used for packing all the elements of a struct into a
    single wide vector. This allows all members of the struct to be read and written to
    simultaneously.

    The single wide-vector created by using the DATA_PACK directive allows more data to be
    accessed in a single clock cycle. This is the case when the struct contains an array. When
    data can be accessed in a single clock cycle, Vivado HLS automatically unrolls any loops
    consuming this data, if doing so improves the throughput. The loop can be fully or partially
    unrolled to create enough hardware to consume the additional data in a single clock cycle.

    If a struct port using DATA_PACK is to be implemented with an AXI4 interface you may wish
    to consider using the DATA_PACK byte_pad option. The byte_pad option is used to
    automatically align the member elements to 8-bit boundaries. This alignment is sometimes
    required by Xilinx IP.

    For the following example code, the options for implementing a struct port are shown in
    the following figure.

    1 typedef struct{
    2 int12 A;
    3 int18 B[4];
    4 int6 C;
    5 } my_data;
    6 void foo(my_data *a )

    • By default, the members are implemented as individual ports. The array has multiple
    ports (data, addr, etc.)
    • Using DATA_PACK results in a single wide port.
    • Using DATA_PACK with struct_level byte padding aligns entire struct to the next
    8-bit boundary.

    •Using DATA_PACK with field_level byte padding aligns each struct member to the
    next 8-bit boundary.

    Interface Synthesis and Multi-Access Pointers

    Using pointers which are accessed multiple times can introduce unexpected behavior after
    synthesis. In the following example pointer d_i is read four times and pointer d_o is
    written to twice: the pointers perform multiple accesses.

     1 #include "pointer_stream_bad.h"
     2 void pointer_stream_bad ( dout_t *d_o,
     3 din_t acc = 0;
     4 acc +=
     5 acc +=
     6 *d_o =
     7 acc +=
     8 acc +=
     9 *d_o =
    10 din_t *d_i) {
    11 *d_i;
    12 *d_i;
    13 acc;
    14 *d_i;
    15 *d_i;
    16 acc;
    17 }

    After synthesis this code will result in an RTL design which reads the input port once and
    writes to the output port once. As with any standard C compiler, Vivado HLS will optimize
    away the redundant pointer accesses. To implement the above code with the “anticipated”
    4 reads on d_i and 2 writes to the d_o the pointers must be specified as volatile as
    shown in the next example.

     1 #include "pointer_stream_better.h"
     2 void pointer_stream_better ( volatile dout_t *d_o,
     3 din_t acc = 0;
     4 acc +=
     5 acc +=
     6 *d_o =
     7 acc +=
     8 acc +=
     9 *d_o =
    10 volatile din_t *d_i) {
    11 *d_i;
    12 *d_i;
    13 acc;
    14 *d_i;
    15 *d_i;
    16 acc;
    17 }

    How AXI Works

    • Productivity: By standardizing on the AXI interface, developers need to learn only a single protocol for IP.
    • Flexibility: Providing the right protocol for the application:
    °AXI4 is for memory-mapped interfaces and allows high throughput bursts of up to
    256 data transfer cycles with just a single address phase.
    °AXI4-Lite is a light-weight, single transaction memory-mapped interface. It has a
    small logic footprint and is a simple interface to work with both in design and usage.
    °AXI4-Stream removes the requirement for an address phase altogether and allows
    unlimited data burst size. AXI4-Stream interfaces and transfers do not have address
    phases and are therefore not considered to be memory-mapped.
    •Availability: By moving to an industry-standard, you have access not only to the Vivado IP Catalog, but also to a worldwide community of ARM partners.
    °Many IP providers support the AXI protocol.

     

    The AXI specifications describe an interface between a single AXI master and AXI slave,
    representing IP cores that exchange information with each other. Multiple memory-mapped
    AXI masters and slaves can be connected together using AXI infrastructure IP blocks.

    The AXI Interconnect is architected using a traditional, monolithic crossbar approach;

    Both AXI4 and AXI4-Lite interfaces consist of five different channels:
    • Read Address Channel
    • Write Address Channel
    • Read Data Channel
    • Write Data Channel
    • Write Response Channel

    Data can move in both directions between the master and slave simultaneously, and data
    transfer sizes can vary. The limit in AXI4 is a burst transaction of up to 256 data transfers.
    AXI4-Lite allows only one data transfer per transaction.

    AXI4:
    • Provides separate data and address connections for reads and writes, which allows
    simultaneous, bidirectional data transfer.
    • Requires a single address and then bursts up to 256 words of data.

    The AXI4 protocol describes options that allow AXI4-compliant systems to achieve very
    high data throughput. Some of these features, in addition to bursting, are: data upsizing
    and downsizing, multiple outstanding addresses, and out-of-order transaction processing.

    At a hardware level, AXI4 allows systems to be built with a different clock for each AXI
    master-slave pair. In addition, the AXI4 protocol allows the insertion of register slices (often
    called pipeline stages) to aid in timing closure.

    AXI4-Lite is similar to AXI4 with some exceptions: The most notable exception is that
    bursting is not supported.

    The AXI4-Stream protocol defines a single channel for transmission of streaming data. The
    AXI4-Stream channel models the write data channel of AXI4. Unlike AXI4, AXI4-Stream
    interfaces can burst an unlimited amount of data.You can split, merge, interleave, upsize, and downsize AXI4-Stream
    compliant interfaces.

    •Memory-Mapped Protocols: In memory-mapped protocols (AXI3, AXI4, and
    AXI4-Lite), all transactions involve the concept of transferring a target address within a
    system memory space and data.
    Memory-mapped systems often provide a more homogeneous way to view the system,
    because the IP operates around a defined memory map.
    •AXI4-Stream Protocol: Use the AXI4-Stream protocol for applications that typically
    focus on a data-centric and data-flow paradigm where the concept of an address is not
    present or not required. Each AXI4-Stream acts as a single unidirectional channel with a
    handshaking data flow.
    At this lower level of operation (compared to the memory-mapped protocol types), the
    mechanism to move data between IP is defined and efficient, but there is no unifying
    address context between IP. The AXI4-Stream IP can be better optimized for
    performance in data flow applications, but also tends to be more specialized around a
    given application space.
    •Infrastructure IP: An infrastructure IP is a building block used to help assemble
    systems. Infrastructure IP tends to be a generic IP that moves or transforms data
    around the system using general-purpose AXI4 interfaces and does not interpret data.
    Examples of infrastructure IP are:
    ° AXI Register slices (for pipelining)
    ° AXI FIFOs (for buffering/clock conversion)
    ° AXI Interconnect IP and AXI SmartConnect IP (for connecting memory-mapped IP
    together)
    ° AXI Direct Memory Access (DMA) engines (for memory-mapped to stream
    conversion)
    ° AXI Performance Monitors and Protocol Checkers (for analysis and debug)
    ° AXI Verification IP (for simulation-based verification and performance analysis)
    These IP are useful for connecting IP together into a system, but are not generally
    endpoints for data.

    AXI4 Protocols

    AXI4-Stream

    About the axi4 stream:

    The interface can be used to connect a single master, that generates data, to a
    single slave, that receives data. The protocol can also be used when connecting larger numbers
    of master and slave components. The protocol supports multiple data streams using the same set
    of shared wires, allowing a generic interconnect to be constructed that can perform upsizing,
    downsizing and routing operations.

    The following stream terms are used in this specification:
    Transfer

    A single transfer of data across an AXI4-Stream interface. A single transfer is
    defined by a single TVALID, TREADY handshake.
    Packet

    A group of bytes that are transported together across an AXI4-Stream interface.
    A packet is similar to an AXI4 burst. A packet may consist of a single transfer or
    multiple transfers. Infrastructure components can use packets to deal more
    efficiently with a stream in packet-sized groups.
    Frame

    The highest level of byte grouping in an AXI4-Stream. A frame contains an
    integer number of packets. A frame can be a very large number of bytes, for
    example an entire video frame buffer.
    Data Stream

    The transport of data from one source to one destination.
    A data stream can be:
    •a series of individual byte transfers
    •a series of byte transfers grouped together in packets.

    Some examples of different data stream styles that might use the defined AXI4-Stream byte types

    Interface signals:

    Using AXI4 Interfaces

    AXI4-Stream Interfaces

    An AXI4-Stream interface can be applied to any input argument and any array or pointer
    output argument. Since an AXI4-Stream interface transfers data in a sequential streaming
    manner it cannot be used with arguments which are both read and written. An AXI4-Stream
    interface is always sign-extended to the next byte. For example, a 12-bit data value is sign-extended to 16-bit.

    There are two basic ways to use an AXI4-Stream in your design.
    • Use an AXI4-Stream without side-channels.
    • Use an AXI4-Stream with side-channels.
    This second use model provides additional functionality, allowing the optional
    side-channels which are part of the AXI4-Stream standard, to be used directly in the C code.

    AXI4-Stream Interfaces without Side-Channels
    An AXI4-Stream is used without side-channels when the function argument does not
    contain any AXI4 side-channel elements.

    1 void example(int A[50], int B[50]) {
    2 //Set the HLS native interface types
    3 #pragma HLS INTERFACE axis port=A
    4 #pragma HLS INTERFACE axis port=B
    5 int i;
    6 for(i = 0; i < 50; i++){
    7 B[i] = A[i] + 5;
    8 }
    9 }

    AXI4-Stream Interfaces with Side-Channels
    Side-channels are optional signals which are part of the AXI4-Stream standard. The
    side-channel signals may be directly referenced and controlled in the C code using a struct,
    provided the member elements of the struct match the names of the AXI4-Stream side-channel signals.

    An example of this is provided with Vivado
    HLS. The Vivado HLS include directory contains the file ap_axi_sdata.h. This header
    file contains the following structs:

     1 #include "ap_int.h"
     2 template<int D,int U,int TI,int TD>
     3 struct ap_axis{
     4 ap_int<D> data;
     5 ap_uint<D/8> keep;
     6 ap_uint<D/8> strb;
     7 ap_uint<U> user;
     8 ap_uint<1> last;
     9 ap_uint<TI> id;
    10 ap_uint<TD> dest;
    11 };
    12 template<int D,int U,int TI,int TD>
    13 struct ap_axiu{
    14 ap_uint<D> data;
    15 ap_uint<D/8> keep;
    16 ap_uint<D/8> strb;
    17 ap_uint<U> user;
    18 ap_uint<1> last;
    19 ap_uint<TI> id;
    20 ap_uint<TD> dest;
    21 };

    The following example shows how the side-channels can be used directly in the C code and
    implemented on the interface. In this example a signed 32-bit data type is used.

     1 #include "ap_axi_sdata.h"
     2 void example(ap_axis<32,2,5,6> A[50], ap_axis<32,2,5,6> B[50]){
     3 //Map ports to Vivado HLS interfaces
     4 #pragma HLS INTERFACE axis port=A
     5 #pragma HLS INTERFACE axis port=B
     6 int i;
     7 for(i = 0; i < 50; i++){
     8 B[i].data = A[i].data.to_int() + 5;
     9 B[i].keep = A[i].keep;
    10 B[i].strb = A[i].strb;
    11 B[i].user = A[i].user;
    12 B[i].last = A[i].last;
    13 B[i].id = A[i].id;
    14 B[i].dest = A[i].dest;
    15 }
    16 }

    AXI4-Lite Interface

    You can use an AXI4-Lite interface to allow the design to be controlled by a CPU or
    microcontroller. Using the Vivado HLS AXI4-Lite interface, you can:
    • Group multiple ports into the same AXI4-Lite interface.
    • Output C driver files for use with the code running on a processor.

    The following example shows how Vivado HLS implements multiple arguments, including
    the function return, as an AXI4-Lite interface. Because each directive uses the same name
    for the bundle option, each of the ports is grouped into the same AXI4-Lite interface.

     1 void example(char *a,
     2 {
     3 #pragma HLS INTERFACE
     4 #pragma HLS INTERFACE
     5 #pragma HLS INTERFACE
     6 #pragma HLS INTERFACE
     7 #pragma HLS INTERFACE
     8 char *b, char *c)
     9 s_axilite port=return
    10 s_axilite port=a
    11 s_axilite port=b
    12 s_axilite port=c
    13 ap_vld port=b
    14 bundle=BUS_A
    15 bundle=BUS_A
    16 bundle=BUS_A
    17 bundle=BUS_A offset=0x0400
    18 *c += *a + *b;
    19 }

    By default, Vivado HLS automatically assigns the address for each port that is grouped into
    an AXI4-Lite interface. Vivado HLS provides the assigned addresses in the C driver files. To explicitly define the address, you can use the
    offset option, as shown for argument c in the example above.

    After synthesis, Vivado HLS implements the ports in the AXI4-Lite port, as shown in the
    following figure. Vivado HLS creates the interrupt port by including the function return in
    the AXI4-Lite interface. You can program the interrupt through the AXI4-Lite interface. You
    can also drive the interrupt from the following block-level protocols:
    • ap_done: Indicates when the function completes all operations.
    • ap_ready: Indicates when the function is ready for new input data.

    Control Clock and Reset in AXI4-Lite Interfaces

    By default, Vivado HLS uses the same clock for the AXI4-Lite interface and the synthesized
    design. Vivado HLS connects all registers in the AXI4-Lite interface to the clock used for the
    synthesized logic (ap_clk).
    Optionally, you can use the INTERFACE directive clock option to specify a separate clock
    for each AXI4-Lite port.

    • RECOMMENDED: For ease of use during the operation of the design, Xilinx recommends that you do not include additional I/O protocols in the ports grouped into an AXI4-Lite interface. However, Xilinx recommends that you include the block-level I/O protocol associated with the return port in the AXI4-Lite interface.
    • IMPORTANT: In an AXI4-Lite interface, Vivado HLS reserves addresses 0x0000 through 0x000C for the block-level I/O protocol signals and interrupt controls.

    AXI4 Master Interface

    You can use an AXI4 master interface on array or pointer/reference arguments, which
    Vivado HLS implements in one of the following modes:
    • Individual data transfers
    • Burst mode data transfers

    With individual data transfers, Vivado HLS reads or writes a single element of data for each
    address. The following example shows a single read and single write operation. In this
    example, Vivado HLS generates an address on the AXI interface to read a single data value
    and an address to write a single data value. The interface transfers one data value per
    address.

    1 void bus (int *d) {
    2 static int acc = 0;
    3 acc += *d;
    4 *d = acc;
    5 }

    With burst mode transfers, Vivado HLS reads or writes data using a single base address
    followed by multiple sequential data samples, which makes this mode capable of higher
    data throughput. Burst mode of operation is possible when you use the C memcpy function
    or a pipelined for loop.

     1 void example(volatile int *a){
     2 #pragma HLS INTERFACE m_axi depth=50 port=a
     3 #pragma HLS INTERFACE s_axilite port=return
     4 //Port a is assigned to an AXI4 master interface
     5 int i;
     6 int buff[50];
     7 //memcpy creates a burst access to memory
     8 memcpy(buff,(const int*)a,50*sizeof(int));
     9 for(i=0; i < 50; i++){
    10 buff[i] = buff[i] + 100;
    11 }
    12 memcpy((int *)a,buff,50*sizeof(int));
    13 }

    The following example shows the same code as the preceding example but uses a for loop
    to copy the data out:

     1 void example(volatile int *a){
     2 #pragma HLS INTERFACE m_axi depth=50 port=a
     3 #pragma HLS INTERFACE s_axilite port=return
     4 //Port a is assigned to an AXI4 master interface
     5 int i;
     6 int buff[50];
     7 //memcpy creates a burst access to memory
     8 memcpy(buff,(const int*)a,50*sizeof(int));
     9 for(i=0; i < 50; i++){
    10 buff[i] = buff[i] + 100;
    11 }
    12 for(i=0; i < 50; i++){
    13 #pragma HLS PIPELINE
    14 a[i] = buff[i];
    15 }
    16 }

    When using a for loop to implement burst reads or writes, follow these requirements:
    • Pipeline the loop
    • Access addresses in increasing order
    • Do not place accesses inside a conditional statement
    • For nested loops, do not flatten loops, because this inhibits the burst operation

    In the following example, Vivado HLS implements the port reads as burst transfers. Port a
    is specified without using the bundle option and is implemented in the default AXI
    interface. Port b is specified using a named bundle and is implemented in a separate AXI
    interface called d2_port.

     1 void example(volatile int *a, int *b){
     2 #pragma HLS INTERFACE s_axilite port=return
     3 #pragma HLS INTERFACE m_axi depth=50 port=a
     4 #pragma HLS INTERFACE m_axi depth=50 port=b bundle=d2_port
     5 int i;
     6 int buff[50];
     7 //copy data in
     8 for(i=0; i < 50; i++){
     9 #pragma HLS PIPELINE
    10 buff[i] = a[i] + b[i];
    11 }
    12 ...
    13 }

    Controlling AXI4 Burst Behavior

    An optimal AXI4 interface is one in which the design never stalls while waiting to access the
    bus, and after bus access is granted, the bus never stalls while waiting for the design to
    read/write. To create the optimal AXI4 interface, the following options are provided in the
    INTERFACE directive to specify the behavior of the bursts and optimize the efficiency of the
    AXI4 interface.

    Some of these options use internal storage to buffer data and may have an impact on area
    and resources:
    • latency: Specifies the expected latency of the AXI4 interface, allowing the design to
    initiate a bus request a number of cycles (latency) before the read or write is expected.
    If this figure it too low, the design will be ready too soon and may stall waiting for the
    bus. If this figure is too high, bus access may be granted but the bus may stall waiting
    on the design to start the access.
    • max_read_burst_length: Specifies the maximum number of data values read
    during a burst transfer.
    • num_read_outstanding: Specifies how many read requests can be made to the AXI4
    bus, without a response, before the design stalls. This implies internal storage in the
    design, a FIFO of size:
    num_read_outstanding*max_read_burst_length*word_size.
    • max_write_burst_length: Specifies the maximum number of data values written
    during a burst transfer.

    •num_write_outstanding: Specifies how many write requests can be made to the
    AXI4 bus, without a response, before the design stalls. This implies internal storage in
    the design, a FIFO of size:
    num_read_outstanding*max_read_burst_length*word_size
    The following example can be used to help explain these options:

    1 #pragma HLS interface m_axi port=input offset=slave bundle=gmem0
    2 depth=1024*1024*16/(512/8)
    3 latency=100
    4 num_read_outstanding=32
    5 num_write_outstanding=32
    6 max_read_burst_length=16
    7 max_write_burst_length=16

    The interface is specified as having a latency of 100. Vivado HLS seeks to schedule the
    request for burst access 100 clock cycles before the design is ready to access the AXI4 bus.
    To further improve bus efficiency, the options num_write_outstanding and
    num_read_outstanding ensure the design contains enough buffering to store up to 32
    read and write accesses. This allows the design to continue processing until the bus
    requests are serviced. Finally, the options max_read_burst_length and
    max_write_burst_length ensure the maximum burst size is 16 and that the AXI4
    interface does not hold the bus for longer than this.

    Controlling the Address Offset in an AXI4 Interface

     By default, the AXI4 master interface starts all read and write operations from address
    0x00000000. For example, given the following code, the design reads data from addresses
    0x00000000 to 0x000000c7 (50 32-bit words, gives 200 bytes), which represents 50 address
    values. The design then writes data back to the same addresses.

     1 void example(volatile int *a){
     2 #pragma HLS INTERFACE m_axi depth=50 port=a
     3 #pragma HLS INTERFACE s_axilite port=return bundle=AXILiteS
     4 int i;
     5 int buff[50];
     6 memcpy(buff,(const int*)a,50*sizeof(int));
     7 for(i=0; i < 50; i++){
     8 buff[i] = buff[i] + 100;
     9 }
    10 memcpy((int *)a,buff,50*sizeof(int));
    11 }

    To apply an address offset, use the -offset option with the INTERFACE directive, and
    specify one of the following options:
    • off: Does not apply an offset address. This is the default.
    • direct: Adds a 32-bit port to the design for applying an address offset.
    • slave: Adds a 32-bit register inside the AXI4-Lite interface for applying an address
    offset.

    If you use the slave option in an AXI interface, you must use an AXI4-Lite port on the
    design interface. Xilinx recommends that you implement the AXI4-Lite interface using the
    following pragma:

    #pragma HLS INTERFACE s_axilite port=return

    In addition, if you use the slave option and you used several AXI4-Lite interfaces, you
    must ensure that the AXI master port offset register is bundled into the correct AXI4-Lite
    interface. In the following example, port a is implemented as an AXI master interface with
    an offset and AXI4-Lite interfaces called AXI_Lite_1 and AXI_Lite_2:

    #pragma HLS INTERFACE m_axi port=a depth=50 offset=slave
    #pragma HLS INTERFACE s_axilite port=return bundle=AXI_Lite_1
    #pragma HLS INTERFACE s_axilite port=b bundle=AXI_Lite_2

    The following INTERFACE directive is required to ensure that the offset register for port a is
    bundled into the AXI4-Lite interface called AXI_Lite_1:

    #pragma HLS INTERFACE s_axilite port=a bundle=AXI_Lite_1

    Reference

    1. Xinlinx UG902

    2. Xilinx UG1037, Vivado Design Suite AXI Reference Guide

    3. AMBA 4 AXI4-Stream Protocol:  https://static.docs.arm.com/ihi0051/a/IHI0051A_amba4_axi4_stream_v1_0_protocol_spec.pdf

  • 相关阅读:
    二叉树基本操作(二)
    二叉树基本操作(一)
    数组的方式实现--栈 数制转换
    数据的插入与删除
    链表 创建 插入 删除 查找 合并
    ACM3 求最值
    ACM2 斐波那契数列
    ACM_1 大数求和
    简单二维码生成及解码代码:
    ORM中去除反射,添加Expression
  • 原文地址:https://www.cnblogs.com/wordchao/p/10981199.html
Copyright © 2011-2022 走看看