zoukankan      html  css  js  c++  java
  • Xilinx 常用模块汇总(verilog)【03】

    作者:桂。

    时间:2018-05-10  2018-05-10  21:03:44

    链接:http://www.cnblogs.com/xingshansi/p/9021919.html 


    前言

    主要记录常用的基本模块。

    一、模块汇总

    • 17- 自相关操作xcorr

    实现思路主要参考:工程应用中的自相关操作,根据推导可以看出,自相关操作涉及的基本操作有:复数相乘、递归【自回归,IIR等都需要该操作】。

    涉及到具体模块:1)10- 共轭复数相乘模块; 2)11- 复数延拍模块; 3)递归 —> 1- 3个数组合加、减运算(分IQ分别调用)。

    路径:印象笔记:0019/015 

    MATLAB仿真(实际操作中,N除不除->移位,差别不大):

    clc;clear all;close all;
    fs = 960e6;
    f0 = 20e6;
    t = [0:255]/fs;
    sig = [zeros(1,384),exp(1j*(2*pi*t*f0+pi/4)),zeros(1,384)];
    sig = awgn(sig,5);
    %自相关
    N = 64;
    k = 1;
    R = zeros(1,length(sig));
    for i = N+1:length(sig)-k
        R(i) = R(i-k) + 1/N * (sig(i)*conj(sig(i+k)) - sig(i-N)*conj(sig(i-N+k)));
    end
    figure()
    subplot 211
    plot(abs(sig)/max(abs(sig)));
    subplot 212
    plot(abs(R)/max(abs(R)));
    

      

    verilog【给出完整分析、实现、仿真过程】

    不失一般性,以k=1为例,

    理论分析: 

    用到的基本模块:1)10- 共轭相乘;2)11- 复数延拍;3)1- 3路加减运算

    硬件实现

    其中延拍,也可调用原语:SRL(移位链 )+ FDRE的思路:

    //Shift:SRL[A3A2A1A0] + 2
    parameter width = 16; //data width
    genvar ii;
    
    generate
        for(ii = 0; ii < width; ii++)
        begin:delay
            
    //            (* HLUTNM = ii *)
            SRL16E #(
                .INIT(16'h0000) // Initial Value of Shift Register
                ) SRL16E_u1 (
                .Q(out_cache[ii]),       // SRL data output
                .A0(1'b0),     // Select[0] input
                .A1(1'b1),     // Select[1] input
                .A2(1'b1),     // Select[2] input
                .A3(1'b0),     // Select[3] input
                .CE(1'b1),     // Clock enable input
                .CLK(clk),   // Clock input
                .D(ddata[ii])        // SRL data input
                );    
    
    FDRE #(
            .INIT(1'b0) // Initial value of register (1'b0 or 1'b1)
            ) FDRE_u1 (
            .Q(out[ii]),      // 1-bit Data output
            .C(clk),      // 1-bit Clock input
            .CE(1'b1),    // 1-bit Clock enable input
            .R(sclr),      // 1-bit Synchronous reset input
            .D(out_cache[ii])       // 1-bit Data input
            ); 
    end
    endgenerate

     细节可参考scm.pdf /hdl.pdf:

     

    功能仿真

     1)生成仿真数据:

    clc;clear all;close all;
    fs = 960e6;
    f0 = 20e6;
    t = [0:255]/fs;
    sig = [zeros(1,384),sin(2*pi*t*f0+pi/4),zeros(1,384)];
    sig = awgn(sig,10);
    L = length(sig);
    N = 17;
    t0 = [0:L-1]/fs;
    y_n = round(sig*(2^(N-2)-1));
    %write data
    fid = fopen('sig.txt','w');
    for k=1:length(y_n)
        B_s=dec2bin(y_n(k)+((y_n(k))<0)*2^N,N);
        for j=1:N
            if B_s(j)=='1'
                tb=1;
            else
                tb=0;
            end
            fprintf(fid,'%d',tb);
        end
        fprintf(fid,'
    ');
    end
    
    fprintf(fid,';');
    fclose(fid);
    

      2)仿真结果:

    MATLAB测试数据:

    VIVADO输出结果:

       

    从仿真结果可以看出,二者完全一致,Xcorr模块有效。【X点自相关,仅需要修改共轭相乘结果的延迟拍数即可,这便实现了参数化。】

    代码路径:印象笔记0019/015Xcorr

    • 18- 数据速率转化模块

    路径:印象笔记-1/0019/017

    跨时钟域,对于moni-bit,可用打2拍的思路,即定义两级寄存器,以降低亚稳态概率;对于multi-bits,目前跨时钟域的数据传输,常用的两个基本思路是:1)异步FIFO;2)异步 dual port RAM 。(ug473.pdf)

    1)FIFO(first in first out)

     更多细节可参考:印象笔记-3-FPGA/024-FIFO核使用,以及博文:基础003_V7-Memory Resources

    要点1:最小深度计算

    要点2:数据传输转换关系:输入的数据量理论上需要小于等于输出的数据量。

    FIFO多与高速接口配合使用。

    2)Dual port RAM

    关于IP核的使用,可参考:印象笔记-3/FPGA/025-双端口RAM参数设置,双端口RAM与FIFO最大的区别在于存在读、写地址,如果数据存在偏移,可以很方便地修正偏移量。 

    应用举例:现有两路数据存在偏移,希望将二者对齐,以Dual port RAM为例,调用XILINX 的 IP核,细节参考博文:DUAL PORT RAM应用实例 

    • 19- 卷积模块(convolution)

    路径:1/0019/019

    分析:卷积的理论细节及多种实现思路,可参考博文:信号处理——卷积(convolution)的实现

    这里只给出一种普适的思路,且仅考虑实数情况:对于复数域C,拆解为实、虚部即可。定义序列h:

    对应卷积原理:

    实现的思路是:1)直接延拍+寄存器,2)累加操作(累加长度等于最短序列长度,此处为M),即可完成卷积。

    延拍 + 寄存器

    delay_all.sv

    `timescale 1ns/1ps
    /*
    Function: DPRAM for data aligned
    Author: Gui.
    Data: 2018年5月14日16:31:09
    */
    /*
     等价于: dout <= {dout[Num-2:0],din};  // SliceM without rst
    */
    module delay_all(din, clk, sclr, dout);
    parameter datwidth = 17;
    parameter Num = 8;
    
    input [datwidth-1:0] din;
    input clk;
    input sclr;
    output [Num-1:0][datwidth-1:0] dout;
    //
    logic [Num-1:0][datwidth-1:0] dat;
    
    genvar ii;
    generate 
    for (ii = 1;ii < Num; ii++)
    begin:delayall
        delay #(
            .datwidth(datwidth)
        )
        u1(
            .din(dat[ii-1]),
            .clk(clk),
            .sclr(sclr),
            .dout(dat[ii])
        );
    end
    endgenerate
    
    always @(posedge clk)
    if(sclr)
    begin
        dat <= 0;
    end 
    else
    begin
        dat[0] <= din;
    end 
    
    assign dout = dat;
    
    endmodule
    View Code

    delay.sv

    `timescale 1ns/1ps
    /*
    Function: DPRAM for data aligned
    Author: Gui.
    Data: 2018年5月14日16:21:09
    */
    module delay(din, clk, sclr, dout);
    parameter datwidth = 17;
    input [datwidth-1:0] din;
    input clk;
    input sclr;
    
    output [datwidth-1:0] dout;
    
    wire [datwidth-1:0] q;
    
    genvar ii;
    
    generate
    for(ii = 0; ii < datwidth; ii++)
        begin:delay
        FDRE #(
                .INIT(1'b0) // Initial value of register (1'b0 or 1'b1)
                ) FDRE_u1 (
                .Q(q[ii]),      // 1-bit Data output
                .C(clk),      // 1-bit Clock input
                .CE(1'b1),    // 1-bit Clock enable input
                .R(sclr),      // 1-bit Synchronous reset input
                .D(din[ii])       // 1-bit Data input
                ); 
        end
    endgenerate
    
    endmodule
    View Code

    该延拍操作可等价为一个语句:

     dout <= {dout[Num-2:0],din};

    不添加复位信号,则内部资源调用CLB中的SliceM。

    累加操作,3加法器结合:

    localparam ParaNum = (Num % 3)?(Num/3 + 1):(Num/3);// parallel Number
    localparam Actbits = ParaNum*3;// Actually bits

    该操作对于卷积系数较少的情况勉强适用,但对于阶数过大的情形、既耗内存、又增加群延迟。对于阶数较多的情形,可借助FIR调用DSP48E_05一文的实现思路。关于IP核的使用,对应文档::pg149-fir-compiler.pdf

    • 20- FIR滤波模块

    原理同19- 卷积操作,完全一致,滤波器系数设计参考博文:fdatool的滤波器设计 以及 FIR特性及仿真实现_01

    如果只是基本的滤波器实现,可以借助Fdatool -> targets -> generate HDL.

    • 21- Hilbert变换

     原理同19- 卷积操作,完全一致,Hilbert工程实现上也可以借助FIR的设计思想,且设计工具也完全一致,不再具体描述。

    目前主要的模块:

    • 22- IIR滤波模块

     该功能目前使用较少,拆解下来:延拍 + 加减法,差别不大。同样可以直接用MATLAB生成verilog文件:

    // -------------------------------------------------------------
    //
    // Module: filteriir
    // Generated by MATLAB(R) 8.1 and the Filter Design HDL Coder 2.9.3.
    // Generated on: 2018-05-15 13:03:46
    // -------------------------------------------------------------
    
    // -------------------------------------------------------------
    // HDL Code Generation Options:
    //
    // Name: filteriir
    // TargetLanguage: Verilog
    // TestBenchStimulus: step ramp chirp 
    // GenerateHDLTestBench: off
    
    // Filter Specifications:
    //
    // Sampling Frequency : N/A (normalized frequency)
    // Response           : Lowpass
    // Specification      : N,F3dB
    // Filter Order       : 10
    // 3-dB Point         : 0.45
    // -------------------------------------------------------------
    
    // -------------------------------------------------------------
    // HDL Implementation    : Fully parallel
    // Multipliers           : 15
    // Folding Factor        : 1
    // -------------------------------------------------------------
    // Filter Settings:
    //
    // Discrete-Time IIR Filter (real)
    // -------------------------------
    // Filter Structure    : Direct-Form II, Second-Order Sections
    // Number of Sections  : 5
    // Stable              : Yes
    // Linear Phase        : No
    // -------------------------------------------------------------
    
    `timescale 1 ns / 1 ns
    
    module filteriir
                   (
                    clk,
                    clk_enable,
                    reset,
                    filter_in,
                    filter_out
                    );
    
      input   clk; 
      input   clk_enable; 
      input   reset; 
      input   [63:0] filter_in; //double
      output  [63:0] filter_out; //double
    
    ////////////////////////////////////////////////////////////////
    //Module Architecture: filteriir
    ////////////////////////////////////////////////////////////////
      // Local Functions
      // Type Definitions
      // Constants
      parameter scaleconst1 = 3.6533535137021494E-01; //double
      parameter coeff_b1_section1 = 1.0000000000000000E+00; //double
      parameter coeff_b2_section1 = 2.0000000000000000E+00; //double
      parameter coeff_b3_section1 = 1.0000000000000000E+00; //double
      parameter coeff_a2_section1 = -2.7099751179194387E-01; //double
      parameter coeff_a3_section1 = 7.3233891727280376E-01; //double
      parameter scaleconst2 = 2.9120577213381943E-01; //double
      parameter coeff_b1_section2 = 1.0000000000000000E+00; //double
      parameter coeff_b2_section2 = 2.0000000000000000E+00; //double
      parameter coeff_b3_section2 = 1.0000000000000000E+00; //double
      parameter coeff_a2_section2 = -2.1600986428424435E-01; //double
      parameter coeff_a3_section2 = 3.8083295281952212E-01; //double
      parameter scaleconst3 = 2.4834107896254057E-01; //double
      parameter coeff_b1_section3 = 1.0000000000000000E+00; //double
      parameter coeff_b2_section3 = 2.0000000000000000E+00; //double
      parameter coeff_b3_section3 = 1.0000000000000000E+00; //double
      parameter coeff_a2_section3 = -1.8421380307753588E-01; //double
      parameter coeff_a3_section3 = 1.7757811892769818E-01; //double
      parameter scaleconst4 = 2.2434814972791137E-01; //double
      parameter coeff_b1_section4 = 1.0000000000000000E+00; //double
      parameter coeff_b2_section4 = 2.0000000000000000E+00; //double
      parameter coeff_b3_section4 = 1.0000000000000000E+00; //double
      parameter coeff_a2_section4 = -1.6641639010121584E-01; //double
      parameter coeff_a3_section4 = 6.3808989012861264E-02; //double
      parameter scaleconst5 = 2.1350378853875973E-01; //double
      parameter coeff_b1_section5 = 1.0000000000000000E+00; //double
      parameter coeff_b2_section5 = 2.0000000000000000E+00; //double
      parameter coeff_b3_section5 = 1.0000000000000000E+00; //double
      parameter coeff_a2_section5 = -1.5837228791342844E-01; //double
      parameter coeff_a3_section5 = 1.2387442068467328E-02; //double
      // Signals
      real input_register; // double
      real scale1; // double
      real scaletypeconvert1; // double
      // Section 1 Signals 
      real a1sum1; // double
      real a2sum1; // double
      real b1sum1; // double
      real b2sum1; // double
      real delay_section1 [0:1] ; // double
      real inputconv1; // double
      real a2mul1; // double
      real a3mul1; // double
      real b1mul1; // double
      real b2mul1; // double
      real b3mul1; // double
      real scale2; // double
      real scaletypeconvert2; // double
      // Section 2 Signals 
      real a1sum2; // double
      real a2sum2; // double
      real b1sum2; // double
      real b2sum2; // double
      real delay_section2 [0:1] ; // double
      real inputconv2; // double
      real a2mul2; // double
      real a3mul2; // double
      real b1mul2; // double
      real b2mul2; // double
      real b3mul2; // double
      real scale3; // double
      real scaletypeconvert3; // double
      // Section 3 Signals 
      real a1sum3; // double
      real a2sum3; // double
      real b1sum3; // double
      real b2sum3; // double
      real delay_section3 [0:1] ; // double
      real inputconv3; // double
      real a2mul3; // double
      real a3mul3; // double
      real b1mul3; // double
      real b2mul3; // double
      real b3mul3; // double
      real scale4; // double
      real scaletypeconvert4; // double
      // Section 4 Signals 
      real a1sum4; // double
      real a2sum4; // double
      real b1sum4; // double
      real b2sum4; // double
      real delay_section4 [0:1] ; // double
      real inputconv4; // double
      real a2mul4; // double
      real a3mul4; // double
      real b1mul4; // double
      real b2mul4; // double
      real b3mul4; // double
      real scale5; // double
      real scaletypeconvert5; // double
      // Section 5 Signals 
      real a1sum5; // double
      real a2sum5; // double
      real b1sum5; // double
      real b2sum5; // double
      real delay_section5 [0:1] ; // double
      real inputconv5; // double
      real a2mul5; // double
      real a3mul5; // double
      real b1mul5; // double
      real b2mul5; // double
      real b3mul5; // double
      real output_typeconvert; // double
      real output_register; // double
    
      // Block Statements
      always @ (posedge clk or posedge reset)
        begin: input_reg_process
          if (reset == 1'b1) begin
            input_register <= 0.0000000000000000E+00;
          end
          else begin
            if (clk_enable == 1'b1) begin
              input_register <= $bitstoreal(filter_in);
            end
          end
        end // input_reg_process
    
      always @* scale1 <= input_register * scaleconst1;
    
      always @* scaletypeconvert1 <= scale1;
    
    
      //   ------------------ Section 1 ------------------
    
      always @ (posedge clk or posedge reset)
        begin: delay_process_section1
          if (reset == 1'b1) begin
            delay_section1[0] <= 0.0000000000000000E+00;
            delay_section1[1] <= 0.0000000000000000E+00;
          end
          else begin
            if (clk_enable == 1'b1) begin
              delay_section1[1] <= delay_section1[0];
              delay_section1[0] <= a1sum1;
            end
          end
        end // delay_process_section1
    
      always @* inputconv1 <= scaletypeconvert1;
    
    
      always @* a2mul1 <= delay_section1[0] * coeff_a2_section1;
    
      always @* a3mul1 <= delay_section1[1] * coeff_a3_section1;
    
      always @* b1mul1 <= a1sum1;
    
    
      always @* b2mul1 <= delay_section1[0] * coeff_b2_section1;
    
      always @* b3mul1 <= delay_section1[1];
    
    
      always @* a2sum1 <= inputconv1 - a2mul1;
    
      always @* a1sum1 <= a2sum1 - a3mul1;
    
      always @* b2sum1 <= b1mul1 + b2mul1;
    
      always @* b1sum1 <= b2sum1 + b3mul1;
    
      always @* scale2 <= b1sum1 * scaleconst2;
    
      always @* scaletypeconvert2 <= scale2;
    
    
      //   ------------------ Section 2 ------------------
    
      always @ (posedge clk or posedge reset)
        begin: delay_process_section2
          if (reset == 1'b1) begin
            delay_section2[0] <= 0.0000000000000000E+00;
            delay_section2[1] <= 0.0000000000000000E+00;
          end
          else begin
            if (clk_enable == 1'b1) begin
              delay_section2[1] <= delay_section2[0];
              delay_section2[0] <= a1sum2;
            end
          end
        end // delay_process_section2
    
      always @* inputconv2 <= scaletypeconvert2;
    
    
      always @* a2mul2 <= delay_section2[0] * coeff_a2_section2;
    
      always @* a3mul2 <= delay_section2[1] * coeff_a3_section2;
    
      always @* b1mul2 <= a1sum2;
    
    
      always @* b2mul2 <= delay_section2[0] * coeff_b2_section2;
    
      always @* b3mul2 <= delay_section2[1];
    
    
      always @* a2sum2 <= inputconv2 - a2mul2;
    
      always @* a1sum2 <= a2sum2 - a3mul2;
    
      always @* b2sum2 <= b1mul2 + b2mul2;
    
      always @* b1sum2 <= b2sum2 + b3mul2;
    
      always @* scale3 <= b1sum2 * scaleconst3;
    
      always @* scaletypeconvert3 <= scale3;
    
    
      //   ------------------ Section 3 ------------------
    
      always @ (posedge clk or posedge reset)
        begin: delay_process_section3
          if (reset == 1'b1) begin
            delay_section3[0] <= 0.0000000000000000E+00;
            delay_section3[1] <= 0.0000000000000000E+00;
          end
          else begin
            if (clk_enable == 1'b1) begin
              delay_section3[1] <= delay_section3[0];
              delay_section3[0] <= a1sum3;
            end
          end
        end // delay_process_section3
    
      always @* inputconv3 <= scaletypeconvert3;
    
    
      always @* a2mul3 <= delay_section3[0] * coeff_a2_section3;
    
      always @* a3mul3 <= delay_section3[1] * coeff_a3_section3;
    
      always @* b1mul3 <= a1sum3;
    
    
      always @* b2mul3 <= delay_section3[0] * coeff_b2_section3;
    
      always @* b3mul3 <= delay_section3[1];
    
    
      always @* a2sum3 <= inputconv3 - a2mul3;
    
      always @* a1sum3 <= a2sum3 - a3mul3;
    
      always @* b2sum3 <= b1mul3 + b2mul3;
    
      always @* b1sum3 <= b2sum3 + b3mul3;
    
      always @* scale4 <= b1sum3 * scaleconst4;
    
      always @* scaletypeconvert4 <= scale4;
    
    
      //   ------------------ Section 4 ------------------
    
      always @ (posedge clk or posedge reset)
        begin: delay_process_section4
          if (reset == 1'b1) begin
            delay_section4[0] <= 0.0000000000000000E+00;
            delay_section4[1] <= 0.0000000000000000E+00;
          end
          else begin
            if (clk_enable == 1'b1) begin
              delay_section4[1] <= delay_section4[0];
              delay_section4[0] <= a1sum4;
            end
          end
        end // delay_process_section4
    
      always @* inputconv4 <= scaletypeconvert4;
    
    
      always @* a2mul4 <= delay_section4[0] * coeff_a2_section4;
    
      always @* a3mul4 <= delay_section4[1] * coeff_a3_section4;
    
      always @* b1mul4 <= a1sum4;
    
    
      always @* b2mul4 <= delay_section4[0] * coeff_b2_section4;
    
      always @* b3mul4 <= delay_section4[1];
    
    
      always @* a2sum4 <= inputconv4 - a2mul4;
    
      always @* a1sum4 <= a2sum4 - a3mul4;
    
      always @* b2sum4 <= b1mul4 + b2mul4;
    
      always @* b1sum4 <= b2sum4 + b3mul4;
    
      always @* scale5 <= b1sum4 * scaleconst5;
    
      always @* scaletypeconvert5 <= scale5;
    
    
      //   ------------------ Section 5 ------------------
    
      always @ (posedge clk or posedge reset)
        begin: delay_process_section5
          if (reset == 1'b1) begin
            delay_section5[0] <= 0.0000000000000000E+00;
            delay_section5[1] <= 0.0000000000000000E+00;
          end
          else begin
            if (clk_enable == 1'b1) begin
              delay_section5[1] <= delay_section5[0];
              delay_section5[0] <= a1sum5;
            end
          end
        end // delay_process_section5
    
      always @* inputconv5 <= scaletypeconvert5;
    
    
      always @* a2mul5 <= delay_section5[0] * coeff_a2_section5;
    
      always @* a3mul5 <= delay_section5[1] * coeff_a3_section5;
    
      always @* b1mul5 <= a1sum5;
    
    
      always @* b2mul5 <= delay_section5[0] * coeff_b2_section5;
    
      always @* b3mul5 <= delay_section5[1];
    
    
      always @* a2sum5 <= inputconv5 - a2mul5;
    
      always @* a1sum5 <= a2sum5 - a3mul5;
    
      always @* b2sum5 <= b1mul5 + b2mul5;
    
      always @* b1sum5 <= b2sum5 + b3mul5;
    
      always @* output_typeconvert <= b1sum5;
    
    
      always @ (posedge clk or posedge reset)
        begin: Output_Register_process
          if (reset == 1'b1) begin
            output_register <= 0.0000000000000000E+00;
          end
          else begin
            if (clk_enable == 1'b1) begin
              output_register <= output_typeconvert;
            end
          end
        end // Output_Register_process
    
      // Assignment Statements
      assign filter_out = $realtobits(output_register);
    endmodule  // filteriir
    View Code
    • 23- 查找表和进位链(LUT CARRY,组合逻辑单元)

    查找表分表是LUT2、LUT5、LUT6,一个基本例子:12bit或操作。

    assign user_out = |datin[11:0];
    //LUT 
    wire out, out1_14;
    LUT6 #(
        .INIT(64'hFFFFFFFFFFFFFFFE))
    out1 (
        .I0(datin[3]),
        .I1(datin[2]),
        .I2(datin[5]),
        .I3(datin[4]),
        .I4(datin[7]),
        .I5(datin[6]),
        .O(out)
    );
    
    LUT6 #(
        .INIT(64'hFFFFFFFFFFFFFFFE))
    out2 (
        .I0(datin[9]),
        .I1(datin[8]),
        .I2(datin[11]),
        .I3(datin[10]),
        .I4(datin[1]),
        .I5(datin[0]),
        .O(out1_14)
    );
    
    LUT2 #(
        .INIT(4'hE))
    out3 (
        .I0(out),
        .I1(out1_14),
        .O(user_out)
    );
    View Code

    后续1:扩展矩阵运算、自适应处理模块

    后续2:学习接口协议、通信网络协议

    后续3:sysgen 学习模块搭建

    后续4:HLS(选学)/microblaze

    后续5:调研更加智能化地解决FPGA算法设计实现的工具,例如python好像就存在这样的工具包....

    切记,兼顾论文阅读、多向前辈请教。

  • 相关阅读:
    Go语言http之请求接收和处理 代码
    C++之IO流的状态以及使用
    C++之指向函数的指针
    C++之数组类型的形参
    C++之vector类型的形参
    C++之形参
    C++之运算符
    C++之多维数组
    C++之动态数组
    C++之指针
  • 原文地址:https://www.cnblogs.com/xingshansi/p/9021919.html
Copyright © 2011-2022 走看看