How to resolve the delay problem between two modules in Verilog - cryptography

I am trying to implement BIST (Built-in self-test). Test pattern generator(TPG) module generates test patterns using LFSR per posedge clk (always #1 clk = ~clk). I am using 32 stage LFSR. Each value of TPG module is sent to the AES encryption module which gives ciphertext after #141 delay. During this #141 delay many of the TPG module outputs are lost. How to solve this problem?
I thought a solution is to store thousand test pattern values in the array (reg [127:0] arr [0:999]) and then send each test pattern to the AES encryption module. Is this a good solution because I think it is making chip memory bigger. If it is then what should I do? The code snippet is given below.
wire [127:0] state_byte;
TPG T (clk, rst, bistMode, state_byte);
AES_encryption ENC_TB(key_byte, state_byte, clk, rst, encryptionEnable, state_out_byte, load, ready);
codes of TPG module is given below:
module TPG (input wire clk, input wire rst, input wire sel, output reg[127:0] valueO);
integer i;
reg [31:0] patternGenerate[0:3],temp;
always #(posedge clk)begin
if(sel == 1)begin
if(rst)begin
valueO = 128'b0;
temp = 32'b11111111111111111111111111111111;
end
else
begin
for(i=0;i<4;i=i+1)begin
temp = {(temp[31] ^ temp[25] ^ temp[22] ^ temp[21] ^ temp[15] ^ temp[11] ^ temp[10] ^ temp[9] ^ temp[7] ^ temp[6] ^ temp[4] ^ temp[3] ^ temp[1] ^ temp[0]), temp[31:1]};
patternGenerate[i] = temp;
end
valueO = {patternGenerate[3],patternGenerate[2],patternGenerate[1],patternGenerate[0]};
end
end
end
endmodule

Related

Continuous assignment with 0 delay not getting the expected value after a signal positive edge

I have implemented an 8 bit serial-in parallel-out register in SystemVerilog and I'm trying to test it. I'm using Icarus Verilog as simulator.
In the test bench, I send 8 bits and wait for the rising edge of a signal, then check the obtained parallel buffer. The problem is that, after waiting for the rising edge, the parallel buffer has not the expected value. However, if I add a #0 delay to the assert it does work.
The signal on which I'm waiting the rising edge, and the buffer that should contain the expected value are assigned as:
assign rdy = (i == 7) & was_enabled;
assign out_data = {8{rdy}} & buff;
I know buff contains the right value, then, how is that on the rising edge of rdy, rdy is effectively 1 but out_data is still 0?
Wave dump
Note: See how when rdy goes high out_data is 0xaa.
Code
Serial-in parallel-out register
module sipo_reg(
input wire in_data,
output wire [7:0] out_data,
output wire rdy,
input wire en,
input wire rst,
input wire clk
);
reg [7:0] buff;
reg [2:0] i;
reg was_enabled;
wire _clk;
always #(posedge _clk, posedge rst) begin
if (rst) begin
buff <= 0;
i <= 7;
was_enabled <= 0;
end else begin
was_enabled <= 1;
buff[i] <= in_data;
i <= i == 0 ? 7 : (i - 1);
end
end
assign _clk = en | clk; // I know this is a very bad practice, I'm on it...
assign rdy = (i == 7) & was_enabled;
assign out_data = {8{rdy}} & buff;
endmodule
Test bench:
module utils_sipo_reg_tb;
reg clk = 1'b1;
wire _clk;
always #2 clk = ~clk;
assign _clk = clk | en;
reg in_data = 1'b0, rst = 1'b0, en = 1'b0;
wire [7:0] out_data;
wire rdy;
sipo_reg dut(in_data, out_data, rdy, en, rst, clk);
integer i = 0;
initial begin
$dumpfile(`VCD);
$dumpvars(1, utils_sipo_reg_tb);
en = 1;
#4 rst = 1;
#4 rst = 0;
assert(out_data === 8'b0);
assert(rdy === 1'b0);
//
// read 8 bits works
//
#4 en = 0;
for (i = 0; i < 8; i = i + 1) begin
#(negedge _clk) in_data = ~in_data;
end
en = 1;
#(posedge rdy);
assert(rdy === 1'b1);
assert(out_data === 8'haa); // <-- This fails, but works if I add a '#0' delay.
#20;
$finish;
end
endmodule
I have tried to replace these lines
assign rdy = (i == 7) & was_enabled;
assign out_data = {8{rdy}} & buff;
by these
assign rdy = (i == 7) & was_enabled;
assign out_data = {8{((i == 7) & was_enabled)}} & buff;
because I suspected the simulator was 'calculating' out_data after rdy because the former depends on the latter. However, they are still continuous assignments with 0 delay, I would expect them to get their value at the exact same time (unless a delay is added).
Would it be a good design practice to add a few picoseconds of delay after each #(posedge signal) to make sure everything is settled by the simulator?
You have a race condition in your testbench because you are trying to sample a signal at a time where it is changing. All digital systems have inherent race conditions, and the way to deal with them is to only sample your signals when you know they are stable.
In your case, you could use a small numeric delay as you have suggested. However, since you have a clock signal, if you know that changes to signals only occur on the posedge of the clock, you could sample signals at the negedge:
#(posedge rdy);
#(negedge clk);
assert(rdy === 1'b1);
assert(out_data === 8'haa);
This is a more robust approach than using a numeric delay since it scales better (no need to worry about picking the best numeric delay value).
This is a synchronous design and your assertion should synchronous as well. That means only using one edge, the (positive) clock edge. Once you start using other signal edges, you run into race conditions between statements waiting for the signal to change, which includes both the #(posedge rdy) procedural delay and the assign out_data = {8{rdy}} & buff; continuous assignment.
There are two approaches to fixing this in your testbench:
Do not use #(posedge rdy) in your prodedural code. Use
#(posedge clk iff (rdy === 1'b1));
assert(out_data === 8'haa);
Since i and was_enabled are both updated with nonblocking assignments, rdy gets sampled with its old value, as well as out_data in the assertion that follows.
Another option is using a concurrent assertion which is outside of any procedural code
assert property (#(posedge clk) $rose(rdy) |-> out_data === 8'haa);
This reads "When rdy has risen, this implies on the same cycle that out_data must be 8'haa"

Checker not found. promblem in verilog modelsim

module Vr_ALU (A, B, ALUCtrl, ALUOut, Zero);
input [31:0] A;
input [31:0] B;
input [2:0] ALUCtrl;
output [31:0] ALUOut;
output Zero;
wire [31:0] sig_a;
wire [31:0] sig_b;
wire [31:0] sig_sum;
wire sig_cin;
wire sig_cout;
always #(*) begin
if(ALUCtrl==2'b010)
Vr_ripple_adder_M_bits U1(.A(sig_a), .B(sig_b), .CIN(sig_cin), .S(sig_sum), .COUT(sig_cout));
else if(ALUCtrl==2'b110)
Vr_ripple_adder_M_bits U2(.A(sig_a), .B(~sig_b), .CIN(~sig_cin), .S(sig_sum), .COUT(~sig_cout));
else ALUOut = 2'bx;
end
assign Zero = (ALUCtrl==2'b110 && ALUOut==0)? 1:0;
endmodule
at this code, I try to make module work as adder when ALUCtrl is 010, and as subtractor when ALUCtrl is 110. But I'm having 'checker not found. Instantiation must be of a visible checker' problem.
Need help.
You cannot instantiate modules in always blocks. You cannot instantiate module conditionally. Modules represent hardware and as such they are always present.
Instead you can use muxes to switch inputs in your module. For example,
reg[31:0] sig_b_temp;
reg sig_cin_temp;
reg sig_cout_temp;
reg sig_cout; // uou need 'reg' for this example.
// muxes
always #(*) begin
if(ALUCtrl==2'b010) begom
//inuts
sig_cin_temp = sig_sin;
sig_b_temp = sig_b;
//outputs
sig_cout = sig_cout_temp;
end
else begin
//inputs
sig_cin_temp = ~sig_sin;
sig_b_temp = ~sig_b;
//output
sig_cout = ~sig_cout_temp;
end
end
//single module instance
Vr_ripple_adder_M_bits U1(.A(sig_a),
.B(sig_b_temp),
.CIN(sig_cin_temp),
.S(sig_sum),
.COUT(sig_cout_temp));
Note, the code above will not compile with

Error with verilog generate loop : Unable to bind wire/reg/memory

I am building a signed multiplier verilog code based on Row Adder Tree (binary tree) architecture and modified baugh-wooley algorithm.
However, I am facing issue with generate loop as follows when I add the partial products across subsequent layer of the binary tree.
Do you guys have any idea how to get away from those error ?
edaplayground online code
Is using generate loop the only feasible way (given large length of multiplicand and multiplier) to do the additions of partial products across layers of a binary tree ?
module multiply(clk, reset, in_valid, out_valid, in_A, in_B, out_C); // C=A*B
parameter A_WIDTH = 16;
parameter B_WIDTH = 16;
input clk, reset;
input in_valid; // to signify that in_A, in_B are valid
input signed [(A_WIDTH-1):0] in_A;
input signed [(B_WIDTH-1):0] in_B;
output reg signed [(A_WIDTH+B_WIDTH-1):0] out_C;
output reg out_valid; // to signify that out_C is valid
/*
This multiplier code architecture requires an area of O(N*M*logN) and time O(logN)
with M being the length or bitwidth of the multiplicand
see https://i.imgur.com/NaqjC6G.png or
Row Adder Tree Multipliers in http://www.andraka.com/multipli.php or
https://pdfs.semanticscholar.org/415c/d98dafb5c9cb358c94189927e1f3216b7494.pdf#page=10
regarding the mechanisms within all layers
In the case of an adder tree, the adders making up the levels closer to the input
take up real estate (remember the structure of row adder tree). As the size of
the input multiplicand bitwidth grows, it becomes more and more difficult to find a
placement that does not use long routes involving multiple switch nodes. The result
is the maximum clocking speed degrades quickly as the size of the bitwidth grows.
For signed multiplication, see also modified baugh-wooley algorithm for trick in
skipping sign extension, thus smaller final routed silicon area.
https://stackoverflow.com/questions/54268192/understanding-modified-baugh-wooley-multiplication-algorithm/
All layers are pipelined, so throughput = one result for each clock cycle
but each multiplication result still have latency = NUM_OF_INTERMEDIATE_LAYERS
*/
// The multiplication of two numbers is equivalent to adding as many copies of one
// of them, the multiplicand, as the value of the other one, the multiplier.
localparam SMALLER_WIDTH = (A_WIDTH <= B_WIDTH) ? A_WIDTH : B_WIDTH;
localparam LARGER_WIDTH = (A_WIDTH > B_WIDTH) ? A_WIDTH : B_WIDTH;
wire [(LARGER_WIDTH-1):0] MULTIPLICAND = (A_WIDTH > B_WIDTH) ? in_A : in_B ;
wire [(SMALLER_WIDTH-1):0] MULTIPLIPLIER = (A_WIDTH <= B_WIDTH) ? in_A : in_B ;
localparam NUM_OF_INTERMEDIATE_LAYERS = $clog2(SMALLER_WIDTH);
/*Stage 1: Binary multiplications to generate partial products rows*/
// first layer has "SMALLER_WIDTH" entries of data of width "LARGER_WIDTH"
// This resulted in a binary tree with faster vertical addition processes as we have
// lesser (NUM_OF_INTERMEDIATE_LAYERS) rows to add
reg [(LARGER_WIDTH-1):0] partial_products [0:(SMALLER_WIDTH-1)];
generate
genvar first_layer_index; // all partial products rows are in first layer
for(first_layer_index=0; first_layer_index<SMALLER_WIDTH; first_layer_index=first_layer_index+1) begin: first_layer
always #(posedge clk, posedge reset)
begin
if(reset) partial_products[first_layer_index] <= 0;
else begin
partial_products[first_layer_index] <= (MULTIPLICAND & MULTIPLIPLIER[first_layer_index]); // generation of partial products rows
end
end
end
endgenerate
/*Stage 2 : Intermediate partial products additions*/
// intermediate partial product rows
// Imagine a rhombus of height of "NUM_OF_INTERMEDIATE_LAYERS"
// and width of "LARGER_WIDTH" being re-arranged into binary row adder tree
// such that additions can be done in O(logN) time
generate
genvar layer;
for(layer=1; layer<NUM_OF_INTERMEDIATE_LAYERS; layer=layer+1) begin: middle_layers
// number of leafs (or children) in each layer within the binary tree
localparam NUM_OF_PP_ADDITION = (SMALLER_WIDTH >> layer);
reg [(LARGER_WIDTH+layer-1):0] middle_rows[0:(NUM_OF_PP_ADDITION-1)];
integer pp_index; // leaf index within each layer of the tree
always #(posedge clk, posedge reset)
begin
if(reset)
begin
for(pp_index=0; pp_index<NUM_OF_PP_ADDITION ; pp_index=pp_index+1)
middle_rows[pp_index] <= 0;
end
else begin
for(pp_index=0; pp_index<NUM_OF_PP_ADDITION ; pp_index=pp_index+1)
middle_rows[pp_index] <=
middle_layers[layer-1].middle_rows[1<<pp_index] +
(middle_layers[layer-1].middle_rows[(1<<pp_index) + 1]) << 1;
end
end
end
endgenerate
/*Stage 3 : Adding the final two partial products*/
wire sign_bit = in_A[A_WIDTH-1] ^ in_B[B_WIDTH-1];
always #(posedge clk, posedge reset)
begin
if(reset)
begin
out_C <= 0;
out_valid <= 0;
end
else out_C <= 0;// {sign_bit, };
end
endmodule
iverilog '-Wall' '-g2012' design.sv testbench.sv && unbuffer vvp a.out
design.sv:107: error: Unable to bind wire/reg/memory 'middle_layers[(layer)-('sd1)].middle_rows[('sd1)<<(pp_index)]' in 'test.mul.middle_layers[1]'
design.sv:108: error: Unable to bind wire/reg/memory 'middle_layers[(layer)-('sd1)].middle_rows[(('sd1)<<(pp_index))+('sd1)]' in 'test.mul.middle_layers[1]'
2 error(s) during elaboration.
your mistake is that there is no block named multiple_layers[0] in your code.
you start with
for(layer=1; ...) begin: multile_layers
reg [(LARGER_WIDTH+layer-1):0] middle_rows;
always begin
reset middle rows;
for ... multiple_layers [layer - 1] ...
end
end
so, the last reference to the previous block failed.
I guess you would need something like the following
for(layer=0; ...) begin: multile_layers
reg [(LARGER_WIDTH+layer-1):0] middle_rows;
if (layer > 1) begin
always begin
reset middle rows
for ... multiple_layers [layer - 1] ...
end
end
else begin
always begin
reset middle_rows
// no for
end
end
end

Variable assignment in SystemVerilog generate statement

I have created a simple module that I replicate several times using the Verilog generate statement. However, it seems that the generate statement somehow effects variable assignment in the module. Here's the code:
module test();
timeunit 10ns;
timeprecision 1ns;
wire[3:0] out;
reg[3:0] values[0:4] = {5, 6, 7, 8, 9};
logic clk;
generate
genvar i;
for (i=0; i < 5; i++) begin: M1
MUT mut(
.out,
.in(values[i]),
.clk
);
end
endgenerate
initial begin
#1 clk = 0;
$monitor("%b %b %b %b %b\n", M1[0].mut.out, M1[1].mut.out, M1[2].mut.out, M1[3].mut.out, M1[4].mut.out);
#10 $stop;
end
always #1 clk++;
endmodule
module MUT(output [3:0] out, input [3:0] in, input clk);
reg[3:0] my_reg[0:7];
assign out = my_reg[7];
always #(posedge clk) begin
my_reg[7] <= in; //5
end
endmodule
The expected output of this test program would be 0101 0110 0111 1000 1001, however the output I get is xxxx xxxx xxxx xxxx. It seems that the values in the values variable in the test module are not getting assigned to the out variable in the MUT module. However, when I replace my_reg[7] <= in; with say, my_reg[7] <= 5;, the code works as expected. The code also works when I assign directly to out (after declaring it as register) i.e. out <= in;. There's no problem if I replicate the MUT modules 'manually' without using any generate statements.
You are not connecting the outputs to separate wires. So they are implicitly tied together(like how it did for clock) resulting multiple drivers for a bit.
Just add
wire[3:0] out[0:4];
generate
genvar i;
for (i=0; i < 5; i++) begin: M1
MUT mut(
.out(out[i]), // Connect to different wires
.in(values[i]),
.clk
);
end
endgenerate
Try to initialize clk variable with 0.

How to pass array structure between two verilog modules

I am trying to pass a array structure as reg [0:31]instructionmem[0:31] between two modules.
I coded it as follows :
Module No 1:
module module1(instructionmem);
output reg [0:31]instructionmem[0:31];
------------------
----lines of code---
---------------
endmodule
Module No 2:
module module2(instructionmem);
input [0:31]instructionmem[0:31];
--------------------------------
-----line of code---------------
-------------------------------
endmodule
Testbench:
module test_bench();
wire [0:31]instructionmem[0:31];
module1 m1(instructionmem);
module2 m2(instructionmem);
endmodule
I am getting errors for this implementation. So how can we send such array structures ?
This is not possible in Verilog. (See sec. 12.3.3, Syntax 12-4 of the Verilog 2005 standard document, IEEE Std. 1364-2005.)
Instead you should "flatten" the array and pass it as a simple vector, e.g.:
module module1(instructionmem);
output [32*32-1:0] instructionmem;
reg [31:0] instructionmem_array [31:0];
genvar i;
generate for (i = 0; i < 32; i = i+1) begin:instmem
assign instructionmem[32*i +: 32] = instructionmem_array[i];
end endgenerate
endmodule
module module2(instructionmem);
input [32*32-1:0] instructionmem;
reg [31:0] instructionmem_array [31:0];
integer i;
always #*
for (i = 0; i < 32; i = i+1)
instructionmem_array[i] = instructionmem[32*i +: 32];
endmodule
module test_bench(instructionmem);
output [32*32-1:0] instructionmem;
module1 m1(instructionmem);
module2 m2(instructionmem);
endmodule