Error with verilog generate loop : Unable to bind wire/reg/memory - time-complexity

I am building a signed multiplier verilog code based on Row Adder Tree (binary tree) architecture and modified baugh-wooley algorithm.
However, I am facing issue with generate loop as follows when I add the partial products across subsequent layer of the binary tree.
Do you guys have any idea how to get away from those error ?
edaplayground online code
Is using generate loop the only feasible way (given large length of multiplicand and multiplier) to do the additions of partial products across layers of a binary tree ?
module multiply(clk, reset, in_valid, out_valid, in_A, in_B, out_C); // C=A*B
parameter A_WIDTH = 16;
parameter B_WIDTH = 16;
input clk, reset;
input in_valid; // to signify that in_A, in_B are valid
input signed [(A_WIDTH-1):0] in_A;
input signed [(B_WIDTH-1):0] in_B;
output reg signed [(A_WIDTH+B_WIDTH-1):0] out_C;
output reg out_valid; // to signify that out_C is valid
/*
This multiplier code architecture requires an area of O(N*M*logN) and time O(logN)
with M being the length or bitwidth of the multiplicand
see https://i.imgur.com/NaqjC6G.png or
Row Adder Tree Multipliers in http://www.andraka.com/multipli.php or
https://pdfs.semanticscholar.org/415c/d98dafb5c9cb358c94189927e1f3216b7494.pdf#page=10
regarding the mechanisms within all layers
In the case of an adder tree, the adders making up the levels closer to the input
take up real estate (remember the structure of row adder tree). As the size of
the input multiplicand bitwidth grows, it becomes more and more difficult to find a
placement that does not use long routes involving multiple switch nodes. The result
is the maximum clocking speed degrades quickly as the size of the bitwidth grows.
For signed multiplication, see also modified baugh-wooley algorithm for trick in
skipping sign extension, thus smaller final routed silicon area.
https://stackoverflow.com/questions/54268192/understanding-modified-baugh-wooley-multiplication-algorithm/
All layers are pipelined, so throughput = one result for each clock cycle
but each multiplication result still have latency = NUM_OF_INTERMEDIATE_LAYERS
*/
// The multiplication of two numbers is equivalent to adding as many copies of one
// of them, the multiplicand, as the value of the other one, the multiplier.
localparam SMALLER_WIDTH = (A_WIDTH <= B_WIDTH) ? A_WIDTH : B_WIDTH;
localparam LARGER_WIDTH = (A_WIDTH > B_WIDTH) ? A_WIDTH : B_WIDTH;
wire [(LARGER_WIDTH-1):0] MULTIPLICAND = (A_WIDTH > B_WIDTH) ? in_A : in_B ;
wire [(SMALLER_WIDTH-1):0] MULTIPLIPLIER = (A_WIDTH <= B_WIDTH) ? in_A : in_B ;
localparam NUM_OF_INTERMEDIATE_LAYERS = $clog2(SMALLER_WIDTH);
/*Stage 1: Binary multiplications to generate partial products rows*/
// first layer has "SMALLER_WIDTH" entries of data of width "LARGER_WIDTH"
// This resulted in a binary tree with faster vertical addition processes as we have
// lesser (NUM_OF_INTERMEDIATE_LAYERS) rows to add
reg [(LARGER_WIDTH-1):0] partial_products [0:(SMALLER_WIDTH-1)];
generate
genvar first_layer_index; // all partial products rows are in first layer
for(first_layer_index=0; first_layer_index<SMALLER_WIDTH; first_layer_index=first_layer_index+1) begin: first_layer
always #(posedge clk, posedge reset)
begin
if(reset) partial_products[first_layer_index] <= 0;
else begin
partial_products[first_layer_index] <= (MULTIPLICAND & MULTIPLIPLIER[first_layer_index]); // generation of partial products rows
end
end
end
endgenerate
/*Stage 2 : Intermediate partial products additions*/
// intermediate partial product rows
// Imagine a rhombus of height of "NUM_OF_INTERMEDIATE_LAYERS"
// and width of "LARGER_WIDTH" being re-arranged into binary row adder tree
// such that additions can be done in O(logN) time
generate
genvar layer;
for(layer=1; layer<NUM_OF_INTERMEDIATE_LAYERS; layer=layer+1) begin: middle_layers
// number of leafs (or children) in each layer within the binary tree
localparam NUM_OF_PP_ADDITION = (SMALLER_WIDTH >> layer);
reg [(LARGER_WIDTH+layer-1):0] middle_rows[0:(NUM_OF_PP_ADDITION-1)];
integer pp_index; // leaf index within each layer of the tree
always #(posedge clk, posedge reset)
begin
if(reset)
begin
for(pp_index=0; pp_index<NUM_OF_PP_ADDITION ; pp_index=pp_index+1)
middle_rows[pp_index] <= 0;
end
else begin
for(pp_index=0; pp_index<NUM_OF_PP_ADDITION ; pp_index=pp_index+1)
middle_rows[pp_index] <=
middle_layers[layer-1].middle_rows[1<<pp_index] +
(middle_layers[layer-1].middle_rows[(1<<pp_index) + 1]) << 1;
end
end
end
endgenerate
/*Stage 3 : Adding the final two partial products*/
wire sign_bit = in_A[A_WIDTH-1] ^ in_B[B_WIDTH-1];
always #(posedge clk, posedge reset)
begin
if(reset)
begin
out_C <= 0;
out_valid <= 0;
end
else out_C <= 0;// {sign_bit, };
end
endmodule
iverilog '-Wall' '-g2012' design.sv testbench.sv && unbuffer vvp a.out
design.sv:107: error: Unable to bind wire/reg/memory 'middle_layers[(layer)-('sd1)].middle_rows[('sd1)<<(pp_index)]' in 'test.mul.middle_layers[1]'
design.sv:108: error: Unable to bind wire/reg/memory 'middle_layers[(layer)-('sd1)].middle_rows[(('sd1)<<(pp_index))+('sd1)]' in 'test.mul.middle_layers[1]'
2 error(s) during elaboration.

your mistake is that there is no block named multiple_layers[0] in your code.
you start with
for(layer=1; ...) begin: multile_layers
reg [(LARGER_WIDTH+layer-1):0] middle_rows;
always begin
reset middle rows;
for ... multiple_layers [layer - 1] ...
end
end
so, the last reference to the previous block failed.
I guess you would need something like the following
for(layer=0; ...) begin: multile_layers
reg [(LARGER_WIDTH+layer-1):0] middle_rows;
if (layer > 1) begin
always begin
reset middle rows
for ... multiple_layers [layer - 1] ...
end
end
else begin
always begin
reset middle_rows
// no for
end
end
end

Related

Continuous assignment with 0 delay not getting the expected value after a signal positive edge

I have implemented an 8 bit serial-in parallel-out register in SystemVerilog and I'm trying to test it. I'm using Icarus Verilog as simulator.
In the test bench, I send 8 bits and wait for the rising edge of a signal, then check the obtained parallel buffer. The problem is that, after waiting for the rising edge, the parallel buffer has not the expected value. However, if I add a #0 delay to the assert it does work.
The signal on which I'm waiting the rising edge, and the buffer that should contain the expected value are assigned as:
assign rdy = (i == 7) & was_enabled;
assign out_data = {8{rdy}} & buff;
I know buff contains the right value, then, how is that on the rising edge of rdy, rdy is effectively 1 but out_data is still 0?
Wave dump
Note: See how when rdy goes high out_data is 0xaa.
Code
Serial-in parallel-out register
module sipo_reg(
input wire in_data,
output wire [7:0] out_data,
output wire rdy,
input wire en,
input wire rst,
input wire clk
);
reg [7:0] buff;
reg [2:0] i;
reg was_enabled;
wire _clk;
always #(posedge _clk, posedge rst) begin
if (rst) begin
buff <= 0;
i <= 7;
was_enabled <= 0;
end else begin
was_enabled <= 1;
buff[i] <= in_data;
i <= i == 0 ? 7 : (i - 1);
end
end
assign _clk = en | clk; // I know this is a very bad practice, I'm on it...
assign rdy = (i == 7) & was_enabled;
assign out_data = {8{rdy}} & buff;
endmodule
Test bench:
module utils_sipo_reg_tb;
reg clk = 1'b1;
wire _clk;
always #2 clk = ~clk;
assign _clk = clk | en;
reg in_data = 1'b0, rst = 1'b0, en = 1'b0;
wire [7:0] out_data;
wire rdy;
sipo_reg dut(in_data, out_data, rdy, en, rst, clk);
integer i = 0;
initial begin
$dumpfile(`VCD);
$dumpvars(1, utils_sipo_reg_tb);
en = 1;
#4 rst = 1;
#4 rst = 0;
assert(out_data === 8'b0);
assert(rdy === 1'b0);
//
// read 8 bits works
//
#4 en = 0;
for (i = 0; i < 8; i = i + 1) begin
#(negedge _clk) in_data = ~in_data;
end
en = 1;
#(posedge rdy);
assert(rdy === 1'b1);
assert(out_data === 8'haa); // <-- This fails, but works if I add a '#0' delay.
#20;
$finish;
end
endmodule
I have tried to replace these lines
assign rdy = (i == 7) & was_enabled;
assign out_data = {8{rdy}} & buff;
by these
assign rdy = (i == 7) & was_enabled;
assign out_data = {8{((i == 7) & was_enabled)}} & buff;
because I suspected the simulator was 'calculating' out_data after rdy because the former depends on the latter. However, they are still continuous assignments with 0 delay, I would expect them to get their value at the exact same time (unless a delay is added).
Would it be a good design practice to add a few picoseconds of delay after each #(posedge signal) to make sure everything is settled by the simulator?
You have a race condition in your testbench because you are trying to sample a signal at a time where it is changing. All digital systems have inherent race conditions, and the way to deal with them is to only sample your signals when you know they are stable.
In your case, you could use a small numeric delay as you have suggested. However, since you have a clock signal, if you know that changes to signals only occur on the posedge of the clock, you could sample signals at the negedge:
#(posedge rdy);
#(negedge clk);
assert(rdy === 1'b1);
assert(out_data === 8'haa);
This is a more robust approach than using a numeric delay since it scales better (no need to worry about picking the best numeric delay value).
This is a synchronous design and your assertion should synchronous as well. That means only using one edge, the (positive) clock edge. Once you start using other signal edges, you run into race conditions between statements waiting for the signal to change, which includes both the #(posedge rdy) procedural delay and the assign out_data = {8{rdy}} & buff; continuous assignment.
There are two approaches to fixing this in your testbench:
Do not use #(posedge rdy) in your prodedural code. Use
#(posedge clk iff (rdy === 1'b1));
assert(out_data === 8'haa);
Since i and was_enabled are both updated with nonblocking assignments, rdy gets sampled with its old value, as well as out_data in the assertion that follows.
Another option is using a concurrent assertion which is outside of any procedural code
assert property (#(posedge clk) $rose(rdy) |-> out_data === 8'haa);
This reads "When rdy has risen, this implies on the same cycle that out_data must be 8'haa"

Error (10170): expecting "<=", or "=", or "+=", or "-=", or "*=", or "/=", or "%=", or "&=", or "|=", or "^=", etc

module accumulator (
input [7:0] A ,
input reset,
input clk,
output reg carryout,
output reg overflow,
output reg [8:0] S,
output reg HEX0,
output reg HEX1,
output reg HEX2,
output reg HEX3
);
reg signA;
reg signS;
reg [7:0] magA;
reg [7:0] magS;
reg Alarger;
initial begin
S = 9'b000000000;
end
always_ff # (posedge clk, posedge reset) begin
if (reset) begin
S = 9'b000000000;
end
else begin
begin
signA <= A[7]; //Is A negative or positive
signS <= S[7];
S <= A + S;
end
if (signA == 1) begin //A is negative so magnitude is of 2s compliment
magA <= (~A[7:0] + 1'b1);
end
else begin
magA <= A;
end
if (signS == 1) begin //sum is negative so magnitude is of 2s compliment
magS <= (~S[7:0] + 1'b1);
end
else begin
magS <= S;
end
if (magA > magS) begin
Alarger <= 1'b1; //Magnitude of A is larger than magnitude of sum
end
else begin
Alarger <= 1'b0;
end
if ((signA == 1) & (Alarger == 1) & (S[7] == 0)) begin
overflow <= 1'b1;
end
else begin
overflow <= 1'b0;
end
if ((signS == 1) & (Alarger == 0) & (S[7] == 0)) begin
overflow <= 1'b1;
end
else begin
overflow <= 1'b0;
end
if ((signS == 1) & (signA == 1) & (S[7] == 0)) begin
overflow <= 1'b1;
end
else begin
overflow <= 1'b0;
end
if ((signS == 0) & (signA == 0) & (S[7] == 1)) begin
overflow <= 1'b1;
end
else begin
overflow <= 1'b0;
end
if (S[8] == 1) begin //carryout occurred
carryout <= 1'b1;
overflow <= 1'b0;
S <= 9'b000000000; //sum no longer valid
end
else begin
carryout <= 1'b0;
end
display_hex h1 //display of A
(
.bin (magA),
.hexl (HEX2),
.hexh (HEX3)
);
display_hex h2 //display of sum
(
.bin (S[7:0]),
.hexl (HEX0),
.hexh (HEX1)
);
end
end
endmodule
I am trying to make an accumulator that adds A (8 digit binary value that can be signed or unsigned) repeatedly to the sum. Once the sum is computed, then sum and A should display the value on 4 hex display LEDs (2 LEDs for A and 2 LEDs for sum). However, I am having a hard time getting it to compile. I have searched the error code and it seems like a general error for a syntax error and can have several meanings.
The error is the result of these two lines:
display_hex h1 //display of A
(
.bin (magA),
.hexl (HEX2),
.hexh (HEX3)
);
display_hex h2 //display of sum
(
.bin (S[7:0]),
.hexl (HEX0),
.hexh (HEX1)
);
Here, it appears you have a module named display_hex which converts an 8-bit value into the needed digits for a seven segment display. You are trying to use the module as if it were a function and modules are very much NOT functions. Modules in Verilog (or SystemVerilog as you are using, but the difference is really token at this point) can be though of as a group of hardware that takes in some inputs and spits out some outputs; and its important to note that they are static things. They either exist in the design or they do not; just like using ICs on a breadboard. The top module is the breadboard and the modules you declare under that module are components you are plugging into the board. The inputs and outputs are the various connections (pins) you must wire up to make everything work.
That said, always blocks (like the always_ff you are using) form a way of describing the logic and registers inside modules. Thus, you do thinks like assign logic/reg variables inside them to describe how the hardware behaves. If you look at your logic, you'll notice that the module declarations are dependent on reset; ie if reset is asserted, these modules wont exist, which doesnt make any sense. Electrical signals don't make entire ICs in a circuit disappear! Thus, you need to pull your module declaration out of your logical description of your acculumator, like so:
module accumulator (
...
);
...
display_hex h1 //display of A
(
.bin (magA),
.hexl (HEX2),
.hexh (HEX3)
);
display_hex h2 //display of sum
(
.bin (S[7:0]),
.hexl (HEX0),
.hexh (HEX1)
);
...
always_ff #(posedge clk, posedge reset) begin
// Your accumulator logic here
...
end
endmodule
Notice that the module declarations for the display_hex modules are stand alone, as I am declaring these modules exist, not dependence on anything!
However, there are a number of issues with your design besides that:
As you are using SystemVerilog constructs (always_ff), you should declare all of your variables type logic, not reg or left blank (ie, input clk should be input logic clk, reg signA should be logic signA). The logic type just makes everything easier, so use it :)
In your always_ff block, you do reset correctly except that the assignment should really be NBA (use S <= 9'b0;, not S = 9'b0; in the if (reset))
You use NBA inside your always_ff, which is correct, however, it appears you need to use these values right away in the following logic. This will not work as you expect, or at least it will not act within the same clock cycle. To fix this, youll need to decide what should be a register and what should just be values resulting from intermediate logic, then create a separate always_comb for the intermediate values.
I am making the assumption that the HEX variables are meant for seven segment displays, so they should probably declared at least [6:0] HEXn
I was not able to reproduce the exact error, but moving the instantiations of display_hex outside always_ff resolves the main issue:
module accumulator
(
/* ... */
);
// ...
always_ff # (posedge clk, posedge reset) begin
/* ... */
end
display_hex h1 (
/* ... */
);
display_hex h2 (
/* ... */
);
endmodule
Another thing: The code drives variable S from initial as well as always. This creates multiple drivers and the code will not compile. To fix this, remove the initial completely, you don't need it since S will be set to 0 when reset is asserted.
OR
You can move all the logic into the initial block; it'd look something like this (but this, most probably, won't synthesize):
initial begin
S = 0;
forever begin
wait #(posedge clock);
// Do stuff here ..
end
end

I wrote a verilog code but got errors like not a constant or unknown type

How can I remove the errors mentioned in the title?
reg [3:0]count;
reg [6:0]seg;
always # (posedge clock) begin
if (reset)
count = 0;
else
count = count+1'b0; //starting A counter
end
begin // this seems to need an "always #*" just before "begin"
case(count)
4'b0000: seg = 7'b0000001;
endcase
end
assign {a,b,c,d,e,f,g} = seg;
endmodule // where is "module"?
It shows error 44 i.e. count is not a constant
Error 1059 i.e. seg is of unknown type. Please help.
can't figure out what to do next
This seems to be a 7-segment counter that displays an hexadecimal value based on the value of a running 4-bit counter.
First, you need to declare inputs and outputs of your module, which can be done (in Verilog 2001) as this:
module counter7seg (
input wire clock,
input wire reset,
output wire a,
output wire b,
output wire c,
output wire d,
output wire e,
output wire f,
output wire g
);
This part of your description looks almost fine: a 4 bit counter. But you need (as Morgan pointed) to fix two things: assignments within a clocked always must be nonblocking ( <= instead of = ), and your counter, to be able to count, must add 1, not 0:
always # (posedge clock) begin
if (reset)
count <= 0;
else
count <= count+1'b1; //starting A counter
end
The following part, when you take the value of count and derive values for the segments, is incomplete: you need 16 cases here.
I can see that you use negative logic to switch on segments, so effectively, 0000 becomes 0000001, 0001 becomes (just guessing by the contents of picture below) 1001111,.... 1000 becomes 0000000, and so on.
always #* begin
case(count) // 16 cases here + default
4'b0000: seg = 7'b0000001; // 0
4'b0001: seg = 7'b1001111; // 1
4'b0010: seg = 7'b0110110; // 2
//......
//......
4'b1000: seg = 7'b0000000; // 8
//......
//......
4'b1111: seg = 7'b0110000; // a nice capital F
default: seg = 7'b1111111; // a default case with all segments off
endcase
end

Verilog I/O reading a character

I seem to have some issues anytime I try anything with I/O for verilog. Modelsim either throws function not supported for certain functions or does nothing at all. I simply need to read a file character by character and send each bit through the port. Can anyone assist
module readFile(clk,reset,dEnable,dataOut,done);
parameter size = 4;
//to Comply with S-block rules which is a 4x4 array will multiply by
// size so row is the number of size bits wide
parameter bits = 8*size;
input clk,reset,dEnable;
output dataOut,done;
wire [1:0] dEnable;
reg dataOut,done;
reg [7:0] addr;
integer file;
reg [31:0] c;
reg eof;
always#(posedge clk)
begin
if(file == 0 && dEnable == 2'b10)begin
file = $fopen("test.kyle");
end
end
always#(posedge clk) begin
if(addr>=32 || done==1'b1)begin
c <= $fgetc(file);
// c <= $getc();
eof <= $feof(file);
addr <= 0;
end
end
always#(posedge clk)
begin
if(dEnable == 2'b10)begin
if($feof(file))
done <= 1'b1;
else
addr <= addr+1;
end
end
//done this way because blocking statements should not really be used
always#(addr)
begin:Access_Data
if(reset == 1'b0) begin
dataOut <= 1'bx;
file <= 0;
end
else if(addr<32)
dataOut <= c[31-addr];
end
endmodule
I would suggest reading the entire file at one time into an array, and then iterate over the array to output the values.
Here is a snippet of how to read bytes from a file into a SystemVerilog queue. If you need to stick to plain old Verilog you can do the same thing with a regular array.
reg [8:0] c;
byte q[$];
int i;
// Read file a char at a time
file = $fopen("filename", "r");
c = $fgetc(file);
while (c != 'h1ff) begin
q.push_back(c);
$display("Got char [%0d] 0x%0h", i++, c);
c = $fgetc(file);
end
Note that c is defined as a 9-bit reg. The reason for is that $fgetc will return -1 when it reaches the end of the file. In order to differentiate between EOF and a valid 0xFF you need this extra bit.
I'm not familiar with $feof and don't see it in the Verilog 2001 spec, so that may be something specific to Modelsim. Or it could be the source of the "function not supported."

Why is XST optimizing away my registers and how do I stop it?

I have a simple verilog program that increments a 32 bit counter, converts the number to an ASCII string using $sformat and then pushes the string to the host machine 1 byte at a time using an FTDI FT245RL.
Unfortunately Xilinx XST keeps optimizing away the string register vector. I've tried mucking around with various initialization and access routines with no success. I can't seem to turn off optimization, and all of the examples I find online differ very little from my initialization routines. What am I doing wrong?
module counter(CK12, TXE_, WR, RD_, LED, USBD);
input CK12;
input TXE_;
output WR;
output RD_;
output [7:0] LED;
inout [7:0] USBD;
reg [31:0] count = 0;
reg [7:0] k;
reg wrf = 0;
reg rd = 1;
reg [7:0] lbyte = 8'b00000000;
reg td = 1;
parameter MEM_SIZE = 88;
parameter STR_SIZE = 11;
reg [MEM_SIZE - 1:0] str;
reg [7:0] strpos = 8'b00000000;
initial
begin
for (k = 0; k < MEM_SIZE; k = k + 1)
begin
str[k] = 0;
end
end
always #(posedge CK12)
begin
if (TXE_ == 0 && wrf == 1)
begin
count = count + 1;
wrf = 0;
end
else if (wrf == 0) // If we've already lowered the strobe, latch the data
begin
if(td)
begin
$sformat(str, "%0000000000d\n", count);
strpos = 0;
td = 0;
end
str = str << 8;
wrf = 1;
strpos = strpos + 1;
if(strpos == STR_SIZE)
td = 1;
end
end
assign RD_ = rd;
assign WR = wrf;
assign USBD = str[87:80];
assign LED = count[31:24];
endmodule
Loading device for application
Rf_Device from file '3s100e.nph' in
environment /opt/Xilinx/10.1/ISE.
WARNING:Xst:1293 - FF/Latch str_0
has a constant value of 0 in block
. This FF/Latch will be
trimmed during the optimization
process.
WARNING:Xst:1896 - Due to other
FF/Latch trimming, FF/Latch str_1
has a constant value of 0 in block
. This FF/Latch will be
trimmed during the optimization
process.
WARNING:Xst:1896 - Due to other
FF/Latch trimming, FF/Latch str_2
has a constant value of 0 in block
. This FF/Latch will be
trimmed during the optimization
process.
The $sformat task is unlikely to be synthesisable - consider what hardware the compiler would need to produce to implement this function! This means your 'str' register never gets updated, so the compiler thinks it can optimize it away. Consider a BCD counter, and maybe a lookup table to convert the BCD codes to ASCII codes.
AFAIK 'initial' blocks are not synthesisable. To initialize flops, use a reset signal. Memories need a 'for' loop like you have, but which triggers only after reset.