XOR outputs one because of propagaton delay in (basic) counter (FPGA) - hardware

I have a basic counter example, counter is 6-bits wide.
reg[5:0] currcounterval_reg;
always #(posedge clk_g0)
currcounterval_reg <= currcounterval_reg+ 1'b1;
My constraints
Clock is running at 83 Mhz 912ns period) on Virtex 7 chip. The counter has no reset and the output is connected to board pins. When I run the circuit, I see switching in the signals as shown in the attached (hardware run) o/p. In the attached screenshot, if you look at the counter, after 7(seven), I should get '8'...but it first switches to '12' as bit-2 goes to zero later than the others. I have an xor gate downstream where I compare o/p of two counters. How to I avoid getting into this problem?
Whatever I do for constraining, it doesn't go away. Please help me with some strategies to remove the switching.
Please feel free to ask me if you have more questions.
You can find a my waveform here
http://i.imgur.com/btEMiFD.png?1
Thanks.

I am guessing that it is an improper constraint issue. But try wiring a synchronous counter explicitly to solve the issue.
reg[5:0] counter;
always #(posedge clk_g0)
begin
counter[0] <= ~counter[0] ;
counter[1] <= (counter[0] ) ? ~counter[1] : counter[1];
counter[2] <= (&counter[1:0] ) ? ~counter[2] : counter[2];
counter[3] <= (&counter[2:0] ) ? ~counter[3] : counter[3];
counter[4] <= (&counter[3:0] ) ? ~counter[4] : counter[4];
counter[5] <= (&counter[4:0] ) ? ~counter[5] : counter[5];
end

You need to make the XOR output synchronous. The XOR output should go to the D input of a FF that is clocked on the same rising edge that increments the counters. Bring the Q output of the FF to an FPGA pin rather than the XOR output.
You shouldn't try to match the combinational logic delays and routing delays of the two counters. The delay paths from two counters to your XOR gates will almost certainly be different so a fast XOR (like on a Virtex 7) will generate glitches. In a good synchronous design that doesn't matter, because you only care that the combinational logic outputs are valid before the next clock edge.

Related

I am getting error when check my systemverilog code in quartus II

I have this simple code checked with Quartus II. First, It gives me error 5000 iterations for loop limit then I try to change verilog constant loop limit variable in settings and now it is giving me this error
Error (293007): Current module quartus_map ended unexpectedly. Verify that you have sufficient memory available to compile your design. You can view disk space and physical RAM requirements on the System and Software Requirements page of the Intel FPGA website (http://dl.altera.com/requirements/).
Is this something related to tool limitation or am I doing something wrong with my code ?
Here is my code:
module Branch_status_table #(parameter BST_length = 16383) //16383
(
output reg [2:1] status,
output reg [32:1] PC_predict_o,
input wire [2:1] status_update,
input wire [32:1] PC_in, PC_update,
input wire [32:1] PC_predict_update,
input wire clk,en_1,RST
);
wire [14:1] PC_index, PC_index_update;
//Internal memory
reg [2:1] status_bits [BST_length:0];
reg [32:1] PC_predict [BST_length:0];
reg [16:1] PC [BST_length:0];
//Combinational
assign PC_index = PC_in [16:3];
assign PC_index_update = PC_update [16:3];
//
initial begin
for ( int i=0; i <= BST_length; i=i+1) begin
status_bits[i] <= 0;
PC_predict[i] <= 0;
PC[i]<=0;
end
end
//Prediction
always_ff #(posedge clk) begin
if ( (PC[PC_index]==PC_in[32:17]) && (status_bits[PC_index]!=0) ) begin
status <= status_bits [PC_index];
PC_predict_o <= PC_predict [PC_index];
end
else begin
status <= 0;
PC_predict_o <= 0;
end
end
//Update
always_ff #(posedge clk) begin
if (en_1==1) begin
status_bits[PC_index_update] <= status_update;
PC [PC_index_update] <= PC_update[32:17] ;
PC_predict[PC_index_update] <= PC_predict_update;
end
else begin
status_bits[PC_index_update] <= status_bits[PC_index_update] ;
PC [PC_index_update] <= PC [PC_index_update] ;
PC_predict[PC_index_update] <= PC_predict[PC_index_update] ;
end
end
endmodule
There a coding issue and maybe a resource utilization issue.
The coding issue:
The code infers block ram, and is attempting initialize/reset it.
In general you can't reset block rams using a single initial block.
That style of initialization of arrays can be done in a testbench, not in RTL.
Synthesis tools have physical limits.
To reset/initialize a block ram you must write 0 to each address. Use the same type of synchronous process (always #(posedge clk)) because the memory is a synchronous device. Put a mux in front of the write port and use a state machine a start up to write 0's to every address, then when the state machine finishes the state that move to an state where the BRAM behaves like you normally want it to.
This page discusses the issue.
https://www.edaboard.com/threads/how-to-clear-reset-my-bram-in-vhdl-xilinx.247572/
This is a Xilinx related answer; other vendors work the same way.
Summarizing the coding issue: You probably don't need to initialize and you can't do it this way:
initial begin
for ( int i=0; i <= BST_length; i=i+1) begin
status_bits[i] <= 0;
PC_predict[i] <= 0;
PC[i]<=0;
end
end
Potential Utilization Issue:
FPGAs' have finite resources.
The tool seems to be indicating that the code infers more block ram than the part has. To verify this, change parameter BST_length from 16K to something small like 8 and see if the utilization issue ("insufficient memory") goes away. If it goes away then re-design using less memory.
This can also be analyzed by hand knowing how much BRAM the part has and how much you are inferring. Don't infer more than the part has.
The tool is giving different answers with simple code changes because of two issues related to the same inference of BRAM. When the code changes a little the tool switches and informs about the other issue. This is how the tools work sometimes, a first error can mask a second by erroring out so that the 2nd error is not reached.

Simple oscillator in VHDL

I've just started with VHDL, and I have a problem understanding how exactly process operates. Here is an example of a simple oscillator:
timer_1Hz: process(clk) is
variable timer : integer := 0;
constant period : integer := 50E6;
begin
--if rising_edge(clk) then
timer := (timer+1) rem period;
if (timer=0) then
led <= not led;
end if;
--end if;
end process timer_1Hz;
clk is an input (clock) signal with 50 MHz frequency, and 50% duty cycle.
Now, as I understand it, the process timer_1Hz will be triggered on any change in the clk signal, whether that be a transition from 0 to 1, or from 1 to 0.
I expected from the above example to flash LEDs with a frequency of 0.5 Hz, since the rising_edge test was commented out. In other words, I expected that the body will be triggered two times in a single clock period, on a rising and a falling edge. However, that doesn't seem to work, i.e., LEDs are never turned on.
If I include the rising_edge test, LEDs blink with a 1 Hz frequency, just as I expected.
Can someone please explain what am I missing in the unexpected case.
The code will not work without the rising edge part you removed. It might work in simulation, but not in a real fpga board. Why? Because sensitivity list is (mostly) important for simulation and not(so much) for synthesis.
For synthesis purpose, you have to always think about the hardware you would be implementing. In a practical sense, a process is NOT "run" when certain events occur.
If you really want 0.5 Hz output, just use 25E6 instead of 50E6 with the original code..

Output skew when using clocking blocks

I am using a clocking block in my interface for signal aliasing. I want to concatenate some of the bits together to form a bus, and then drive this bus from my driver. So, for example:
interface bus_intf (clk);
input logic clk;
logic[1:0] x_lsb;
logic[1:0] x_msb;
clocking driver_bus #(posedge clk)
default input #1step output #0;
output x_bus = {x_msb, x_lsb};
endclocking
endinterface
Now the problem with this is, in one of my assertions, I need to read bus_intf.driver_bus.x_bus. As stated in the SV manual, an output variable from a clocking block should not be read by the testbench, and if it is, then simulator spits out an error (or warning in my case).
So I modified the interface:
interface bus_intf (clk);
input logic clk;
logic[1:0] x_lsb;
logic[1:0] x_msb;
clocking driver_bus #(posedge clk)
default input #1step output #0;
inout x_bus = {x_msb, x_lsb};
endclocking
endinterface
The problem now is, in my waveform I see two signals being created - x_bus and x_bus__o. I understand why Questasim did this - it is to separate the inout declaration so I can view both versions.
However, the problem now is all my clocking drive is delayed by one clock cycle! so x_bus__o which is connected to the DUT is one clock cycle later than x_bus. This is inspite of me explicitly stating that output skew is #0.
Any idea why this happens? Am I doing something wrong or have I misunderstood?
I've put your code on EDAPlayground and tried it out. It seems to be working as expected. Here's my test harness:
module top;
bit clk;
always #1 clk = ~clk;
bus_intf busif(clk);
initial begin
#busif.driver_bus;
$display("time = ", $time);
busif.driver_bus.x_bus <= 'hf;
repeat (2)
#(negedge clk);
$display("time = ", $time);
busif.driver_bus.x_bus <= 'ha;
#100;
$finish();
end
always #(busif.x_lsb)
$display("time = ", $time, " x_lsb = ", busif.x_lsb);
always #(busif.x_msb)
$display("time = ", $time, " x_msb = ", busif.x_msb);
endmodule
The link is here if you want to try it online: http://www.edaplayground.com/x/Utf
If I drive x_bus at a posedge, then the value will be written immediately, as would be expected due to the #0 output delay. If I drive x_bus at a negedge (or at any other time aside from a posedge), then it will wait until the next posedge to drive the value. I see this behavior regardless of whether x_bus is declared as output or inout.
Check to see when you are scheduling your writes; this might be the reason you see some delays on your waves.
When you have bidirectional flow through a clocking block, the signal from the verification to the hardware and back has to go through two virtual D-FFs. So the original observation is correct. The input of 1-step is one D-FF to the design; then the return is one more D-FF back appearing 0ns (i.e., just after the clock). Clocking blocks are not useful in the situation of a signal that requires a single-cycle turn-around, and for that reason, you avoid them if that is a requirement. For most designs, it is simply not necessary. Monitors will observe the signals with a pipeline delay of one cycle, which is generally not a problem.

Combinational Logic Timing

I am currently trying to implement a data path, which calculates the following, in one clock cycle.
Takes input A and B and add them.
Shift the result of addition, one bit to right. (Dividing by 2)
Subtract the shifted result from another input C.
The behavioral architecture of the entity is simply shown below.
signal sum_out : std_logic_vector (7 downto 0);
signal shift_out : std_logic_vector (7 downto 0);
process (clock, data_in_a, data_in_b, data_in_c)
begin
if clock'event and clock = '1' then
sum_out <= std_logic_vector(unsigned(data_in_a) + unsigned(data_in_b));
shift_out <= '0' & sum_out(7 downto 1);
data_out <= std_logic_vector(unsigned(data_in_c) - unsigned(shift_out));
end if;
end process;
When I simulate the above code, I do get the result I expect to get. However, I get the result, after 3 clock cycles, instead 1 as I wish. The simulation wave form is shown below.
I am not yet familiar with implementing designs with timing concerns. I was wondering, if there are ways to achieve above calculations, in one clock cycle. If there are, how can I implement them?
Do do this with signals simply register only the last element in the chain (data_out). This analyzes, I didn't write a test bench to verify simulation.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity signal_single_clock is
port (
signal clock: in std_logic;
signal data_in_a: in std_logic_vector(7 downto 0);
signal data_in_b: in std_logic_vector(7 downto 0);
signal data_in_c: in std_logic_vector(7 downto 0);
signal data_out: out std_logic_vector(7 downto 0)
);
end entity;
architecture behave of signal_single_clock is
signal sum_out : std_logic_vector (7 downto 0);
signal shift_out : std_logic_vector (7 downto 0);
begin
sum_out <= std_logic_vector(unsigned(data_in_a) + unsigned(data_in_b));
shift_out <= '0' & sum_out(7 downto 1);
single_reg:
process (clock)
begin
if clock'event and clock = '1' then
data_out <= std_logic_vector(unsigned(data_in_c) - unsigned(shift_out));
end if;
end process;
end architecture;
When you assign a new value to a signal inside a process, this new value will be available only after the process finishes execution. Therefore, anytime you read the signal's value you will be using the original value from when the process started executing.
On the other hand, assignments to varibles take place immediately, and the new value can be used in the subsequent statements if you wish.
So, to solve you problem, simply implement sum_out, shift_out, and data_out using variables, instead of signals. Then simply copy the value of data_out to an output port of your entity.
Without using variables:
sum <= in_a + in_b;
process (clock)
begin
if rising_edge(clock) then
data_out <= in_c - ('0' & sum(7 downto 1));
end if;
end process;
All declarations except clock are unsigned(7 downto 0); why make it more complicated than that?
The original, pipelined to 3 cycles, will probably work at higher clock rates.
EDIT following comment:
I wanted to demonstrate that VHDL doesn't really have to be that verbose.
However there seem to be a lot of people "teaching" VHDL who are focussing on trivial elements and missing the big picture entirely, so I'll say a little bit about that.
VHDL is a strongly typed language, to prevent mistakes that creep in when types are mistaken for each other and (e.g.) you add two large numbers and get a negative result.
It does NOT follow from that, that you need type conversions all over the place.
Indeed, if you need a lot of type conversions, it's a sign that your design is probably wrong, and it's time to rethink that instead of ploughing ahead down the wrong path.
Code - in ANY language - should be as clean and simple as possible.
Otherwise it's hard to read, and there are probably bugs in it.
The big difference between a C-like language and VHDL is this:
In C, using the correct data types you can write sum = in_a + in_b;
and it will work. Using the wrong data types you can also write sum = in_a + in_b;
and it will compile just fine; what it actually does is another matter! The bugs are hidden : it is up to you to determine the correct types, and if you get it wrong there is very little you can do except keep on testing.
in VHDL, using the right types you can write sum <= in_a + in_b;
and using the wrong types, the compiler forces you to write something like sum <= std_logic_vector(unsigned(in_a) + unsigned(in_b)); which is damn ugly, but will (probably: see note 1) still work correctly.
So to answer the question : how do I decide to use unsigned or std_logic_vector?
I see that I need three inputs and an output. I could just make them std_logic_vectorbut I stop and ask: what do they represent?
Numbers.
Can they be negative? Not according to my reading of the specification (your question).
So, unsigned numbers... (Note 1)
Do I need non-arithmetic operations on them? Yes there's a shift.(Note 2)
So, numeric_std.unsigned which is related to std_logic_vector instead of natural which is just an integer.
Now you can't avoid type conversions altogether. Coding standards may impose restrictions such as "all top level ports must be std_logic_vector" and you must implement the external entity specification you are asked to; intermediate signals for type conversions are sometimes cleaner than the alternatives, e.g. in_a <= unsigned(data_in_a);
Or if you are getting instructions, characters and the numbers above from the same memory, for example, you might decide the memory contents must be std_logic_vector because it doesn't just contain numbers. But pick the correct place to convert type and you will find maybe 90% of the type conversions disappear. Take that as a design guideline.
(Note 1 : but what happens if C < (A+B)/2 ? Should data_out be signed? Even thinking along these lines has surfaced a likely bug that std_logic_vector left hidden...
The right answer depends on unknowns including the purpose of data_out : if it is really supposed to be unsigned, e.g. a memory address, you may want to flag an error instead of making it signed)
(Note 2 : there isn't a synthesis tool left alive that won't translate
signal a : natural; ... x <= a/2 into a shift right, so natural would also work, unless there were other reasons to choose unsigned. A lot of people seem to still be taught that integers aren't synthesisable, and that's just wrong.)

signal vs variable

VHDL provides two major object types to hold data, namel signal and variable, but I can't find anywhere that is clear on when to use one data-type over the other. Can anyone shed some light on their strengths/limitations/scope/synthesis/situations in which using one would be better than the other?
Signals can be used to communicate values between processes. Variables cannot. There are shared variables which can in older compilers, but you really are asking for problems (with race conditions) if you do that - unless you use protected types which are a bit like classes. Then they are same to use for communication, but not (as far as I know) synthesisable.
This fundamental restriction on communication comes from the way updates on signals and variables work.
The big distinction comes because variables update immediately they are assigned to (with the := operator). Signals have an update scheduled when assigned to (with the <= operator) but the value that anyone sees when they read the signal will not change until some time passes.
(Aside: That amount of time could be as small as a delta cycle, which is the smallest amount of time in a VHDL simuator - no "real" time passes. Something like wait for 0 ps; causes the simulator to wait for the next delta cycle before continuing.)
If you need the same logic to feed into multiple flipflops a variable is a good way of factoring that logic into a single point, rather than copying/pasting code.
In terms of logic, within a clocked process, signals always infer a flipflop. Variables can be used for both combinatorial logic and inferring a flipflop. Sometimes both for the same variable. Some think this confusing, personally, I think it's fine:
process (clk)
variable something : std_logic;
if rising_edge(clk) then
if reset = '1' then
something := '0';
else
output_b <= something or input c; -- using the previous clock's value of 'something' infers a register
something := input_a and input_b; -- comb. logic for a new value
output_a <= something or input_c; -- which is used immediately, not registered here
end if;
end if;
end process;
One thing to watch using variables is that because if they are read after they are written, no register output is used, you can get long chains of logic which can lead to missing your fmax target
One thing to watch using signals (in clocked processes) is that they always infer a register, and hence leads to latency.
As others have said signals get updated with their new value at the end of the time slice, but variables are updated immediately.
// inside some process
// varA = sigA = 0. sigB = 2
varA := sigB + 1; // varA is now 3
sigC <= varA + 1; // sigC will be 4
sigA <= sigB + 1; // sigA will be 3
sigD <= sigA + 1; // sigD will be 1 (original sigA + 1)
For hardware design, I use variables very infrequently. It's normally when I'm hacking in some feature that really needs the code to be re-factored, but I'm on a deadline. I avoid them because I find the mental model of working with signals and variables too different to live nicely in one piece of code. That's not to say it can't be done, but I think most RTL engineers avoid mixing... and you can't avoid signals.
Other points:
Signals have entity scoping. Variables are local to the process.
Both synthesize