"unsigned" type conversion demands input in sequential process sensitivity list - alias

I have an address counter in a VHDL sequential process. Its idle value is set in a configuration register to a certain max value; afterwards, anytime it enters a certain state it should increment by one.
To get the maximum value, I declare a subset of an input std_logic_vector as an alias.
I declared address_int as an unsigned variable. I then defined a sequential process with a clk and a reset in the sensitivity list. When the reset is asserted, the address counter is set to the alias value. After reset is released, the counter is rolled over/incremented on rising edges when in a certain state.
The synthesis tool gives me this message:
*WARNING:Xst:819 line 134: The following signals are missing in the process sensitivity list: DL_CADU_SIZE*
And all the address lines have become asynchronous signals! What is going on here? Is there some strange behavior with unsigned that doesn't occur with integers? I usually use integers here, but the conversion seemed more straightforward from unsigned for purposes of code maintenance. I have tried ditching the alias and doing the straight conversion, but it didn't help.
library IEEE;
use ieee.std_logic_1164.a
use ieee.numeric_std.all;
-- entity declaration, ports, architecture, etc.
signal address_int : unsigned(8 downto 0);
alias aMaxWords : std_logic_vector(8 downto 0) is DL_CADU_SIZE(10 downto 2);
begin
WADDR <= std_logic_vector(address_int);
OUT_PROC: process (CLK_CORE, RST_N_CORE)
begin
if RST_N_CORE = '0' then
address_int <= unsigned(aMaxWords);
elsif rising_edge(CLK_CORE) then
if next_state = WRITE_WORD then
if address_int = unsigned(aMaxWords) then
address_int <= (others => '0');
else
address_int <= address_int + 1;
end if;
end if; -- WRITE_WORD
end if; -- rising_edge
end process OUT_PROC;
end RTL;

This:
if RST_N_CORE = '0' then
address_int <= unsigned(aMaxWords)
describes an async reset - therefore aMaxWords will be treated as asynchronous by the synthesiser irrespective of whether it is or not.
What the synthesiser interprets your code as is "while rst_n_core is low, copy the value of aMaxWords to address_int" so if aMaxWords changes during reset, the value must be copied across. The lack of that signal in your sensitivity list means that the synthesiser is making a circuit which behaves differently to what the language says it should, hence the warning.
It really shouldn't do this: without the signal in the sensitivity list, it ought to capture the signal on the falling edge of the reset line. But as that's not how most chips work, the synthesiser designers (in their infinite wisdom) decided many years ago to assume the designer intended to have that signal in the sensitivity list, and issue a warning, rather than saying "this can't work, fix it". So then you get code which works differently in simulation and synthesis. End rant.

Your reset code:
if RST_N_CORE = '0' then
address_int <= unsigned(aMaxWords);
is wrong. The definition of reset is set your circuit to known-state. But your code assign it to a signal. You should assign it as all 0 or all 1 for reset, or your aMaxWords must be constant (note that your synthesizer may be not enough intellegent for known it, then should assign it as constant) :
if RST_N_CORE = '0' then
address_int <= (others => '0');
or
if RST_N_CORE = '0' then
address_int <= (others => '1');

Related

I am getting error when check my systemverilog code in quartus II

I have this simple code checked with Quartus II. First, It gives me error 5000 iterations for loop limit then I try to change verilog constant loop limit variable in settings and now it is giving me this error
Error (293007): Current module quartus_map ended unexpectedly. Verify that you have sufficient memory available to compile your design. You can view disk space and physical RAM requirements on the System and Software Requirements page of the Intel FPGA website (http://dl.altera.com/requirements/).
Is this something related to tool limitation or am I doing something wrong with my code ?
Here is my code:
module Branch_status_table #(parameter BST_length = 16383) //16383
(
output reg [2:1] status,
output reg [32:1] PC_predict_o,
input wire [2:1] status_update,
input wire [32:1] PC_in, PC_update,
input wire [32:1] PC_predict_update,
input wire clk,en_1,RST
);
wire [14:1] PC_index, PC_index_update;
//Internal memory
reg [2:1] status_bits [BST_length:0];
reg [32:1] PC_predict [BST_length:0];
reg [16:1] PC [BST_length:0];
//Combinational
assign PC_index = PC_in [16:3];
assign PC_index_update = PC_update [16:3];
//
initial begin
for ( int i=0; i <= BST_length; i=i+1) begin
status_bits[i] <= 0;
PC_predict[i] <= 0;
PC[i]<=0;
end
end
//Prediction
always_ff #(posedge clk) begin
if ( (PC[PC_index]==PC_in[32:17]) && (status_bits[PC_index]!=0) ) begin
status <= status_bits [PC_index];
PC_predict_o <= PC_predict [PC_index];
end
else begin
status <= 0;
PC_predict_o <= 0;
end
end
//Update
always_ff #(posedge clk) begin
if (en_1==1) begin
status_bits[PC_index_update] <= status_update;
PC [PC_index_update] <= PC_update[32:17] ;
PC_predict[PC_index_update] <= PC_predict_update;
end
else begin
status_bits[PC_index_update] <= status_bits[PC_index_update] ;
PC [PC_index_update] <= PC [PC_index_update] ;
PC_predict[PC_index_update] <= PC_predict[PC_index_update] ;
end
end
endmodule
There a coding issue and maybe a resource utilization issue.
The coding issue:
The code infers block ram, and is attempting initialize/reset it.
In general you can't reset block rams using a single initial block.
That style of initialization of arrays can be done in a testbench, not in RTL.
Synthesis tools have physical limits.
To reset/initialize a block ram you must write 0 to each address. Use the same type of synchronous process (always #(posedge clk)) because the memory is a synchronous device. Put a mux in front of the write port and use a state machine a start up to write 0's to every address, then when the state machine finishes the state that move to an state where the BRAM behaves like you normally want it to.
This page discusses the issue.
https://www.edaboard.com/threads/how-to-clear-reset-my-bram-in-vhdl-xilinx.247572/
This is a Xilinx related answer; other vendors work the same way.
Summarizing the coding issue: You probably don't need to initialize and you can't do it this way:
initial begin
for ( int i=0; i <= BST_length; i=i+1) begin
status_bits[i] <= 0;
PC_predict[i] <= 0;
PC[i]<=0;
end
end
Potential Utilization Issue:
FPGAs' have finite resources.
The tool seems to be indicating that the code infers more block ram than the part has. To verify this, change parameter BST_length from 16K to something small like 8 and see if the utilization issue ("insufficient memory") goes away. If it goes away then re-design using less memory.
This can also be analyzed by hand knowing how much BRAM the part has and how much you are inferring. Don't infer more than the part has.
The tool is giving different answers with simple code changes because of two issues related to the same inference of BRAM. When the code changes a little the tool switches and informs about the other issue. This is how the tools work sometimes, a first error can mask a second by erroring out so that the 2nd error is not reached.

VHDL decimation(?) of data in specific way

My VHDL code doesn't behave as i expected.
What i want: i have 32 bit input data stream, and decimated 32 bit data output in some specific order.
Let's say each 32 bit data split into two 16 bit data.
first case: every second 16 bit of 32 bit data present on output;
second case: every fourth 16 bit of 32 bit data present on output;
third case fourth 16 bit of 32 bit data present on output
and so on.
Like in the picture:pic1
Here is first case of implementation:
process (CLK_IN, RST_IN)
begin
if (RST_IN = '1') then
rx_data_half_a <= (others => '0');
elsif rising_edge(CLK_IN) then
rx_data_half_a <= DATA_IN(15 downto 0);
end if;
end process;
process (CLK_IN, RST_IN)
begin
if (RST_IN = '1') then
rx_data_half_a0 <= (others => '0');
rx_data_half_a1 <= (others => '0');
elsif rising_edge(CLK_IN) then
rx_data_half_a0 <= rx_data_half_a;
rx_data_half_a1 <= rx_data_half_a0;
rx_data_half_a2 <= rx_data_half_a1;
DATA_OUT <= rx_data_half_a0 & rx_data_half_a;
end if;
end process;
And the testbench is looking like that:
sim
Instead of 00002222 44446666 ...
I get: 00002222 22224444 44446666 ...
I already do this job using memory (just counting specific addresses) but i dont' want to use it. I think there's much easiest way to implement this.
It is possible to do with registers without reducing the frequency?
Can you give me some advise?
You need a minimal state machine (eg: a counter) to track the input and update the data registers at the appropriate time. You logic is running every clock cycle and has no idea it needs to "skip" any of the incoming samples.
Since you are decimating, it is not possible to do this in registers or in memory without "reducing the frequency" in that you will have half (or 1/4 or whatever your decimation ratio is set to) as many output elements as input elements. If you use a memory you could burst at the full rate for a while, but you will still have to pause periodically and "re-fill" the buffer.

bubble sort in vhdl

Can anyone help me in writing VHDL code for bubble sort given an array of data as input?
I have declared in_array as input which contains 15 array elements. i want to bubble sort them in descending order.
in_array is input array.
sorted_array is out put array.
in_array_sig is signal of type in_array
I am facing problem with statements inside process
Below is my code:
architecture behav of Bubblesort is
signal in_array_sig : bubble;
signal temp : std_logic_vector(3 downto 0);
signal i : integer := 0;
begin
in_array_sig <= in_array;
proc1 : process(clk, reset)
begin
if reset = '0' then
if (clk'event and clk = '1') then
while (i <= 15) loop
if (in_array_sig(i) < in_array_sig(i + 1)) then
temp <= in_array_sig(i);
in_array_sig(i) <= in_array_sig(i + 1);
in_array_sig(i + 1) <= temp;
end if;
i <= i + 1;
end loop;
end if;
end if;
end process;
sorted_array <= in_array_sig;
end behav;
I am beginner in VHDL coding. Kindly help me with this.
The lack of a Minimal Complete and Verifiable example makes it hard to provide an answer about all the the things stopping your code from bubble sorting accurately. These can be described in the order you'd encounter them troubleshooting.
proc1 : process(clk, reset)
begin
if reset = '0' then
if (clk'event and clk = '1') then
while (i <= 15) loop
if (in_array_sig(i) < in_array_sig(i + 1)) then
temp <= in_array_sig(i);
in_array_sig(i) <= in_array_sig(i + 1);
in_array_sig(i + 1) <= temp;
end if;
i <= i + 1;
end loop;
end if;
end if;
end process;
Before starting note that the clock is gated with reset. You could qualify assignments with reset making it an enable instead.
Problems
The first thing we'd find producing an MCVe and a testbench is that the process never suspends. This is caused by the condition in the while loop depending on i and i a signal being updated within the process. i shouldn't be a signal here (and alternatively you could use a for loop here).
This also points out that temp is a signal and suffers the same problem, you can't use the 'new' value of temp until the process has suspended and resumed. Signals are scheduled for update, a signal assignment without a waveform element containing an after clause have an implicit after clause with zero delay. Signal updates do no occur while any process scheduled to resume has yet to resume and subsequently suspend. This allows the semblance of concurrency for signals who's assignments are found in sequential statements (a concurrent statement has an equivalent process containing equivalent sequential statements). So neither i nor temp can update during execution of a processes sequence of statements and both want to be variables.
We'd also get bitten using a signal for in_array_sig. As you increment i the previously indexed in_array_sig(i + 1) becomes the next loop iteration's in_array_sig(i). Without an intervening process suspend and resume the original value is available. in_array_sig wants to be a variable as well.
If we were to fix all these we'd also likely note that i is not initialized (this would be taken care of in a for loop iteration scheme) and we might also find that we get a bound error in a line using an (i + 1) index for in_array_sig. It's not clear without the author of the question providing an MCVe whether the array size is 16 (numbered 0 to 15) or 17. If the former i = 15 + 1 would be out of the index range for the undisclosed array type for in_array, in_array_sig, and sorted_array.
If we were to insure the index range is met noting that we only need 1 fewer tests and swaps than the number of elements in an array we'd find that what the process isn't a complete bubble sort. We would see the largest binary value of in_array_sig end up as the right most element of sorted_array. However the order of the remaining elements isn't guaranteed.
To perform a complete bubble sort we need another loop nesting the first one. Also the now 'inner' for loop can have a decreasing number of elements to traverse because each iteration leaves a largest remaining element rightmost until the order is assured to be complete.
Fixes
Fixing the above would give us something that looks like this:
architecture foo of bubblesort is
use ieee.numeric_std.all;
begin
BSORT:
process (clk)
variable temp: std_logic_vector (3 downto 0);
variable var_array: bubble;
begin
var_array := in_array;
if rising_edge(clk) then
for j in bubble'LEFT to bubble'RIGHT - 1 loop
for i in bubble'LEFT to bubble'RIGHT - 1 - j loop
if unsigned(var_array(i)) > unsigned(var_array(i + 1)) then
temp := var_array(i);
var_array(i) := var_array(i + 1);
var_array(i + 1) := temp;
end if;
end loop;
end loop;
sorted_array <= var_array;
end if;
end process;
end architecture foo;
Note the loop iteration schemes are described in terms of type bubble bounds, the outer is one shorter than the length and the inner is one shorter for each iteration. Also note the sorted_array assignment is moved into the process where the in_array_sig variable replacement var_array is visible.
Also of note is the use of the unsigned greater than operator. The ">" for std_logic_vector allows meta-values and 'H' and 'L' values to distort relational comparison while the operator for unsigned is arithmetic.
Results
Throw in package and entity declarations:
library ieee;
use ieee.std_logic_1164.all;
package array_type is
type bubble is array (0 to 15) of std_logic_vector(3 downto 0);
end package;
library ieee;
use ieee.std_logic_1164.all;
use work.array_type.all;
entity bubblesort is
port (
signal clk: in std_logic;
signal reset: in std_logic;
signal in_array: in bubble;
signal sorted_array: out bubble
);
end entity;
along with a testbench:
library ieee;
use ieee.std_logic_1164.all;
use work.array_type.all;
entity bubblesort_tb is
end entity;
architecture fum of bubblesort_tb is
signal clk: std_logic := '0';
signal reset: std_logic := '0';
signal in_array: bubble :=
(x"F", x"E", x"D", x"C", x"B", x"A", x"9", x"8",
x"7", x"6", x"5", x"4", x"3", x"2", x"1", x"0");
signal sorted_array: bubble;
begin
DUT:
entity work.bubblesort(foo)
port map (
clk => clk,
reset => reset,
in_array => in_array,
sorted_array => sorted_array
);
CLOCK:
process
begin
wait for 10 ns;
clk <= not clk;
if now > 30 ns then
wait;
end if;
end process;
end architecture;
and we get:
something that works.
The reset as enable has not been included in process BSORT in architecture and can be added in, inside the if statement with a clock edge condition.
And about here we get to Matthew Taylor's point in a comment about describing hardware.
Depending on the synthesis tool the process may or may not be realizable as hardware. If not you'd need intermediary variables holding the array portions used in each successive iteration of the inner loop.
There's also the issue of how much you can do in a clock cycle. Worst case there is a delay depth comprised of fifteen element comparisons and fifteen 2:2 selectors conditionally swapping element pairs.
If you were to pick a clock speed that was incompatible with the synthesized delay you'd need to re-architect the implementation from a software loop emulation to something operating across successive clocks.
That could be as simple as allowing more clock periods by using that enable to determine when the bubble sort is valid for loading into the sorted_array register. It could be more complex also allowing different and better performing sorting methods or a modification to bubble sort to say detect no more swaps can be necessary.

Combinational Logic Timing

I am currently trying to implement a data path, which calculates the following, in one clock cycle.
Takes input A and B and add them.
Shift the result of addition, one bit to right. (Dividing by 2)
Subtract the shifted result from another input C.
The behavioral architecture of the entity is simply shown below.
signal sum_out : std_logic_vector (7 downto 0);
signal shift_out : std_logic_vector (7 downto 0);
process (clock, data_in_a, data_in_b, data_in_c)
begin
if clock'event and clock = '1' then
sum_out <= std_logic_vector(unsigned(data_in_a) + unsigned(data_in_b));
shift_out <= '0' & sum_out(7 downto 1);
data_out <= std_logic_vector(unsigned(data_in_c) - unsigned(shift_out));
end if;
end process;
When I simulate the above code, I do get the result I expect to get. However, I get the result, after 3 clock cycles, instead 1 as I wish. The simulation wave form is shown below.
I am not yet familiar with implementing designs with timing concerns. I was wondering, if there are ways to achieve above calculations, in one clock cycle. If there are, how can I implement them?
Do do this with signals simply register only the last element in the chain (data_out). This analyzes, I didn't write a test bench to verify simulation.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity signal_single_clock is
port (
signal clock: in std_logic;
signal data_in_a: in std_logic_vector(7 downto 0);
signal data_in_b: in std_logic_vector(7 downto 0);
signal data_in_c: in std_logic_vector(7 downto 0);
signal data_out: out std_logic_vector(7 downto 0)
);
end entity;
architecture behave of signal_single_clock is
signal sum_out : std_logic_vector (7 downto 0);
signal shift_out : std_logic_vector (7 downto 0);
begin
sum_out <= std_logic_vector(unsigned(data_in_a) + unsigned(data_in_b));
shift_out <= '0' & sum_out(7 downto 1);
single_reg:
process (clock)
begin
if clock'event and clock = '1' then
data_out <= std_logic_vector(unsigned(data_in_c) - unsigned(shift_out));
end if;
end process;
end architecture;
When you assign a new value to a signal inside a process, this new value will be available only after the process finishes execution. Therefore, anytime you read the signal's value you will be using the original value from when the process started executing.
On the other hand, assignments to varibles take place immediately, and the new value can be used in the subsequent statements if you wish.
So, to solve you problem, simply implement sum_out, shift_out, and data_out using variables, instead of signals. Then simply copy the value of data_out to an output port of your entity.
Without using variables:
sum <= in_a + in_b;
process (clock)
begin
if rising_edge(clock) then
data_out <= in_c - ('0' & sum(7 downto 1));
end if;
end process;
All declarations except clock are unsigned(7 downto 0); why make it more complicated than that?
The original, pipelined to 3 cycles, will probably work at higher clock rates.
EDIT following comment:
I wanted to demonstrate that VHDL doesn't really have to be that verbose.
However there seem to be a lot of people "teaching" VHDL who are focussing on trivial elements and missing the big picture entirely, so I'll say a little bit about that.
VHDL is a strongly typed language, to prevent mistakes that creep in when types are mistaken for each other and (e.g.) you add two large numbers and get a negative result.
It does NOT follow from that, that you need type conversions all over the place.
Indeed, if you need a lot of type conversions, it's a sign that your design is probably wrong, and it's time to rethink that instead of ploughing ahead down the wrong path.
Code - in ANY language - should be as clean and simple as possible.
Otherwise it's hard to read, and there are probably bugs in it.
The big difference between a C-like language and VHDL is this:
In C, using the correct data types you can write sum = in_a + in_b;
and it will work. Using the wrong data types you can also write sum = in_a + in_b;
and it will compile just fine; what it actually does is another matter! The bugs are hidden : it is up to you to determine the correct types, and if you get it wrong there is very little you can do except keep on testing.
in VHDL, using the right types you can write sum <= in_a + in_b;
and using the wrong types, the compiler forces you to write something like sum <= std_logic_vector(unsigned(in_a) + unsigned(in_b)); which is damn ugly, but will (probably: see note 1) still work correctly.
So to answer the question : how do I decide to use unsigned or std_logic_vector?
I see that I need three inputs and an output. I could just make them std_logic_vectorbut I stop and ask: what do they represent?
Numbers.
Can they be negative? Not according to my reading of the specification (your question).
So, unsigned numbers... (Note 1)
Do I need non-arithmetic operations on them? Yes there's a shift.(Note 2)
So, numeric_std.unsigned which is related to std_logic_vector instead of natural which is just an integer.
Now you can't avoid type conversions altogether. Coding standards may impose restrictions such as "all top level ports must be std_logic_vector" and you must implement the external entity specification you are asked to; intermediate signals for type conversions are sometimes cleaner than the alternatives, e.g. in_a <= unsigned(data_in_a);
Or if you are getting instructions, characters and the numbers above from the same memory, for example, you might decide the memory contents must be std_logic_vector because it doesn't just contain numbers. But pick the correct place to convert type and you will find maybe 90% of the type conversions disappear. Take that as a design guideline.
(Note 1 : but what happens if C < (A+B)/2 ? Should data_out be signed? Even thinking along these lines has surfaced a likely bug that std_logic_vector left hidden...
The right answer depends on unknowns including the purpose of data_out : if it is really supposed to be unsigned, e.g. a memory address, you may want to flag an error instead of making it signed)
(Note 2 : there isn't a synthesis tool left alive that won't translate
signal a : natural; ... x <= a/2 into a shift right, so natural would also work, unless there were other reasons to choose unsigned. A lot of people seem to still be taught that integers aren't synthesisable, and that's just wrong.)

Translating a VHDL monitor into a PSL assertion

I have an interesting question about PSL assertion. Here is a VHDL monitor process. It is a process dedicated to an assertion, and thus a non-synthesizable one. This monitor checks the current FSM state and stores the values of two registers: "input1" and "reg136". Finally, it triggers an "assert" statement to compare the values of these registers.
process (clk)
variable var_a : signed(7 downto 0);
variable var_b : signed(7 downto 0);
begin
if (rising_edge(clk)) then
case state is
when s0 =>
var_a := signed(input1);
when s22 =>
var_t34 := signed(reg136);
when s85 =>
assert (var_t34 < var_a)
report "Assertion xxx failed : (t34 < a)";
end case;
end if;
end process;
Question is: is there a way to translate this monitor into a PSL (Property Specification Language) assertion ?
IMPORTANT: the registers "input1" and "reg136" can only be read when the fsm state is at state s0 and s22 respectively. Otherwise, the data contained in these registers does not belong to the asserted variables "a" and "t34". As a result, a PSL statement wouls need a way to read and store the values on the right fsm states.
Thank you !
Actually, I think I found a way to do that using the prev PSL function.
assert always (state = s85) -> (prev(reg136, 85-22) < prev(input1, 85-0)) #rising_edge(clk)
It requires the fsm to increment its state register by one on each clock cycle or to know how many clock cycles are needed to go from state s0 or s22 to state s85.
Can someone confirm that this would work ? I do not have a PSL-ready simulator to check this...