Optimizing design with many identical units that could be shared - optimization

I have a design that generates a video signal on demand, without using RAM resources for a framebuffer.
I have a hierarchy that represents the screen layout, with a toplevel block generating the pixel clock and sync signals, and creating a signal that shows the coordinates of the next pixel. Below that are various blocks with the same interface:
type point is record
valid : std_logic;
x : unsigned(11 downto 0);
y : unsigned(11 downto 0);
end record;
type color is record
r : std_logic;
g : std_logic;
b : std_logic;
end record;
component source is
port(
pos : in point;
col : out color);
end component;
The general idea is that each of these blocks either generates a signal directly, or contains sub-blocks.
I'd like to stick with the pixel-on-demand schema, as it allows me to do
architecture syn of zoom2_block is
signal slave_pos : point;
begin
slave_pos.valid <= pos.valid;
slave_pos.x <= "0" & pos.x(10 downto 1);
slave_pos.y <= "0" & pos.y(10 downto 1);
slave : source port map(
pos => slave_pos,
col => col);
end architecture;
Now, the innermost pixel generators for several blocks are fairly similar (e.g. font pixel lookup), and because only one pixel will ever be passed outside, I wonder whether I can somehow share blocks, e.g. like the font in the hierarchy
output
source : split_screen
source : zoom
source : text
font
source: text
font
The text blocks themselves cannot be shared, because these contain the actual character codes given to the font blocks -- but the font block is twice exactly the same -- taking a coordinate and character code and returning the appropriate pixel value, with no state. Since the font data is large, not being able to share these is a problem.
Ideas I've had so far:
Have every block output '-' while pos.valid = '0', in the hope that the compiler will notice that this will be the case only for one block in the hierarchy, at all time. I'm not sure the compiler will get this.
Create a special component that arbitrates access to the font block, as a generic with array(1 to N) of point interface, selecting the first input with pos.valid = 1. This would still require me to build a hierarchy that is no longer a tree.
Can this be done?

Related

How do I assign data to an internal input port

I have an FPGA trying to read/write values to SDRAM on the same chip. What the sdram sees as IN, the top level sees as OUT and otherwise. SDRAM "paths" are instantiated and are brought to the top level. These paths have no direction. However, I know that the top level reads and writes to the sdram. I tried a variation of the code shown and it compiled. The code below is an example to pass two values to the SDRAM and read a third value. I have assigned a direction to paths. Is my logic correct in that it sends two values and received a third?
use IEEE.STD_LOGIC_UNSIGNED.ALL; -- see page 36 of Circuit Design with VHDL
port(
-- ---------------------------------------------------------------------
-- Global signals ------------------------------------------------------
CLK : in std_logic;
RESET : in std_logic;
A : out std_logic_vector(15 downto 0);
B : out std_logic_vector(15 downto 0);
C : in std_logic_vector(15 downto 0);
end entity sigma_k_top;
architecture rtl of function_top is
signal cntr : std_logic_vector(31 downto 0);
signal sig_A : std_logic_vector(15 downto 0);
signal sig_B : std_logic_vector(15 downto 0);
signal sig_C : std_logic_vector(15 downto 0);
begin
sdram_inst : entity work.sdram
port map (
CLK => sdram_CLK_in, --CLK shared by all
A => sdram_A_in, -- Write to sdram
B => sdram_B_in,-- Write to sdram
C => sdram_C_out, --Read from sdram
);
transfer: process(CLK)
begin
IF rising_edge(CLK) then
cntr <= cntr + 1;
if cntr = 1000 then --
sig_A <= "1000000000000000";
sig_B <= "1000000000000000";
end if;
if cntr = 1001 then
if C(0) = '1' then
sig_A <= sig_A - 1; -- Writing
sig_B <= sig_B + 1; -- Writing
xfer <= C ; -- Reading
end if;
end if;
if cntr > 2000 then
cntr <= (others => '0');
end if;
END IF;
end process;
-- -------------------------------------------------------------------------
-- Top-level ports ---------------------------------------------------------
TEST_LED(7 downto 0) <= xfer(7 downto 0); -- Making some sdram output visible
A <= sig_A; -- Sending value to sdram
B <= sig_B; -- Sending value to sdram
end architecture rtl;
What inputs and outputs exist to/from the RAM could vary based on how you intend to use it. If the RAM really exists on the FPGA chip itself, an example might be that you want to use a simple single port RAM on say a Xilinx Block RAM library component.
As it appears from the code that the sdram is instanced under the FPGA's top level (the RAM is contained within the fpga chip), it seems that the what are the RAM's inputs/outputs should also be the top level's inputs/outputs. It would be reversed if the sdram were outside the FPGA (and thus outside the FPGA's top level)
In general, RAMs tend to be sequential elements that require at the least:
-A clock (typically a 1-bit wide signal)
-An address (tends to be log2(n) bits wide, where n is the size of the RAM array. So if the array has 64 elements, you'd need at least 6 bits to address everything. The same address signal could be used for both reads and writes, or maybe you would have 2 separate address signals.)
-A write enable (in the simplest from could be a 1-bit signal. The most typical use would be to assert this signal for 1 clock cycle to update data at the current address of the address signal)
-data (width would vary and tends to be flexible/configurable on an FPGA. If you want to store 16-bits of data in each RAM entry that should be perfectly valid. This could be a single signal or 2 separate ones for read and write data).
As long as the signal vectors going to/from the RAM have at least these basic functions, it seems like you should be able to use it at least as a simple RAM. Note by the way that in your code the sdram_* signals are neither declared nor connected to anything other than the sdram instance itself.

How to declare a global variable in Verilog?

I am writing to ask how to declare a global variable in Verilog. What declared by parameter and define keywords are essentially constants, not variables.
What I need is the following:
`define Glitch
module Cell ( Shift_In, Shift_Out_Screwed, Clk );
input Clk, Shift_In;
output Shift_Out_Screwed;
wire Shift_Out_Screwed;
wire Shift_Out;
Inverter INV1 ( Shift_In, Shift_Out, Clk );
assign Shift_Out_Screwed = Glitch ? ~Shift_Out : Shift_Out
endmodule
This is a very simple glitch insertion. When Glitch==1, the original output is reversed; when Glitch==0, the original output is kept unchanged. I want the signal Glitch to be defined in an external simulation testbench.v file although it is declared and used here, and I don't want to add the signal Glitch to the input port list of the module cell. This is because that my real circuit is a very complicated one, and if I add an input port to a certain cell, there will be many other cells affected.
Does anyone know how to declare a global variable in Verilog?
The problem you are wrestling with sounds like error injection. You want the ability to inject a bit error on an output port from within a testbench. You can do it like this:
module Cell ( Shift_In, Shift_Out_Screwed, Clk );
input Clk, Shift_In;
output Shift_Out_Screwed;
wire Shift_Out_Screwed;
wire Shift_Out;
Inverter INV1 ( Shift_In, Shift_Out, Clk );
`ifdef SIMULATION
// This logic is used in simulation, but not synthesis. Use care.
logic Glitch = 1'b0;
assign Shift_Out_Screwed = Glitch ? ~Shift_Out : Shift_Out
`else
// This logic is used in synthesis, but not simulation. Use care.
assign Shift_out_Screwed = Shift_out;
`endif
endmodule
Note that I use the "SIMULATION" preprocessor switch to hide the "Glitch" error injection from synthesis. Use this technique with care to avoid creating simulation/synthesis mismatches.
In your testbench, you can induce a glitch in a specific instance of your cell by referencing the "Glitch" signal in the design hierarchy, like this:
initial begin
...
#(posedge Clk); #1;
$top.u_foo.u_bar.u_cell.Glitch = 1'b1;
#(posedge Clk); #1;
$top.u_foo.u_bar.u_cell.Glitch = 1'b1;
...
end
The above code snippet will inject one cycle of "Glitch".
As an alternative: a more traditional way of injecting errors is to use the "force" statement in the testbench to override a driven in a device under test.

Multiple behaviours for single entity

I wrote a VHDL Testbench which contains the following :
Lots of signal declarations
UUT instantiations / port maps
A huge amount of one-line concurrent assignments
Various small processes
One main (big) process which actually stimulates the UUT.
Everything is fine except the fact that I want to have two distinct types of stimulation (let's say a simple stimulus and a more complex one) so what I did is I created two testbenches which have everything in common except the main big process.
But I don't find it really convenient since I always need to update both when, for example, I make a change to the UUT port map. Not cool.
I don't really want to merge my two main process because it will look like hell and I can't have the two process declared concurrently in the same architecture (I might end up with a very long file and I don't like that they can theoretically access the same signals).
So I would really like to keep a "distinct files" approach but only for that specific process. Is there a way out of this or am I doomed?
This seems like an example where using multiple architectures of the same entity would help. You have a file along the lines of:
entity TestBench
end TestBench;
architecture SimpleTest of TestBench is
-- You might have a component declaration for the UUT here
begin
-- Test bench code here
end SimpleTest;
You can easily add another architecture. You can have architectures in separate files. You can also use direct entity instantiation to avoid the component declaration for the UUT (halving the work required if the UUT changes):
architecture AnotherTest of TestBench is
begin
-- Test bench code here
UUT : entity work.MyDesign (Behavioral)
port map (
-- Port map as usual
);
end AnotherTest ;
This doesn't save having duplicate code, but at least it removes one of the port map lists.
Another point if you have a lot of signals in your UUT port map, is that this can be easier if you try to make more of the signals into vectors. For example, you might have lots of serial outputs of the same type going to different chips on the board. I have seen lots of people will name these like SPI_CS_SENSORS, SPI_CS_CPU, SPI_CS_FRONT_PANEL, etc. I find it makes the VHDL a lot more manageable if these are combined to SPI_CS (2 downto 0), with the mapping of what signal goes to what device specified by the circuit diagram. I suppose this is just preference, but maybe this sort of approach could help if you have really huge port lists.
Using a TestControl entity
A more sophisitcated approach would involve using a test control entity to implement all your stimulus. At the simplest level, this would have as ports all of the signals from the UUT you are interested in. A more sophisticated test bench would have a test control entity with interfaces that can control bus functional models that contain the actual pin wiggling required to exercise your design. You can have one file declaring this entity, say TestControl_Entity.vhd:
entity TestControl is
port (
clk : out std_logic;
UUTInput : out std_logic;
UUTOutput : in std_logic
);
Then you have one or more architecture files, for example TestControl_SimpleTest.vhd:
architecture SimpleTest of TestControl is
begin
-- Stimulus for simple test
end SimpleTest;
Your top level test bench would then look something like:
entity TestBench
end TestBench;
architecture Behavioral of TestBench is
signal clk : std_logic;
signal a : std_logic;
signal b : std_logic;
begin
-- Common processes like clock generation could go here
UUT : entity work.MyDesign (Behavioral)
port map (
clk => clk,
a => a,
b => b
);
TestControl_inst : entity work.TestControl (SimpleTest)
port map (
clk => clk,
UUTInput => a,
UUTOutput => b
);
end SimpleTest;
You can now change the test by changing the architecture selected for TestControl.
Using configurations
If you have a lot of different tests, you can use configurations to make it easier to select them. To do this, you first need to make the test control entity instantiation use a component declaration as opposed to direct instantiation. Then, at the end of each test control architecture file, create the configuration:
use work.all;
configuration Config_SimpleTest of TestBench is
for Behavioral
for TestControl_inst : TestControl
use entity work.TestControl (TestControl_SimpleTest);
end for;
end for;
end Config_SimpleTest;
Now when you want to simulate, you simulate a configuration, so instead of a command like sim TestBench, you would run something like sim work.Config_SimpleTest. This makes it easier to manage test benches with a large number of different tests, because you don't have to edit any files in order to run them.
A generic can be added to the test bench entity, to control if simple or
complex testing is done, like:
entity tb is
generic(
test : positive := 1); -- 1: Simple, 2: Complex
end entity;
library ieee;
use ieee.std_logic_1164.all;
architecture syn of tb is
-- Superset of declarations for simple and complex testing
begin
simple_g : if test = 1 generate
process is -- Simple test process
begin
-- ... Simple testing
wait;
end process;
end generate;
complex_g : if test = 2 generate
process is -- Complex test process
begin
-- ... Complex testing
wait;
end process;
end generate;
end architecture;
The drawback is that declarations can't be made conditional, so the
declarations must be a superset of the signals and other controls for both
simple and complex testing.
The simulator can control the generic value through options, for example -G
for generic control in ModelSim simulator. It is thereby possible to compile
once, and select simple or complex testing at runtime.

Combinational Logic Timing

I am currently trying to implement a data path, which calculates the following, in one clock cycle.
Takes input A and B and add them.
Shift the result of addition, one bit to right. (Dividing by 2)
Subtract the shifted result from another input C.
The behavioral architecture of the entity is simply shown below.
signal sum_out : std_logic_vector (7 downto 0);
signal shift_out : std_logic_vector (7 downto 0);
process (clock, data_in_a, data_in_b, data_in_c)
begin
if clock'event and clock = '1' then
sum_out <= std_logic_vector(unsigned(data_in_a) + unsigned(data_in_b));
shift_out <= '0' & sum_out(7 downto 1);
data_out <= std_logic_vector(unsigned(data_in_c) - unsigned(shift_out));
end if;
end process;
When I simulate the above code, I do get the result I expect to get. However, I get the result, after 3 clock cycles, instead 1 as I wish. The simulation wave form is shown below.
I am not yet familiar with implementing designs with timing concerns. I was wondering, if there are ways to achieve above calculations, in one clock cycle. If there are, how can I implement them?
Do do this with signals simply register only the last element in the chain (data_out). This analyzes, I didn't write a test bench to verify simulation.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity signal_single_clock is
port (
signal clock: in std_logic;
signal data_in_a: in std_logic_vector(7 downto 0);
signal data_in_b: in std_logic_vector(7 downto 0);
signal data_in_c: in std_logic_vector(7 downto 0);
signal data_out: out std_logic_vector(7 downto 0)
);
end entity;
architecture behave of signal_single_clock is
signal sum_out : std_logic_vector (7 downto 0);
signal shift_out : std_logic_vector (7 downto 0);
begin
sum_out <= std_logic_vector(unsigned(data_in_a) + unsigned(data_in_b));
shift_out <= '0' & sum_out(7 downto 1);
single_reg:
process (clock)
begin
if clock'event and clock = '1' then
data_out <= std_logic_vector(unsigned(data_in_c) - unsigned(shift_out));
end if;
end process;
end architecture;
When you assign a new value to a signal inside a process, this new value will be available only after the process finishes execution. Therefore, anytime you read the signal's value you will be using the original value from when the process started executing.
On the other hand, assignments to varibles take place immediately, and the new value can be used in the subsequent statements if you wish.
So, to solve you problem, simply implement sum_out, shift_out, and data_out using variables, instead of signals. Then simply copy the value of data_out to an output port of your entity.
Without using variables:
sum <= in_a + in_b;
process (clock)
begin
if rising_edge(clock) then
data_out <= in_c - ('0' & sum(7 downto 1));
end if;
end process;
All declarations except clock are unsigned(7 downto 0); why make it more complicated than that?
The original, pipelined to 3 cycles, will probably work at higher clock rates.
EDIT following comment:
I wanted to demonstrate that VHDL doesn't really have to be that verbose.
However there seem to be a lot of people "teaching" VHDL who are focussing on trivial elements and missing the big picture entirely, so I'll say a little bit about that.
VHDL is a strongly typed language, to prevent mistakes that creep in when types are mistaken for each other and (e.g.) you add two large numbers and get a negative result.
It does NOT follow from that, that you need type conversions all over the place.
Indeed, if you need a lot of type conversions, it's a sign that your design is probably wrong, and it's time to rethink that instead of ploughing ahead down the wrong path.
Code - in ANY language - should be as clean and simple as possible.
Otherwise it's hard to read, and there are probably bugs in it.
The big difference between a C-like language and VHDL is this:
In C, using the correct data types you can write sum = in_a + in_b;
and it will work. Using the wrong data types you can also write sum = in_a + in_b;
and it will compile just fine; what it actually does is another matter! The bugs are hidden : it is up to you to determine the correct types, and if you get it wrong there is very little you can do except keep on testing.
in VHDL, using the right types you can write sum <= in_a + in_b;
and using the wrong types, the compiler forces you to write something like sum <= std_logic_vector(unsigned(in_a) + unsigned(in_b)); which is damn ugly, but will (probably: see note 1) still work correctly.
So to answer the question : how do I decide to use unsigned or std_logic_vector?
I see that I need three inputs and an output. I could just make them std_logic_vectorbut I stop and ask: what do they represent?
Numbers.
Can they be negative? Not according to my reading of the specification (your question).
So, unsigned numbers... (Note 1)
Do I need non-arithmetic operations on them? Yes there's a shift.(Note 2)
So, numeric_std.unsigned which is related to std_logic_vector instead of natural which is just an integer.
Now you can't avoid type conversions altogether. Coding standards may impose restrictions such as "all top level ports must be std_logic_vector" and you must implement the external entity specification you are asked to; intermediate signals for type conversions are sometimes cleaner than the alternatives, e.g. in_a <= unsigned(data_in_a);
Or if you are getting instructions, characters and the numbers above from the same memory, for example, you might decide the memory contents must be std_logic_vector because it doesn't just contain numbers. But pick the correct place to convert type and you will find maybe 90% of the type conversions disappear. Take that as a design guideline.
(Note 1 : but what happens if C < (A+B)/2 ? Should data_out be signed? Even thinking along these lines has surfaced a likely bug that std_logic_vector left hidden...
The right answer depends on unknowns including the purpose of data_out : if it is really supposed to be unsigned, e.g. a memory address, you may want to flag an error instead of making it signed)
(Note 2 : there isn't a synthesis tool left alive that won't translate
signal a : natural; ... x <= a/2 into a shift right, so natural would also work, unless there were other reasons to choose unsigned. A lot of people seem to still be taught that integers aren't synthesisable, and that's just wrong.)

"unsigned" type conversion demands input in sequential process sensitivity list

I have an address counter in a VHDL sequential process. Its idle value is set in a configuration register to a certain max value; afterwards, anytime it enters a certain state it should increment by one.
To get the maximum value, I declare a subset of an input std_logic_vector as an alias.
I declared address_int as an unsigned variable. I then defined a sequential process with a clk and a reset in the sensitivity list. When the reset is asserted, the address counter is set to the alias value. After reset is released, the counter is rolled over/incremented on rising edges when in a certain state.
The synthesis tool gives me this message:
*WARNING:Xst:819 line 134: The following signals are missing in the process sensitivity list: DL_CADU_SIZE*
And all the address lines have become asynchronous signals! What is going on here? Is there some strange behavior with unsigned that doesn't occur with integers? I usually use integers here, but the conversion seemed more straightforward from unsigned for purposes of code maintenance. I have tried ditching the alias and doing the straight conversion, but it didn't help.
library IEEE;
use ieee.std_logic_1164.a
use ieee.numeric_std.all;
-- entity declaration, ports, architecture, etc.
signal address_int : unsigned(8 downto 0);
alias aMaxWords : std_logic_vector(8 downto 0) is DL_CADU_SIZE(10 downto 2);
begin
WADDR <= std_logic_vector(address_int);
OUT_PROC: process (CLK_CORE, RST_N_CORE)
begin
if RST_N_CORE = '0' then
address_int <= unsigned(aMaxWords);
elsif rising_edge(CLK_CORE) then
if next_state = WRITE_WORD then
if address_int = unsigned(aMaxWords) then
address_int <= (others => '0');
else
address_int <= address_int + 1;
end if;
end if; -- WRITE_WORD
end if; -- rising_edge
end process OUT_PROC;
end RTL;
This:
if RST_N_CORE = '0' then
address_int <= unsigned(aMaxWords)
describes an async reset - therefore aMaxWords will be treated as asynchronous by the synthesiser irrespective of whether it is or not.
What the synthesiser interprets your code as is "while rst_n_core is low, copy the value of aMaxWords to address_int" so if aMaxWords changes during reset, the value must be copied across. The lack of that signal in your sensitivity list means that the synthesiser is making a circuit which behaves differently to what the language says it should, hence the warning.
It really shouldn't do this: without the signal in the sensitivity list, it ought to capture the signal on the falling edge of the reset line. But as that's not how most chips work, the synthesiser designers (in their infinite wisdom) decided many years ago to assume the designer intended to have that signal in the sensitivity list, and issue a warning, rather than saying "this can't work, fix it". So then you get code which works differently in simulation and synthesis. End rant.
Your reset code:
if RST_N_CORE = '0' then
address_int <= unsigned(aMaxWords);
is wrong. The definition of reset is set your circuit to known-state. But your code assign it to a signal. You should assign it as all 0 or all 1 for reset, or your aMaxWords must be constant (note that your synthesizer may be not enough intellegent for known it, then should assign it as constant) :
if RST_N_CORE = '0' then
address_int <= (others => '0');
or
if RST_N_CORE = '0' then
address_int <= (others => '1');