One problem I've seen again and again on different VHDL projects is that the top-level testbenches are always large and difficult to keep organized. There is basically a main test process where EVERY test signal is controlled or validated from, which becomes HUGE over time. I know that you can make testbenches for the lower-level components, but this question mainly applies to top-level input/output tests.
I'd like to have some kind of hierarchy structure to keep things organized. I've tried implementing VHDL procedures, but the compiler was very unhappy because it thought I was trying to assign signals from different sections of code...
Is there anything available in VHDL to achieve the behavior of c programming's inline-function or #define preprocessor replacement macros? If not, what can you suggest? It would make me happy to be able to have my top-level test bench look like this:
testClockSignals();
testDigitialIO();
testDACSignals();
...
Having the implementation of these functions in a separate file would be icing on the cake. Haha...I'd just like to write and simulate the test benches in C.
It is a VHDL requirement that the either you write the procedures in the process (as #MortenZdk suggests) or you pass all the IO to it.
My preference is to put my procedures only in packages, so I use the pass all IO approach. To simplify what is passed, I use records. If you reduce it to one record, it will be inout and require resolution functions on the elements of the record.
For more ideas on this approach, goto: http://www.synthworks.com/papers/ and see the papers titled:
"Accelerating Verification Through Pre-Use ..." (near the bottom) and
" VHDL Testbench Techniques that Leapfrog SystemVerilog" (at the top)
Another key aspect is to use a separate process for each independent interface. This way stimulus can be generated concurrently for different interfaces. This is also illustrated in the papers.
Separating test bench code in manageable procedures is possible, but maybe the
compiler complained because a procedure tries to access signals that were not
in scope ? If a procedure is to controls a signal that is not in scope, then
the signal can be given as argument to the procedure, as shown for the
procReset example below.
A test bench structure, with multiple levels for easier maintenance, is shown
below:
--==========================================================
-- Reusable procedures
-- Reset generation
procedure procReset(signal rst : out std_logic; ...) is
...
--==========================================================
-- Main test control procedure in test bench
process is
------------------------------------------------------------
-- General control and status
-- Reset device under test and related test bench modules
procedure genReset is
begin
procReset(rst, 100 ns); -- procReset declared elsewhere
-- Other code as required for complete reset
end procedure;
------------------------------------------------------------
-- Test cases
procedure testClockSignals is
begin
genReset; -- Apply reset to avoid test case interdependency
-- Test code here, and call genErr if mismatch detected
end procedure;
procedure testDigitialIO is
begin
genReset; -- Apply reset to avoid test case interdependency
-- Test code here, and call genErr if mismatch detected
end procedure;
procedure testDACSignals is
begin
genReset; -- Apply reset to avoid test case interdependency
-- Test code here, and call genErr if mismatch detected
end procedure;
begin
------------------------------------------------------------
-- Run test cases
testClockSignals;
testDigitialIO;
testDACSignals;
-- End of simulation
std.env.stop(0);
wait;
end process;
There are several levels in the structure:
Run test cases: Where the procedures for each test case is
called. It is thereby possible to comment out one or
more of the test cases during development and debugging.
Test cases: Test test case code itself, which is written as
separate and independent procedures. Interdependence between
run of the different test cases is avoided by reset (using
genReset procedure) of the device under test and related test
bench support modules.
General control and status: Reusable test bench specific
procedure, for example reset of device under test and test
bench support modules.
Reusable procedures: Does not control or use test bench
signals directly, but only through procedure arguments. These
procedures may be located in packages (other files) for reuse
in other test benches.
The test bench file may still be quite a number of lines, since all the test
case code still have to be in the same file with the above approach, if this
test bench code need direct access to test bench signals in order to control or
check the signals values. If signal values can be passed to test case
procedures through arguments, as done for the procReset call, then it is
possible to move the test case code to another package.
If you have lower level testbenches for each block, then you can make use of them at the top level.
By making the key lower level test elements entities in their own right, you can compose them into higher level test items which are often just a small shim to convert the pin-level data into the test-level data you were originally using.
For example, in an image processing FPGA, you would have some image-sourcing and data-checking code to check out the algorithmic parts. These could be used as is, or with some wrapping to provide the data to the top-level FPGA pins, and then decode the pin outputs back to the format that the original checking code requires.
The register setup code that was no doubt tested at the lower level, can be wrapped in some more code with wiggles the FPGA pins appropriately and interprets the pin-wiggling results.
The combination of the two sets of code allows you to check the end-to-end function of the image processing pipeline and the register configuration of that pipeline.
Related
I am currently developing my own programming language. The codebase (in Lua) is composed of several modules, as follows:
The first, error.lua, has no dependancies;
lexer.lua depends only on error.lua;
prototypes.lua also has no dependancies;
parser.lua, instead, depends on all the modules above;
interpreter.lua is the fulcrum of the whole codebase. It depends on error.lua, parser.lua, and memory.lua;
memory.lua depends on functions.lua;
finally, functions.lua depends on memory.lua and interpreter.lua. It is required from inside memory.lua, so we can say that memory.lua also depends on interpreter.lua.
With "A depends on B" I mean that the functions declared in A need those declared in B.
The real problem, though, is when A depends on B which depends on A, which, as you can understand from the list above, happens quite frequently in my code.
To give a concrete example of my problem, here's how interpreter.lua looks like:
--first, I require the modules that DON'T depend on interpreter.lua
local parser, Error = table.unpack(require("parser"))
--(since error.lua is needed both in the lexer, parser and interpreter module,
--I only actually require it once in lexer.lua and then pass its result around)
--Then, I should require memory.lua. But since memory.lua and
--functions.lua need some functions from interpreter.lua to work, I just
--forward declare the variables needed from those functions and then those functions themself:
--forward declaration
local globals, new_memory, my_nil, interpret_statement
--functions I need to declare before requiring memory.lua
local function interpret_block()
--uses interpret_statement and new_memory
end
local function interpret_expresion()
--uses new_memory, Error and my_nil
end
--Now I can safely require memory.lua:
globals, new_memory, my_nil = require("memory.lua")(interpret_block, interpret_espression)
--(I'll explain why it returns a function to call later)
--Then I have to fulfill the forward declaration of interpret_executement:
function interpret_executement()
--uses interpret_expression, new_memory and Error
end
--finally, the result is a function
return function()
--uses parser, new_fuction and globals
end
The memory.lua module returns a function so that it can receive interpret_block and interpret_expression as arguments, like this:
--memory.lua
return function(interpret_block, interpret_expression)
--declaration of globals, new_memory, my_nil
return globals, new_memory, my_nil
end
Now, I got the idea of the forward declarations here and that of the functions-as-modules (like in memory.lua, to pass some functions from the requiring module to the required module) here. They're all great ideas, and I must say that they work greatly. But you pay in readability.
In fact, breaking in smaller pieces the code this time made my work harder that it would have been if I coded everything in a single file, which is impossible for me because it's over than 1000 lines of code and I'm coding from a smartphone.
The feeling I have is that of working with spaghetti code, only on a larger scale.
So how could I solve the problem of my code being ununderstandable because of some modules needing each other to work (which doesn't involve making all the variables global, of course)? How would programmers in other languages solve this problem? How should I reorganize my modules? Are there any standard rules in using Lua modules that could also help me with this problem?
If we look at your lua files as a directed graph, where a vertice points from a dependency to its usage, the goal is to modify your graph to be a tree or forest, as you intend to get rid of the cycles.
A cycle is a set of nodes, which, traversed in the direction of the vertices can reach the starting node.
Now, the question is how to get rid of cycles?
The answer looks like this:
Let's consider node N and let's consider {D1, D2, ..., Dm} as its direct dependencies. If there is no Di in that set that depends on N either directly or indirectly, then you can leave N as it is. In that case, the set of problematic dependencies looks like this: {}
However, what if you have a non-empty set, like this: {PD1, ..., PDk} ?
You then need to analyze PDi for i between 1 and k along with N and see what is the subset in each PDi that does not depend on N and what is the subset of N which does not depend on any PDi. This way you can define N_base and N, PDi_base and PDi. N depends on N_base, just like all PDi elements and PDi depends on PDi_base along with N_base.
This approach minimalizes circles in the dependency tree. However, it is quite possible that a function set of {f1, ..., fl} exists in this group which cannot be migrated into _base as discussed due to dependencies and there are still cycles. In this case you need to give a name to the group in question, create a module for it and migrate all to functions into that group.
I wrote a VHDL Testbench which contains the following :
Lots of signal declarations
UUT instantiations / port maps
A huge amount of one-line concurrent assignments
Various small processes
One main (big) process which actually stimulates the UUT.
Everything is fine except the fact that I want to have two distinct types of stimulation (let's say a simple stimulus and a more complex one) so what I did is I created two testbenches which have everything in common except the main big process.
But I don't find it really convenient since I always need to update both when, for example, I make a change to the UUT port map. Not cool.
I don't really want to merge my two main process because it will look like hell and I can't have the two process declared concurrently in the same architecture (I might end up with a very long file and I don't like that they can theoretically access the same signals).
So I would really like to keep a "distinct files" approach but only for that specific process. Is there a way out of this or am I doomed?
This seems like an example where using multiple architectures of the same entity would help. You have a file along the lines of:
entity TestBench
end TestBench;
architecture SimpleTest of TestBench is
-- You might have a component declaration for the UUT here
begin
-- Test bench code here
end SimpleTest;
You can easily add another architecture. You can have architectures in separate files. You can also use direct entity instantiation to avoid the component declaration for the UUT (halving the work required if the UUT changes):
architecture AnotherTest of TestBench is
begin
-- Test bench code here
UUT : entity work.MyDesign (Behavioral)
port map (
-- Port map as usual
);
end AnotherTest ;
This doesn't save having duplicate code, but at least it removes one of the port map lists.
Another point if you have a lot of signals in your UUT port map, is that this can be easier if you try to make more of the signals into vectors. For example, you might have lots of serial outputs of the same type going to different chips on the board. I have seen lots of people will name these like SPI_CS_SENSORS, SPI_CS_CPU, SPI_CS_FRONT_PANEL, etc. I find it makes the VHDL a lot more manageable if these are combined to SPI_CS (2 downto 0), with the mapping of what signal goes to what device specified by the circuit diagram. I suppose this is just preference, but maybe this sort of approach could help if you have really huge port lists.
Using a TestControl entity
A more sophisitcated approach would involve using a test control entity to implement all your stimulus. At the simplest level, this would have as ports all of the signals from the UUT you are interested in. A more sophisticated test bench would have a test control entity with interfaces that can control bus functional models that contain the actual pin wiggling required to exercise your design. You can have one file declaring this entity, say TestControl_Entity.vhd:
entity TestControl is
port (
clk : out std_logic;
UUTInput : out std_logic;
UUTOutput : in std_logic
);
Then you have one or more architecture files, for example TestControl_SimpleTest.vhd:
architecture SimpleTest of TestControl is
begin
-- Stimulus for simple test
end SimpleTest;
Your top level test bench would then look something like:
entity TestBench
end TestBench;
architecture Behavioral of TestBench is
signal clk : std_logic;
signal a : std_logic;
signal b : std_logic;
begin
-- Common processes like clock generation could go here
UUT : entity work.MyDesign (Behavioral)
port map (
clk => clk,
a => a,
b => b
);
TestControl_inst : entity work.TestControl (SimpleTest)
port map (
clk => clk,
UUTInput => a,
UUTOutput => b
);
end SimpleTest;
You can now change the test by changing the architecture selected for TestControl.
Using configurations
If you have a lot of different tests, you can use configurations to make it easier to select them. To do this, you first need to make the test control entity instantiation use a component declaration as opposed to direct instantiation. Then, at the end of each test control architecture file, create the configuration:
use work.all;
configuration Config_SimpleTest of TestBench is
for Behavioral
for TestControl_inst : TestControl
use entity work.TestControl (TestControl_SimpleTest);
end for;
end for;
end Config_SimpleTest;
Now when you want to simulate, you simulate a configuration, so instead of a command like sim TestBench, you would run something like sim work.Config_SimpleTest. This makes it easier to manage test benches with a large number of different tests, because you don't have to edit any files in order to run them.
A generic can be added to the test bench entity, to control if simple or
complex testing is done, like:
entity tb is
generic(
test : positive := 1); -- 1: Simple, 2: Complex
end entity;
library ieee;
use ieee.std_logic_1164.all;
architecture syn of tb is
-- Superset of declarations for simple and complex testing
begin
simple_g : if test = 1 generate
process is -- Simple test process
begin
-- ... Simple testing
wait;
end process;
end generate;
complex_g : if test = 2 generate
process is -- Complex test process
begin
-- ... Complex testing
wait;
end process;
end generate;
end architecture;
The drawback is that declarations can't be made conditional, so the
declarations must be a superset of the signals and other controls for both
simple and complex testing.
The simulator can control the generic value through options, for example -G
for generic control in ModelSim simulator. It is thereby possible to compile
once, and select simple or complex testing at runtime.
I am trying to create a module for carry select adder in verilog. Everything works fine except the following portion where it is causing compilation error.
module csa(a,b,s,cout);
input[15:0] a,b;
output [15:0] s;
output cout;
wire zero_c1, zero_c2,zero_c3,zero_c4,zero_c5;
wire one_c1, one_c2,one_c3,one_c4,one_c5;
wire temp_c1,temp_c2,temp_c3,temp_c4,temp_c5;
wire [15:0] s_zero, s_one;
initial
begin
fork
fa(s[0], temp_c1,a[0],b[0],0);
fa_one(s_zero[1],s_one[1],zero_c1,one_c1,a[1],b[1]);
fa_two(s_zero[3:2],s_one[3:2],zero_c2,one_c2,a[3:2],b[3:2]);
fa_three(s_zero[6:4],s_one[6:4],zero_c3,one_c3,a[6:4],b[6:4]);
fa_four(s_zero[10:7],s_one[10:7],zero_c4,one_c4,a[10:7],b[10:7]);
fa_five(s_zero[15:11],s_one[15:11],zero_c5,one_c5,a[15:11],b[15:11]);
join
end
When I try to compile that it says -
the module "fa", "fa_one" are not a task or void function
I deleted the "initial" statement and now it says -
Syntax error near "fork", expecting "endmodule"
I just want to run the code between join and fork in parallel. I have also confirmed that the module fa, fa_one works fine.
Would appreciate if anyone can help me pointing out what I am doing wrong here. Thanks.
Verilog modules are not run or executed but instantiated, they represent physical blocks of hardware.
Everything is in parallel unless you have made effort to time share pieces of hardware. For example you might write an ALU core, which exists only once but use a program ROM to tell it which instruction to process every clockcycle.
Inside your modules you can have combinatorial code and sequential code.
Combinatorial logic will simulate in 0 time but will actually take some time for values to propagate through when placed on real devices.
If this propagation delay is not thought about and very large blocks of logic are created you will struggle to close timing on synthesis, due to the settling time through the logic being greater than the clock speed either side of the combinatorial logic.
Sequential logic implies that the results are held in flip-flops, which only update on clock edges. This means chains of sequential logic can take many clock cycles for data to propagate.
When pipelining a processor you break individual section up with flip-flops giving each section a full clock cycle for combinatorial propagation, at the expense of taking several clock cycles to calculate a single result.
To correct your example you would just have:
module csa(
input [15:0] a,
input [15:0] b,
output [15:0] s,
output cout
);
wire zero_c1, zero_c2,zero_c3,zero_c4,zero_c5;
wire one_c1, one_c2,one_c3,one_c4,one_c5;
wire temp_c1,temp_c2,temp_c3,temp_c4,temp_c5;
wire [15:0] s_zero, s_one;
fa ufa(s[0], temp_c1,a[0],b[0],0);
fa_one ufa_one(s_zero[1],s_one[1],zero_c1,one_c1,a[1],b[1]);
fa_two ufa_two(s_zero[3:2],s_one[3:2],zero_c2,one_c2,a[3:2],b[3:2]);
fa_three ufa_three(s_zero[6:4],s_one[6:4],zero_c3,one_c3,a[6:4],b[6:4]);
fa_four ufa_four(s_zero[10:7],s_one[10:7],zero_c4,one_c4,a[10:7],b[10:7]);
fa_five ufa_five(s_zero[15:11],s_one[15:11],zero_c5,one_c5,a[15:11],b[15:11]);
endmodule
NB: it is module_name #(parameters) instance_name ( ports );
fork is used to run procedural statements within a module in parallel. Separate module instances always run in parallel.
Child modules are instantiated directly within their parent module, not within an initial, begin, or fork which are used for procedural statements. So you can remove the initial, begin, fork, join, and end, and add an endmodule at the end.
I'm having difficulties to understand how I could utilize a sequential logic entity in the process of another. This process is a state-machine which on each clock signal either reads values from the input, or performs calculations. These calculation take many iterations to complete. However, each iteration is supposed to utilize a sub-entity, which is defined using the same principles as the above one (two-state state-machine, clock-based iterations), to obtain some results needed in the same iteration.
As I see it, I have two options:
implementing the subentity in a separate process within the main entity and finding a way to halt the main process and sync it with the subentity execution - this would mean using the clock signal of the main entity
implementing the subentity within the process of the main entity (basically something like a function call) and finding a way to halt the main process until subentity execution completes - this seems to me hardly doable using the main clock signal
None of them seems very appealing and rather complex, so I'm asking for some experienced insight and clarification. I really hope that there is a more conventional way that I'm missing.
"Entity" is an unfortunate choice of word here, as it suggests a VHDL Entity which may or may not be what you want.
You are thinking along roughly the right lines however, but it is a little unclear what you mean by "appealing"; so your goals are unclear and that makes it difficult to help.
To take your two approaches separately :
(1) Separate processes are a valid approach to dividing up tasks. They will naturally operate in parallel. In a synchronous design (best practice, safest and simplest - not universal but you need a compelling reason to do anything else) they will normally both be clocked by the same system clock.
When you need to synchronise them, you can, using extra "handshaking" signals. Typically your main SM would start the subsystem, wait until the subsystem acknowledged, wait again until the subsystem was done, and use the result.
main_sm : process(clk)
begin
if rising_edge(clk) then
case state is
...
when start_op =>
subsystem_start <= '1';
if subsystem_busy = '1' then
state <= wait_subsystem;
end if;
when wait_subsystem <=
subsystem_start <= '0';
if subsystem_busy = '0' then
state <= use_result;
end if;
when use_result => -- carry on processing
...
end case;
end if;
end process main_sm;
It should be clear how to write the subsystem to match...
This is most useful where the subsystem processing takes a large, variable or unknown time to complete - perhaps sending characters to a UART, or a serial divider. With care, it can also allow several top level processes to access the subsystem to save hardware (obviously the subsystem handshaking logic only responds to one process at a time!)
(2) If the sub-entity is to be implemented in the process, it should be written as a subprogram, i.e. as you speculate, a procedure or function. If it is declared local to the process it has access to that process's environment; otherwise you can pass it parameters. This is simplest when the subprogram can complete within the current clock cycle; often you can structure the code so that it can.
Try the following in your synthesis tool:
main_sm : process(clk)
procedure wait_here (level : std_logic; nextstate : state_type) is
begin
subsystem_start <= level;
if subsystem_busy = level then
state <= nextstate;
end if;
end wait_here;
begin
...
when start_op =>
wait_here('1', wait_subsystem);
when wait_subsystem <=
wait_here('0', use_result);
This rewrite of the handshaking above ought to work and in some synth tools it will, but others may not provide good synthesis support for subprograms.
You can use subprograms spanning multiple clock cycles in processes in simulation; the trick is to eliminate the sensitivity list and use
wait until rising_edge(clk);
instead. This is also potentially synthesisable, and can be used e.g. in a loop in a procedure. However some synthesis tools reject it, and Xilinx XST for one is actually getting worse, rather than better, in support for it.
i have been told to use 'when' statement to make multiplexer but not use 'if' statement as it will cause timing errors...
i don't understand this ...
so what is the difference between 'if' and 'when' ? and do they map to the same thing in hardware ?
OK, lets discuss some points at first on the difference between if and when statements:
Both are called Dataflow Design Elements.
when statement
concurrent statement
not used in process, used only in architecture as process is sequential execution
if statement
sequential statement
used in process as it is sequential statement, and not used outside the process
And you know multiplexer is a component don't need process block, as its behavior doesn't change with changing its input, so it will be outside process, so you have to write it using when statement as it is concurrent statement.. And if you wrote it with if statement, timing errors may occur. Also all the references and also Xilinx help (if you are using Xilinx) are writing the Multiplexer block using when statement not if statement
Reference: Digital Design Priciples & Practices, John F. Wakerly, 3rd Edition
See these:
VHDL concurrent statements, which includes when.
VHDL sequential statements, which includes if.
Basically, if is sequential, and when is concurrent. They do not map to the same thing in hardware... This page describes, at the bottom, some of the special considerations needed to synthesize an if statement.
Both coding styles are totally valid.
Let's recall some elements. Starting from HDL, synthesis is done in two main steps :
first, the VHDL is analyzed in order to detect RTL templates (consisting in RTL elements : flip-flops, arithmetic expressions, multiplexers , control logic ). We say that these elements are "infered" (i.e you must code using the right template to get what you wanted initially. You must imagine how these elements are connected, before coding ).
The second step is real logic synthesis, that takes a particular target technology parameters into account (types of gates available, timing, area, power).
These two steps clearly separates RTL functional needs (steering logic, computations) from technology contingencies (timing etc).
Let's come back to the first step (RTL) :
Concerning multiplexers, several coding styles are possible :
using concurrent assignement :
y<= a1 when cond1 else a2 when cond2 else cond3;
using if statement within a process :
process(a1,a2,a3,cond1,cond2)
begin
if(cond1) then
y<=a1;
elsif(cond2) then
y<=a2;
else
y<=a3;
end if;
end;
using another concurrent assignment
form, suitable for generic
descriptions : if sel is an integer
and muxin an array of signals, then :
muxout <= muxin(sel); --will infer a mux
Note that the 3 coding styles always work. Note also that they are "a bit more" than simple multiplexer as the coding style force the presence of a priority encoding (if elsif, when else), which is not the case of a simple equation-based multiplexer, really symmetric.
using a case statement
process(a1,a2,a3,cond1,cond2)
variable cond : std_logic(1 downto 0);
begin
cond := cond2 & cond1;
case cond is
when "01" => y<= a1;
when "10" => y<= a2;
when others => y<=a3;
end case;
end;
using a select statement (in our
example, two concurrent assignements
needed) :
sel <= cond2 & cond1;
WITH sel SELECT
y <= a1 WHEN "01",
a2 WHEN "10",
a3 WHEN OTHERS;
A final remark is about the rising of abstraction, even for RTL design : the synthesizers are now really mature. Have a look at Jiri Gaisler coding styles for LEON2 open source processor for example, as well as his coding styles (see here). He prones a very different approach, yet perfectly valid, from classical books.
You should always understand what the RTL synthesizer will infer.
In the contrary, behavioral synthesis allows you to forget (partially) what the synthesizer will infer. But that's another story.