How to support architecture reuse with minor differences - hardware

I need some hints about VHDL. I'm pretty new to it so be kind.
I've written a module and I've registered the output (it's a Carry Save Adder - CSA). I've used this module in some part of my design.
Now I've to use the same module, but I need to remove the output register (I need to use that in a combinational way). I know that I could copy and paste the VHDL code and use a different entity, but in my opinion it's a little bit inelegant.
I thought to use a generic parameter, but I don't know where to start. Can someone give me a hint?

Yes, add a registered output generic to the architecture. If this is true, implement the registered output. If it's false, connect the output to what would have been the input to the register. You can then instantiate it in the two different locations with the generic set differently.
This code is completely untested (not compiled even), but hopefully you get the idea.
GEN_REG : if REG_OUTPUT = true generate
p_output : process is
begin -- process p_output
wait until clk'event and clk = '1';
Q <= Q_internal;
end process p_output;
end generate GEN_REG;
GEN_WIRE : if REG_OUTPUT = false generate
Q <= Q_internal;
end generate GEN_WIRE;

Related

Lua Spaghetti Modules

I am currently developing my own programming language. The codebase (in Lua) is composed of several modules, as follows:
The first, error.lua, has no dependancies;
lexer.lua depends only on error.lua;
prototypes.lua also has no dependancies;
parser.lua, instead, depends on all the modules above;
interpreter.lua is the fulcrum of the whole codebase. It depends on error.lua, parser.lua, and memory.lua;
memory.lua depends on functions.lua;
finally, functions.lua depends on memory.lua and interpreter.lua. It is required from inside memory.lua, so we can say that memory.lua also depends on interpreter.lua.
With "A depends on B" I mean that the functions declared in A need those declared in B.
The real problem, though, is when A depends on B which depends on A, which, as you can understand from the list above, happens quite frequently in my code.
To give a concrete example of my problem, here's how interpreter.lua looks like:
--first, I require the modules that DON'T depend on interpreter.lua
local parser, Error = table.unpack(require("parser"))
--(since error.lua is needed both in the lexer, parser and interpreter module,
--I only actually require it once in lexer.lua and then pass its result around)
--Then, I should require memory.lua. But since memory.lua and
--functions.lua need some functions from interpreter.lua to work, I just
--forward declare the variables needed from those functions and then those functions themself:
--forward declaration
local globals, new_memory, my_nil, interpret_statement
--functions I need to declare before requiring memory.lua
local function interpret_block()
--uses interpret_statement and new_memory
end
local function interpret_expresion()
--uses new_memory, Error and my_nil
end
--Now I can safely require memory.lua:
globals, new_memory, my_nil = require("memory.lua")(interpret_block, interpret_espression)
--(I'll explain why it returns a function to call later)
--Then I have to fulfill the forward declaration of interpret_executement:
function interpret_executement()
--uses interpret_expression, new_memory and Error
end
--finally, the result is a function
return function()
--uses parser, new_fuction and globals
end
The memory.lua module returns a function so that it can receive interpret_block and interpret_expression as arguments, like this:
--memory.lua
return function(interpret_block, interpret_expression)
--declaration of globals, new_memory, my_nil
return globals, new_memory, my_nil
end
Now, I got the idea of the forward declarations here and that of the functions-as-modules (like in memory.lua, to pass some functions from the requiring module to the required module) here. They're all great ideas, and I must say that they work greatly. But you pay in readability.
In fact, breaking in smaller pieces the code this time made my work harder that it would have been if I coded everything in a single file, which is impossible for me because it's over than 1000 lines of code and I'm coding from a smartphone.
The feeling I have is that of working with spaghetti code, only on a larger scale.
So how could I solve the problem of my code being ununderstandable because of some modules needing each other to work (which doesn't involve making all the variables global, of course)? How would programmers in other languages solve this problem? How should I reorganize my modules? Are there any standard rules in using Lua modules that could also help me with this problem?
If we look at your lua files as a directed graph, where a vertice points from a dependency to its usage, the goal is to modify your graph to be a tree or forest, as you intend to get rid of the cycles.
A cycle is a set of nodes, which, traversed in the direction of the vertices can reach the starting node.
Now, the question is how to get rid of cycles?
The answer looks like this:
Let's consider node N and let's consider {D1, D2, ..., Dm} as its direct dependencies. If there is no Di in that set that depends on N either directly or indirectly, then you can leave N as it is. In that case, the set of problematic dependencies looks like this: {}
However, what if you have a non-empty set, like this: {PD1, ..., PDk} ?
You then need to analyze PDi for i between 1 and k along with N and see what is the subset in each PDi that does not depend on N and what is the subset of N which does not depend on any PDi. This way you can define N_base and N, PDi_base and PDi. N depends on N_base, just like all PDi elements and PDi depends on PDi_base along with N_base.
This approach minimalizes circles in the dependency tree. However, it is quite possible that a function set of {f1, ..., fl} exists in this group which cannot be migrated into _base as discussed due to dependencies and there are still cycles. In this case you need to give a name to the group in question, create a module for it and migrate all to functions into that group.

vhdl how to use an entity within a process

I'm having difficulties to understand how I could utilize a sequential logic entity in the process of another. This process is a state-machine which on each clock signal either reads values from the input, or performs calculations. These calculation take many iterations to complete. However, each iteration is supposed to utilize a sub-entity, which is defined using the same principles as the above one (two-state state-machine, clock-based iterations), to obtain some results needed in the same iteration.
As I see it, I have two options:
implementing the subentity in a separate process within the main entity and finding a way to halt the main process and sync it with the subentity execution - this would mean using the clock signal of the main entity
implementing the subentity within the process of the main entity (basically something like a function call) and finding a way to halt the main process until subentity execution completes - this seems to me hardly doable using the main clock signal
None of them seems very appealing and rather complex, so I'm asking for some experienced insight and clarification. I really hope that there is a more conventional way that I'm missing.
"Entity" is an unfortunate choice of word here, as it suggests a VHDL Entity which may or may not be what you want.
You are thinking along roughly the right lines however, but it is a little unclear what you mean by "appealing"; so your goals are unclear and that makes it difficult to help.
To take your two approaches separately :
(1) Separate processes are a valid approach to dividing up tasks. They will naturally operate in parallel. In a synchronous design (best practice, safest and simplest - not universal but you need a compelling reason to do anything else) they will normally both be clocked by the same system clock.
When you need to synchronise them, you can, using extra "handshaking" signals. Typically your main SM would start the subsystem, wait until the subsystem acknowledged, wait again until the subsystem was done, and use the result.
main_sm : process(clk)
begin
if rising_edge(clk) then
case state is
...
when start_op =>
subsystem_start <= '1';
if subsystem_busy = '1' then
state <= wait_subsystem;
end if;
when wait_subsystem <=
subsystem_start <= '0';
if subsystem_busy = '0' then
state <= use_result;
end if;
when use_result => -- carry on processing
...
end case;
end if;
end process main_sm;
It should be clear how to write the subsystem to match...
This is most useful where the subsystem processing takes a large, variable or unknown time to complete - perhaps sending characters to a UART, or a serial divider. With care, it can also allow several top level processes to access the subsystem to save hardware (obviously the subsystem handshaking logic only responds to one process at a time!)
(2) If the sub-entity is to be implemented in the process, it should be written as a subprogram, i.e. as you speculate, a procedure or function. If it is declared local to the process it has access to that process's environment; otherwise you can pass it parameters. This is simplest when the subprogram can complete within the current clock cycle; often you can structure the code so that it can.
Try the following in your synthesis tool:
main_sm : process(clk)
procedure wait_here (level : std_logic; nextstate : state_type) is
begin
subsystem_start <= level;
if subsystem_busy = level then
state <= nextstate;
end if;
end wait_here;
begin
...
when start_op =>
wait_here('1', wait_subsystem);
when wait_subsystem <=
wait_here('0', use_result);
This rewrite of the handshaking above ought to work and in some synth tools it will, but others may not provide good synthesis support for subprograms.
You can use subprograms spanning multiple clock cycles in processes in simulation; the trick is to eliminate the sensitivity list and use
wait until rising_edge(clk);
instead. This is also potentially synthesisable, and can be used e.g. in a loop in a procedure. However some synthesis tools reject it, and Xilinx XST for one is actually getting worse, rather than better, in support for it.

signal vs variable

VHDL provides two major object types to hold data, namel signal and variable, but I can't find anywhere that is clear on when to use one data-type over the other. Can anyone shed some light on their strengths/limitations/scope/synthesis/situations in which using one would be better than the other?
Signals can be used to communicate values between processes. Variables cannot. There are shared variables which can in older compilers, but you really are asking for problems (with race conditions) if you do that - unless you use protected types which are a bit like classes. Then they are same to use for communication, but not (as far as I know) synthesisable.
This fundamental restriction on communication comes from the way updates on signals and variables work.
The big distinction comes because variables update immediately they are assigned to (with the := operator). Signals have an update scheduled when assigned to (with the <= operator) but the value that anyone sees when they read the signal will not change until some time passes.
(Aside: That amount of time could be as small as a delta cycle, which is the smallest amount of time in a VHDL simuator - no "real" time passes. Something like wait for 0 ps; causes the simulator to wait for the next delta cycle before continuing.)
If you need the same logic to feed into multiple flipflops a variable is a good way of factoring that logic into a single point, rather than copying/pasting code.
In terms of logic, within a clocked process, signals always infer a flipflop. Variables can be used for both combinatorial logic and inferring a flipflop. Sometimes both for the same variable. Some think this confusing, personally, I think it's fine:
process (clk)
variable something : std_logic;
if rising_edge(clk) then
if reset = '1' then
something := '0';
else
output_b <= something or input c; -- using the previous clock's value of 'something' infers a register
something := input_a and input_b; -- comb. logic for a new value
output_a <= something or input_c; -- which is used immediately, not registered here
end if;
end if;
end process;
One thing to watch using variables is that because if they are read after they are written, no register output is used, you can get long chains of logic which can lead to missing your fmax target
One thing to watch using signals (in clocked processes) is that they always infer a register, and hence leads to latency.
As others have said signals get updated with their new value at the end of the time slice, but variables are updated immediately.
// inside some process
// varA = sigA = 0. sigB = 2
varA := sigB + 1; // varA is now 3
sigC <= varA + 1; // sigC will be 4
sigA <= sigB + 1; // sigA will be 3
sigD <= sigA + 1; // sigD will be 1 (original sigA + 1)
For hardware design, I use variables very infrequently. It's normally when I'm hacking in some feature that really needs the code to be re-factored, but I'm on a deadline. I avoid them because I find the mental model of working with signals and variables too different to live nicely in one piece of code. That's not to say it can't be done, but I think most RTL engineers avoid mixing... and you can't avoid signals.
Other points:
Signals have entity scoping. Variables are local to the process.
Both synthesize

What is the standard way to optimise mutual recursion in F#/Scala?

These languages do not support mutually recursive functions optimization 'natively', so I guess it must be trampoline or.. heh.. rewriting as a loop) Do I miss something?
UPDATE: It seems that I did lie about FSharp, but I just didn't see an example of mutual tail-calls while googling
First of all, F# supports mutually recursive functions natively, because it can benefit from the tailcall instruction that's available in the .NET IL (MSDN). However, this is a bit tricky and may not work on some alternative implementations of .NET (e.g. Compact Frameworks), so you may sometimes need to deal with this by hand.
In general, I that there are a couple of ways to deal with it:
Trampoline - throw an exception when the recursion depth is too high and implement a top-level loop that handles the exception (the exception would carry information to resume the call). Instead of exception you can also simply return a value specifying that the function should be called again.
Unwind using timer - when the recursion depth is too high, you create a timer and give it a callback that will be called by the timer after some very short time (the timer will continue the recursion, but the used stack will be dropped).
The same thing could be done using a global stack that stores the work that needs to be done. Instead of scheduling a timer, you would add function to the stack. At the top-level, the program would pick functions from the stack and run them.
To give a specific example of the first technique, in F# you could write this:
type Result<´T> =
| Done of ´T
| Call of (unit -> ´T)
let rec factorial acc n =
if n = 0 then Done acc
else Call(fun () -> factorial (acc * n) (n + 1))
This can be used for mutually recursive functions as well. The imperative loop would simply call the f function stored in Call(f) until it produces Done with the final result. I think this is probably the cleanest way to implement this.
I'm sure there are other sophisticated techniques for dealing with this problem, but those are the two I know about (and that I used).
On Scala 2.8, scala.util.control.TailCalls:
import scala.util.control.TailCalls._
def isEven(xs: List[Int]): TailRec[Boolean] = if (xs.isEmpty)
done(true)
else
tailcall(isOdd(xs.tail))
def isOdd(xs: List[Int]): TailRec[Boolean] = if (xs.isEmpty)
done(false)
else
tailcall(isEven(xs.tail))
isEven((1 to 100000).toList).result
Just to have the code handy for when you Bing for F# mutual recursion:
let rec isOdd x =
if x = 1 then true else isEven (x-1)
and isEven x =
if x = 0 then true else isOdd (x-1)
printfn "%A" (isEven 10000000)
This will StackOverflow if you compile without tail calls (the default in "Debug" mode, which preserves stacks for easier debugging), but run just fine when compiled with tail calls (the default in "Release" mode). The compiler does tail calls by default (see the --tailcalls option), and .NET implementations on most platforms honor it.

'if' vs 'when' for making multiplexer

i have been told to use 'when' statement to make multiplexer but not use 'if' statement as it will cause timing errors...
i don't understand this ...
so what is the difference between 'if' and 'when' ? and do they map to the same thing in hardware ?
OK, lets discuss some points at first on the difference between if and when statements:
Both are called Dataflow Design Elements.
when statement
concurrent statement
not used in process, used only in architecture as process is sequential execution
if statement
sequential statement
used in process as it is sequential statement, and not used outside the process
And you know multiplexer is a component don't need process block, as its behavior doesn't change with changing its input, so it will be outside process, so you have to write it using when statement as it is concurrent statement.. And if you wrote it with if statement, timing errors may occur. Also all the references and also Xilinx help (if you are using Xilinx) are writing the Multiplexer block using when statement not if statement
Reference: Digital Design Priciples & Practices, John F. Wakerly, 3rd Edition
See these:
VHDL concurrent statements, which includes when.
VHDL sequential statements, which includes if.
Basically, if is sequential, and when is concurrent. They do not map to the same thing in hardware... This page describes, at the bottom, some of the special considerations needed to synthesize an if statement.
Both coding styles are totally valid.
Let's recall some elements. Starting from HDL, synthesis is done in two main steps :
first, the VHDL is analyzed in order to detect RTL templates (consisting in RTL elements : flip-flops, arithmetic expressions, multiplexers , control logic ). We say that these elements are "infered" (i.e you must code using the right template to get what you wanted initially. You must imagine how these elements are connected, before coding ).
The second step is real logic synthesis, that takes a particular target technology parameters into account (types of gates available, timing, area, power).
These two steps clearly separates RTL functional needs (steering logic, computations) from technology contingencies (timing etc).
Let's come back to the first step (RTL) :
Concerning multiplexers, several coding styles are possible :
using concurrent assignement :
y<= a1 when cond1 else a2 when cond2 else cond3;
using if statement within a process :
process(a1,a2,a3,cond1,cond2)
begin
if(cond1) then
y<=a1;
elsif(cond2) then
y<=a2;
else
y<=a3;
end if;
end;
using another concurrent assignment
form, suitable for generic
descriptions : if sel is an integer
and muxin an array of signals, then :
muxout <= muxin(sel); --will infer a mux
Note that the 3 coding styles always work. Note also that they are "a bit more" than simple multiplexer as the coding style force the presence of a priority encoding (if elsif, when else), which is not the case of a simple equation-based multiplexer, really symmetric.
using a case statement
process(a1,a2,a3,cond1,cond2)
variable cond : std_logic(1 downto 0);
begin
cond := cond2 & cond1;
case cond is
when "01" => y<= a1;
when "10" => y<= a2;
when others => y<=a3;
end case;
end;
using a select statement (in our
example, two concurrent assignements
needed) :
sel <= cond2 & cond1;
WITH sel SELECT
y <= a1 WHEN "01",
a2 WHEN "10",
a3 WHEN OTHERS;
A final remark is about the rising of abstraction, even for RTL design : the synthesizers are now really mature. Have a look at Jiri Gaisler coding styles for LEON2 open source processor for example, as well as his coding styles (see here). He prones a very different approach, yet perfectly valid, from classical books.
You should always understand what the RTL synthesizer will infer.
In the contrary, behavioral synthesis allows you to forget (partially) what the synthesizer will infer. But that's another story.