what is the difference between synthesis and simulation (VHDL) - process

Im am working on a VHDL project that includes an fsm.
Some states change according to a counter. It dit not work until i put 'clk' in the sensitivity list, besides the current state and the input.
I know that during synthesis, the sensitivity not used, or discarded. But how can that have such an impact on the result in the simulation? if a leave this 'clk', would the fsm perform as i want op an FPGA?
thanks,
David

This is the simple explanation:
The simulator uses the sensitivity list to figure out when it needs to run the process. The reason why the simulator needs hints to figure out when to run the process is because computer processors can only do one (or only a few in multicore systems) thing at a time and the processor will have to take turns running each part of your design. The sensitivity list allows simulation to run in a reasonable time frame.
When you synthesize code into an ASIC or FPGA, the process is always "running" since it has dedicated hardware.
When you simulate a state machine without the clock in the sensitivity list, the process will never run on the clock edges, but only on changes to your input. If you have the state transition implemented as a flip flop (if clk'event and clk = '1') then your state transition will never be simulated unless you happen to change your input at the same time as the clock's rising edge.
You should probably leave the clock in the sensitivity list, assuming the FSM changes on clock edges.
Also, try to proofread your questions.

Synthesis tools focus on logic design (FPGA, ASIC) and ignore sensitivity lists because there are only three basic types of logic: Combinational logic, edge sensitive storage (flip-flops and some RAM), and level sensitive storage (latches and some RAM).
Combinational logic requires all input signals to be on the sensitivity list. From a synthesis tool perspective, if one is missing, they can either ignore the sensitivity list and treat it as if all inputs were on the sensitivity list, or produce some complicated combination of flip-flops and combinational logic that probably will not do what the user wanted anyway. Both of these have an implementation cost to the vendor, hence, why invest money (development time) to create something that is not useful. As a result, the only good investment is to simplify and ignore the sensitivity list.
Simulators on the other hand, have a bigger perspective than just logic design. The language defines sensitivity lists as to indicate when the code should run. So simulators implement that semantic with a high fidelity.
Long term it may make you happy to know that VHDL-2008 allows the keyword "all" to be used in a sensitivity list to replace the signal inputs there. This is intended to simplify the modeling of combinational logic. Syntax is as follows:
MyStateMachine : process(all)
begin
-- my statemachine logic
end process MyStateMachine ;
Turn on the VHDL-2008 switch and it out in your synthesis tool. If it does not work, be sure to submit a bug against the tool.

Related

VHDL optimization tips [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am quite new in VHDL, and by using different IP cores (by different providers) can see that sometimes they differ massively as per the space that they occupy or timing constraints.
I was wondering if there are rules of thumb for optimization in VHDL (like there are in C, for example; unroll your loops, etc.).
Is it related to the synthesis tools I am using (like the different compilers are using other methods of optimization in C, so you need to learn to read the feedback asm files they return), or is it dependent on my coding skills?
Is it related to the synthesis tools I am using (like the different compilers are using other methods of optimization in C, so you need to learn to read the feedback asm files they return), or is it dependent on my coding skills?
The answer is "yes." When you are coding in HDL, you are actually describing hardware (go figure). Instead of the code being converted into machine code (like it would with C) it is synthesized to logical functions (AND, NOT, OR, XOR, etc) and memory elements (RAM, ROM, FF...).
VHDL can be used in many different ways. You can use VHDL in a purely structural sense where at the base level you are calling our primitives of the underlying technology that you are targeting. For example, you literally instantiate every AND, OR, NOT, and Flip Flop in your design. While this can give you a lot of control, it is not an efficient use of time in 99% of cases.
You can also implement hardware using behavioral constructs with VHDL. Instead of explicitly calling out each logic element, you describe a function to be implemented. For example, if this, do this, otherwise, do something else. You can describe state machines, math operations, and memories all in a behavioral sense. There are huge advantages to describing hardware in a behavioral sense:
Easier for humans to understand
Easier for humans to maintain
More portable between synthesis tools and target hardware
When using behavioral constructs, knowing your synthesis tool and your target hardware can help in understanding how what you write will actually be implemented. For example, if you describe a memory element with a asynchronous reset the implementation in hardware will be different for architectures with a dedicated asynchronous reset input to the memory element and one without.
Synthesis tools will generally publish in their reference manual or user guide a list of suggested HDL constructs to use in order to obtain some desired implementation result. For basic cases, they will be what you would expect. For more elaborate behavior models (e.g. a dual port RAM) there may be some form that you need to follow for the tool to "recognize" what you are describing.
In summary, for best use of your target device:
Know the device you are targeting. How are the programmable elements laid out? How many inputs and outputs are there from lookup tables? Read the device user manual to find out.
Know your synthesis engine. What types of behavioral constructs will be recognized and how will they be implemented? Read the synthesis tool user guide or reference manual to find out. Additionally, experiment by synthesizing small constructs to see how it gets implemented (via RTL or technology viewer, if available).
Know VHDL. Understand the differences between signals and variables. Be able to recognize statements that will generate many levels of logic in your design.
I was wondering if there are rules of thumb for optimization in VHDL
Now that you know the hardware, synthesis tool, and VHDL... Assuming you want to design for maximum performance, the following concepts should be adhered to:
Pipeline, pipeline, pipeline. The more levels of logic you have between synchronous elements, the more difficulty you are going to have making your timing constraint/goal.
Pipeline some more. Having additional stages of registers can provide additional wiggle-room in the future if you need to add more processing steps to your algorithm without affecting the overall latency/timeline.
Be careful when operating on the boundaries of the normal fabric. For example, if interfacing with an IO pin, dedicated multiplies, or other special hardware, you will take more significant timing hits. Additional memory elements should be placed here to avoid critical paths forming.
Review your synthesis and implementation reports frequently. You can learn a lot from reviewing these frequently. For example, if you add a new feature, and your timing takes a hit, you just introduced a critical path. Why? How can you alleviate this issue?
Take care with your "global" structures -- such as resets. Logic that must be widely distributed in your design deserves special care, since it needs to reach across your whole device. You may need special pipeline stages, or timing constraints on this type of logic. If at all possible, avoid "global" structures, unless truly a requirement.
While synthesis tools have design goals to focus on area, speed or power, the designer's choices and skills is the major contributor for the quality of the output. A designer should have a goal to maximize speed or minimize area and it will greatly influence his choices. A design optimized for speed can be made smaller by asking the tool to reduce the area, but not nearly as much as the same design thought for area in the first place.
However, it is more complicated than that. IP cores often target several FPGA technologies as well as ASIC. This can only be achieved by using general VHDL constructs, (or re-writing the code for each target, which non-critical IP providers don't do). FPGA and ASIC vendor have primitives that will improve speed/area when used, but if you write code to use a primitive for a technology, it doesn't mean that the resulting code will be optimized if you change the technology. Both Xilinx and Altera have DSP blocks to speed multiplication and whatnot, but they don't work exactly the same and writing code that uses the full potential of both is very challenging.
Synthesis tools are notorious for doing exactly what you ask them to, even if a more optimized solution is simple, for example:
a <= (x + y) + z; -- This is a 2 cascaded 2-input adder
b <= x + y + z; -- This is a 3-input adder
Will likely lead a different path from xyz to b/c. In the end, the designer need to know what he wants, and he has to verify that the synthesis tool understands his intent.

Synopsys: Repeated compiles produce different results. How to automate iterated compile?

I'm new to using Design Compiler. In the past, I've done mostly FPGA work. Right now, I'm using Synopsys to determine the minimum are necessary to represent some circuits (using the Nangate 45nm library). I'm not doing P&R right now; I'm just trying to determine transistor area.
My only optimization constraint is to minimize area. I've noticed that if I tell DC to compile more than one time in a row, it produces different (and usually smaller) results each time.
I've looked and looked and failed to see if this is mentioned in a manual or anywhere in any discussion. Is it meant to work this way?
This suggests that optimization is stopping earlier than it could, so it's not REALLY minimizing area. Any idea why?
Is there a way I can tell it to increase the effort and/or tell it to automatically iterate compiles so that it will converge on the smallest design?
I'm guessing that DC is expecting to meet timing constraints, but I've given it a purely combinatorial block and no timing constraint. Did they never consider the usage scenario when all you want to do is work out the minimum gate area for a combinatorial circuit?
On a pure combinatorial circuit you can use a set_max_delay constraint and DC will attempt to meet that.
For reduced area you can use -map_effort high or -map_effort ultra to get it to work harder.
DC is a funny beast, and the algorithms it uses change as processes advance and make certain activities more or less useful. A lot of pre-layout optimization is less useful since the whole situation can change once the gates are actually placed and routed.
I filed a support ticket with Synopsys. I was using a 2010 version of design compiler. Apparently, area optimization has been improved since then, and the 2014 version will minimize area in one compiler pass.

disambiguating HPCT artificial intelligence architecture

I am trying to construct a small application that will run on a robot with very limited sensory capabilities (NXT with gyroscope/ultrasonic/touch) and the actual AI implementation will be based on hierarchical perceptual control theory. I'm just looking for some guidance regarding the implementation as I'm confused when it comes to moving from theory to implementation.
The scenario
My candidate scenario will have 2 behaviors, one is to avoid obstacles, second is to drive in circular motion based on given diameter.
The problem
I've read several papers but could not determine how I should classify my virtual machines (layers of behavior?) and how they should communicating to lower levels and solving internal conflicts.
These are the list of papers I've went through to find my answers but sadly could not
pct book
paper on multi-legged robot using hpct
pct alternative perspective
and the following ideas are the results of my brainstorming:
The avoidance layer would be part of my 'sensation layer' and that is because it only identifies certain values like close objects e.g. ultrasonic sensor specific range of values. The other second layer would be part of the 'configuration layer' as it would try to detect the pattern in which the robot is driving like straight line, random, circle, or even not moving at all, this is using the gyroscope and motor readings. 'Intensity layer' represents all sensor values so it's not something to consider as part of the design.
Second idea is to have both of the layers as 'configuration' because they would be responding to direct sensor values from 'intensity layer' and they would be represented in a mesh-like design where each layer can send it's reference values to the lower layer that interface with actuators.
My problem here is how conflicting behavior would be handled (maneuvering around objects and keep running in circles)? should be similar to Subsumption where certain layers get suppressed/inhibited and have some sort of priority system? forgive my short explanation as I did not want to make this a lengthy question.
/Y
Here is an example of a robot which implements HPCT and addresses some of the issues relevant to your project, http://www.youtube.com/watch?v=xtYu53dKz2Q.
It is interesting to see a comparison of these two paradigms, as they both approach the field of AI at a similar level, that of embodied agents exhibiting simple behaviors. However, there are some fundamental differences between the two which means that any comparison will be biased towards one or the other depending upon the criteria chosen.
The main difference is of biological plausibility. Subsumption architecture, although inspired by some aspects of biological systems, is not intended to theoretically represent such systems. PCT, on the hand, is exactly that; a theory of how living systems work.
As far as PCT is concerned then, the most important criterion is whether or not the paradigm is biologically plausible, and criteria such as accuracy and complexity are irrelevant.
The other main difference is that Subsumption concerns action selection whereas PCT concerns control of perceptions (control of output versus control of input), which makes any comparison on other criteria problematic.
I had a few specific comments about your dissertation on points that may need
clarification or may be typos.
"creatures will attempt to reach their ultimate goals through
alternating their behaviour" - do you mean altering?
"Each virtual machine's output or error signal is the reference signal of the machine below it" - A reference signal can be a function of one or more output signals from higher-level systems, so more strictly this would be, "Each virtual machine's output or error signal contributes to the reference signal of a machine at a lower level".
"The major difference here is that Subsumption does not incorporate the ideas of 'conflict' " - Well, it does as the purpose of prioritising the different layers, and sub-systems, is to avoid conflict. Conflict is implicit, as there is not a dedicated system to handle conflicts.
"'reorganization' which require considering the goals of other layers." This doesn't quite capture the meaning of reorganisation. Reorganisation happens when there is prolonged error in perceptual control systems, and is a process whereby the structure of the systems changes. So rather than just the reference signals changing the connections between systems or the gain of the systems will change.
"Design complexity: this is an essential property for both theories." Rather than an essential property, in the sense of being required, it is a characteristic, though it is an important property to consider with respect to the implementation or usability of a theory. Complexity, though, has no bearing on the validity of the theory. I would say that PCT is a very simple theory, though complexity arises in defining the transfer functions, but this applies to any theory of living systems.
"The following step was used to create avoidance behaviour:" Having multiple nodes for different speeds seem unnecessarily complex. With PCT it should only be necessary to have one such node, where the distance is controlled by varying the speed (which could be negative).
Section 4.2.1 "For example, the avoidance VM tries to respond directly to certain intensity values with specific error values." This doesn't sound like PCT at all. With PCT, systems never respond with specific error (or output) values, but change the output in order to bring the intensity (in this case) input in to line with the reference.
"Therefore, reorganisation is required to handle that conflicting behaviour. I". If there is conflict reorganisation may be necessary if the current systems are not able to resolve that conflict. However, the result of reorganisation may be a set of systems that are able to resolve conflict. So, it can be possible to design systems that resolve conflict but do not require reorganisation. That is usually done with a higher-level control system, or set of systems; and should be possible in this case.
In this section there is no description of what the controlled variables are, which is of concern. I would suggest being clear about what are goal (variables) of each of the systems.
"Therefore, the designed behaviour is based on controlling reference values." If it is only reference values that are altered then I don't think it is accurate to describe this as 'reorganisation'. Such a node would better be described as a "conflict resolution" node, which should be a higher-level control system.
Figure 4.1. The links annotated as "error signals" are actually output signals. The error signals are the links between the comparator and the output.
"the robot never managed to recover from that state of trying to reorganise the reference values back and forth." I'd suggest the way to resolve this would be to have a system at a level above the conflicted systems, and takes inputs from one or both of them. The variable that it controls could simply be something like, 'circular-motion-while-in-open-space', and the input a function of the avoidance system perception and then a function of the output used as the reference for the circular motion system, which may result in a low, or zero, reference value, essentially switching off the system, thus avoiding conflict, or interference. Remember that a reference signal may be a weighted function of a number of output signals. Those weights, or signals, could be negative so inhibiting the effect of a signal resulting in suppression in a similar way to the Subsumption architecture.
"In reality, HPCT cannot be implemented without the concept of reorganisation because conflict will occur regardless". As described above HPCT can be implemented without reorganisation.
"Looking back at the accuracy of this design, it is difficult to say that it can adapt." Provided the PCT system is designed with clear controlled variables in mind PCT is highly adaptive, or resistant to the effects of disturbances, which is the PCT way of describing adaption in the present context.
In general, it may just require clarification in the text, but as there is a lack of description of controlled variables in the model of the PCT implementation and that, it seems, some 'behavioural' modules used were common to both implementations it makes me wonder whether PCT feedback systems were actually used or whether it was just the concept of the hierarchical architecture that was being contrasted with that of the Subsumption paradigm.
I am happy to provide more detail of HPCT implementation though it looks like this response is somewhat overdue and you've gone beyond that stage.
Partial answer from RM of the CSGnet list:
https://listserv.illinois.edu/wa.cgi?A2=ind1312d&L=csgnet&T=0&P=1261
Forget about the levels. They are just suggestions and are of no use in building a working robot.
A far better reference for the kind of robot you want to develop is the CROWD program, which is documented at http://www.livingcontrolsystems.com/demos/tutor_pct.html.
The agents in the CROWD program do most of what you want your robot to do. So one way to approach the design is to try to implement the control systems in the CROWD programs using the sensors and outputs available for the NXT robot.
Approach the design of the robot by thinking about what perceptions should be controlled in order to produce the behavior you want to see the robot perform. So, for example, if one behavior you want to see is "avoidance" then think about what avoidance behavior is (I presume it is maintaining a goal distance from obstacles) and then think about what perception, if kept under control, would result in you seeing the robot maintain a fixed distance from objects. I suspect it would be the perception of the time delay between sending and receiving of the ultrasound pulses.Since the robot is moving in two-space (I presume) there might have to be two pulse sensors in order to sense the two D location of objects.
There are potential conflicts between the control systems that you will need to build; for example, I think there could be conflicts between the system controlling for moving in a circular path and the system controlling for avoiding obstacles. The agents in the CROWD program have the same problem and sometimes get into dead end conflicts. There are various ways to deal with conflicts of this kind;for example, you could have a higher level system monitoring the error in the two potentially conflicting systems and have it make reduce the the gain in one system or the other if the conflict (error) persists for some time.

How can I estimate if a feature is going to take up too many resources on an FPGA?

I'm starting on my first commercial sized application, and I often find myself making a design, but stopping myself from coding and implementing it, because it seems like a huge use of resources. This is especially true when it's on a piece that is peripheral (for example an enable for the output taps of a shift register). It gets even worse when I think about how large the generic implementation can get (4k bits for the taps example). The cleanest implementation would have these, but in my head it adds a great amount of overhead.
Is there any kind of rule I can use to make a quick decision on whether a design option is worth coding and evaluation? In general I worry less about the number of flip-flops, and more when it comes to width of signals. This may just be coming from a CS background where all application boundarys should be as small as possibly feasable to prevent overhead.
Point 1. We learn by playing, so play! Try a couple of things. See what the tools do. Get a feel for the problem. You won't get past this is you don't try something. Often the problems aren't where you think they're going to be.
Point 2. You need to get some context for these decisions. How big is adding an enable to a shift register compared to the capacity of the FPGA / your design?
Point 3. There's two major types of 'resource' to consider :- Cells and Time.
Cells is relatively easy in broad terms. How many flops? How much logic in identifiable blocks (e.g. in an ALU: multipliers, adders, etc)? Often this is defined by the design you're trying to do. You can't build an ALU without registers, a multiplier, an adder, etc.
Time is more subtle, and is invariably traded off against cells. You'll be trying to hit some performance target and recognising the structures that will make that hard are where to experience from point 1 comes in.
Things to look out for include:
A single net driving a large number of things. Large fan-outs cause a heavy load on a single driver which slows it down. The tool will then have to use cells to buffer that signal. Classic time vs cells trade off.
Deep clumps of logic between register stages. Again the tool will have to spend more cells to make logic meet timing if it's close to the edge. Simple logic is fast and small. Sometimes introducing a pipeline stage can decrease the size of a design is it makes the logic either side far easier.
Don't worry so much about large buses, if each bit is low fanout and you've budgeted for the registers. Large buses are often inherent in fast designs because you need high bandwidth. It can be easier to go wide than to go to a higher clock speed. On the other hand, think about the control logic for a wide bus, because it's likely to have a large fan-out.
Different tools and target devices have different characteristics, so you have to play and learn the rules for your set-up. There's always a size vs speed (and these days 'vs power') compromise. You need to understand what moves you along that curve in each direction. That comes with experience.
Is there any kind of rule I can use to make a quick decision on whether a design option is worth coding and evaluation?
Only rule I can come up with is 'Have I got time? or not?'
If I have, I'll explore. If not I better just make something work.
Ahhh, the life of doing design to a deadline!
It's something that comes with experience. Here's some pointers:
adding numbers is fairly cheap
choosing between them (multiplexing) gets big quite quickly if you have a lot of inputs to the multiplexer (the width of each input is a secondary issue also).
Multiplications are free if you have spare multipliers in your chip, they suddenly become expensive when you run out of hard DSP blocks.
memory is also cheap, until you run out. For example, your 4Kbit shift register easily fits within a single Xilinx block RAM, which is fine if you have one to spare. If not it'll take a large number of LUTs (depending on the device - an older Spartan 3 can fit 17 bits into a LUT (including the in-CLB register), so will require ~235 LUTS). And not all LUTs can be shift registers. If you are only worried about the enable for the register, don't. Unless you are pushing the performance of the device, routing that sort of signal to a few hundred LUTs is unlikely to cause major timing issues.

When must a signal be inserted into the sensitivity list of a process

I am confused about when a signal declared in an architecture must be inserted into the sensitivity list of a process.
Is there is a general law that can be followed in any situation?
I have real difficulties understanding when I have to include a signal in a process sensitivity list.
The "general law" is that
anything that your process needs to know about changes of needs to be in the sensitivity list.
For a typical synthesisable register with a synchronous reset:
process (clk) is
begin
if rising_edge(clk) then
if reset = '1' then
-- do reset things
else
-- read some signals, assign some outputs
end if;
end if;
end process;
Only the clock needs to be in the list, as everything else is only looked at when the clock changes (due to the if rising_edge(clk) statement.
If you need an asynchronous reset:
process (clk, reset) is
begin
if reset = '1' then
-- do reset things
elsif rising_edge(clk) then
-- read some signals, assign some outputs
end if;
end process;
then the reset signal must also be in the sensitivity list, as your design needs to check the value of it every time it changes, irrespective of what the clock is doing.
For combinatorial logic, I avoid using processes completely because of the problems keeping the sensitivity list up-to-date, and the potential for simulation then behaving differently to the synthesised code. This has been eased by the all keyword in VHDL-2008, but I still haven't found myself wanting to write long complicated combinatorial logic such that a process would help.
If a signal is in the sensitivity list of a process, the process will "wake up" and be evaluated whenever the value of that signal changes. If it is not in the sensitivity list, a signal can change, but a process will not be re-evaluated to determine what the new outputs should be.
For Combinatorial Logic: Likely you want all your input signals to be included in the sensitivity list. If they are not included in the sensitivity list, then that will result in your output not changing even when that input signal changes. This is a common error (due to carelessness). Note that in VHDL 2008 you can use "all" keyword to automatically include all necessary signals in your process and avoid creating latches.
For Synchronous Logic: Likely you only want your clock (and maybe your reset) signal in the sensitivity list. This is because you are only concerned with the value of your signals (other than the clock) when your system clock has changed. This is because you are typically describing registers (composed of flip flops) which only allow changing their output value on a clock edge.
All of this can be confusing in the case of using HDL for synthesis because only a subset of the circuits you describe in VHDL can actually be implemented within a FPGA. For example, you can't have a primitive memory element that is sensitive to two independent clock edges, even though you could describe such a circuit by including two clocks in a sensitivity list.
Also, the synthesis tools (talking about the Xilinx XST in this case) don't necessarily always respect the process sensitivity list. If you fail to list all the processes whose values are evaluated in the body of the process, the XST will emit a warning saying that it's going to assume that the signals whose values are evaluated are on the sensitivity list. That may lead to differences between behavioral simulations and actual hardware. Keep it in mind.
...also, be warned, the sensitivity list has no influence over the behaviour of your design once it is synthesised. It is only used during simulation. Hence it's quite easy to introduce a difference in behaviour between RTL and synthesised code by changes to the sensitivity list.
The rules Josh gives are good, but above all, read the warnings your tools give you and act on them. They normally check that the sensitivity list is correct and will flag any problems. Emacs VHDL mode also has a command to update the sensitivity list, and it's normally pretty good at it.
Hmmmm, Ninja'd