I'm exploring some of the output from g++'s cfg facilities. I think I understand what "bb" does - it is a label for goto statements, right? What does bb stand for? Does g++ have any other ways of labeling places to go to?
It stands for basic block. In GCC, this is either a sequence of GIMPLE statements or (in later compiler passes) RTL expressions. Basic blocks are elements of the control flow graph.
Related
The use of context is briefly mentioned in the K tutorial as a way to customize the order evaluation. But I'm also seeing other context statements that contain rewrite arrows in them, like this one in the untyped simple language.
context ++(HOLE => lvalue(HOLE))
rule <k> ++loc(L) => I +Int 1 ...</k>
<store>... L |-> (I => I +Int 1) ...</store> [increment]
Could someone explain how exactly context work in K? In particular, I'm interested in:
Is there a more general usage of context in K than just stating the order of evaluation?
How does the order in which context statements are declared affect the semantics?
Thank you!
More detailed information about context declarations in K can be found in K's documentation here. In particular, contexts with rewrite arrows mean that heating and cooling will wrap the term to be heated or cooled in a particular symbol. In your example, that symbol is lvalue.
To answer your questions specifically:
Context declarations, like strictness attributes, are primarily used in order to specify the evaluation strategy. While in theory they can be used for other things, in practice this rarely happens. That said, evaluation strategies can be complex, which is part of why K has so many different features relating to evaluation strategy. In the example you mentioned, we use rewrites in a context declaration in order to provide a separate set of rules for evaluating lvalues (ie, to avoid actually evaluating all the way to a value, and only evaluate to a location).
K's sentences are unordered. Within a single module, you can reorder any of its sentences (except import statements, which must appear first) and there will not be an effect on the intended semantics (although backends may result in slightly different behavior for concrete execution if your semantics is nondeterministic). This includes context declarations.
I am writing the code for expression evaluator using lex and yacc which can have following operations:
/ , * , + , - , pow(a,b) , sqrt(a) , log(a)
also there can be brackets in the expression.
Input expression is in the file "calculator.input"
I have to compare the time of my code with bc, I am facing following problems:
1) bc doesn't accept pow(a,b) and log(a) it instead accepts a^b and l(a) .
How do I change it?
2) How do I use the bc from the main funtion in the yacc program ? or that can't be done?
I think it would be easier to change your code than to change bc, but if you want to try, you can find pointers to bc's source bundles on the GNU project page and in the FreeBSD source mirror. Of course, the end result would not strictly speaking be bc any more, so I don't know if it would still count, for the purposes of your assignment.
I don't know what the specifications are for the pow function you are supposed to implement, but note that bc's ^ operator only allows integer exponents, so it might not work on all your test cases (unless, of course, all your test cases have integer exponents.) You could compute a^b with e(l(a)*b), but it won't be as accurate for integer exponents:
e(l(10)*100)
99999999999999999920085453156357924020916787698393558126052191252537\
96016108317256511712576426623511.11829711443225035170
10^100
10000000000000000000000000000000000000000000000000000000000000000000\
000000000000000000000000000000000
You might want to consult with your tutor, professor, or teaching assistant.
If you don't want to (or are not allowed to) generate the bc equivalent test cases by hand, you might be able to automate the process with sed (if the exponential sub-expressions are not complicated), or by adapting your calculator to output the expression in bc's syntax. The latter would be a fairly easy project, and you'd probably learn something by implementing it.
If you are using a Unix-like system, you can easily run any command-line utility from a C program. (Indeed, you can do that on non-Unix-like systems, too, but the library functions will differ.) If you don't need to pass data to bc through its stdin, you can use the popen(3) library function, which is certainly the easiest solution.
Otherwise, you will have to set up a pair of pipe(2)s (one for writing to bc's stdin and the other for reading from its stdout), fork(2) a child process, and use one of the exec* function calls, probably execlp(3) or execvp(3), to run bc in the child. (Watch out for pipe deadlock while you are writing to and reading from the child.) Once the child process finishes (which you'll notice because you'll get an EOF on the pipe you're using to read from its stdout, you should use wait(3) or waitpid(3) to get its status code.
If all that seems too complicated, you could use the much simpler solution of running both your program and bc from your shell. (You can use the time shell built-in on Unix-like shells to get a measure of execution time, although it will not be microsecond resolution which might be necessary for such a simple program.)
I have written a finite volume model. The parameter n represents the number of volumes. After translating, the parameter can't be modified. Dymola gives this message:
Warning: Setting n has no effect in model.
After translation you can only set literal start-values and non-evaluated parameters.
I think the problem is that the parameter n is used in the equation section. There I use the following code:
equation
...
for i in 2:n-1 loop
T[i] = some equation
end for
I also use n for the calculation of the initial values of T.
The purpose is to make a script that repeatedly executes the model but with a different n.
How can I do this?
The issue here is that your parameter n affects the number of variables in the problem. Dymola (and all other Modelica compilers I know of) evaluate such parameters at compile time. In other words, they hard code the value at compile time into the model.
One potential workaround in your case is to perform the translation or simulation inside your loop. Note that in the translate and simulate commands in Dymola you can include modifications. Just add them after the model name. For example MyModel would become MyModel(n=10).
On a modern Pentium it is no longer possible to give branching hints to the processor it seems. Assuming that a profiling compiler such as gcc with profile-guided optimization gains information about likely branching behavior, what can it do to produce code that will execute more quickly?
The only option I know of is to move unlikely branches to the end of a function. Is there anything else?
Update.
http://download.intel.com/products/processor/manual/325462.pdf volume 2a, section 2.1.1 says
"Branch hint prefixes (2EH, 3EH) allow a program to give a hint to the processor about the most likely code path for
a branch. Use these prefixes only with conditional branch instructions (Jcc). Other use of branch hint prefixes
and/or other undefined opcodes with Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable
behavior."
I don't know if these actually have any effect however.
On the other hand section 3.4.1. of http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf says
"
Compilers generate code that improves the efficiency of branch prediction in Intel processors. The Intel
C++ Compiler accomplishes this by:
keeping code and data on separate pages
using conditional move instructions to eliminate branches
generating code consistent with the static branch prediction algorithm
inlining where appropriate
unrolling if the number of iterations is predictable
With profile-guided optimization, the compiler can lay out basic blocks to eliminate branches for the most
frequently executed paths of a function or at least improve their predictability. Branch prediction need
not be a concern at the source level. For more information, see Intel C++ Compiler documentation.
"
http://cache-www.intel.com/cd/00/00/40/60/406096_406096.pdf says in "Performance Improvements with PGO "
"
PGO works best for code with many frequently executed branches that are difficult to
predict at compile time. An example is the code with intensive error-checking in which
the error conditions are false most of the time.
The infrequently executed (cold) errorhandling code can be relocated so the branch is rarely predicted incorrectly. Minimizing
cold code interleaved into the frequently executed (hot) code improves instruction cache
behavior."
There are two possible sources for the information you want:
There's Intel 64 and IA-32 Architectures Software Developer's Manual (3 volumes). This is a huge work which has evolved for decades. It's the best reference I know on a lot of subjects, including floating-point. In this case, you want to check volume 2, the instruction set reference.
There's Intel 64 and IA-32 Architectures Optmization Reference Manual. This will tell you in somewhat brief terms what to expect from each microarchitecture.
Now, I don't know what you mean by a "modern Pentium" processor, this is 2013, right? There aren't any Pentiums anymore...
The instruction set does support telling the processor if the branch is expected to be taken or not taken by a prefix to the conditional branch instructions (such as JC, JZ, etc). See volume 2A of (1), section 2.1.1 (of the version I have) Instruction Prefixes. There is the 2E and 3E prefixes for not taken and taken respectively.
As to whether these prefixes actually have any effect, if we can get that information, it will be on Optimization Reference Manual, the section for the microarchitecture you want (and I'm sure it won't be the Pentium).
Apart from using those, there is an entire section on the Optimization Reference Manual on that subject, that's section 3.4.1 (of the version I have).
It makes no sense to reproduce that here, since you can download the manual for free.
Briefly:
Eliminate branches by using conditional instructions (CMOV, SETcc),
Consider the static prediction algorithm (3.4.1.3),
Inlining
Loop unrolling
Also, some compilers, GCC, for instance, even when CMOV is not possible, often perform bitwise arithmetic to select one of two distinct things computed, thus avoiding branches. It does this particularly with SSE instructions when vectorizing loops.
Basically, the static conditions are:
Unconditional branches are predicted to be taken (... kind of expectable...)
Indirect branches are predicted not to be taken (because of a data dependency)
Backward conditionals are predicted to be taken (good for loops)
Forward conditionals are predicted not to be taken
You probably want to read the entire section 3.4.1.
If it's clear that a loop is rarely entered, or that it normally iterates very few times, then the compiler might avoid unrolling the loop, as doing so can add a lot of harmful complexity to handle edge conditions (an odd-number iterations, etc.). Vectorisation, in particular, should be avoided in such cases.
The compiler might rearrange nested tests, so that the one that most frequently results in a short-cut can be used to avoid performing a test on something with a 50% pass rate.
Register allocation can be optimised to avoid having a rarely-used block force register spill in the common case.
These are just some examples. I'm sure there are others I haven't thought of.
Off the top of my head, you have two options.
Option #1: Inform the compiler of the hints and let the compiler organize the code appropriately. For example, GCC supports the following ...
__builtin_expect((long)!!(x), 1L) /* GNU C to indicate that <x> will likely be TRUE */
__builtin_expect((long)!!(x), 0L) /* GNU C to indicate that <x> will likely be FALSE */
If you put them in macro form such as ...
#if <some condition to indicate support>
#define LIKELY(x) __builtin_expect((long)!!(x), 1L)
#define UNLIKELY(x) __builtin_expect((long)!!(x), 0L)
#else
#define LIKELY(x) (x)
#define UNLIKELY(x) (x)
#endif
... you can now use them as ...
if (LIKELY (x != 0)) {
/* DO SOMETHING */
} else {
/* DO SOMETHING ELSE */
}
This leaves the compiler free to organize the branches according to static branch prediction algorithms, and/or if the processor and compiler support it, to use instructions that indicate which branch is more likely to be taken.
Option #2: Use math to avoid branching.
if (a < b)
y = C;
else
y = D;
This could be re-written as ...
x = -(a < b); /* x = -1 if a < b, x = 0 if a >= b */
x &= (C - D); /* x = C - D if a < b, x = 0 if a >= b */
x += D; /* x = C if a < b, x = D if a >= b */
Hope this helps.
It can make the fall-through (ie the case where a branch is not taken) the most used path. That has two big effects:
only 1 branch can be taken per clock, or on some processors even per 2 clocks, so if there are any other branches (there usually are, most code that matters is in a loop), a taken branch is bad news, a non-taken branch less so.
when the branch predictor is wrong, the code that it does have to execute is more likely to be in the code cache (or µop cache, where applicable). If it wasn't, that would have been a double-whammy of restarting the pipeline and waiting for a cache miss. This is less of an issue in most loops, since both sides of the branch are likely to be in the cache, but it comes into play in big loops and other code.
It can also decide whether to do if-conversion based on better data than a heuristic guess. If-conversions may seem like "always a good idea", but they're not, they're only "often a good idea". If the branch in the branching implementation is very well-predicted, the if-converted code can well be slower.
i have been told to use 'when' statement to make multiplexer but not use 'if' statement as it will cause timing errors...
i don't understand this ...
so what is the difference between 'if' and 'when' ? and do they map to the same thing in hardware ?
OK, lets discuss some points at first on the difference between if and when statements:
Both are called Dataflow Design Elements.
when statement
concurrent statement
not used in process, used only in architecture as process is sequential execution
if statement
sequential statement
used in process as it is sequential statement, and not used outside the process
And you know multiplexer is a component don't need process block, as its behavior doesn't change with changing its input, so it will be outside process, so you have to write it using when statement as it is concurrent statement.. And if you wrote it with if statement, timing errors may occur. Also all the references and also Xilinx help (if you are using Xilinx) are writing the Multiplexer block using when statement not if statement
Reference: Digital Design Priciples & Practices, John F. Wakerly, 3rd Edition
See these:
VHDL concurrent statements, which includes when.
VHDL sequential statements, which includes if.
Basically, if is sequential, and when is concurrent. They do not map to the same thing in hardware... This page describes, at the bottom, some of the special considerations needed to synthesize an if statement.
Both coding styles are totally valid.
Let's recall some elements. Starting from HDL, synthesis is done in two main steps :
first, the VHDL is analyzed in order to detect RTL templates (consisting in RTL elements : flip-flops, arithmetic expressions, multiplexers , control logic ). We say that these elements are "infered" (i.e you must code using the right template to get what you wanted initially. You must imagine how these elements are connected, before coding ).
The second step is real logic synthesis, that takes a particular target technology parameters into account (types of gates available, timing, area, power).
These two steps clearly separates RTL functional needs (steering logic, computations) from technology contingencies (timing etc).
Let's come back to the first step (RTL) :
Concerning multiplexers, several coding styles are possible :
using concurrent assignement :
y<= a1 when cond1 else a2 when cond2 else cond3;
using if statement within a process :
process(a1,a2,a3,cond1,cond2)
begin
if(cond1) then
y<=a1;
elsif(cond2) then
y<=a2;
else
y<=a3;
end if;
end;
using another concurrent assignment
form, suitable for generic
descriptions : if sel is an integer
and muxin an array of signals, then :
muxout <= muxin(sel); --will infer a mux
Note that the 3 coding styles always work. Note also that they are "a bit more" than simple multiplexer as the coding style force the presence of a priority encoding (if elsif, when else), which is not the case of a simple equation-based multiplexer, really symmetric.
using a case statement
process(a1,a2,a3,cond1,cond2)
variable cond : std_logic(1 downto 0);
begin
cond := cond2 & cond1;
case cond is
when "01" => y<= a1;
when "10" => y<= a2;
when others => y<=a3;
end case;
end;
using a select statement (in our
example, two concurrent assignements
needed) :
sel <= cond2 & cond1;
WITH sel SELECT
y <= a1 WHEN "01",
a2 WHEN "10",
a3 WHEN OTHERS;
A final remark is about the rising of abstraction, even for RTL design : the synthesizers are now really mature. Have a look at Jiri Gaisler coding styles for LEON2 open source processor for example, as well as his coding styles (see here). He prones a very different approach, yet perfectly valid, from classical books.
You should always understand what the RTL synthesizer will infer.
In the contrary, behavioral synthesis allows you to forget (partially) what the synthesizer will infer. But that's another story.