Can I control register allocation in g++? - optimization

I have highly optimized piece of C++ and making even small changes in places far from hot spots can hit performance as much as 20%. After deeper investigation it turned out to be (probably) slightly different registers used in hot spots.
I can control inlineing with always_inline attribute, but can I control register allocation?

If you really want to mess with the register alloation then you can force GCC to allocate local and global variables in certain registers.
You do this with a special variable declaration like this:
register int test_integer asm ("EBX");
Works for other architectures as well, just replace EBX with a target specific register name.
For more info on this I suggest you take a look at the gcc documentation:
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Local-Reg-Vars.html
My suggestion however is not to mess with the register allocation unless you have very good reasons for it. If you allocate some registers yourself the allocator has less registers to work with and you may end up with a code that is worse than the code you started with.
If your function is that performance critical that you get 20% performance differences between compiles it may be a good idea to write that thing in inline-assembler.
EDIT: As strager pointed out the compiler is not forced to use the register for the variable. It's only forced to use the register if the variable is used at all. E.g. if the variable it does not survive an optimization pass it won't be used. Also the register can be used for other variables as well.

In general the register keyword is simply ignored by all modern compilers. The only exception is the (relatively) recent addition of an error if you attempt to take the address of a variable you've marked with the register keyword.
I've experienced this sort of pain as well, and eventually found the only real way around it was to look at output assembly to try and determine what is causing gcc to go off the deepend. There are other things you can do but it depends on exactly what your code is trying to do. I was working in a very very large function with a large amount of computed goto mayhem in which minor (seemingly innocuous) changes could cause catastrophic performance hits. If you're doing similar there are a few things you can do to try and mitigate the problem, but the details are somewhat icky so i'll forgo discussing them here unless it's actually relevant.

It depends on the processor you are using. Or should I say, yes you can with the register keyword, but this is frowned upon unless you are using a simple processor with no pipe-lining and a single core. These days GCC can do a way better job than you can with register allocation. Trust it.

Related

Are global variables as frowned upon in embeded systems programming (C)?

I learned years ago, that in the application world, global variables are a "bad" or "frowned upon", so it became a habit to try to avoid them and use them very scarcely.
Seems like that in the embedded world they are almost unavoidable when it comes to working with hardware interrupts. They also have to be made volatile so that the compiler does not optimize them out if it sees them never being touched in the running program.
Are both of these statements true ? is there a way to avoid those variables in the case I described without bending too far backward ?
Seems like that in the embedded world they are almost unavoidable when
it comes to working with hardware interrupts. They also have to be
made volatile so that the compiler does not optimize them out if it
sees them never being touched in the running program.
Are both of these statements true ? is there a way to avoid those
variables in the case I described without bending too far backward ?
Neither of those statements are true.
First, let's clarify that by global variables we mean file scope variables that have external linkage. These are variables that could be called upon with the extern keyword or by mistake.
With regard to the first statement:
Seems like that in the embedded world they are almost unavoidable when
it comes to working with hardware interrupts.
Global variables are avoidable when working with hardware interrupts. As others have pointed out in the comments, global variables in an embedded environment are not uncommon, but they aren't encouraged either especially if you can afford to implement proper encapsulation. This article, which someone provided in the comments to your question, actually contains a reader response that provides a good example of where a proper implementation of encapsulation was not possible (you don't have to go far, it's the first one).
With regard to the second statement:
They also have to be made volatile so that the compiler does not optimize them out if it
sees them never being touched in the running program.
This statement is, let's say 'almost true'. The compiler knows when the memory location for a variable needs to be accessed (written to/read from memory) so when optimization is turned on it will avoid unnecessary memory access. The volatile keyword tells the compiler not to do that, which means access to that memory location will happen every time the variable is used.
Cases where using the volatile keyword is necessary
Global variable(s) updated an interrupt
Global variable(s) accessed by multiple threads in a multi-thread application
Memory-mapped peripheral registers
In the case of a global variable that is updated by an interrupt, the volatile keyword is imperative because the interrupt can happen at any time and we do not want to miss that update. For global variables that are not updated by an interrupt and the application is single threaded, the volatile keyword is completely unnecessary and can actually slow your code down since you'll be accessing the memory location for that variable every time!
is there a way to avoid those variables in the case I described without bending too far backward ?
The answer to that really depends. Probably most important is what do you have to gain from making this design change? Also, is this a professional project, one for school, or one for fun?
From my experience as an engineer, time to market is often times the most important thing a company is worried about when developing a new product. There is usually going to be some legacy code that you get stuck with that was developed during the research and development phase, and it 'works' so why spend time to fix something that isn't broken? Seriously, it better be a very convincing argument otherwise don't waste your time.
From my educational experience, taking the time to go back and implement a proper design philosophy and document it is definitely worth it, but only if you actually have the time to do so! If you are close to a deadline, don't risk it. If you are ahead of the game, do it. It'll be worth it in more ways than one.
Lastly, to properly encapsulate the Interrupt Service Routine (ISR) for a hardware interrupt, you need to place it in an actual device driver (CAN, UART, SPI, etc). All communication with the ISR should be facilitated by the device driver and device driver only. Variables shared between the ISR and the device driver should be declared static and volatile. If you need access to any of those variables externally, you create setters and getters as part of your public API for the driver. Check this answer out for a general guideline to follow.

Choosing CPU architecture for LLVM/CLANG

I am designing TTL serial computer, and I am struggling on choosing architecture more suitable for LLVM compiler backend (I want to be able to run any C++ software there). There will be no MMU, no multiplication/division, no hardware stack, no interrupts.
I have 2 main options:
1) 8-bit memory, 8-bit ALU, 8-bit registers (~12-16). Memory address width 24 bit. So I will need to use 3 registers as IP and 3 registers for any memory location.
Needless to say that any address calculations would be pure pain to implement in compiler.
2) 24-bit memory, 24-bit ALU, 24-bit registers (~6-8). Flat memory, nice. The drawbacks is that due to serial nature of the design, each operation would take 3 time more clocks, even if we are operating on some booleans. 24-bit memory data width is expensive. And it's harder to implement in hardware in general.
The question is : Do you think implementing all c++ features on this 8-bit, stack-less based hardware is possible, or I need to have more complex hardware to have generated code of reasonable quality & speed?
I second the suggestion to use LCC. I used it in this homebrew 16-bit RISC project: http://fpgacpu.org/xsoc/cc.html .
I don't think it should make much difference whether you build the 8-bit variant and use 3 add-with-carries to increment IP, or the 24-bit variant and do the whole thing in hardware. You can hide the difference in your assembler.
If you look at my article above, or an even simpler CPU here: http://fpgacpu.org/papers/soc-gr0040-paper.pdf you will see you really don't need that many operators / instructions to cover the integer C repetoire. In fact there is an lcc utility (ops) to print the min operator set for a given machine.
For more information see my article on porting lcc to a new machine here: http://www.fpgacpu.org/usenet/lcc.html
Once I had ported lcc, I wrote an assembler, and it synthesized a larger repetoire of instructions from the basic ones. For example, my machine had load-byte-unsigned but not load-byte-signed, so I emitted this sequence:
lbs rd,imm(rs) ->
lbu rd,imm(rs)
lea r1,0x80
xor rd,r1
sub rd,r1
So I think you can get by with this min cover of operations:
registers
load register with constant
load rd = *rs
store *rs1 = rs2
+ - (w/ w/o carry) // actually can to + with - and ^
>> 1 // << 1 is just +
& ^ // (synthesize ~ from ^, | from & and ^)
jump-and-link rd,rs // rd = pc, pc = rs
skip-z/nz/n/nn rs // skip next insn on rs==0, !=0, <0, >=0
Even simpler is to have no registers (or equivalently blur registers with memory -- all registers have a memory address).
Set aside a register for SP, and write the function prolog/epilog handler in the compiler and you won't have to worry about stack instructions. There's just code to store each of the callee save registers, adjust the SP by the frame size, and so forth.
Interrupts (and return from interrupts) are straightforward. All you need to do is force a jump-and-link instruction into the instruction register. If you chose the bit pattern for that to be something like 0, and put the right addresses into the source register rs (especially if it is r0), it can be done with a flip-flop reset input or an extra force-to-0 and gate. I use a similar trick in the second paper above.
Interesting project. I see a TTL / 7400 contest is underway and I was thinking myself of how simple a machine could you get away with and would it be cheating to add a 32 KB or 128 KB async SRAM to the machine to hold the code and data.
Anyway, happy hacking!
p.s.
1) You will want to decide how large each integral type is. You can certainly make char, short, int, long, long long, etc. the same size, one 24b word, if you wish, although it won't be compliant in min representation ranges.
2) And although I focused on lcc here, you were asking about C++. I recommend persuing C first. Once you have things figured out for C, including *, /, % operators in software, etc., it should be more tractable to move to full blown C++ whether in LLVM or GCC. The difference between C and C++ is "only" the extra vtables and RTTI tables and code sequences (entirely built up out the primitive C integer operator repetoire) required to handle virtual function calls, pointer to member dereference, dynamic casts, static constructors, exception handling, etc.
IMHO, It is possible for c compiler. i am not sure for c++, though.
LLVM/CLang could be hard choice for 8bit computer,
Instead, first try lcc, then second llvm/etc, HTH.
Bill Buzbee succeed to retarget lcc compiler for his Magic-1(known as homebrewcpu).
Although the hardware design and construction of Magic-1 usually gets the most attention, the largest part of the project (by far) has been developing/porting the software. To this end, I've had to write an assembler and linker from scratch, retarget a C compiler, write and port the standard C libraries, write a simplified operating system and then port a more sophisticated one. It's been a challenge, but a fun one. I suppose I'm somewhat twisted, but I happen to enjoy debugging difficult problems. And, when the bug you're trying to track down could involve one or more of: hardware design flaw, loose or broken wire, loose or bad TTL chip, assembler bug, linker bug, compiler bug, C runtime library bug, or finally a bug in the program in question there's lot of opportunity for fun. Oh, and I also don't have the luxury of blaming the bugs on anyone else.
I'm continually amazed that the damn thing runs at all, much less runs as well as it does.
In my opinion, stackless hardware is already poorly suited for C and C++ code. If you have nested function calls, you will need to emulate a stack in software anyway, which of course is much slower.
When going the stackless route, you will probably allocate most of your variables as 'static', and have no re-entrant functions. In this case, 6502-style addressing modes can be effective. You could for example have these addressing modes:
Immediate address (24bit) as part of opcode
Immediate address (24bit) plus index register (8bit)
Indirect access: immediate 24bit address to memory, which contains the actual address
Indirect access: 24 bit address to memory, 8 bit index register added to value from memory.
The address modes outlined above would allow efficient access to arrays, structures and objects allocated at a constant address (static allocation). They would be less efficient (but still usable) for dynamically and stack-allocated objects.
You would also get some benefit from your serial design: usually the 24 bit + 8 bit addition does not take 24 cycles, but you can instead short-circuit the addition when carry is 0.
Instead of mapping the IP as registers directly, you could allow changing it only through goto/branch instructions, using the same address modes as above. Jumps into dynamically computed addresses are quite rare so it makes more sense to give the whole 24-bit address directly in the opcode.
I think that if you design the CPU carefully, you can use many C++ features quite efficiently. However, do not expect that any random C++ code would run fast on such a limited CPU.
The implementation is certainly possible, but I doubt it will be usable (at lest for C++ code). As it was already noted, first problem is lack of stack. Next, bunch of C++ relies heavily on dynamic memory allocation, also C++ "internal" structures are quite big.
So, as it seems to me, it will be better, if you:
Get rid of C++ requirement (or at least, limit yourself to some subset)
Use 24 bits, not 8 bits for everything (for registers as well)
Add hardware stack
You are not going to be able to run "any" C++ code there. For example fork(), system(), etc. Anything that clearly relies on interrupts for example. You can get a long way there, sure.
Now do you mean any programs that can/have been written in C++ or are you limiting yourself to the language only and not the libraries that are commonly associated with C/C++? The language itself is a much easier rule to live with.
I think the easier question/answer, is, why not just try? What have you tried so far? It could be argued that the x86 is an 8-bit machine, no regard for alignment and many 8 bit instructions. the msp430 was ported to llvm to show how easily and quickly it could be done, I would like to see that platform with better support (not where my strengths lie otherwise I would be doing it) a 16 bit platform. no mmu. does have a stack and interrupts sure, dont have to use them and if you remove library rules then what is left that needs an interrupt?
I would look at llvm but note that the documentation produced that shows how easy it is to port, is dated and wrong and you basically have to figure it out on your own from the compiler sources. llc has a book, known for that, not optimized. Sources dont compile well on modern computers, always having to go backwards in time to use it, any time I go near it after an evening just trying to build it as is I give up. vbcc, simple, clean, documented, not unfriendly to smaller processors. Is it C++, dont remember. Of all of them the easiest to get a compiler up and running though. Of all of them LLVM is the most attractive and most useful when all said and done. dont go near gcc or even think of it, duct tape and bailing wire inside holding it together.
Have you invented your instruction set yet? do you have a simulator and assembler yet? Look up lsasim at github to find my instruction set. You can write an llvm backend for mine as practice for yours...grin...(my vbcc backend is horrible, I need to start over)...
You have to have some idea of how the high level will be implemented but you really have to start with an instruction set and an instruction set simulator and an assembler of some sort. Then start hand converting C/C++ code into assembly for your instruction set, that should pretty quickly get you through "can I do this without a stack", etc. In this process define your calling convention, implement more C/C++ code by hand using your calling convention. THEN dig into a compiler and make a back end. I think you should consider vbcc as a stepping stone, then head for LLVM if it appears like it (the isa) will work.

General question: Adding new test code to embedded system

this maybe will be off topic, but I am preparing for an exam in real time. And I have been browsing the book and Internet for an answer for a problem.
Basically I wonder if by adding additional test code if it may change the real time behavior for an embedded system, and or also if it will introduce new errors.
Anyone who might know the answer for this, or refer me to some reading material for it?
Your question is too general.. So I guess the default answer would be it depends.. But considering the possibilities as an exercise of logic and thought, yes it surely can!
There are many schemes available to guarantee the 'real-timeness' of an embedded system. For example, one can have a pre-emptive timer based ISR to service the real-time task.. In such a case, your test code could possibly not affect the 'real-timeness'.. But if the testing takes too long, and the context switches are not pre-emptive, you could get into trouble..
But again it depends on what you're testing and how you're testing. Your test code can possible mess with the timers, interrupts or the memory of system. The possibilities to mess up stuff if you're not careful are endless..
Having an OS underneath will prevent some errors, but again depending on how it works, you may be saved from bad 'test code'..
Yes, when you add code (test, diagnostic, statistic) it may change the real time behavior. It depends on the design, the implementation and the CPU power if it will actually change the behavior. You also have more lines of code and the probability for errors may increase. But I wouldn't say, "it will introduce errors", since it can introduce errors.
Yes it can. See How can adding data to a segment in flash memory screw up a program's timing? for an example of how even adding non-executable code can adjust timing enough to screw up a system.
Yea, changing your code base could totally change its timing. Consider if you dumped some debug output to a serial port, it takes time to call that function, format the data, and if the function is synchronous, then for it to wait for data to go out. This kinda stuff definitely changes system timing behavior.

Can compiler optimization introduce bugs?

Today I had a discussion with a friend of mine and we debated for a couple of hours about "compiler optimization".
I defended the point that sometimes, a compiler optimization might introduce bugs or at least, undesired behavior.
My friend totally disagreed, saying that "compilers are built by smart people and do smart things" and thus, can never go wrong.
He didn't convince me at all, but I have to admit I lack of real-life examples to strengthen my point.
Who is right here? If I am, do you have any real-life example where a compiler optimization produced a bug in the resulting software? If I'm mistaking, should I stop programming and learn fishing instead?
Compiler optimizations can introduce bugs or undesirable behaviour. That's why you can turn them off.
One example: a compiler can optimize the read/write access to a memory location, doing things like eliminating duplicate reads or duplicate writes, or re-ordering certain operations. If the memory location in question is only used by a single thread and is actually memory, that may be ok. But if the memory location is a hardware device IO register, then re-ordering or eliminating writes may be completely wrong. In this situation you normally have to write code knowing that the compiler might "optimize" it, and thus knowing that the naive approach doesn't work.
Update: As Adam Robinson pointed out in a comment, the scenario I describe above is more of a programming error than an optimizer error. But the point I was trying to illustrate is that some programs, which are otherwise correct, combined with some optimizations, which otherwise work properly, can introduce bugs in the program when they are combined together. In some cases the language specification says "You must do things this way because these kinds of optimizations may occur and your program will fail", in which case it's a bug in the code. But sometimes a compiler has a (usually optional) optimization feature that can generate incorrect code because the compiler is trying too hard to optimize the code or can't detect that the optimization is inappropriate. In this case the programmer must know when it is safe to turn on the optimization in question.
Another example:
The linux kernel had a bug where a potentially NULL pointer was being dereferenced before a test for that pointer being null. However, in some cases it was possible to map memory to address zero, thus allowing the dereferencing to succeed. The compiler, upon noticing that the pointer was dereferenced, assumed that it couldn't be NULL, then removed the NULL test later and all the code in that branch. This introduced a security vulnerability into the code, as the function would proceed to use an invalid pointer containing attacker-supplied data. For cases where the pointer was legitimately null and the memory wasn't mapped to address zero, the kernel would still OOPS as before. So prior to optimization the code contained one bug; after it contained two, and one of them allowed a local root exploit.
CERT has a presentation called "Dangerous Optimizations and the Loss of Causality" by Robert C. Seacord which lists a lot of optimizations that introduce (or expose) bugs in programs. It discusses the various kinds of optimizations that are possible, from "doing what the hardware does" to "trap all possible undefined behaviour" to "do anything that's not disallowed".
Some examples of code that's perfectly fine until an aggressively-optimizing compiler gets its hands on it:
Checking for overflow
// fails because the overflow test gets removed
if (ptr + len < ptr || ptr + len > max) return EINVAL;
Using overflow artithmetic at all:
// The compiler optimizes this to an infinite loop
for (i = 1; i > 0; i += i) ++j;
Clearing memory of sensitive information:
// the compiler can remove these "useless writes"
memset(password_buffer, 0, sizeof(password_buffer));
The problem here is that compilers have, for decades, been less aggressive in optimization, and so generations of C programmers learn and understand things like fixed-size twos complement addition and how it overflows. Then the C language standard is amended by compiler developers, and the subtle rules change, despite the hardware not changing. The C language spec is a contract between the developers and compilers, but the terms of the agreement are subject to change over time and not everyone understands every detail, or agrees that the details are even sensible.
This is why most compilers offer flags to turn off (or turn on) optimizations. Is your program written with the understanding that integers might overflow? Then you should turn off overflow optimizations, because they can introduce bugs. Does your program strictly avoid aliasing pointers? Then you can turn on the optimizations that assume pointers are never aliased. Does your program try to clear memory to avoid leaking information? Oh, in that case you're out of luck: you either need to turn off dead-code-removal or you need to know, ahead of time, that your compiler is going to eliminate your "dead" code, and use some work-around for it.
When a bug goes away by disabling optimizations, most of the time it's still your fault
I am responsible for a commercial app, written mostly in C++ - started with VC5, ported to VC6 early, now successfully ported to VC2008. It grew to over 1 Million lines in the last 10 years.
In that time I could confirm a single code generation bug thast occured when agressive optimizations where enabled.
So why am I complaining? Because in the same time, there were dozens of bugs that made me doubt the compiler - but it turned out to be my insufficient understanding of the C++ standard. The standard makes room for optimizations the compiler may or may not make use of.
Over the years on different forums, I've seen many posts blaming the compiler, ultimately turning out to be bugs in the original code. No doubt many of them obscure bugs that need a detailed understanding of concepts used in the standard, but source code bugs nonetheless.
Why I reply so late: stop blaming the compiler before you have confirmed it's actually the compiler's fault.
Compiler (and runtime) optimization can certainly introduce undesired behaviour - but it at least should only happen if you're relying on unspecified behaviour (or indeed making incorrect assumptions about well-specified behaviour).
Now beyond that, of course compilers can have bugs in them. Some of those may be around optimisations, and the implications could be very subtle - indeed they're likely to be, as obvious bugs are more likely to be fixed.
Assuming you include JITs as compilers, I've seen bugs in released versions of both the .NET JIT and the Hotspot JVM (I don't have details at the moment, unfortunately) which were reproducible in particularly odd situations. Whether they were due to particular optimisations or not, I don't know.
To combine the other posts:
Compilers do occasionally have bugs in their code, like most software. The "smart people" argument is completely irrelevant to this, as NASA satellites and other apps built by smart people also have bugs. The coding that does optimization is different coding from that which doesn't, so if the bug happens to be in the optimizer then indeed your optimized code may contain errors while your non-optimized code will not.
As Mr. Shiny and New pointed out, it's possible for code that is naive with regard to concurrency and/or timing issues to run satisfactorily without optimization yet fail with optimization as this may change the timing of execution. You could blame such a problem on the source code, but if it will only manifest when optimized, some people might blame optimization.
Just one example: a few days ago, someone discovered that gcc 4.5 with the option -foptimize-sibling-calls (which is implied by -O2) produces an Emacs executable that segfaults on startup.
This has apparently been fixed since.
I've never heard of or used a compiler whose directives could not alter the behaviour of a program. Generally this is a good thing, but it does require you to read the manual.
AND I had a recent situation where a compiler directive 'removed' a bug. Of course, the bug is really still there but I have a temporary workaround until I fix the program properly.
Yes. A good example is the double-checked locking pattern. In C++ there is no way to safely implement double-checked locking because the compiler can re-order instructions in ways that make sense in a single-threaded system but not in a multi-threaded one. A full discussion can be found at http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
Is it likely? Not in a major product, but it's certainly possible. Compiler optimizations are generated code; no matter where code comes from (you write it or something generates it), it can contain errors.
I encountered this a few times with a newer compiler building old code. The old code would work but relied on undefined behavior in some cases, like improperly defined / cast operator overload. It would work in VS2003 or VS2005 debug build, but in release it would crash.
Opening up the assembly generated it was clear that the compiler had just removed 80% of the functionality of the function in question. Rewriting the code to not use undefined behavior cleared it up.
More obvious example: VS2008 vs GCC
Declared:
Function foo( const type & tp );
Called:
foo( foo2() );
where foo2() returns an object of class type;
Tends to crash in GCC because the object isn't allocated on the stack in this case, but VS does some optimization to get around this and it will probably work.
Aliasing can cause problems with certain optimizations, which is why compilers have an option to disable those optimizations. From Wikipedia:
To enable such optimizations in a predictable manner, the ISO standard for the C programming language (including its newer C99 edition) specifies that it is illegal (with some exceptions) for pointers of different types to reference the same memory location. This rule, known as "strict aliasing", allows impressive increases in performance[citation needed], but has been known to break some otherwise valid code. Several software projects intentionally violate this portion of the C99 standard. For example, Python 2.x did so to implement reference counting,[1] and required changes to the basic object structs in Python 3 to enable this optimisation. The Linux kernel does this because strict aliasing causes problems with optimization of inlined code.[2] In such cases, when compiled with gcc, the option -fno-strict-aliasing is invoked to prevent unwanted or invalid optimizations that could produce incorrect code.
Yes, compiler optimizations can be dangerous. Usually hard real-time software projects forbids optimizations for this very reason. Anyway, do you know of any software with no bugs?
Aggressive optimizations may cache or even do strange assumptions with your variables. The problem is not only with the stability of your code, but also they can fool your debugger. I have seen several times a debugger failing to represent the memory contents because some optimizations retained a variable value within the registers of the micro
The very same thing can happen to your code. The optimization puts a variable into a register and do not write to the variable until it has finished. Now imagine how different things can be if your code has pointers to variables in your stack and it has several threads
It's theoretically possible, sure. But if you don't trust the tools to do what they are supposed to do, why use them? But right away, anyone arguing from the position of
"compilers are built by smart people
and do smart things" and thus, can
never go wrong.
is making a foolish argument.
So, until you have reason to believe that a compiler is doing so, why posture about it?
I certainly agree that it's silly to say the because compilers are written by "smart people" that they are therefore infallible. Smart people designed the Hindenberg and the Tacoma Narrows Bridge, too. Even if it's true that compiler-writers are among the smartest programmers out there, it's also true that compilers are among the most complex programs out there. Of course they have bugs.
On the other hand, experience tells us that the reliability of commercial compilers is very high. I've had many many times that someone told me that the reason why is program doesn't work MUST be because of a bug in the compiler because he has checked it very carefully and he is sure that it is 100% correct ... and then we find that in fact the program has an error and not the compiler. I'm trying to think of times that I've personally run across something that I was truly sure was an error in the compiler, and I can only recall one example.
So in general: Trust your compiler. But are they ever wrong? Sure.
It can happen. It has even affected Linux.
As I recall, early Delphi 1 had a bug where the results of Min and Max were reversed. There was also an obscure bug with some floating point values only when the floating point value was used within a dll. Admittedly, it has been more than a decade, so my memory may be a bit fuzzy.
I have had a problem in .NET 3.5 if you build with optimization, add another variable to a method which is named similarly to an existing variable of the same type in the same scope then one of the two (new or old variable) will not be valid at runtime and all references to the invalid variable are replaced with references to the other.
So, for example, if I have abcd of MyCustomClass type and I have abdc of MyCustomClass type and I set abcd.a=5 and abdc.a=7 then both variables will have property a=7. To fix the issue both variables should be removed, the program compiled (hopefully without errors) then they should be re-added.
I think I have run into this problem a few times with .NET 4.0 and C# when doing Silverlight applications also. At my last job we ran into the problem quite often in C++. It might have been because the compilations took 15 minutes so we would only build the libraries we needed, but sometimes the optimized code was exactly the same as the previous build even though new code had been added and no build errors had been reported.
Yes, code optimizers are built by smart people. They are also very complicated so having bugs is common. I suggest fully testing any optimized release of a large product. Usually limited use products are not worth a full release, but they should still be generally tested to make sure they perform their common tasks correctly.
Compiler optimization can reveal (or activate) dormant (or hidden) bugs in your code. There may be a bug in your C++ code that you don't know of, that you just don't see it. In that case, it is a hidden or dormant bug, because that branch of the code is not executed [enough number of times].
The likelihood of a bug in your code is much bigger (thousands of times more) than a bug in the compiler's code: Because the compilers are tested extensively. By TDD plus practically by all people who have use them since their release!). So it is virtually unlikely that a bug is discovered by you and not discovered by literally hundreds of thousands of times it is used by other people.
A dormant bug or hidden bug is just a bug that is not revealed itself to the programmer yet. People who can claim that their C++ code does not have (hidden) bugs are very rare. It requires C++ knowledge (very few can claim for that) and extensive testing of the code. It is not just about the programmer, but about the code itself (the style of development). Being bug-prone is in the character of the code (how rigorously it is tested) or/and the programmer (how disciplined is in test and how well knows C++ and programming).
Security+Concurrency bugs: This is even worse if we include concurrency and security as bugs. But after all, these 'are' bugs. Writing a code that is in the first place bug-free in terms of concurrency and security is almost impossible. That's why there is always already a bug in the code, which can be revealed (or forgotten) in compiler optimization.
More, and more aggressive optimizations could be enabled if the program you compile has a good testing suite. Then it is possible to run that suite and be somewhat more sure the program operates correctly. Also, you can prepare your own tests that match closely that do you plan to do in production.
It is also true that any large program may have (and probably indeed has) some bugs independently on which switches do you use to compile it.
I work on a large engineering application, and every now and then we see release only crashes and other problems reported by clients. Our code has 37 files (out of around
6000) where we have this at the top of the file, to turn off optimization to fix such crashes:
#pragma optimize( "", off)
(We use Microsoft Visual C++ native, 2015, but it is true for just about any compiler, except maybe Intel Fortran 2016 update 2 where we have not yet turned of any optimizations.)
If you search through the Microsoft Visual Studio feedback site you can find some optimization bugs there as well. We occasionally log some of ours (if you can reproduce it easily enough with a small section of code and you are willing to take the time) and they do get fixed, but sadly others get introduced again. smiles
Compilers are programs written by people, and any big program has bugs, trust me on that. The compiler optimization options most certainly has bugs and turning on optimization can certainly introduce bugs in your program.
Everything that you can possibly imagine doing with or to a program will introduce bugs.
Because of exhaustive testing and the relative simplicity of actual C++ code (C++ has under 100 keywords / operators) compiler bugs are relatively rare. Bad programming style often is the only thing encounters them. And usually the compiler will crash or produce an internal compiler error instead. The only exception to this rule is GCC. GCC, especially older versions, had a lot of experimental optimizations enabled in O3 and sometimes even the other O levels. GCC also targets so many backends that this leaves more room for bugs in their intermediate representation.
I had a problem with .net 4 yesterday with something that looks like...
double x=0.4;
if(x<0.5) { below5(); } else { above5(); }
And it would call above5(); But if I actually use x somewhere, it would call below5();
double x=0.4;
if(x<0.5) { below5(); } else { System.Console.Write(x); above5(); }
Not the exact same code but similar.

In vxworks, should every task be spawned with VX_FP_TASK option?

In vxworks, should every task be spawned with VX_FP_TASK option?
The VX_FP_TASK option is required if your task uses any floating point operations. But how does one predict the future - i mean, how can one know if he/she will use float or not?
While fixing any bug or introducing new code, should the programmer find which all tasks will get effected by his/her code chage and if that task is spawned with this option or not? This is very tedious. Am I missing something?
VX_FP_TASK forces the task context switch to include the FP registers. This increases context switch time. If in your application time, deadlines and performance targets can be met even with this overhead, then there is little problem I suggest is doing this. Not having VX_FP_TASK might be considered an optimisation to be applied with care only if and when necessary. So if the default case is to use VX_FP_TASK, you will probably have less checking to do in the few cases where you might need to optimise performance, since often optimisation is unnecessary to achieve the required results. If the context switch performance overhead this imposes makes or breaks your project, it may be marginal in any case.
On the other hand although in embedded systems FPUs are becoming more common, it is also common for embedded systems designers to use FP as the exception rather than the rule because of the traditional lack of hardware FP support. One solution is therefore to have an in-house design rule that floating point shall not be used without formal justification and sign-off: i.e. use of floating point must be in the design, rather than a programmer decision. Checking is generally a simple case of scanning the source for float, double, and math.h. (since it is probably difficult to use floating point without either of these occurring in the code). You might for example add a pre-build static analysis check that looks for these and flags a warning.
In many applications it is possible to design so that FP math operations are naturally confined to specific tasks. A problem occurs however when someone chooses to use an existing function intended for use in one of these tasks in another that is not FP safe. This may be difficult to spot; a solution to this is to have functions that use floating point and which may be used in other tasks to include a debug ASSERT that tests the task options using taskOptionsGet().
So a combination of scanning for use of float, double, and math.h, and adding an ASSERT check to the functions that uses these will probably protect you from introducing errors in code maintenance.
[added 2010Feb14]
As much as complex macros are generally a bad thing, I suggest that the following may be useful (as alluded to above):
#if NDEBUG
#define ASSERT_FP_SAFE() ((void) 0)
#else
#define ASSERT_FP_SAFE() do{ int opt; \
STATUS st = taskGetOptions( taskIdSelf(), &opt ); \
assert( st == OK && (opt & VX_FP_TASK) != 0 ) ; \
}while(0) ;
#endif
This macro should be inserted in any function that uses float or double, or which includes <math.h> or any other FP dependent library you may use (which you can achieve by textual search). The assertion will then fail when such a function is called from a non-FP task.
Note the check of the return from taskGetOptions() will catch use of floating point in interrupt contexts. Although if the assert occurs in an interrupt, you may not get any output. A call to logMsg() may be safer perhaps; you could use that if st != OK and assert() otherwise.
Unfortunately it is a run-time assertion, so the code has to run for it to be checked. It would be better if it could be detected through static analysis, but I cannot think of a simple method. If however you also use code coverage analysis, then this may be sufficient. It may be a good habit even if you do choose to make all tasks VX_FP_TASK; that way if anyone forgets to do one or the other, you have a chance of catching it.
From experience I can give you a simple answer: Always spawn a task with VX_FP_TASK. Especially if your code could be used for different architectures.
Depending on the compiler (gnu, diab), the compiling flags that you use, and the architechture, the floating point registers can be used for more than just floating point operations. In most architectures FP registers are bigger than regular registers, so they turn into perfect candidates for optimizing code.
For example, in PPC603 processors, if you use C++ instead of plain C, the FP registers will be used for optimization, and if you don't have VX_FP_TASK enabled on that task it could corrupt the FP registers of another task, even though it's not making any calculations!
Correct execution is more important than performance, and most times the performance gain doesn't justify the risk introduced by not enabling it.
If you want to ensure that all tasks have the flag enabled, consider adding a hook that always enables the flag during task creation with taskCreateHookAdd( )
ALWAYS use VX_FP_TASK! The cost of not having it, and trying to track down the erratic errors which result is unbelievably expensive.