Equivalents to gcc/clang's march=native in other compilers? - cmake

I'd like to know if there are other compilers than gcc and clang that provide something like an -march=native option, and if so, what that option is. I already understand from another question (Automatically building for best available platform in visual c++ (equivalent to gcc's -march=native)) that Microsoft's compilers do not have that option (unless it's implied in the option that activates the SSE2 instruction set, up to and excluding AVX and higher at least).
The use case is simple: provide a cmake set-up and thus the user with an option to activate and build with support for all the "intrinsics" his or her CPU supports. We currently have detection logic for the actual intrinsics we target (e.g. SSE4.2 and/or PCLMUL on x86) but that logic will probably get very complex when more platforms and compilers have to be taken into consideration. Simplifying them could lead to situations where the compiler starts to use unsupported instruction sets outside of the intended places protected by runtime checks.

Currently, the Microsoft Visual C++ compiler doesn't provide equivalent flags to march=native. You'll have to figure out the appropriate flags manually or using a script before building the code.
Regarding the Intel C++ compiler, the xHost and QxHost flags have basically the same purpose.

Related

How to make a normal C library work with embedded environment?

I was recently asked about how to use a C library (Cello in this case) in an embedded environment, but I'm not sure how to go about that.
Is it correct to say that if a library can be compiled in the embedded environment, it can be used?
Should I care about making the library more lightweight or something like that?
Any suggestions are appreciated.
To have it compile is the bare minimum. Notably most embedded systems are freestanding systems, such as microcontroller and RTOS applications. Compilers for freestanding systems need not provide all standard library headers, the only mandatory ones are (C17 4/6):
<float.h>, <iso646.h>, <limits.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>,
<stddef.h>, <stdint.h>, <stdnoreturn.h>
In addition, the embedded system need not support floating point arithmetic. Some systems implement software floating point support, but using that is very bad practice. If your MCU does not have a FPU, you should not be using floating point arithmetic, or you picked the wrong MCU for the task, period.
"I need to represent this number with decimals internally or to the user" is not a valid reason for using floating point. Fixed point arithmetic should be used for that. You only need floating point if you are to use math libraries like math.h and more advanced math.
Traditionally, embedded system compilers have been slow to adapt the latest C standard. It's been quite a while since C11 release now though, so at the moment all useful compilers have caught up with it (C17 only contains minor changes so we can likely ignore that one). Historically, embedded compilers have been horribly bad at this though, so remain sceptical. There shouldn't be any reason to pick a compiler without C11 support for new product development.
Summary for getting the lib to compile (bare minimum):
Does the library use hosted system headers, and if so does the embedded compiler support them?
Does the library use floating point and if so does the target system have a FPU, or at least a software floating point lib?
Does the library rely on the latest C standards and if so does the embedded compiler support them?
With that out of the way, you have to consider if the library is at all written to be portable. Did they take care with things like integer types, enums and alignment? Are they using stdint.h or are they using "sloppy typing" int all over the place? Did they consider endianess? Is the lib using dynamic allocation, which is banned in most embedded systems? Is it compatible with industry standards like MISRA-C? And so on.
Then there's optimizations to consider on top of that. Optimizing code for microcontrollers is very different than optimizing code for PC CPUs.
A brief glance at the various "compiler switches" (#ifdef) present usually gives a clue of how portable the code is. Looking (very briefly) at this cello lib, they seem to have considered porting between mainstream x86 systems but that's it. You would have to rewrite pretty much the whole lib if you were to port it to an embedded system. The work effort depends on how alien the target CPU is compared to x86. Porting to a high end Cortex-A with Little Endian might not require much effort. Porting to some low-end crap MCU would require a monumental effort.
Code portability is a big topic and requires very competent C programmers. To make the very same code run on for example a x86-64 and a crappy 8-bit MCU is not a trivial task.
Professional libs like protocol stacks usually come with a system port for a specific MCU, where they have not just taken generic portability in account, but also the specific system.
Not all libraries that can be compiled, can be used in embedded environments. Libraries that use malloc and free (or their C++ counterparts) are dangerous and therefore should be handled with care. These libraries can result in undeterministic behaviour because of memory allocations failing.
It is possible that the standard C STD could be wholly compiled for embedded devices but that doesn't mean that you'll have much use for printf or scanf. So a better question before you ask if you can compile it is should you use it. Cello seems like a fun experiment but isn't a stable platform to develop something real on. It can be done though and an example of that is the Espruino.
Most of the time it is a bad idea to rewrite a library to be 'lightweight' or more importantly in an embedded environment: statically allocated. You are probably not as smart as those people or won't put in the time needed to create a complete functional embedded fork which is as stable as the original or even better. Don't be dissuaded for a fun little side project but don't depend on it for a real project.
Another problem could be that the library is too big for your microcontroller. The Atmega32a only has 32KB of programmable flash. To take a C++ example of the top of my head: boost won't fit in that space even for all the highly useable tools that it provides.

Is C++CLI optimized?

If I write a program in C++CLI / managed C++, does the compiler perform any optimizations?
I know that for C#, there are some optimizations done at compile time, with most optimizations being done by the JIT. Is the same true for C++CLI?
A similar question: can I do the equivalent of an -O2 flag, for C++CLI? I already know about the "-c Release" flag, but I'm unclear on what kind of optimizations it does.
Thanks!
C++/CLI code is always optimized in the Release build, yes. By whom is the key, you can freely mix as you dare. This tends to go wrong with too much native C++ code getting compiled to MSIL. Hard to notice, the code generator can handle any compliant C++03 code and rarely squeals about any C++1x incantations.
A good reminder that the jitter isn't that much different from a C++ compiler's back-end. MSIL compares pretty well to, say, the IR that LLVM needs. The IR that the MSVC++ compiler uses for native code isn't documented and not visible.
Which makes it a good practice to isolate the native C++ you wrap in its own static library or DLL. But mixing at the function-level is possible, you can switch back-and-forth with #pragma un/managed.
So it is much like you'd guess, #pragma unmanaged code gets the full optimizer love and #pragma managed gets optimized at runtime by the jitter. You'll find jitter optimizations documented in this post.
When generating native code, the C++/CLI compiler supports the same optimizations as Microsoft's native C++ compiler.
When generating MSIL, the C++/CLI compiler supports a smaller number of optimizations (but still more than C#), and then another optimization pass takes place during JIT (same JIT and same JIT-time optimizations as apply to C#).
For example, loop unrolling is possible when generating MSIL, but auto-vectorization is not, because MSIL doesn't have SIMD instructions. Vectorization may theoretically still be done by the JIT, but in practice the resource constraints of a JIT mean that optimization is less effective.
In addition, there are some optimizations possible for C++ but not C# due to language design. For example, C++ templates (including in C++/CLI) are compiled for each combination of template arguments, while .NET generics (including in C# and in C++/CLI) are fully resolved only based on the generic constraints.

Is it possible to embed LLVM Interpreter in my software and does it make sense?

Suppose I have a software and I want to make cross-plataform plugins. You compile the plugin for a virtual machine, and any platform running my software would be able to run this code.
I am wondering if it is possible to use LLVM interpreter and bytecode for this purpose. Also, I am wondering if does make sense using LLVM for this purpose instead of something else, i.e. is it what LLVM was made for?
I'm not sure that LLVM was designed for it. However, I doubt there is anything that hasn't been done using LLVM1
Other virtual-machines based script engines are specifically created for the job:
LUA is very popular
Wikipedia lists some other Extension/embeddable languages under the Scripting language entry
If you're looking for embeddable virtual machines:
IKVM supports embedding JVM and CLR in a bridged mode (interoperable)
Parrot supports embedding (and includes a Python interpreter; mind you, you can just run python bytecode images)
Perl has similar architecture and supports embedding
Javascript supports embedding (not sure about the architecture of v8, but I guess it would use a virtual machine)
Mono's CLR engine supports embedding: http://www.mono-project.com/Embedding_Mono
1 including compiling c++ information to javascript to run in your browser...
There is VMIR (https://github.com/andoma/vmir) which is a LLVM bitcode interpreter / JIT engine that's intended to be embedded into other apps.
Disclaimer: I'm the author of it and it's still work-in-progress but works reasonable well.
In theory, there exist a limited subset of LLVM IR which can be portable across various platforms. You shall not specify alignments, you shall not bitcast pointers to integral types, you must avoid intrinsics, etc. Which means - you can't immediately use a code generated by a stock C compiler (llvm-gcc, Clang, whatever), unless you specify a limited target for it and implement sanitising LLVM passes. Another issue is that the bitcode format from different LLVM versions is not guaranteed to be compatible.
In practice, I would not go there. Mono is a reasonably small, embeddable, fast VM, and all the .NET stack of tools is available for it. VM itself is pretty low-level (as long as you do not care about the verifyability).
LLVM includes an interpreter, so if you can build this interpreter for your target platforms, you can then evaluate LLVM bitcode on the fly.
It's apparently not so fast though.
In their classic discussion (that you do not want to miss if you're a fan of open source, LLVM, compilers) about LLVM vs libJIT, that has happened long before LLVM became famous and established, the author of libJIT Rhys Weatherley raised this particular issue, he stated that LLVM is not suitable to be embedded, while Chris Lattner, the author of LLVM stated that otherwise, it is modular and you can use it in any possible fashion including embedding only the parts you need.

Can I use variadic templates (but none of the other c++0x features) in g++?

The thinking is that since variadic templates are a compile time feature, there will be little ABI impact or runtime behaviour change. Is this possible?
I specifically want the benefit of faster compile times for boost::mpl::vector and boost::mpl::string.
Rephrasing the question...
Is it possible to mix c++03 and c++11 code by separating them into libraries? I.e. we use quite a few 3rd party c++ libraries which are compatible with gcc 4.3 but we are moving on too gcc 4.7 and intend to use c++11 features where possible/makes sense. Or is it impossible to mix c++11 and c++03?
You should compile and link everything using the same tools running in compatible modes. You can't cherry-pick features like this.
The ABI impact comes in, for example, increased virtual function tables for standard I/O classes. It is not safe to mix things around.
I cant give a qualified answer, but from what I understood is, that lots of people would be concerned if this kind of backward-compatibility would be broken. As far as I understood there is nothing in the new C++11 that makes it necessary to rebuild everything. Thus, it could only be your specific compiler that would make that necessary. For the GCC I dont't expect it, although, the different libstdc++ versions could create "issues".
My strong guess is, that on a typical (intel-) linuxes you should be able to create two independent libs with different decently new versions of the gcc (maybe >4.x) and use/link them into a final program. You may have some things in there twice, though. I had some minor solvable issues with threads in 4.7.0 and <thread>. I don't know if they would create a good or bad mix with other thread-libs (eg. boost). However, you don't want to use gcc-4.7.0 for your production code, yet. And before a final gcc compiler is out, only a statement from the responsible projects team can give you certainty.

Where is the VM in LLVM?

Note: marked as community wiki.
Where is the Low Level Virtual Machine in LLVM?
I see that we have llvm-g++ and c-lang, but to me, a LLVM is something almost like Valgrind of a simulator, where instructions are executed on it, and I can write programs to instrument the running code / interrupt when certain conditions happen / etc ...
Where are the tools like this built on LLVM?
Thanks!
I think you're looking for QEMU, not LLVM.
The low-level virtual machine in LLVM is that, after converting the higher-level C and C++ language input into an internal low-level representation (as a stage in the normal compiling process), it can then save this low-level representation and execute it on a JIT compiler (which thus acts somewhat like a virtual machine). This JIT compiler does a substantial amount of optimization, and so I expect it would be difficult to instrument in quite the form that you're thinking of -- in particular, it does not do instruction-by-instruction stepping through the execution.
QEMU, by contrast, is an open-source emulator that does instruction-by-instruction stepping through of machine code. It already contains a certain amount of ability to instrument code to look for certain conditions, in that it can connect to GDB and set watchpoints and so forth, which are implemented in QEMU itself.
To use LLVM for running x86 code you should check libCPU or outdated llvm-qemu.
Look at running x86 program _on_ llvm