Are there modern compilers for high level languages on simple processors which produce self-modifying code? - optimization

Back in the days before caches and branch prediction, it was relatively common if not encouraged to make self-modifying code for certain kinds of optimizations. It was probably most common in games and demos written in assembler in the eras between 8-bit up to early 32-bit such as the Amiga.
I'm not sure if any compilers from those days emitted self-modifying assembler or machine code.
What I'm wondering is whether there are any current/modern compilers that do. Obviously it would be useless or too difficult on powerful processors with caches.
But What about the very many simple processors such as used in embedded systems? Is self modifying code considered a viable optimization strategy by any modern compilers for any simple / 8-bit / embedded processor?
There is a question with a similar title, "Is there any self-improving compiler around?
", but note that it's not about the same subject:
Please note that I am talking about a compiler that improves itself - not a compiler that improves the code it compiles.

All embedded systems today use flash ROM. I believe Amiga and similar were RAM-based systems. The only way "self-modification" exists in embedded systems, is where you have boot loaders, that re-program certain parts of the flash depending on what program and/or functionality that should be used.
Apart from that, it doesn't make sense for the program to modify itself in runtime. Running code from RAM is generally discouraged, for safety reasons (potential of accidental modifications caused by bugs) and for electric reasons (RAM is volatile and far more sensitive to EMC than flash).

Related

What's the motivation in using Verilog or VHDL over C?

I come from a programming background and not messed around too much with hardware or firmware (at most a bit electronics and Arduino).
What is the motivation in using hardware description languages (HDL) such as Verilog and VHDL over programming languages like C or some Assembly?
Is this issue at all a matter of choice?
I read that hardware, which its firmware is written in an HDL, has a clear advantage in running instructions in parallel. However, I was surprised to see discussions expressing doubts whether to write firmware in C or Assembly (how is Assembly appropriate if you don't necessarily have a CPU?) but I concluded it's also an option.
Therefore, I have a few questions (don't hesitate to explain anything):
A firmware really can be written either in HDL or in a software programming language, or it's just another way to perform the same mission? I'd love to real-world examples. What constraints resulting from each option?
I know that a common use of firmware over software is in hardware accelerators (such as GPUs, network adapters, SSL accelerators, etc). As I understand it, this acceleration is not always necessary, but only recommended (for example, in the case of SSL and acceleration of complex algorithms). Can one choose between firmware and software in all cases? If not, I'd be happy to cases in which firmware is clearly and unequivocally appropriate.
I've read that the firmware mostly burned on ROM or flash. How it is represented in there? In bits, like software? If so, what's the profound difference? Is it the availability of adapted circuits in the case of firmware?
I guess I made a mistake here and there in some assumptions, please forgive me. Thank you!
The term "firmware" is at best ill defined, and I believe that is probably the cause of your confusion.
Historicallym - before the availability of programmable logic devices - the term "firmware" has been used to refer to code stored-in and executed-from read-only memory (ROM). At a time when the only available ROM technology was mask-ROM where the code was burned into the device at manufacture of the silicon and therefore unchangeable without replacing the chip - that was pretty "firm". Even with later programmable read-only memory (PROM), which could be programmed post-manufacture, because it was one-time programmable (OTP), the term still applied.
With the introduction of UV erasable EEPROM, firmware became perhaps less "firm", but the lack of in-circuit programmability and the need to expose the device to UV to erase it still made replacement of the embedded software a chore - normally requiring removal of the chip, placing it in the eraser for an hour or so, then programming it in a dedicated programmer.
The advent of NOR Flash memory, where code could be stored and executed directly from the device, but also readily changed in-circuit, the term firmware in this context has become less common. However it is still used (perhaps mainly by older practitioners) to refer to embedded software stored and executed from a random-access, read-only memory device as opposed to loaded into RAM from a file system.
The use for the term firmware to refer to programmable logic configuration is newer and has probably come about simply because it is hardware, but the configuration is written much like software using a high-level language.
The upshot of this is that you do not choose
"Verilog and VHDL over programming languages like C or some Assembly"
because in each context the term firmware simply refers to a different concept.
It would be best to avoid the term firmware altogether as it means different things to different people or in different contexts.
There is perhaps some further confusion form the fact that some hardware description languages are based on software development languages - such as Handle C, which is a C-like hardware description language.
This question would not have much arise some time ago, but with current platforms you can now translate between C and HDL languages (instead of using Handel-C extension of C, from the 90's), mainly between C and behavioral VHDL. And a lot of newer tools provided by enterprises, like Xilinx Electronic System Level Design Ecosystem, or Impulse-C (http://www.impulseaccelerated.com/products_universal.htm)
It is important to know though that as C is a middle level language, and VHDL is, as stated a Hrdware Description Language. C can only handle sequential instructions while VHDL allows both sequential and concurrent executions.
And even though a C program can be successfully written with pure logical or algorithmic thinking,
a successful VHDL programmer needs thorough working knowledge of the hardware circuits. being able to predict how a given code will be implemented in hardware.
In both languages you care about the resource usage, but in a different way (Unless you are programming for resource-constrained devices). But when it comes to VHDL, apart from the memory, other logic elements are limited in a FPGA (where you normally put the VHDL code in).

llvm/tools: lli REPL compared to LuaJIT

I was wondering if someone has had experience with the llvm/tools - lli interpreter/JIT-compiler (cf. http://llvm.org/docs/GettingStarted.html#tools). I am interested in any information that you can provide (speed, complexity, implementations, etc.).
Thanks.
UPDATE:
Okay how would bitcode execution be compared to LuaJIT VM execution, supposing that lli acts as an interpreter? What about when lli acts as a jit-compiler (same comparison)?
NOTE:
I am only asking if anyone has experience/ is willing to spare some time to share.
LuaJIT is a tracing JIT, which means it can re-optimize itself to better suite the data passed through the execution environment, however, LLVM is a static JIT, and thus will just generate the once-off best-case machine code for the corresponding source, which may leading it it loosing performance in tight loops or bad branch misspredictions.
The actual LuaJIT VM is also highly optimized, threaded, machine specific assembly, where as LLVM uses C++ for portability (and other reasons), so this obviously gives LuaJIT a huge advantage. LLVM also has a much higher overhead than LuaJIT, purely because LuaJIT was designed to work on much less powerful systems (such as those driven by ARM CPU's).
The LuaJIT bytecode was also specially designed for LuaJIT, where as LLVM's bitcode is a lot more generic, this will obviously make LuaJIT's execute faster. LuaJIT's bytecode is also well designed for encoding optimization hints etc for use by the JIT and the tracer.
ignoring the fact that these are two different types of JITs, the whole comparison boils down to LLVM is focused on being a generic JIT/Compiler backend, LuaJIT is focused on executing Lua as fast as possible in the best way possible, thus it gains from not being constrained by generality.

Points to be considered while designing or coding for lesser footprint deliverables

Please post the points one should keep in mind while designing or coding for lesser footprint deliverables for embedded systems.
I am not giving compiler or platform details, as I want generic information. But, any specific information on Linux based OS is also welcome.
Depends on how low you want to get. I'm currently coding for fiscal printers, and there's no OS, and the main rule is no dynamic memory allocation. The funny thing is that I still convinced the crew to code fully modern C++ ;).
Actually there are a few rules we decided upon:
no dynamic allocation
hence, no STL
no exception handling (obvious reasons)
There isn't a general answer, only ones specific to language/platform ... but
Small memory footprint ...
Don't use Java, C#/mono, PHP, Perl, Python or anything with garbage collection
Get as close to the metal as feasible, Use C
Do alot of profiling to see where memory is getting allocated, if you are using dynamic allocation
Ensure you prevent heap-fragmentation by allocating sensible chunks and sizes of the heap
Avoid recursive functions especially those that use malloc(). Better allocating a chunk and passing a pointer around.
use free() ;)
Ensure your types are no bigger than required
Turn on compiler optimizations
There will be more.
for real low footprint consider doing Assembly directly.
We all know that Hello World in C or C++ is 20kb+(because of all the default libraries which get linked). In Assembly this overhead is gone. As pointed out in the comments one can reduce the standard libraries quite a bit. However, the fact remains that the code density you can get when coding assembly is much higher than a compiler will generate from a higher language. So for code where every byte matters, use assembly.
also when programming on devices with less capable processors, programming in assembly language might be your only way to do make the program fast enough for it to be realtime enough to (for instance) control machines
When faced with such constraints, it is advisable to pre-allocate memory in order to guarantee that the system will work under load. A design pattern such as "object pooling" can be used to share resources within the system.
The C language enables tight resource (i.e. memory & compute cycles) control. It should be strongly considered.
Avoid recursion as it is easy to abuse and can result in stack overflow conditions.

Are there any FreeRTOS interpreted language libraries available?

I work for a company that created firmware for several device using FreeRTOS. Lately our request for new features has surpassed how much work our firmware engineers are capable of, but we can't afford to hire anyone new right now either. Making even tiny changes requires firmware people to go in and modify things at a very low level.
I've been looking for some sort of interpreted language project for FreeRTOS that would let us implement new features at a higher level. Ideally I would like to get things eventually so the devices become closer to generic computers with us writing drivers, rather than us having to implement every feature ourselves.
Are there any FreeRTOS projects that interpret java, python or similar bytecode?
I've looked on google, but since I'm not a firmware engineer myself I'm not sure if I'm looking for the right keywords.
Thanks everyone
I don't think the RTOS, or even the OS, matters too much here if the code is portable. Depending on your input & output scheme, you'll probably need to do a little porting.
Regarding embeddable scripting languages, the 2 I'm familiar with are LUA and PAWN.
I think there are versions of Python & other such languages ported to embedded systems, but they tend to be the embedded Linux variety. Depending on your platform (no idea if it's a little MCU with 8K ROM or an embedded PC) that might be an option.
There are no interpreted languages out there that are "made" to use FreeRTOS, or any other microcontroller threading library (loosely called an 'RTOS' within the e2e community).
However, languages which I have first hand experience using in embedded systems that are (a) written in C, and (b) small enough to embedded in a microcontroller include:
LUA (suitable for almost anything, even some PICs)
Python (suitable for most ARM architectures, anyways, with more than 1mb ram)
I do not have first-hand experience with it, but Ruby may be as easy to embed as Python.
Instead of looking for FreeRTOS-specific interpreters, you might try looking for any interpreters for your particular microcontroller, or microcontroller in general. It might be possible to interface them with FreeRTOS or turn the interpreter into a task.
There seems to be someone trying to go for Lua on FreeRTOS (pic32).
I guess your question boils down ultimately to finding ways of increasing the level of abstraction above the low-level RTOS mechanisms. While it is perhaps true that interpreted languages work at somewhat higher level of abstraction than C, you can do much better than that by applying methods based on event-driven frameworks and state machines. Such event-driven frameworks have been around for decades and have been proven in countless embedded systems in all sorts of domains. Today, virtually every modeling tool for embedded systems capable of code-generation (e.g., Rational-Rose RT, Rhapsody, etc.) contains a variant of such a state-machine framework.
But event-driven, state-machine frameworks can be used also without big tools. The QP state machine frameworks (state-machine.com), for example, do everything that a conventional RTOS can do, only more efficiently, plus many things that an RTOS can't.
When you start using modern event-driven programming paradigm with state machines, your problems will change. You will no longer struggle with 15 levels of convoluted if-else statements, and you will stop worrying about semaphores or other such low-level RTOS mechanisms. Instead, you'll start thinking at a higher level of abstraction about state machines and events exchanged among them. After you experience this quantum leap, you will never want to go back to the raw RTOS and the spaghetti code.

Optimisation , Compilers and Its Effects

(i) If a Program is optimised for one CPU class (e.g. Multi-Core Core i7)
by compiling the Code on the same , then will its performance
be at sub-optimal level on other CPUs from older generations (e.g. Pentium 4)
... Optimizing may prove harmful for performance on other CPUs..?
(ii)For optimization, compilers may use x86 extensions (like SSE 4) which are
not available in older CPUs.... so ,Is there a fall-back to some non-extensions
based routine on older CPUs..?
(iii)Is Intel C++ Compiler is more optimizing than Visual C++ Compiler or GCC..
(iv) Will a truly Multi-Core Threaded application will perform effeciently on a
older CPUs (like Pentium III or 4)..?
Compiling on a platform does not mean optimizing for this platform. (maybe it's just bad wording in your question.)
In all compilers I've used, optimizing for platform X does not affect the instruction set, only how it is used, e.g. optimizing for i7 does not enable SSE2 instructions.
Also, optimizers in most cases avoid "pessimizing" non-optimized platforms, e.g. when optimizing for i7, typically a small improvement on i7 will not not be chosen if it means a major hit for another common platform.
It also depends in the performance differences in the instruction sets - my impression is that they've become much less in the last decade (but I haven't delved to deep lately - might be wrong for the latest generations). Also consider that optimizations make a notable difference only in few places.
To illustrate possible options for an optimizer, consider the following methods to implement a switch statement:
sequence if (x==c) goto label
range check and jump table
binary search
combination of the above
the "best" algorithm depends on the relative cost of comparisons, jumps by fixed offsets and jumps to an address read from memory. They don't differ much on modern platforms, but even small differences can create a preference for one or other implementation.
It is probably true that optimising code for execution on CPU X will make that code less optimal on CPU Y than the same code optimised for execution on CPU Y. Probably.
Probably not.
Impossible to generalise. You have to test your code and come to your own conclusions.
Probably not.
For every argument about why X should be faster than Y under some set of conditions (choice of compiler, choice of CPU, choice of optimisation flags for compilation) some clever SOer will find a counter-argument, for every example a counter-example. When the rubber meets the road the only recourse you have is to test and measure. If you want to know whether compiler X is 'better' than compiler Y first define what you mean by better, then run a lot of experiments, then analyse the results.
I) If you did not tell the compiler which CPU type to favor, the odds are that it will be slightly sub-optimal on all CPUs. On the other hand, if you let the compiler know to optimize for your specific type of CPU, then it can definitely be sub-optimal on other CPU types.
II) No (for Intel and MS at least). If you tell the compiler to compile with SSE4, it will feel safe using SSE4 anywhere in the code without testing. It becomes your responsibility to ensure that your platform is capable of executing SSE4 instructions, otherwise your program will crash. You might want to compile two libraries and load the proper one. An alternative to compiling for SSE4 (or any other instruction set) is to use intrinsics, these will check internally for the best performing set of instructions (at the cost of a slight overhead). Note that I am not talking about instruction instrinsics here (those are specific to an instruction set), but intrinsic functions.
III) That is a whole other discussion in itself. It changes with every version, and may be different for different programs. So the only solution here is to test. Just a note though; Intel compilers are known not to compile well for running on anything other than Intel (e.g.: intrinsic functions may not recognize the instruction set of a AMD or Via CPU).
IV) If we ignore the on-die efficiencies of newer CPUs and the obvious architecture differences, then yes it may perform as well on older CPU. Multi-Core processing is not dependent per se on the CPU type. But the performance is VERY dependent on the machine architecture (e.g.: memory bandwidth, NUMA, chip-to-chip bus), and differences in the Multi-Core communication (e.g.: cache coherency, bus locking mechanism, shared cache). All this makes it impossible to compare newer and older CPU efficiencies in MP, but that is not what you are asking I believe. So on the whole, a MP program made for newer CPUs, should not be using less efficiently the MP aspects of older CPUs. Or in other words, just tweaking the MP aspects of a program specifically for an older CPU will not do much. Obviously you could rewrite your algorithm to more efficiently use a specific CPU (e.g.: A shared cache may permit you to use an algorithm that exchanges more data between working threads, but this algo will die on a system with no shared cache, full bus lock and low memory latency/bandwidth), but it involves a lot more than just MP related tweaks.
(1) Not only is it possible but it has been documented on pretty much every generation of x86 processor. Go back to the 8088 and work your way forward, every generation. Clock for clock the newer processor was slower for the current mainstream applications and operating systems (including Linux). The 32 to 64 bit transition is not helping, more cores and less clock speed is making it even worse. And this is true going backward as well for the same reason.
(2) Bank on your binaries failing or crashing. Sometimes you get lucky, most of the time you dont. There are new instructions yes, and to support them would probably mean trap for an undefined instruction and have a software emulation of that instruction which would be horribly slow and the lack of demand for it means it is probably not well done or just not there. Optimization can use new instructions but more than that the bulk of the optimization that I am guessing you are talking about has to do with reordering the instructions so that the various pipelines do not stall. So you arrange them to be fast on one generation processor they will be slower on another because in the x86 family the cores change too much. AMD had a good run there for a while as they would make the same code just run faster instead of trying to invent new processors that eventually would be faster when the software caught up. No longer true both amd and intel are struggling to just keep chips running without crashing.
(3) Generally, yes. For example gcc is a horrible compiler, one size fits all fits no one well, it can never and will never be any good at optimizing. For example gcc 4.x code is slower on gcc 3.x code for the same processor (yes all of this is subjective, it all depends on the specific application being compiled). The in house compilers I have used were leaps and bounds ahead of the cheap or free ones (I am not limiting myself to x86 here). Are they worth the price though? That is the question.
In general because of the horrible new programming languages and gobs of memory, storage, layers of caching, software engineering skills are at an all time low. Which means the pool of engineers capable of making a good compiler much less a good optimizing compiler decreases with time, this has been going on for at least 10 years. So even the in house compilers are degrading with time, or they just have their employees to work on and contribute to the open source tools instead having an in house tool. Also the tools the hardware engineers use are degrading for the same reason, so we now have processors that we hope to just run without crashing and not so much try to optimize for. There are so many bugs and chip variations that most of the compiler work is avoiding the bugs. Bottom line, gcc has singlehandedly destroyed the compiler world.
(4) See (2) above. Don't bank on it. Your operating system that you want to run this on will likely not install on the older processor anyway, saving you the pain. For the same reason that the binaries optimized for your pentium III ran slower on your Pentium 4 and vice versa. Code written to work well on multi core processors will run slower on single core processors than if you had optimized the same application for a single core processor.
The root of the problem is the x86 instruction set is dreadful. So many far superior instructions sets have come along that do not require hardware tricks to make them faster every generation. But the wintel machine created two monopolies and the others couldnt penetrate the market. My friends keep reminding me that these x86 machines are microcoded such that you really dont see the instruction set inside. Which angers me even more that the horrible isa is just an interpretation layer. It is kinda like using Java. The problems you have outlined in your questions will continue so long as intel stays on top, if the replacement does not become the monopoly then we will be stuck forever in the Java model where you are one side or the other of a common denominator, either you emulate the common platform on your specific hardware, or you are writing apps and compiling to the common platform.