Understanding Red/System compilation optimizations compared to GCC - elf

While reading the Red site, I came across a statement stating that compiling an hello world Red/System program creates..
"...a 162 bytes ELF binary, while a similar C code would
produce a 5-6KB binary using Gcc"
That's amazing. Can someone explain/point me to the techniques making such optimizations possible?

It was achieved by having an almost empty runtime library (just a few syscalls wrappers) and an ELF emitter which would not align sections on 4K page boundaries for optimal loading.
Though, that was true for Red/System 0.1.0, so in its very early days, not true anymore (now it would be similar to other compilers). Though we still have a -r compilation option for compiling Red/System code without any runtime, but as nobody uses it, it might not work anymore (should be easy to fix anyway if someone needs it, just drop by Red's chat and ask for it. ;-))

Related

Compiling Pascal code for embedded system (AT89C51RC2)

I am working on making a pretty trivial change to an old existing pascal source file. I have the source code, but need to generate a new hex file with my changes.
First, I tried compiling with "Embedded Pascal", which is the program used by my predecessor. Unfortunately, it is an unregistered copy and gives the message that the file is too large for the unregistered version. Support for and even the homepage for the project has disappeared (old), so I have no idea how I would register.
I tried a couple other compilers, "Free Pascal" and "Turbo51", and they are both giving similar errors:
Filename.pas (79): Error 36: BEGIN expected.
Linkcode $2E
^
The source code begins with
Linkcode $2E
LinkData $0A // normally 8 - make room for capacitance data
Program Main; Vector LongJmp Startup_Vector; //This inserts the start to the main routine.
uses IntLib;
I'm not well-versed in Pascal or embedded programming, but as I understand it, the Linkcode and LinkData lines are required to set up the RAM as needed. Following the "Const" and "var" declarations are subroutines that indeed start with procedure... begin... end.
I realize that Pascal is a bit out of date, but we are stuck with it and our old micro. Any ideas why previously working source code with trivial changes cannot be compiled? I am willing to consider other compilers, including paid options, if any are available with decent support. I am using Windows 10 x64 processor to compile, and flashing to an Atmel 89C51RC2.
If more source code is needed for diagnosis, please let me know what in particular, as I'll need to change some proprietary information before posting. Thanks!
Statements like linkcode and linkdata are not general, but target and compiler specific. Unless you have the know-how to reengineer to a different compiler, getting the original one is best.
Thanks to all for the information. While I didn't find an exact solution here, your comments were helpful for me to understand just how compiler-specific the Pascal code was.
In the end, I was able to get into my predecessors files and transfer registration, solving the issue for now. As suggested, I think I will port to C in the future to avoid fighting all the unsupported compiler nonsense.

Ways to make a D program faster

I'm working on a very demanding project (actually an interpreter), exclusively written in D, and I'm wondering what type of optimizations would generally be recommended. The project makes heavy use of GC, classes, asssociative arrays, and pretty much anything.
Regarding compilation, I've already experimented both with DMD and LDC flags and LDC with -flto=full -O3 -Os -boundscheck=off seems to be making a difference.
However, as rudimentary as this may sound, I would like you to suggest anything that comes to your mind that could help speed up the performance, related or not to the D language. (I'm sure I'm missing several things).
Compiler flags: I would add -mcpu=native if the program will be running on your machine. Not sure what effect -Os has in addition to -O3.
Profiling has been mentioned in comments. Personally under Linux I have a script which dumps a process's stack trace and I do that a few times to get an idea of where it's getting hung up on.
Not sure what you mean by GS.
Since you mentioned classes: in D, methods are virtual by default; virtual methods add indirections and are not inlineable. Make sure only those methods that must be virtual are. See if you can rewrite your program using a form of polymorphism that doesn't involve indirections, such as using template metaprogramming.
Since you mentioned associative arrays: these make heavy use of the GC; to speed them up, switch to a third-party library that works on top of std.allocator, such as https://github.com/dlang-community/containers
If some parts of your code are parallelizable, std.parallelism is a good tool for this.
Since you mentioned that the project is an interpreter: there are many avenues for optimizing them, up to JIT/AOT compilation. Perhaps you could link to an existing library such as LLVM or libjit.

gfortran optimization causes fortran do-variable loop error during runtime

I have written a fortran routine that uses some legacy fortran 77 code for finite elements. However, with a particular mesh, when the -O optimization flag is turned on, an important do-loop iterator is somehow being modified, even though fortran supposedly prohibits this. I have compiled this code using gfortran4.5 with the -fcheck=do run-time checking enabled and it has verifies what I've noted above. A runtime error occurs, only when optimizations are turned on and points directly to the do-iterator.
Using gdb on optimized code seems (while it seems erratic - lines bouncing back and forth) seems to clearly indicate that the do-iterator somehow gets set back to zero, and essentially this causes a nice infinite loop.
Any suggestions as to how to hunt down and fix whatever is causing this bug would be greatly appreciated, as I'd like to make sure the whole project can be consistently compiled with the same flags.
You say that you use fcheck=do; why not go all the way and use fcheck=all? What you're seeing sounds like a typical case of memory corruption due to an array bounds violation, which fcheck=all can in some cases catch. Where the array bounds checking doesn't work that well is with implicit interfaces and incorrect bounds being passed; a solution here is to put your procedures into modules, allowing the compiler to check interfaces.
And, like Jonathan Dursi said, consider using a tool like valgrind.

code running very slowly after importing ansi c into iphone project

I have an ANSI C code that is about 10,000 lines long that I am trying to use in an iPhone project. When I compile the code with gcc on the command line, I type the following:
gcc -o myprog -O3 myprog.c
This program reads in large jpeg files and does some fancy processing on them, so I call it with the following
./myprog mypic.jpg
and from the command line, this takes take about 0.1 seconds.
I'm trying to import this code into an iPhone project but I'm not entirely sure how. I was able to get it to compile and run successfully by renaming myprog.c to myprog.h and then calling the functions in the C code from within a generic NSObject class. I added the O3 optimization to the project's Other C Flags. However, when I do this, the code on the simulator takes about 2 seconds to run and on the iPhone about 7 seconds to run which renders an unacceptable user experience.
Any tips on on hoe to get this going would be much appreciated.
It's hard to say for sure where the slowness comes from, or if there is any way around it, but right off the bat you've done something wrong.
You shouldn't have renamed a .c file to a .h file and included it. You should have written a .h (header) file that had the function, variable, and type declarations declared:
myprog.h:
#ifndef MYPROG_H_
#define MYPROG_H_
struct thing {
int a;
int b;
};
extern int woof;
int foo(void * buf, int size);
#endif /* MYPROG_H_ */
Then you should compile the .c file to an object file (or library) and link the main program against that. If you were to have included the .h file that was really just a renamed .c file into more than one source code files it may have resulted in having multiple versions of some data and code in your program.
You'll probably also want to go through and separate out any code in myprog.c that you won't be using in your iPhone program. I'll bet that there is plenty.
As far as why the program is slowing down, this could have to do with myprog being written to make use of some resources that aren't available on the iPhone. The first thing that comes to mind is large amounts of RAM, since many desktop applications are written as though available RAM is infinite, and I could see how some .jpg manipulation code could be written this way. The way to get around this would be to try to rework the algorithm so that it did not load as much of the picture at one time while working on it.
The second thing that come is floating point code. Floating point operations are common in image manipulation code, but often either not available or severely limited in embedded systems. In the case of iPhones they are available, but according to something I heard, their performance is noticeably hampered if you compile your code to thumb rather than regular ARM code. (I've never developed for an iPhone or its particular processor so I don't know for sure, but it is worth looking into).
Another place where things could be slowing down would be if there were some sort of translation between Objective C objects and C structures that you have somehow introduced and is happening a lot more often than it should need to. There are probably other slow downs that could happen because of this, but you might be able to test this theory out by creating a objective C program for your desktop that uses the myprog.c code in a manner similar to the iPhone program's use of it.
Another thing you probably should look into is profiling your iPhone program. Profiling determines (or only helps to determine, in some cases) where the program is spending its time. Knowing this doesn't necessarily tell you that the code that runs the most is bad or that anything about it could be improved, but it does tell you where to look. And sometimes you may look at the results and immediately know that some function that you thought was only going to be called once at the beginning of the program is actually being called repeatedly, which highly suggests that some improvement can be made.
I'm sure that a little searching will turn up how to go about this.

Any Macro or Technic for Part Optimization?

I am working on lock free structure with g++ compiler. It seems that with -o1 switch, g++ will change the execution order of my code. How can I forbid g++'s optimization on certain part of my code while maintain the optimization to other part? I know I can split it to two files and link them, but it looks ugly.
If you find that gcc changes the order of execution in your code, you should consider using a memory barrier. Just don't assume that volatile variables will protect you from that issue. They will only make sure that in a single thread, the behavior is what the language guarantees, and will always read variables from their memory location to account for changes "invisible" to the executing code. (e.g changes to a variable done by a signal handler).
GCC supports OpenMP since version 4.2. You can use it to create a memory barrier with a special #pragma directive.
A very good insight about locking free code is this PDF by Herb Sutter and Andrei Alexandrescu: C++ and the Perils of Double-Checked Locking
You can use a function attribute "__attribute__ ((optimize 0))" to set the optimization for a single function, or "#pragma GCC optimize" for a block of code. These are only for GCC 4.4, though, I think - check your GCC manual. If they aren't supported, separation of the source is your only option.
I would also say, though, that if your code fails with optimization turned on, it is most likely that your code is just wrong, especially as you're trying to do something that is fundamentally very difficult. The processor will potentially perform reordering on your code (within the limits of sequential consistency) so any re-ordering that you're getting with GCC could potentially occur anyway.