How do modern optimizing compilers determine when to optimize?

How do modern optimizing compilers determine when to optimize? - optimization

How do modern optimizing compilers determine when to apply certain optimizations such as loop unrolling and code inlining?
Since both of these affect caching, naively inlining functions with less than X lines, or whatever other simple heuristic, is likely to generate worse performing code. So, how do modern compilers deal with this?
I'm having a hard time finding information on this (especially information thats reasonably easy to understand..), about the best I could find is the wikipedia article. Any details, links to books/articles/papers are greatly appreciated!
EDIT: Since answers are talking mainly about the two optimizations I mentioned (inlining and loop unrolling) I just wanted to clarify that I'm interested in all and any compiler optimizations, not just those two. I'm also more interested in the optimizations which can be performed during ahead-of-time compilation, though JIT optimization is of interest too (though to a slightly lesser extent).
Thanks!

Usually by being that naive anyway and hope it is an improvement.
This is why just-in-time compilation is such a winning strategy. Collect statistics then optimize for the common case.
References:
http://lambda-the-ultimate.org/node/768
GCC supports Profile Guided Optimization
And of course the Sun hotspot JVM

You can look a the Spiral project.
On top of that, optimizing is a tough thing to do generically. This is, in part, why there are so many options to the gcc compiler. If you know something about cache and pages you can do some things by hand and request that others be done through the compiler but no two machines are the same so the approach must be adhoc.

For short: Better than we!
You can have a look at this: http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf
Didier

Good question. You are asking about so called speculative optimizations.
Dynamic compilers use both static heuristics and profile information. Static compilers employs heuristics and (off-line) profile information. The last is often referenced as PGO (Profile Guided Optimizations).
There is a lot of articles on inlining policies. The most comprehensive one is
An Empirical Study of Method Inlining for a Java Just-In-Time Compiler
It also contains references to related work and sharp criticism on some of the considered articles (justified).
In general, state-of-the-art compilers try to use impact analysis to estimate potential effect of speculative optimizations before applying them.
P.S. Loop unrolling is old classic stuff which helps only for some tight loops that performs only number crunchng ops (no calls and so on). Method inlining is much more important optimization in the modern compilers.

Related

How compiled language is better than interpreted language in optimizing the hardware?

Specifically how is compiled language able to better optimize the hardware compared to interpreted language? Other online sources that I have read only gave vague explanations like because it is written in the native code of the target machine while some do not even offer explanation at all. Would appreciate if the explanation provided can be as "Layman" as possible given that I've only just started to code.

One major reason is optimizing compilers. Compiling "in advance" makes it much easier to apply optimizations to code, especially if you're compiling to native assembly code (as you typically do in C, for example). The fact that you know some stuff about the machine that it's going to be deployed on allows you to do machine-specific optimizations. This is especially important for, for example, Pentium-based processors, which have numerous complicated instructions that would tend to require some degree of knowledge of program structure in order to use (e.g. the MMX instruction set).
There are also some cases where the compiler can make structural changes to programs. For example, under special circumstances, some compilers can replace recursion with loops. (I once heard of someone writing a recursive Factorial function in C to learn about how to implement recursion in assembly language only to realize to his horror that the compiler had recognized an optimization and replaced his recursion with a for loop).

Is optimization necessary when code generation is targeting a runtime with JIT?

I'm planning on writing a programming language targeting the .NET platform which led me to start thinking about the code generation aspect of targeting such a platform. I'm new at writing compilers but I know there is optimization done as one of the phases in compiling (or there can be). I started to wonder about the any benefit to spending time optimizing the output (in this case CIL but this would apply to the JVM too) because the JIT compiler and things like the JVM's HotSpot could optimize at run time. Is there any benefit from optimizing the generated code (CIL or the JVM equivalent) when targeting .NET or JVM since the JIT will already optimize?

It depends. There are countless optimizations. Any given compiler (your compiler, the JIT compiler, or any other compiler) necessarily implements only a subset of those. This choice depends on available time, typical/expected input code, priorities, etc. and therefore the engineers who built the JIT compiler may have selected optimizations which work well for the programs they were expecting, but not so well for the kind of program you care about.
You will have to determine what optimizations the JIT compiler misses. The way to do this is, of course, empirical: Actually write programs, letting the JIT compiler optimize them (be sure to do this part properly - disable debugging, compile for release, choose realistic benchmarks, etc.), and then inspect the final machine code. Look for unexpected code (you will, of course, need assembly knowledge for this) and determine if it's a missed optimization or if the JIT was smarter than you thought.
If it is a missed optimization, you have another problem: You can't output the machine code you want, you have to generate different IL instead.
A missed optimization is probably due to a language feature the VM doesn't know (e.g. multi methods on the JVM). You lowered it into the VM's terms during compilation but the translation you chose doesn't sit well with the JIT's order of passes, heuristics, etc.
As you can't just output machine code yourself, you must now find an alternative IL fragment for the same input language code. Ideally, one which the JIT compiler does handle well. Finding that may be an exercise in imagination, but it's not technically hard, just guesswork interleaved with benchmarking.
As another answer points out, JIT compilers work under time constraints. This may lead to optimizations that could happen being missed (e.g. constant propagation running out of time), but as the creators of the JIT compiler faced the same problem, this probably isn't too severe if you don't create much larger/more complicated code.
If you create such bad code that the JIT compiler can't fix it all, then you have to duplicate its optimizations in your AOT compiler. I'm not convinced that this is a likely scenario though, and even if it happens even very simple optimizations should mostly fix the problem.
So, in summary: Start with a straightforward translation, then seek out missed optimizations and either make it easier to optimize for the JIT compiler, or do it yourself (if possible - adaptive optimization is much harder in an AOT setting).

I think this question is hard to answer in general.
For example, the F# compiler performs a tail call optimization, because having tail-recursive functions is common in that language, the F# compiler can do a better job at optimizing them in some cases than the JIT compiler and some versions of the JIT compiler don't perform the optimization at all.
So, your language might have some common operation whose straightforward implementation wouldn't perform well. In that case, it makes sense emitting IL code that's optimized.
What I think you should do is the same as when you're writing a normal program: first write your code in a way that is simple and readable. Only if something doesn't perform well, attempt to optimize that. It might be worth considering that you might need some optimizations in the future and make your code modular enough, so that you don't have to rewrite half of it because of some optimization. But for now, that should be enough.
Writing a compiler is hard enough job already (even if you're targeting an IL). Finish it first and think about optimizations later.

Generally, JIT compilers have some thresholds governing how much optimization they will attempt to perform. These may be based on the size of a method's IL and/or the amount of time already spent JIT compiling the method. So yes, IL which has already been optimized may benefit from further JIT optimization. As always, there is a trade-off: how much time do you want to spend adding AOT optimizations to your compiler (and testing/maintaining them) versus how quickly your code can be JIT compiled, and with what level of optimization.
The magnitude of the improvement depends largely on how much simpler (and smaller) the AOT-optimized IL is relative to the unoptimized IL, as well as the thresholds governing the JIT compiler (which, at least for the Microsoft CLR, are not widely known). The only way to find out is to do some testing yourself.

static and dynamic code analysis

I found several questions about this topic, and all of them with lot of references, but still I don't have a clear idea about that, because most of the references speak about concrete tools and not about the concept in general of the analysis. Thus I have some questions:
About Static analysis:
1. I would like to have a reference, or a summary of which techniques are successful and have more relevance nowadays.
2. What really can they do about discovering bugs, can we make a summary or it is depending of the tool?
About symbolic execution:
1. Where could be enclose symbolic execution? I guess depending of the approach,
I would like to know if they are dynamic analysis, or mix of static and dynamic analysis if it is possible to determine.
I found problems to differentiated the two different techniques in the tools, even I think I know the theoretical difference.
I'm actually working with C
Thanks in advance

I'm trying to give a short answer:
Static analysis looks at the syntactical structure of code and draws conclusions about the program behavior. These conclusions must not always be correct.
A typical example of static analysis is data flow analysis, where you compute sets like used, read, write for every statement. This will help to find e.g. uninitialized values.
You can also analyze the code regarding code-patterns. This way, these tools can be used to check if you are complying to a specific coding standard. A prominent coding standard example is MISRA. This coding standard is used for safety critical systems and avoids problematic constructs in C. This way you can already say a lot about the robustness of your applications against memory leaks, dangling pointers, etc.
Dynamic analysis is not looking at the syntax only, but takes state information into account. In symbolic execution, you are adding assumptions about the possible values of all variables to the statements.
The most expensive and powerful method of dynamic analysis is model checking, where you really look at all possible execution states of the system. You can think of a model checked system as a system that is tested with 100% coverage - but there are of course a lot of practical problems that prevent real systems to be checked that way.
These methods are very powerful, and you can gain a lot from the static code analysis tools especially when combined with a good coding standard.
A feature my software team found really impressive is e.g. that it will tell you in C++ when a class with virtual methods does not have a virtual destructor. Easy to check in fact, but really helpful.
The commercial tools are very expensive, but worth the money, once you learned how to use them. A typical problem in the beginning is that you will get a lot of false alarms, and don't know where to look for the real problem.
Note that nowadays g++ has some of this stuff already built-in, and that you can use something like pclint which is free.
Sorry - this is already getting quite long...hope it's interesting.

The term "static analysis" means that the analysis does not actually run a code. On the other hand, "dynamic analysis" runs a code and also requires some kinds of real test inputs. That is the definition. Nothing more.
Static analysis employs various formal methods such as abstract interpretation, model checking, and symbolic execution. In general, abstract interpretation or model checking is suitable for software verification. Symbolic execution is more appropriate for the purpose of bug finding.
Symbolic execution is categorized into static analysis. However, there is a hybrid method called concolic execution which uses both symbolic execution and dynamic testing.
Added for Zane's comment:
Maybe my explanation was little confusing.
The difference between software verification and bug finding is whether the analysis is sound or not. For example, when we say the buffer overrun analyzer is sound, it means that the analyzer must report all possible buffer overruns. If the analyzer reports nothing, it proves the absence of buffer overruns in the target program. Because model checking is the method that guarantees soundness, it is mostly used for software verification.
On the other hands, symbolic execution which is actively used by today's most commercial static analyzers does not guarantee soundness since sound analysis inherently issues lots, lots of false positives. For the purpose of bug finding, it is more important to reduce false positives even if some true positives are also lost.
In summary,
soundness: there are no false negatives
completeness: there are no false positives
software verification: soundness is more important than completeness
bug finding: completeness is more important than soundness

compiler optimization implementation

Actually I am making a major project in implementing compiler optimization techniques. I already know about the existing techniques, but I am confused what technique to choose and how to implement it.

G'day,
What area of optimization are you talking about?
Compiler optimizations such as:
loop optimizations
dataflow optimizations
static single assignment based optimizations
code generator optimizations
etc.
etc.
Or optimization in the performance of the compiler itself, i.e. the speed with which it works?

Assuming that you have a compiler to optimize, and if it wasn't written by you, look up the documentation to see what is missing. Otherwise, if it was written by you, you can start off with the simplest. The definition for the simplest will depend on the language your compiler consumes. Or am I missing something?

I think you may have over optimized your question . Are you trying to decide where to start or trying to decide if some optimizations are worth implementing and others are not? I would assume all of the existing techniques have a place and are useful depending on the code they come across. If you are deciding which one to do first, pick the one you can do and do it. Pick the low hanging fruit. Get a few wins in your back pocket before you tackle a tough one and stumble and get frustrated. I would assume the real trick is having all the optimizations there and working but coming up with a way to decide which ones produce something better for a particular program and which ones get in the way and make things worse.

IMHO, the thing to do is implement the simple, obvious optimizations and then let it rest. Certainly it is very interesting to try to do weird and wonderful optimizations to rectify things that the user could simply have coded a little better, but if you really want to try to clean up after poor coding or poor design, the user can always outrun you. This is my favorite example.
My favorite example of compiler-optimizations-gone-nuts is Fortran compilers, where they go to such lengths to scramble code to shave a few hypothetical cycles that the code is almost impossible to debug, and typically the program counter is in there less than 1% of the time, so the effort is wasted.

Confused about three optimization techniques

How do you exactly perform "commoning"?
How does Kleene fixed-point theorem help in optimization?
How do you eliminate free variables from local function definitions in programs written in non-functional languages?
EDIT: These are NOT my homework questions. I am in my summer break.
EDIT2: Well I am just begininng to study compiler optimizations and dont have a particular code that I want to optimize. Could you just tell me what are the general methods you can use the above three optimization techniques or at least tell me the resouces that properly explain them?

Commoning is done by bottom-up hashing.
Kleene's theorem allows the compiler to implement an iterative solution to recursion equations that give facts about the program. A simple example of a fact is that at a certain point, variable i is always equal to 0.
If you have a local function with free variables that are let-bound or lambda-bound in an enclosing function, then by definition you are dealing with a language that has first-class functions. The free variables are typically dealt with by closure conversion, although some compilers use lambda-lifting.
Recommended search terms:
Bottom-up hashing
Common-subexpression elimination
Iterative dataflow analysis
Dataflow optimization made simple
Continuation-passing, closure-passing style
Closure conversion
Lambda lifting

These are what I found on the web, if somebody has access to further information please reply.
William Clinger teaches two of the above techniques and looks into more interesting ones in his class:
http://www.ccis.neu.edu/home/will/csg262_fall2004/syllabus.html
These guys are using the Kleene algebra for data flow analysis. I think we can use it in optimizing compilers:
http://ieeexplore.ieee.org/Xplore/login.jsp?url=http://ieeexplore.ieee.org/iel5/4159639/4159640/04159673.pdf%3Fisnumber%3D4159640%26prod%3DCNF%26arnumber%3D4159673%26arSt%3D201%26ared%3D210%26arAuthor%3DFernandes%252C%2BT.&authDecision=-203
Unfortunately the above paper requires login.
This is what I found about commoning(but didnt help much):
http://www.patentsurf.net/7,516,448

http://groups.google.com/group/comp.lang.scheme/browse_thread/thread/ac55fd7d73a5fdb4#

Last Question's Answer:
http://en.wikipedia.org/wiki/Lambda_lifting

Good answer from Norman. (I just hope your prof. doesn't confuse optimizations that a compiler might do with optimizations that the software programmer might do. The latter is less of a technical subject, so there is less to say about it, but in real application it is orders of magnitude more significant.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas