Confused about three optimization techniques - optimization

How do you exactly perform "commoning"?
How does Kleene fixed-point theorem help in optimization?
How do you eliminate free variables from local function definitions in programs written in non-functional languages?
EDIT: These are NOT my homework questions. I am in my summer break.
EDIT2: Well I am just begininng to study compiler optimizations and dont have a particular code that I want to optimize. Could you just tell me what are the general methods you can use the above three optimization techniques or at least tell me the resouces that properly explain them?

Commoning is done by bottom-up hashing.
Kleene's theorem allows the compiler to implement an iterative solution to recursion equations that give facts about the program. A simple example of a fact is that at a certain point, variable i is always equal to 0.
If you have a local function with free variables that are let-bound or lambda-bound in an enclosing function, then by definition you are dealing with a language that has first-class functions. The free variables are typically dealt with by closure conversion, although some compilers use lambda-lifting.
Recommended search terms:
Bottom-up hashing
Common-subexpression elimination
Iterative dataflow analysis
Dataflow optimization made simple
Continuation-passing, closure-passing style
Closure conversion
Lambda lifting

These are what I found on the web, if somebody has access to further information please reply.
William Clinger teaches two of the above techniques and looks into more interesting ones in his class:
http://www.ccis.neu.edu/home/will/csg262_fall2004/syllabus.html
These guys are using the Kleene algebra for data flow analysis. I think we can use it in optimizing compilers:
http://ieeexplore.ieee.org/Xplore/login.jsp?url=http://ieeexplore.ieee.org/iel5/4159639/4159640/04159673.pdf%3Fisnumber%3D4159640%26prod%3DCNF%26arnumber%3D4159673%26arSt%3D201%26ared%3D210%26arAuthor%3DFernandes%252C%2BT.&authDecision=-203
Unfortunately the above paper requires login.
This is what I found about commoning(but didnt help much):
http://www.patentsurf.net/7,516,448

http://groups.google.com/group/comp.lang.scheme/browse_thread/thread/ac55fd7d73a5fdb4#

Last Question's Answer:
http://en.wikipedia.org/wiki/Lambda_lifting

Good answer from Norman. (I just hope your prof. doesn't confuse optimizations that a compiler might do with optimizations that the software programmer might do. The latter is less of a technical subject, so there is less to say about it, but in real application it is orders of magnitude more significant.)

Related

Where can I find good explanations of Computability and Complexity?

I have a repeat coming up in Computability and Complexity and I was wondering if anybody has good resources for this sort of study.
Things like regular languages, context free and context sensitive languages and all that sort of stuff.
For example:
As you can see, it is a horribly phrased question. The notes our lecturer gave us are equally as bad. I really need to pass this module so if anybody has a good resource for studying these topics it would be much appreciated.
I think the problem you're having is not the fault of the phrasing, but the fact that you're not yet comfortable dealing with the mathematical notation involved.
Wikipedia has a lot of articles on automata and other computer science theory topics. Also, a google search on 'NFA to DFA' turns up many helpful results. Automata are used heavily in compilers, so you might find a more "practical" explanation of things in material from a compilers course.
Your class is going to be heavily mathematical, though, so you would do best for yourself by putting aside the attitude that the material you've been given is poor and spend the time learning to understand it. Mathematical formulations give you precise and concise descriptions without as much room for misinterpretation as informal language has.
You may want to look at the class notes made available by Avi Kak at
https://engineering.purdue.edu/kak/courses-i-teach/ECE664/Index.html
See the handwritten notes on Lecture 17 that explain the notation in your question.

Compiler Optimization of Deterministic Functions

I was reading about Deterministic Execution, which is that for the same input, you have the same output. I was wondering whether any compiler writer has thought about optimizing deterministic functions at runtime.
For example, take the factorial function. If at runtime, it is detected that it is continuously being called with the same input value, the compiler can cache the output value and instead of executing the factorial function, can directly use that output value. Seems like a nice research topic. Are there any papers or work on this topic?
This is usually called memoization, and is a fairly common optimization in functional languages.
It can be done but as far as I know, it's not common for compilers to do it. The trouble is that users can define as many types as they like and equality in any way that they like, and with heap allocation and stuff it's very, very difficult to prove such a thing. Basically, it could be done, but only if your function involves straight numerical computation, which is rare, and thus it's usually not of high value.
You're talking about referential transparency. And it's a big part of functional programming.
http://en.wikipedia.org/wiki/Referential_transparency_(computer_science)
http://blogs.msdn.com/b/vcblog/archive/2008/11/12/pogo.aspx talks about profile guided optimization.
doesnot answer your questions per se but in general talks about using runtime behavior to optimize assembly

Programming languages that define the problem instead of the solution?

Are there any programming languages designed to define the solution to a given problem instead of defining instructions to solve it? So, one would define what the solution or end result should look like and the language interpreter would determine how to arrive at that result. Looking at the list of programming languages, I'm not sure how to even begin to research this.
The best examples I can currently think of to help illustrate what I'm trying to ask are SQL and MapReduce, although those are both sort of mini-languages designed to retrieve data. But, when writing SQL or MapReduce statements, you're defining the end result, and the DB decides the best course of action to arrive at the end result set.
I could see these types of languages, if they exist, being used in crunching a lot of data or finding solutions to a set of equations. The dream language would be one that could interpret the defined problem, identify which parts are parallelizable, and execute the solution across multiple processes/cores/boxes.
What about Declarative Programming? Excerpt from wikipedia article (emphasis added):
In computer science, declarative
programming is a programming paradigm
that expresses the logic of a
computation without describing its
control flow. Many languages
applying this style attempt to
minimize or eliminate side effects by
describing what the program should
accomplish, rather than describing how
to go about accomplishing it. This
is in contrast with imperative
programming, which requires an
explicitly provided algorithm.
The closest you can get to something like this is with a logic language such as Prolog. In these languages you model the problem's logic but again it's not magic.
This sounds like a description of a declarative language (specifically a logic programming language), the most well-known example of which is Prolog. I have no idea whether Prolog is parallelizable, though.
In my experience, Prolog is great for solving constraint-satisfaction problems (ones where there's a set of conditions that must be satisfied) -- you define your input set, define the constraints (e.g., an ordering that must be imposed on the previously unordered inputs) -- but pathological cases are possible, and sometimes the logical deduction process takes a very long time to complete.
If you can define your problem in terms of a Boolean formula you could throw a SAT solver at it, but note that the 3SAT problem (Boolean variable assignment over three-variable clauses) is NP-complete, and its first-order-logic big brother, the Quantified Boolean formula problem (which uses the existential quantifier as well as the universal quantifier), is PSPACE-complete.
There are some very good theorem provers written in OCaml and other FP languages; here are a whole bunch of them.
And of course there's always linear programming via the simplex method.
These languages are commonly referred to as 5th generation programming languages. There are a few examples on the Wikipedia entry I have linked to.
Let me try to answer ... may be Prolog could answer your needs.
I would say Objective Caml (OCaml) too...
This may seem flippant but in a sense that is what stackoverflow is. You declare a problem and or intended result and the community provides the solution, usually in code.
It seems immensely difficult to model dynamic open systems down to a finite number of solutions. I think there is a reason most programming languages are imperative. Not to mention there are massive P = NP problems lurking in the dark that would make such a system difficult to engineer.
Although what would be interesting is if there was a formal framework that could leverage human input to "crunch the numbers" and provide a solution, perhaps imperative code generation. The internet and google search engines are kind of that tool but very primitive.
Large problems and software are basically just a collection of smaller problems solved in code. So any system that generated code would require fairly delimited problem sets that can be mapped to more or less atomic solutions.
Lisp. There are so many Lisp systems out there defined in terms of rules not imperative commands. Google ahoy...
There are various Java-based rules engines which allow declarative programming - Drools is one that I've played with and it seems pretty interesting.
A lot of languages define more problems than solutions (don't take this one seriously).
On a serious note: one more vote for Prolog and different kinds of DSLs designed to be declarative.
I remember reading something about computation using DNA back when I was in college. You would put segments of DNA in a solution that represented segments of the problem, and define it in such a way that if the DNA fits together, it's a valid solution. Then you let the properties of chemicals solve the problem for you and look for finished strands that represent a solution. It sounds sort of like what you are refering to.
I don't recall if it was theoretical or had been done, though.
LINQ could also be considered another declarative DSL (aschewing the argument that it's too similar to SQL). Again, you declare what your solution looks like, and LINQ decides how to find it.
The beauty of these kinds of languages is that projects like PLINQ (which I just found) can spring up around them. Check out this video with the PLINQ developers (WMV direct link) on how they parallelize solution finding without modifying the LINQ language (much).
While mathematical proofs don't constitute a programming language, they do form a formal language where you simply define solutions (as long as you allow nonconstructive proofs). Of course, it's not algorithmic, so "math" might not be an acceptable answer.
Meta Discussion
What constitutes a problem or a solution is not absolute and depends on the level of abstraction that you are taking as a reference point.
Let's compare the following 3 languages: SQL, C++, and CPU instructions.
C++ vs CPU instructions
If you choose array manipulation as the desired level of abstraction, then C++ allows you to "define the problem" instead of the solution:
array[i * 2 + 3] = 5;
array[t] = array[k - m] - 1;
Note what this C++ snippet does not state: how the memory is laid out, how many bits are used by each array element, which CPU registers hold the data, and even in which order the arithmetic operations will be performed (as long as the result is the same).
The C++ compiler, however, will translate this code to lower-level CPU instructions that will contain all of these details.
At the abstraction level of array manipulation, C++ is declarative, and CPU instructions are imperative.
SQL vs C++
If you choose a sorting algorithm as the desired level of abstraction, then SQL allows you to "define the problem" instead of the solution:
select *
from table
order by key
This snippet of code is declarative with respect to the sorting algorithm's level of abstraction because it declares that the output is sorted without using lower-level concepts (like array manipulation).
If you had to sort an array in C++ (without using a library), the program would be expressed in terms of array manipulation steps of a particular sorting algorithm.
void sort(int *array, int size) {
int key, j;
for(int i = 1; i < size; i++) {
key = array[i];
j = i;
while(j > 0 && array[j-1] > key) {
array[j] = array[j-1];
j--;
}
array[j] = key;
}
}
This snippet is not declarative with respect to the sorting algorithm's level of abstraction because it uses concepts (such as array manipulation) that are constituents of the sorting algorithm.
Summary
To summarize, whether a language defines problems or solutions depends on what problems and solutions you are referring to.
Many answers here have brought up examples: SQL, LINQ, Prolog, Lisp, OCaml. I am sure there are many useful levels of abstractions with respect to which these languages are declarative.
However, do not forget that you can build a language with an even higher level of abstraction on top of them.

What's the name of the problem that relates to optimizing closures on a stack-based system?

I remember hearing about a general optimization problem that relates to function closures, stating that in general it's difficult to optimize the creation of a closure using only stack-based memory management. Do any of you remember the name of this optimization problem, possibly with an example or link to relevant page?
It sounds like you're thinking of the upward funarg problem.
Perhaps you're thinking of escape analysis.
It concerns the distinction between what the Lisp community calls its two kinds of extent: dynamic extent and indefinite extent. Objects of the former can be stack-allocated, while the latter cannot, as their lifetime likely exceeds the scope of their allocation.
Are you thinking of escape analysis?

How do modern optimizing compilers determine when to optimize?

How do modern optimizing compilers determine when to apply certain optimizations such as loop unrolling and code inlining?
Since both of these affect caching, naively inlining functions with less than X lines, or whatever other simple heuristic, is likely to generate worse performing code. So, how do modern compilers deal with this?
I'm having a hard time finding information on this (especially information thats reasonably easy to understand..), about the best I could find is the wikipedia article. Any details, links to books/articles/papers are greatly appreciated!
EDIT: Since answers are talking mainly about the two optimizations I mentioned (inlining and loop unrolling) I just wanted to clarify that I'm interested in all and any compiler optimizations, not just those two. I'm also more interested in the optimizations which can be performed during ahead-of-time compilation, though JIT optimization is of interest too (though to a slightly lesser extent).
Thanks!
Usually by being that naive anyway and hope it is an improvement.
This is why just-in-time compilation is such a winning strategy. Collect statistics then optimize for the common case.
References:
http://lambda-the-ultimate.org/node/768
GCC supports Profile Guided Optimization
And of course the Sun hotspot JVM
You can look a the Spiral project.
On top of that, optimizing is a tough thing to do generically. This is, in part, why there are so many options to the gcc compiler. If you know something about cache and pages you can do some things by hand and request that others be done through the compiler but no two machines are the same so the approach must be adhoc.
For short: Better than we!
You can have a look at this: http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf
Didier
Good question. You are asking about so called speculative optimizations.
Dynamic compilers use both static heuristics and profile information. Static compilers employs heuristics and (off-line) profile information. The last is often referenced as PGO (Profile Guided Optimizations).
There is a lot of articles on inlining policies. The most comprehensive one is
An Empirical Study of Method Inlining for a Java Just-In-Time Compiler
It also contains references to related work and sharp criticism on some of the considered articles (justified).
In general, state-of-the-art compilers try to use impact analysis to estimate potential effect of speculative optimizations before applying them.
P.S. Loop unrolling is old classic stuff which helps only for some tight loops that performs only number crunchng ops (no calls and so on). Method inlining is much more important optimization in the modern compilers.