how can a compiler that recognizes the iterators be implemented? - iterator

I have been using iterators for a while and I love them.
But although I have thought hard about it, I could not figure out "how a compiler that recognizes the iterators" be implemented. I have also researched about it, but could not find any resource explaining the situation in the compiler-design context.
To elaborate, most of the articles about Iterators imply there is some sort of 'magic' implementing the desired behaviour. They suggest the compiler maintains a state machine in order to follow where the execution is (where the last 'yield return' is seen). I am especially interested in this property of Iterators that enables the lazy evaluation.
By the way, I know what state machines are, have already taken a compiler design course, studied the Dragon Book. But appearently, I cannot relate what I have studied to the 'magics' of csc.
Any knowledge or differential thoughts are appreciated.

It's simpler than it seems. The compiler can decompose the iterator function into individual chunks; chunks are divided by yield statements.
The state machine just needs to keep track of which chunk we're currently in, and upon next invocation of the iterator, jumps directly to this chunk. We also need to keep track of all local variables (of course).
Then, we need to consider a few special cases, in particular loops containing yields. Fortunately, IL (but not C# itself) allows goto to jump into loops and resume them.
Notice that there are some very complicated edge cases, e.g. C# doesn't allow yield in finally blocks because it would be very difficult (impossible?) to leave the function upon yield, and later resume the function, perform clean-up, re-throw any exception and preserve the stack trace.
Eric Lippert has posted an in-depth description of the process. (Read the articles he has linked to, as well!)

One thing I would try would be to write a short example in C#, compile it, and then use Reflector on it. I think that this "yield return" thing is just syntax sugar, so you should be able to see how the compiler handles it in the output of the disassembler.
But, well, I don't really know much about these things so maybe I'm completely wrong.

Related

Does a language describe things beyond itself?

I now have sufficent exposure to the Objective-C that if i'm stuck with anything, I know how to think of the problem in terms of a likely tool I need and go look for it. Simple really. There's A Method For That. So nothings a real problem anymore.
Now I'm looking deeper at the language in broader terms. We write stuff. The compiler hews out all the code to execute it. From a simple flashlight app thats a if/then decision to turn on, to a highly complex accelerometer driven 3D shoot 'em up with blood 'n guts and body parts following all sorts of physics, the compiler prepares the code ready to be executed like a giant railway layout. No matter how random it appears on the screen, everything possible can be generically described and prepared for.
So here's the question:
Are there cases where something completely unexpected to the software designer can still be handled without an execution halt? Maybe I'd better re-frame the question a few different ways: Can a ( objective-C ) program meta-compile within itself in response to an unplanned-for user request? or to re-put my opening remark, are there tools or methods for unlikely descriptions of unlikely problems?
I think #kfb has the right comment about metaprogramming. Check out the Runtime docs in conjunction with metaprogramming tutorials.
Parts of your last question might be in the realm of this doc.
If your looking for ways to reduce the size of your code base for the lesser used features, one idea might be to make the features internet based (assuming connectivity is not a problem).

Any Data Structures & Algorithms books w/ examples in Objective-C or other Keyword Message Language?

I've tried searching for Data Structure / Algorithms books that provide examples in either Objective-C, or another language supporting keyword message syntax, to no avail.
The reason I'm interested in this is because I really think the keyword syntax would help me understand the intent of code, which I find I have to think longer about in languages with typical function call syntax.
A good example is this snippet from a SplayTree implementation in C:
/* Continue down the tree. */
n = splay_tree_splay_helper (sp, key, next, node, parent);
The function name is pretty unhelpful, and even with the comment I have to thoroughly read the code to have any idea what's really happening there.
I know that technically any piece of C code is valid Objective-C, but I'm looking for something that structures algorithm implementations utilizing a good object model like Objective-C's since I believe the resulting code is more maintainable. This may seem counter-intuitive in the performance restricted space of algorithm design, but I've seen plenty of Algorithms books that have examples in idiomatic Ruby, Python, Javascript etc.
Basically I'm looking for anything with a good object model that allows for very descriptive keyword messages, whether it's Objective-C or even (though probably unlikely) anything else in the Smalltalk family.
Why would you want a book? Just download a smalltalk environment and read the whole actual source. Open a system browser, select one of the Collections categories (collection of classes) and start browsing the code (the extra column is for message categories). Open a workspace, type Object cmd-B (or ctrl-B, for browse) and see for yourself why the single responsibility principle was invented. Navigate through the code with hierarchy, senders and implementors.
I think you are looking for the wrong thing.
A good algorithms and data structures books will try to not waste your time with hard to read source code. Most of the good books I know spend most of their time explaining things at a high level and only show actual code in small snippets that can be easily understood independently of the language used and how proficient you are with it.
It doesn't matter how convoluted some guy's implementation of splay trees is. As long as you know what the splay tree is you should be able to implement your own version without looking at hit too much.
And finally, a good object model and nice syntax is not the be-all-end-all of things. Many datastructures make use of union types that are not very nicely implemented in OO style and the naming patterns and syntax are things you should be able to get used to very quickly.

Using flow chart or diagram for routines across programs

I have a busy set of routines to validate or download the current client application. It starts with a Windows desktop shortcut that invokes a .WSF file. This calls on several .VBS files, an .INI for settings, and potentially a .BAT file. Some of these script documents have internal functions. The final phase opens a Microsoft Access database, which entails an AutoExec macro, which kicks off some VBA, including a form which has a load routine of its own in VBA.
None of this detail is specifically important (so please don't add a VBA tag, OR criticize my precious complexity). The point is I have a variety of tools and containers and they may be functionally nested.
I need better techniques for parsing that in a flow chart. Currently I rely on any or all of the following:
a distinct color
a big box that encloses a routine
the classic 'transfer of control' symbol
perhaps an explanatory call-out
Shouldn't I increase my flow charting vocabulary? Tutorials explain the square, the diamond, the circle, and just about nothing more. Surely FC can help me deal with these sorts of things:
The plethora of script types lets me answer different needs, and I want to indicate tool/language.
A sub-routine could result in an abort of the overall task, or an error, and I want to show the handling of that by (or consequences for) higher-level "enclosing" routines.
I want to distinguish "internal" sub-routines from ones in a different script file.
Concurrent script processing could become critical, so I want to note that.
The .INI file lets me provide all routines with persistent values. How is that charted?
A function may have an argument(s) and a return value/reference ... I don't know how to effectively cite even that.
Please provide guidance or point me to a extra-helpful resource. If you recommend an analysis tool set (like UML, which I haven't gotten the hang of yet), please also tell me where I can find a good introduction.
I am not interested in software. Please consider this a white board exercise.
Discussion of the question suggests flowcharts are not useful or accurate.
Accuracy depends on how the flow charts are constructed. If they are constructed manually, they are like any other manually built document and will be out of date almost instantly; that makes hand-constructed flowcharts really useless, which is why people tend to like looking at the code.
[The rest of this response violate's the OPs requirement of "not interested in software (to produce flowcharts)" because I think that's the only way to get them in some kind of useful form.]
If the flowcharts are derived from the code by an an appropriate language-accurate analysis tool, they will be accurate. See examples at http://www.semanticdesigns.com/Products/DMS/FlowAnalysis.html These examples are semantically precise although the pages there don't provide the exact semantics, but that's just a documetation detail.
It is hard to find such tools :-} especially if you want flowcharts that span multiple languages, and multiple "execution paradigms" (OP wants his INI files included; they are some kind of implied assignment statements, and I'm pretty sure he'd want to model SQL actions which don't flowchart usefully because they tend to be pure computation over tables).
It is also unclear that such flowcharts are useful. The examples at the page I provided should be semiconvincing; if you take into account all the microscopic details (e.g., the possiblity of an ABORT control flow arc emanating from every subroutine call [because each call may throw an exception]) these diagrams get horrendously big, fast. The fact that the diagrams are space-consuming (boxes, diamonds, lines, lots of whitespace) aggravates this pretty badly. Once they get big, you literally get lost in space following the arcs. Again, a good reason for people to avoid flowcharts for entire systems. (The other reason people like text languages is they can in fact be pretty dense; you can get a lot on a page with a succinct language, and wait'll you see APL :)
They might be of marginal help in individual functions, if the function has complex logic.
I think it unlikely that you are going to get language accurate analyzers that produce flowcharts for all the languages you want, that such anlayzers can compose their flowcharts nicely (you want JavaScript invoking C# running SQL ...?)
What you might hope for is a compromise solution: display the code with various hyper links to the other artifacts referenced. You still need the ability to produce such hyperlinked code (see http://www.semanticdesigns.com/Products/Formatters/JavaBrowser.html for one way this might work), but you also need hyperlinks across the language boundaries.
I know of no tools that presently do that. And I doubt you have the interest or willpower to build such tools on your own.

Compiler Optimization of Deterministic Functions

I was reading about Deterministic Execution, which is that for the same input, you have the same output. I was wondering whether any compiler writer has thought about optimizing deterministic functions at runtime.
For example, take the factorial function. If at runtime, it is detected that it is continuously being called with the same input value, the compiler can cache the output value and instead of executing the factorial function, can directly use that output value. Seems like a nice research topic. Are there any papers or work on this topic?
This is usually called memoization, and is a fairly common optimization in functional languages.
It can be done but as far as I know, it's not common for compilers to do it. The trouble is that users can define as many types as they like and equality in any way that they like, and with heap allocation and stuff it's very, very difficult to prove such a thing. Basically, it could be done, but only if your function involves straight numerical computation, which is rare, and thus it's usually not of high value.
You're talking about referential transparency. And it's a big part of functional programming.
http://en.wikipedia.org/wiki/Referential_transparency_(computer_science)
http://blogs.msdn.com/b/vcblog/archive/2008/11/12/pogo.aspx talks about profile guided optimization.
doesnot answer your questions per se but in general talks about using runtime behavior to optimize assembly

What issues does static call graph analysis decipher?

What issues would one hope to find using static call graph analysis on a program? FxCop uses static call graph analysis, what issues does it find using this technique?
http://msdn.microsoft.com/library/bb429476.aspx
http://en.wikipedia.org/wiki/Callgraph
Apologies for my lack of knowledge, I found some information via google, but fear that it is vastly incomplete. Thanks!
By itself the call-graph is just that; there are no "wrong" call graphs (unless you have a style check prohibiting recursion).
The real issue is that to understand how code at a point in the program might be problematic, you typically need to understand the shape of the world (what data structures are live, what values they might contain, what relationships they might have) at the moment where that code point is active. The call graph shows how execution can get to the code point of interest, and all the code along that call graph path sets the code execution context. This enables the static analyzer to produce a "context-sensitive" analysis, which gives much more accurate answers.
This leads to a second problem: how does one get an accurate call graph? If you have a direct call of B from A, it is easy to write down, "A calls B" and feel this is an accurate call-graph fact. But if A makes an call through an indirect pointer (can you say virtual method dispatch?) suddenly it isn't so clear exactly who A calls; you end up with A-might-call-B1, A-might-call-B2, ... Which one A actually calls in fact depends on the context in which A executes... oops, you need the call graph to manufacture the call graph. The sort of good news is that you build the call graph up from the bottom: "I know this is surely true, so that must be surely true". At places where the analzyer can't figure it out, it generally makes a conservative guess: ("All of these calls might be possible, I can't rule them out"). That conservatism is one of the key causes of the accuracy of your static analyzer.
This is what I've found:
Call-graphs are used to detect issues in regards to program execution, violation of recommended guidelines, and possible code injection attacks.
By creating a graph of the calling relationships among various methods, it is easy to see where issues may arise at certain times when certain methods are called or how certain methods are called. It's easy to see when a procedure/function may be violating guidelines such as sustaining code modularity. It's easy to see where malicious code could possibly be injected at certain points because of those calling relationships, and how they are structured. In this way, call-graphs provide context to static analysis, producing more accurate results.
Since FxCop uses static call-graphs, it is only able to speculate on the above to a degree.