What issues would one hope to find using static call graph analysis on a program? FxCop uses static call graph analysis, what issues does it find using this technique?
http://msdn.microsoft.com/library/bb429476.aspx
http://en.wikipedia.org/wiki/Callgraph
Apologies for my lack of knowledge, I found some information via google, but fear that it is vastly incomplete. Thanks!
By itself the call-graph is just that; there are no "wrong" call graphs (unless you have a style check prohibiting recursion).
The real issue is that to understand how code at a point in the program might be problematic, you typically need to understand the shape of the world (what data structures are live, what values they might contain, what relationships they might have) at the moment where that code point is active. The call graph shows how execution can get to the code point of interest, and all the code along that call graph path sets the code execution context. This enables the static analyzer to produce a "context-sensitive" analysis, which gives much more accurate answers.
This leads to a second problem: how does one get an accurate call graph? If you have a direct call of B from A, it is easy to write down, "A calls B" and feel this is an accurate call-graph fact. But if A makes an call through an indirect pointer (can you say virtual method dispatch?) suddenly it isn't so clear exactly who A calls; you end up with A-might-call-B1, A-might-call-B2, ... Which one A actually calls in fact depends on the context in which A executes... oops, you need the call graph to manufacture the call graph. The sort of good news is that you build the call graph up from the bottom: "I know this is surely true, so that must be surely true". At places where the analzyer can't figure it out, it generally makes a conservative guess: ("All of these calls might be possible, I can't rule them out"). That conservatism is one of the key causes of the accuracy of your static analyzer.
This is what I've found:
Call-graphs are used to detect issues in regards to program execution, violation of recommended guidelines, and possible code injection attacks.
By creating a graph of the calling relationships among various methods, it is easy to see where issues may arise at certain times when certain methods are called or how certain methods are called. It's easy to see when a procedure/function may be violating guidelines such as sustaining code modularity. It's easy to see where malicious code could possibly be injected at certain points because of those calling relationships, and how they are structured. In this way, call-graphs provide context to static analysis, producing more accurate results.
Since FxCop uses static call-graphs, it is only able to speculate on the above to a degree.
Related
i want to learn test automation using meta programming.i googled it could not find any thing.can anybody suggest me some resources where can i get info about "how to use Meta Programming for making test automation easy"?
That's a broad topic and not a lot has been written about it, because of the "dark corners" of metaprogramming.
What do you mean by "metaprogramming"?
As background, I consider metaprogramming to be any activity in which a tool (which we call a "metaprogramming tool") is used to inspect or modify the application software to achieve some effect.
Many people consider "reflection" to be a kind of metaprogramming; other consider (C++-style) templates to be metaprogramming; some suggest aspect-oriented programming.
I sort of agree but think these are weak versions of what you want, because each has severe limits on what it can see or do to source code. What you really want is a metaprogramming tool that has access to everything in your source program (yes, comments too!) Such tools are called Program Transformation Systems (PTS); they work by parsing the source code and operating on the parsed representation of the program. (I happen to build one of these, see my bio). PTSes can then analyze the code accurate, and/or make reliable changes to the code and regenerate valid source with the changes. PS: a PTS can implement all those other metaprogramming techniques as special cases, so it is strictly more general.
Where can you use metaprogramming for testing?
There are at least 2 areas in which metaprogramming might play a role:
1) Collection of information from tests
2) Generation of tests
3) Avoidance of tests
Collection.
Collection of test results depends on the nature of tests. Many tests are focused on "is this white/black box functioning correctly"? Assuming the tests are written somehow, they have to have access to the box under test,
be able to invoke that box in a realistic ways, determine if the result is correct, and often tabulate the results to that post-testing quality assessments can be made.
Access is the first problem. The black box to be tested may not be easily accessible to a testing framework: driven by a UI event, in a non-public routine, buried deep inside another function where it hard to get at.
You may need metaprogramming to "temporarily" modify the program to provide access to the box that needs testing (e.g., change a Private method to Public so it can be called from outside). Such changes exist only for the duration of the test project; you throw the modified program away because nobody wants it for anything but the test results. Yes, you have to ensure that the code transformations applied to make things visible don't change the program functionality.
The second problem is exercising the targeted black box in a realistic environment. Each code module runs in a world in which it assumes data and the environment are "properly" configured. The test program can set up that world explicitly by making calls on lots of the program elements or using its own custom code; this is usually the bulk of a test routine, and this code is hard to write and fragile (the application under test keeps changing; so do its assumptions about the world). One might use metaprogramming to instrument the application to collect the environment under which a test might need to run, thus avoiding the problem of writing all the setup code.
Finally, one might want to record more than just "test failed/passed". Often it is useful to know exactly what code got tested ("test coverage"). One can instrument the application to collect what-got-executed data; here's how to do it for code blocks: http://www.semdesigns.com/Company/Publications/TestCoverage.pdf using a PTS. More sophisticated instrumentation might be used to capture information about which paths through the code have been executed. Uncovered code, and/or uncovered paths, show where tests have not been applied and you arguably know nothing about what the program does, let alone whether it is buggy in a straightforward way.
Generation of tests
Someone/thing has to produce tests; we've already discussed how to produce the set-up-the-environment part. What about the functional part?
Under the assumption that the program has been debugged (e.g, already tested by hand and fixed), one could use metaprogramming to instrument the code to capture the results of execution of a black box (e.g., instance execution post-conditions). By exercising the program, one can then produce (by definition) "correctly produces" results which can be transformed into a test. In this way, one might construct a huge variety of regression tests for an existing program; these will be valuable in verifying the further enhancements to the program don't break most of its functionality.
Often a function has qualitatively different behaviors on different ranges of input (e.g., for x<10, produced x+1, else produces x*x). Ideally one would like to provide a test for each qualitively different results (e.g, x<10, x>=10) which means one would like to partition the input ranges. Metaprogrammning can help here, too, by enumerating all (partial) paths through module, and providing the predicate that controls each path.
The separate predicates each represent the input space partition of interest.
Avoidance of Tests
One only tests code one does not trust (surely you aren't testing the JDK?) Any code consructed by a reliable method doesn't need tests (the JDK was constructed this way, or at least Oracle is happy to have you beleive it).
Metaprogramming can be used to automatically generate code from specifications or DSLs, in relaible ways. Such generated code is correct-by-construction (we can argue about what degree of rigour), and doesn't need tests. You might need to test that DSL expression achieves the functionaly you desired, but you don't have to worry about whether the generated code is right.
I now have sufficent exposure to the Objective-C that if i'm stuck with anything, I know how to think of the problem in terms of a likely tool I need and go look for it. Simple really. There's A Method For That. So nothings a real problem anymore.
Now I'm looking deeper at the language in broader terms. We write stuff. The compiler hews out all the code to execute it. From a simple flashlight app thats a if/then decision to turn on, to a highly complex accelerometer driven 3D shoot 'em up with blood 'n guts and body parts following all sorts of physics, the compiler prepares the code ready to be executed like a giant railway layout. No matter how random it appears on the screen, everything possible can be generically described and prepared for.
So here's the question:
Are there cases where something completely unexpected to the software designer can still be handled without an execution halt? Maybe I'd better re-frame the question a few different ways: Can a ( objective-C ) program meta-compile within itself in response to an unplanned-for user request? or to re-put my opening remark, are there tools or methods for unlikely descriptions of unlikely problems?
I think #kfb has the right comment about metaprogramming. Check out the Runtime docs in conjunction with metaprogramming tutorials.
Parts of your last question might be in the realm of this doc.
If your looking for ways to reduce the size of your code base for the lesser used features, one idea might be to make the features internet based (assuming connectivity is not a problem).
I have a busy set of routines to validate or download the current client application. It starts with a Windows desktop shortcut that invokes a .WSF file. This calls on several .VBS files, an .INI for settings, and potentially a .BAT file. Some of these script documents have internal functions. The final phase opens a Microsoft Access database, which entails an AutoExec macro, which kicks off some VBA, including a form which has a load routine of its own in VBA.
None of this detail is specifically important (so please don't add a VBA tag, OR criticize my precious complexity). The point is I have a variety of tools and containers and they may be functionally nested.
I need better techniques for parsing that in a flow chart. Currently I rely on any or all of the following:
a distinct color
a big box that encloses a routine
the classic 'transfer of control' symbol
perhaps an explanatory call-out
Shouldn't I increase my flow charting vocabulary? Tutorials explain the square, the diamond, the circle, and just about nothing more. Surely FC can help me deal with these sorts of things:
The plethora of script types lets me answer different needs, and I want to indicate tool/language.
A sub-routine could result in an abort of the overall task, or an error, and I want to show the handling of that by (or consequences for) higher-level "enclosing" routines.
I want to distinguish "internal" sub-routines from ones in a different script file.
Concurrent script processing could become critical, so I want to note that.
The .INI file lets me provide all routines with persistent values. How is that charted?
A function may have an argument(s) and a return value/reference ... I don't know how to effectively cite even that.
Please provide guidance or point me to a extra-helpful resource. If you recommend an analysis tool set (like UML, which I haven't gotten the hang of yet), please also tell me where I can find a good introduction.
I am not interested in software. Please consider this a white board exercise.
Discussion of the question suggests flowcharts are not useful or accurate.
Accuracy depends on how the flow charts are constructed. If they are constructed manually, they are like any other manually built document and will be out of date almost instantly; that makes hand-constructed flowcharts really useless, which is why people tend to like looking at the code.
[The rest of this response violate's the OPs requirement of "not interested in software (to produce flowcharts)" because I think that's the only way to get them in some kind of useful form.]
If the flowcharts are derived from the code by an an appropriate language-accurate analysis tool, they will be accurate. See examples at http://www.semanticdesigns.com/Products/DMS/FlowAnalysis.html These examples are semantically precise although the pages there don't provide the exact semantics, but that's just a documetation detail.
It is hard to find such tools :-} especially if you want flowcharts that span multiple languages, and multiple "execution paradigms" (OP wants his INI files included; they are some kind of implied assignment statements, and I'm pretty sure he'd want to model SQL actions which don't flowchart usefully because they tend to be pure computation over tables).
It is also unclear that such flowcharts are useful. The examples at the page I provided should be semiconvincing; if you take into account all the microscopic details (e.g., the possiblity of an ABORT control flow arc emanating from every subroutine call [because each call may throw an exception]) these diagrams get horrendously big, fast. The fact that the diagrams are space-consuming (boxes, diamonds, lines, lots of whitespace) aggravates this pretty badly. Once they get big, you literally get lost in space following the arcs. Again, a good reason for people to avoid flowcharts for entire systems. (The other reason people like text languages is they can in fact be pretty dense; you can get a lot on a page with a succinct language, and wait'll you see APL :)
They might be of marginal help in individual functions, if the function has complex logic.
I think it unlikely that you are going to get language accurate analyzers that produce flowcharts for all the languages you want, that such anlayzers can compose their flowcharts nicely (you want JavaScript invoking C# running SQL ...?)
What you might hope for is a compromise solution: display the code with various hyper links to the other artifacts referenced. You still need the ability to produce such hyperlinked code (see http://www.semanticdesigns.com/Products/Formatters/JavaBrowser.html for one way this might work), but you also need hyperlinks across the language boundaries.
I know of no tools that presently do that. And I doubt you have the interest or willpower to build such tools on your own.
Greetings!
I inherited a C#.NET application I have been extending and improving for a while now. Overall it was obviously a rush-job (or whoever wrote it was seemingly less competent than myself). The app pulls some data from an embedded device & displays and manipulates it. At the core is a communications thread in the main application form which executes a 600+ lines of code method which calls functions all over the place, implementing a state machine - lots of if-state-then-do type code. Interaction with the device is done by setting the state/mode globally and letting the thread do it's thing. (This is just one example of the badness of the code - overall it is not very OO-like, it reminds of the style of embedded C code the device firmware is written in).
My problem is that this piece of code is central to the application. The software, communications protocol or device firmware are not documented at all. Obviously to carry on with my work I have to interact with this code.
What I would like some guidance on, is whether it is worth scrapping this code & trying to piece together something more reasonable from the information I can reverse engineer? I can't decide! The reason I don't want to refactor is because the code already works, and changing it will surely be a long, laborious and unpleasant task. On the flip side, not refactoring means I have to sometimes compromise the design of other modules so that I may call my code from this state machine!
I've heard of "If it ain't broke don't fix it!", so I am wondering if it should apply when "it" is influencing the design of future code! Any advice would be appreciated!
Thanks!
Also, the longer you wait, the worse the codebase will smell. My suggestion would be first create a testsuite that you can evaluate your refactoring against. This makes it a lot easier to see if you are refactoring or just plain breaking things :).
I would definitely recommend you to refactor the code if you feel its junky. Yes, during the process of refactoring you may have some inconsistencies/problems at the start. But that is why we have iterations and testing. Since you are going to build up on this core engine in future, why not make the basement as stable as possible.
However, be very sure on what you are going to do. Because at times long lines of code does not necessarily mean evil. On the other hand they may be very efficient in running time. If/else blocks are not bad if you ask me, as they are very intelligent in branching from a microprocessor's perspective. So, you will have to be judgmental and very clear before you touch this.
But once you refactor the code, you will definitely have fine control over it. And don't forget to document it!! Tomorrow, someone might very well come and say about you on whatever you've told about this guy who have written that core code.
This depends on the constraints you are facing, it's a decision to be based on practical basis, not on theoretical ones. You need three things to consider.
Time: you need to have enough time to learn it, implement it, and test it, without too many other tasks interrupting you
Boss #1: if you are working for someone, he needs to know and approve the time and effort you will spend immediately, required to rebuild your solution
Boss #2: your boss also needs to know that the advantage of having new and clean software will come at the price of possible regressions, and therefore at the beginning of the deployment there may be unexpected bugs
If you have those three, then go ahead and refactor it. It will be surely be worth it!
First and foremost, get all the business logic out of the Form. Second, locate all the parts where the code interacts with the global state (e.g. accessing the embedded system). Delegate all this access to methods. Then, move these methods into a new class and create an instance in the class's constructor. Finally, inject an instance for the class to use.
Following these steps, you can move your embedded system logic ("existing module") to a wrapper class you write, so the interface can be nice and clean and more manageable. Then you can better tackle refactoring the monster method because there is less global state to worry about (only local state).
If the code works and you can integrate your part with minimal changes to it then let the code as it is and do your integration.
If the code is simply a big barrier in your way to add new functionality then it is best for you to refactor it.
Talk with other people that are responsible for the project, explain the situation, give an estimation explaining the benefits gained after refactoring the code and I'm sure (I hope) that the best choice will be made. It is best to speak about what you think, don't keep anything inside, especially if this affects your productivity, motivation etc.
NOTE: Usually rewriting code is out of the question but depending on situation and amount of code needed to be rewritten the decision may vary.
You say that this is having an impact on the future design of the system. In this case I would say it is broken and does need fixing.
But you do have to take into account the business requirements. Often reality gets in the way!
Would it be possible to wrap this code up in another class whose interface better suits how you want to take the system forward? (See adapter pattern)
This would allow you to move forward with your requirements without the poor design having an impact.
It gives you an interface that you understand which you could write some unit tests for. These tests can be based on what your design requires from this code. It ensures that your assumptions about what it is doing is correct. If you say that this code works, then any failing tests may be that your assumptions are incorrect.
Once you have these tests you can safely refactor - one step at a time, and when you have some spare time or when it is needed - as per business requirements.
Quite often I find the best way to truly understand a piece of code is to refactor it.
EDIT
On reflection, as this is one big method with multiple calls to the outside world, you are going to need some kind of inverse Adapter class to wrap this method. If you can inject dependencies into the method (see Dependency Inversion such that the method calls methods in your classes then you can route these to the original calls.
I have been using iterators for a while and I love them.
But although I have thought hard about it, I could not figure out "how a compiler that recognizes the iterators" be implemented. I have also researched about it, but could not find any resource explaining the situation in the compiler-design context.
To elaborate, most of the articles about Iterators imply there is some sort of 'magic' implementing the desired behaviour. They suggest the compiler maintains a state machine in order to follow where the execution is (where the last 'yield return' is seen). I am especially interested in this property of Iterators that enables the lazy evaluation.
By the way, I know what state machines are, have already taken a compiler design course, studied the Dragon Book. But appearently, I cannot relate what I have studied to the 'magics' of csc.
Any knowledge or differential thoughts are appreciated.
It's simpler than it seems. The compiler can decompose the iterator function into individual chunks; chunks are divided by yield statements.
The state machine just needs to keep track of which chunk we're currently in, and upon next invocation of the iterator, jumps directly to this chunk. We also need to keep track of all local variables (of course).
Then, we need to consider a few special cases, in particular loops containing yields. Fortunately, IL (but not C# itself) allows goto to jump into loops and resume them.
Notice that there are some very complicated edge cases, e.g. C# doesn't allow yield in finally blocks because it would be very difficult (impossible?) to leave the function upon yield, and later resume the function, perform clean-up, re-throw any exception and preserve the stack trace.
Eric Lippert has posted an in-depth description of the process. (Read the articles he has linked to, as well!)
One thing I would try would be to write a short example in C#, compile it, and then use Reflector on it. I think that this "yield return" thing is just syntax sugar, so you should be able to see how the compiler handles it in the output of the disassembler.
But, well, I don't really know much about these things so maybe I'm completely wrong.