Using flow chart or diagram for routines across programs - scripting

I have a busy set of routines to validate or download the current client application. It starts with a Windows desktop shortcut that invokes a .WSF file. This calls on several .VBS files, an .INI for settings, and potentially a .BAT file. Some of these script documents have internal functions. The final phase opens a Microsoft Access database, which entails an AutoExec macro, which kicks off some VBA, including a form which has a load routine of its own in VBA.
None of this detail is specifically important (so please don't add a VBA tag, OR criticize my precious complexity). The point is I have a variety of tools and containers and they may be functionally nested.
I need better techniques for parsing that in a flow chart. Currently I rely on any or all of the following:
a distinct color
a big box that encloses a routine
the classic 'transfer of control' symbol
perhaps an explanatory call-out
Shouldn't I increase my flow charting vocabulary? Tutorials explain the square, the diamond, the circle, and just about nothing more. Surely FC can help me deal with these sorts of things:
The plethora of script types lets me answer different needs, and I want to indicate tool/language.
A sub-routine could result in an abort of the overall task, or an error, and I want to show the handling of that by (or consequences for) higher-level "enclosing" routines.
I want to distinguish "internal" sub-routines from ones in a different script file.
Concurrent script processing could become critical, so I want to note that.
The .INI file lets me provide all routines with persistent values. How is that charted?
A function may have an argument(s) and a return value/reference ... I don't know how to effectively cite even that.
Please provide guidance or point me to a extra-helpful resource. If you recommend an analysis tool set (like UML, which I haven't gotten the hang of yet), please also tell me where I can find a good introduction.
I am not interested in software. Please consider this a white board exercise.

Discussion of the question suggests flowcharts are not useful or accurate.
Accuracy depends on how the flow charts are constructed. If they are constructed manually, they are like any other manually built document and will be out of date almost instantly; that makes hand-constructed flowcharts really useless, which is why people tend to like looking at the code.
[The rest of this response violate's the OPs requirement of "not interested in software (to produce flowcharts)" because I think that's the only way to get them in some kind of useful form.]
If the flowcharts are derived from the code by an an appropriate language-accurate analysis tool, they will be accurate. See examples at These examples are semantically precise although the pages there don't provide the exact semantics, but that's just a documetation detail.
It is hard to find such tools :-} especially if you want flowcharts that span multiple languages, and multiple "execution paradigms" (OP wants his INI files included; they are some kind of implied assignment statements, and I'm pretty sure he'd want to model SQL actions which don't flowchart usefully because they tend to be pure computation over tables).
It is also unclear that such flowcharts are useful. The examples at the page I provided should be semiconvincing; if you take into account all the microscopic details (e.g., the possiblity of an ABORT control flow arc emanating from every subroutine call [because each call may throw an exception]) these diagrams get horrendously big, fast. The fact that the diagrams are space-consuming (boxes, diamonds, lines, lots of whitespace) aggravates this pretty badly. Once they get big, you literally get lost in space following the arcs. Again, a good reason for people to avoid flowcharts for entire systems. (The other reason people like text languages is they can in fact be pretty dense; you can get a lot on a page with a succinct language, and wait'll you see APL :)
They might be of marginal help in individual functions, if the function has complex logic.
I think it unlikely that you are going to get language accurate analyzers that produce flowcharts for all the languages you want, that such anlayzers can compose their flowcharts nicely (you want JavaScript invoking C# running SQL ...?)
What you might hope for is a compromise solution: display the code with various hyper links to the other artifacts referenced. You still need the ability to produce such hyperlinked code (see for one way this might work), but you also need hyperlinks across the language boundaries.
I know of no tools that presently do that. And I doubt you have the interest or willpower to build such tools on your own.


Test-Automation using MetaProgramming

i want to learn test automation using meta programming.i googled it could not find any thing.can anybody suggest me some resources where can i get info about "how to use Meta Programming for making test automation easy"?
That's a broad topic and not a lot has been written about it, because of the "dark corners" of metaprogramming.
What do you mean by "metaprogramming"?
As background, I consider metaprogramming to be any activity in which a tool (which we call a "metaprogramming tool") is used to inspect or modify the application software to achieve some effect.
Many people consider "reflection" to be a kind of metaprogramming; other consider (C++-style) templates to be metaprogramming; some suggest aspect-oriented programming.
I sort of agree but think these are weak versions of what you want, because each has severe limits on what it can see or do to source code. What you really want is a metaprogramming tool that has access to everything in your source program (yes, comments too!) Such tools are called Program Transformation Systems (PTS); they work by parsing the source code and operating on the parsed representation of the program. (I happen to build one of these, see my bio). PTSes can then analyze the code accurate, and/or make reliable changes to the code and regenerate valid source with the changes. PS: a PTS can implement all those other metaprogramming techniques as special cases, so it is strictly more general.
Where can you use metaprogramming for testing?
There are at least 2 areas in which metaprogramming might play a role:
1) Collection of information from tests
2) Generation of tests
3) Avoidance of tests
Collection of test results depends on the nature of tests. Many tests are focused on "is this white/black box functioning correctly"? Assuming the tests are written somehow, they have to have access to the box under test,
be able to invoke that box in a realistic ways, determine if the result is correct, and often tabulate the results to that post-testing quality assessments can be made.
Access is the first problem. The black box to be tested may not be easily accessible to a testing framework: driven by a UI event, in a non-public routine, buried deep inside another function where it hard to get at.
You may need metaprogramming to "temporarily" modify the program to provide access to the box that needs testing (e.g., change a Private method to Public so it can be called from outside). Such changes exist only for the duration of the test project; you throw the modified program away because nobody wants it for anything but the test results. Yes, you have to ensure that the code transformations applied to make things visible don't change the program functionality.
The second problem is exercising the targeted black box in a realistic environment. Each code module runs in a world in which it assumes data and the environment are "properly" configured. The test program can set up that world explicitly by making calls on lots of the program elements or using its own custom code; this is usually the bulk of a test routine, and this code is hard to write and fragile (the application under test keeps changing; so do its assumptions about the world). One might use metaprogramming to instrument the application to collect the environment under which a test might need to run, thus avoiding the problem of writing all the setup code.
Finally, one might want to record more than just "test failed/passed". Often it is useful to know exactly what code got tested ("test coverage"). One can instrument the application to collect what-got-executed data; here's how to do it for code blocks: using a PTS. More sophisticated instrumentation might be used to capture information about which paths through the code have been executed. Uncovered code, and/or uncovered paths, show where tests have not been applied and you arguably know nothing about what the program does, let alone whether it is buggy in a straightforward way.
Generation of tests
Someone/thing has to produce tests; we've already discussed how to produce the set-up-the-environment part. What about the functional part?
Under the assumption that the program has been debugged (e.g, already tested by hand and fixed), one could use metaprogramming to instrument the code to capture the results of execution of a black box (e.g., instance execution post-conditions). By exercising the program, one can then produce (by definition) "correctly produces" results which can be transformed into a test. In this way, one might construct a huge variety of regression tests for an existing program; these will be valuable in verifying the further enhancements to the program don't break most of its functionality.
Often a function has qualitatively different behaviors on different ranges of input (e.g., for x<10, produced x+1, else produces x*x). Ideally one would like to provide a test for each qualitively different results (e.g, x<10, x>=10) which means one would like to partition the input ranges. Metaprogrammning can help here, too, by enumerating all (partial) paths through module, and providing the predicate that controls each path.
The separate predicates each represent the input space partition of interest.
Avoidance of Tests
One only tests code one does not trust (surely you aren't testing the JDK?) Any code consructed by a reliable method doesn't need tests (the JDK was constructed this way, or at least Oracle is happy to have you beleive it).
Metaprogramming can be used to automatically generate code from specifications or DSLs, in relaible ways. Such generated code is correct-by-construction (we can argue about what degree of rigour), and doesn't need tests. You might need to test that DSL expression achieves the functionaly you desired, but you don't have to worry about whether the generated code is right.

Automatisation&Piping of diverse tasks

I am looking for recommendations for a very generic automation/task execution tool. The scope is somewhat between a script, a build system like make and orchestration tools like Ansible or Puppet. The best I can do is describe my rather vague 'requirements' and hope for clues how others have solved these problems. Sorry for the long description, I guess I don't really know what exactly I want he solution to do. I profit from programming answers on SO all the time but I am not entirely sure if my open ended question is acceptable here.
We work as data analysts/system validators in a corporate setting. We perform a range of diverse tasks and interact with lots of ever changing systems. Each little step we do is arguably mundane/easy, but the bigger picture only forms if lots of iterations with slightly different inputs or combinations are repeated. It is a bit like looking for a needle in a hay stack, but the concrete problem is slightly different every time. This makes it hard to use a normal script or automation tool, which require more structure to work. But doing things semi-manual without a big team does not allow us to cover all the analysis/cases we want/need.
To give an applied example: a typical tasks could involve setting up a big calculation in a vendor system, extracting their ASCII output from a web server and parsing it. Then we would suck raw input data from a set of configuration files and data bases. This is piped into some of our home grown replication tools/models living in C++. Then both the system's results and our replication is scanned for interesting outliers (e.g. regression tested) and only this subset is uploaded for human analysts to investigate, nicely presented in an Excel sheet.
We can do all these things easily by hand for a once-off or maybe using ad-hoc tools/scripts. We just can't do it repeatedly for ever so slightly different settings. We seem to need a library for 'common tasks' that are just specialized by some few inputs (e.g. task it to download a time series and scan for outliers - parameters would be db access/login and maybe parameters defining what an outlier is in that context). And then I need to chain these tasks together to make complex tasks repeatable and simple to build up from atomic steps.
I have not found anything really do something like this. There seems to be specialist scripting or tools for each niche available, but not something combining all the different tasks I need to perform.
I have been so far toying on and off with a minimalist sqlite database which controls a set of python 'scripts'/wrappers. These scripts take input parameters from the data base, and they are chained/piped based on the database. The scripts write their results back to the database, mostly as plain text and floats/ints. This kind of db interface is very error prone and complicated for humans; the idea is to have (template) scripts writing (concrete/parametrised) scripts to the db for execution, like rolling itself out before executing. Not sure if this is a smart idea, but the db is driving the scripts, without much interacting among these building block script; rather than having the conventional bunch of scripts calling each other and dumping some data into db as an after thought. So far we have lots of separate wrappers (scripts) to talk to all the systems and do the work, what is really missing is something tying it all together an controlling it.
I am interested (obviously) more in data/flow transparency, repeatability and chaining mini-programs together to bigger units, rather than speed or scaling to larger data sets. All the heavier lifting is either done in the systems we interact with, or it is delegated to C++ called from these python scripts. This is not a production system with more stability and fixed goals but rather a flexible analysis/investigation helper.
I really hope someone here has previously run into exactly that problem severely limiting our productivity, and we can just piggy back off your solution or ideas.
I would suggest that you consider staf (Software Test Automation Framework). It's open source, distributed, and cross-platform. It will run just about any task on just about any platform. It has a variety of plugin "Services" available for specific purposes, or you can create your own custom Service. You can also extend the functionality through scripting (jython) It's also well documented and reasonably well supported through user forums by IBM.

Does a language describe things beyond itself?

I now have sufficent exposure to the Objective-C that if i'm stuck with anything, I know how to think of the problem in terms of a likely tool I need and go look for it. Simple really. There's A Method For That. So nothings a real problem anymore.
Now I'm looking deeper at the language in broader terms. We write stuff. The compiler hews out all the code to execute it. From a simple flashlight app thats a if/then decision to turn on, to a highly complex accelerometer driven 3D shoot 'em up with blood 'n guts and body parts following all sorts of physics, the compiler prepares the code ready to be executed like a giant railway layout. No matter how random it appears on the screen, everything possible can be generically described and prepared for.
So here's the question:
Are there cases where something completely unexpected to the software designer can still be handled without an execution halt? Maybe I'd better re-frame the question a few different ways: Can a ( objective-C ) program meta-compile within itself in response to an unplanned-for user request? or to re-put my opening remark, are there tools or methods for unlikely descriptions of unlikely problems?
I think #kfb has the right comment about metaprogramming. Check out the Runtime docs in conjunction with metaprogramming tutorials.
Parts of your last question might be in the realm of this doc.
If your looking for ways to reduce the size of your code base for the lesser used features, one idea might be to make the features internet based (assuming connectivity is not a problem).

What tool to use for finding duplicated Ada code due to copy&paste

I'm looking for a tool for finding duplicated code due to copy&paste programming to be run over a large Ada codebase. I suppose that Ada support in the tool is important for detecting more than the trivial text similarities, that is, ignore layout or identifier difference, etc.
The tools that I have found with Ada support are the following:
Clone Doctor, commercial product with support for several languages, including Ada.
ConQAT: commercially supported open source product that includes a CloneDetection tool with Ada support since September 2011
Have you tried these tools? Am I missing any other one of interest? Is the language support really significant or a general text tool would be enough? What is your experience with code duplication detection?
Thanks in advance.
I'm the author of CloneDR. Read the following understanding my bias.
It is important to understand the differences in the detection methods of clone detection tools, and the quality of the results as a consequence.
ConQAT is a representative of what are called "token based" detectors. They match sequences of language tokens (operators, identifiers, brackets, keywords etc.) The good news is they are pretty fast (that isn't a big issue; you don't run clone detection every 30 seconds, once a week is enough). They will find some clones that are near-misses, in the sense that another identifier or constant is substituted for an identifier in a clone. The bad news is that they don't understand the structure of your code and consequently want to report things like
} void ID ( ID
as clones. This is defeated by making the detectors only hunt for very long sequences of tokens (typically 30 or more), which means token-based detectors cannot find small but interesting clones without also drowning you in false positives like the above.
CloneDR operates by parsing the code (even for Ada) just like a compiler, building abstract syntax trees, and matching the trees up to a point of difference. It cannot propose a clone that crosses structure boundaries in silly ways. It will find near misses of the same kind as the token based detectors, but it goes beyond this. CloneDR will find consistent substitutions ("anti unifiers") which means clones can be explained by a small number of parameters that have been used in many places in the clone, and it will find variations in the code in which the mismatches are larger than a single token, e.g., expressions, statements, declarations, even blocks. So it produces fewer false positives and better answers. Independent research reports that compare types of clone detectors, specifically including CloneDR, agree with this analysis.
There is more detailed discussion at the Clone Doctor link you listed above. You can see examples of detected clones for many languages (but we don't have an Ada report on the web site).
EDIT March 19, 2012:
Now you can download an eval copy of an Ada95 CloneDR.
Ira Baxter has a good description.
Token-based clone detection tools tend to be good enough for our purpose, which is usually to get a quick overview of how bad code duplication is in a body of source code we haven't seen before, and how duplication is distributed across that code.
In particular, we are happy with CCFinderX, because it has a nice visualization frontend.
However, it's buggy, unmaintained, and the code has been released but without any license statement.
It has language specific preprocessors for some languages, but we often just disable them (they are buggy as well).
If you need better accuracy, you know exactly the language you need to parse (e.g. with C or C++, this is not always the case), and you can find a tool that parses exactly that language (which is also an issue with C and C++), a parsing-based approach may be better, as Ira writes.

How do you organize code in embedded projects?

Highly embedded (limited code and ram size) projects pose unique challenges for code organization.
I have seen quite a few projects with no organization at all. (Mostly by hardware engineers who, in my experience are not typically concerned with non-functional aspects of code.)
However, I have been trying to organize my code accordingly:
hardware specific (drivers, initialization)
application specific (not likely to be reused)
reusable, hardware independent
For each module I try to keep the purpose to one of these three types.
Due to limited size of embedded projects and the emphasis on performance, it is often keep this organization.
For some context, my current project is a limited DSP application on a MSP430 with 8k flash and 256 bytes ram.
I've written and maintained multiple embedded products (30+ and counting) on a variety of target micros, including MSP430's. The "rules of thumb" I have been most successful with are:
Try to modularize generic concepts as much as possible (e.g. separate driver code from application code). -- It makes for easier maintenance and reuse/porting of a project to another target micro in the future.
DO NOT start by worrying about optimized code at the very beginning. Try to solve the domain's problem first and optimize second. -- Your target micro can handle a lot more "stuff" than you might expect.
Work to ensure readability. Although most embedded projects seem to have short development-cycles, the projects often live longer than you might expect and another developer will undoubtedly have to work with your code.
I've worked on 8-bit PIC processors with similar limitations.
One restriction you don't have is how many comments you make or what you choose to name your methods, variables, etc.. Take advantage. Speed and size constraints do sometimes trump organization, but you can always explain.
Another tip is to break up a logical source file into even more pieces than you need, then bind them by #includeing them in a compilation unit. This allows you to have lots of reusable code (even one routine per file) but combine in whatever order you need. This is useful e.g. when trying to meet compilation unit size restrictions, or to pick and choose which common subroutines you need on the next project.
I try to organize it as if I had unlimited RAM and ROM, and it usually works out fine. As mentioned elsewhere, do not try to optimize it until you absolutely need to.
If you can get a pin-compatible processor that has more resources, it's better to get it working on that, concentrating on good structure and layout, then optimize for size later when you understand the code better.
Except under exceptional circumstances (see note), the organisation of your code will have no impact on the final product. (contents of the code are obviously a different matter)
So with that in mind you should organise your code as you would any other project.
With that said, the following are fairly typical:
If this is a processor that you've worked on before, or will be working on in the future, you will usually want to keep a dedicated hardware abstraction layer that can be shared between projects in the future. Typically this module would contain items like routines for managing any uarts, timers etc.
Usually it's reasonable to maintain a set of platform specific code for initialisation and setup that performs all of the configuration and initialisation up to the point where your executive takes over and runs your application. It will also include platform specific hal routines.
The executive/application is probably maintained as a separate module. All of the hardware specific code should be hidden in the hal (as mentioned above).
By splitting your code up like this you also have the option of compiling and running your application as a simulation, on a completely different platform, just by replacing the hardware specific code with routines that mimic the hardware.
This can be good for unit testing and debugging and algorithmic problems you might have.
Exceptional circumstances as might be imposed by unusual compiler restrictions. eg. I've come across some compilers that expect all interrupt service routines to be compiled within a single object file.
I've worked with some sensors like the Tmote Sky, I too have seen poor organization, and I have to admit i have contributed to it. Anyway I'd say that some confusion has to be, because loading too much modules or too much part of program will be (imho) resource killing too, so try to be aware of a threshold between organization and usability on the low resources.
Obviously this don't mean let caos begin, but for example try to get a look on the organization of the tinyOS source code and applications, it's an idea on what I'm trying to say.
Although it is a bit painful, one organization technique that is somewhat common with embedded C libraries is to split every single function and variable into a separate C source file, and then aggregate the resulting collection of O files into a library file.
The motivation for doing this is that for most normal linkers the unit of linkage is an object, for every object you either get the whole object or none of it. Since there is a 1-1 relationship between C files and object files, putting each symbol in it's own C file gives each one it's own object. This in turn lets the linker pull in only that subset of functions and variables that are actually used.
This sort of game doesn't help at all for headers they can happily be left as single files.