I am looking for recommendations for a very generic automation/task execution tool. The scope is somewhat between a script, a build system like make and orchestration tools like Ansible or Puppet. The best I can do is describe my rather vague 'requirements' and hope for clues how others have solved these problems. Sorry for the long description, I guess I don't really know what exactly I want he solution to do. I profit from programming answers on SO all the time but I am not entirely sure if my open ended question is acceptable here.
--
We work as data analysts/system validators in a corporate setting. We perform a range of diverse tasks and interact with lots of ever changing systems. Each little step we do is arguably mundane/easy, but the bigger picture only forms if lots of iterations with slightly different inputs or combinations are repeated. It is a bit like looking for a needle in a hay stack, but the concrete problem is slightly different every time. This makes it hard to use a normal script or automation tool, which require more structure to work. But doing things semi-manual without a big team does not allow us to cover all the analysis/cases we want/need.
To give an applied example: a typical tasks could involve setting up a big calculation in a vendor system, extracting their ASCII output from a web server and parsing it. Then we would suck raw input data from a set of configuration files and data bases. This is piped into some of our home grown replication tools/models living in C++. Then both the system's results and our replication is scanned for interesting outliers (e.g. regression tested) and only this subset is uploaded for human analysts to investigate, nicely presented in an Excel sheet.
We can do all these things easily by hand for a once-off or maybe using ad-hoc tools/scripts. We just can't do it repeatedly for ever so slightly different settings. We seem to need a library for 'common tasks' that are just specialized by some few inputs (e.g. task it to download a time series and scan for outliers - parameters would be db access/login and maybe parameters defining what an outlier is in that context). And then I need to chain these tasks together to make complex tasks repeatable and simple to build up from atomic steps.
I have not found anything really do something like this. There seems to be specialist scripting or tools for each niche available, but not something combining all the different tasks I need to perform.
I have been so far toying on and off with a minimalist sqlite database which controls a set of python 'scripts'/wrappers. These scripts take input parameters from the data base, and they are chained/piped based on the database. The scripts write their results back to the database, mostly as plain text and floats/ints. This kind of db interface is very error prone and complicated for humans; the idea is to have (template) scripts writing (concrete/parametrised) scripts to the db for execution, like rolling itself out before executing. Not sure if this is a smart idea, but the db is driving the scripts, without much interacting among these building block script; rather than having the conventional bunch of scripts calling each other and dumping some data into db as an after thought. So far we have lots of separate wrappers (scripts) to talk to all the systems and do the work, what is really missing is something tying it all together an controlling it.
I am interested (obviously) more in data/flow transparency, repeatability and chaining mini-programs together to bigger units, rather than speed or scaling to larger data sets. All the heavier lifting is either done in the systems we interact with, or it is delegated to C++ called from these python scripts. This is not a production system with more stability and fixed goals but rather a flexible analysis/investigation helper.
I really hope someone here has previously run into exactly that problem severely limiting our productivity, and we can just piggy back off your solution or ideas.
I would suggest that you consider staf (Software Test Automation Framework). It's open source, distributed, and cross-platform. It will run just about any task on just about any platform. It has a variety of plugin "Services" available for specific purposes, or you can create your own custom Service. You can also extend the functionality through scripting (jython) It's also well documented and reasonably well supported through user forums by IBM.
At the moment I am learning Objective-C 2. I'm aware that it's used heavily by Mac developers, but I'm more interested in learning the language at this point in time than the frameworks for developing on Mac OS X/iPhone (except for Foundation). In order to do this I want to write a few intermediate* console applications, but I'm stuck for ideas.
Most examples are something along the lines of "Write a Fraction class that has getters/setters and a print function", which isn't very challenging coming from a C++ background. I'd like some generic examples of programs, but I don't want them to include any Objective-C implementation details. I want to figure out the program structure/write my own interfaces and learn the language from there.
In summary: I am curious as to what example programs Objective-C programmers would recommend for exploring the language.
An example of an "intermediate" application would be something along the lines of "Write a program that takes a URL from the command line and returns the number of occurrences of a certain word in data returned:
example -url www.google.com -word search
"Project Euler" is a standard response for this kind of thing, but I get the feeling that you're less interested in being told to implement algorithmic stuff (since that knowledge is easier to port between languages) and more interested in miniprojects that will familiarize you with core libraries. Is this fair?
If so, IMO, you ought to know the basics of how to do the following with the standard libraries of language you hope to use for serious work:
Standard IO
Network IO
Disk IO and navigating the filesystem
Regexp utilities
Structured data (XML libraries and CSV libraries if they exist)
Programming problems I would recommend for those:
It sounds like you've already done this.
A very simple proxy - something like what you described in your post, but that listens on a port for a message containing a URL rather than taking it on the command line, and likewise returns the results to whatever contacted it over the network rather than outputting to stdio. [Obviously you need to have the machine behind an appropriate firewall for this!]
Something which takes a directory path and recursively tallies the number of lines its children contain. (So, get the directory's listing, open each child file and count the number of line breaks. Then open each of its child directories, get their listings, ...) Record any errors encountered (e.g., no read privileges) in a reasonable way. Write out the final results to file in the directory supplied.
Usually if I tool around in a language enough, I'll run across some problem which I just naturally find myself using regexps for. I'll assume the same is true for you and punt this element for now.
Fetch StackOverflow.com, and [by putting it into a DOM model and navigating that] determine whether this question is still on the front page.
I got the most out of Objective-C by exploring it with a testing framework. I have written a short blog post about it. You should also wrap your head around the memory management conventions employed by Objective-C, reference counting takes a little time to get used to but works very well if responsibilities are clearly segregated (I have written about that on my blog too).
By getting my hands dirty on a testing framework (GHUnit for that matter), I was able to learn far more about the language than I could have in a "traditional" way. Of course you'll need a little pet project, otherwise this approach doesn't make sense.
I don't think your example is a very good idea as it requires you to mess with http connections, resources etc. which is a little framework specific after all. Parsing a text file would be a little easier in this regard. Using a unit testing framework has the following advantages for you:
learn about platform specific build systems and deployment details
forced to develop components in a loosely coupled fashion from the ground up
thereby exploring unique mechanisms of the language, that might require new or make known patterns redundant (e.g. categories make dependency injection obsolete etc.)
fast compile-test cycle, less time spent in front of the debugger
combined with source control: painless experiments
You should also look into the testing framework implementation, as testing frameworks always require to work with metadata to some extend. Testing frameworks are often used together with isolation frameworks. They basically create objects at runtime that comply to certain interfaces and act as stand-ins for concrete objects. Looking at their implementation will teach you about the runtime manipulations that can be done in Objective-C (keyword: Method-Swizzling)
I use MyGeneration along with nHibernate to create the basic POCO objects and XML mapping files. I have heard some people say they think code generators are not a good idea. What is the current best thinking? Is it just that code generation is bad when it generates thousands of lines of not understandable code?
Code generated by a code-generator should not (as a generalisation) be used in a situation where it is subsequently edited by human intervention. Some systems such the wizards on various incarnations of Visual C++ generated code that the programmer was then expected to edit by hand. This was not popular as it required developers to pick apart the generated code, understand it and make modifications. It also meant that the generation process was one shot.
Generated code should live in separate files from other code in the system and only be generated from the generator. The generated code code should be clearly marked as such to indicate that people shouldn't modify it. I have had occasion to do quite a few code-generation systems of one sort or another and All of the code so generated has something like this in the preamble:
-- =============================================================
-- === Foobar Module ===========================================
-- =============================================================
--
-- === THIS IS GENERATED CODE. DO NOT EDIT. ===
--
-- =============================================================
Code Generation in Action is quite a good book on the subject.
Code generators are great, bad code is bad.
Most of the other responses on this page are along the lines of "No, because often the generated code is not very good."
This is a poor answer because:
1) Generators are tool like anything else - if you misuse them, dont blame the tool.
2) Developers tend to pride themselves on their ability to write great code one time, but you dont use code generators for one off projects.
We use a Code Generation system for persistence in all our Java projects and have thousands of generated classes in production.
As a manager I love them because:
1) Reliability: There are no significant remaining bugs in that code. It has been so exhaustively tested and refined over the years than when debugging I never worry about the persistence layer.
2) Standardisation: Every developers code is identical in this respect so there is much less for a guy to learn when picking up a new project from a coworker.
3) Evolution: If we find a better way to do things we can update the templates and update 1000's of classes quickly and consistently.
4) Revolution: If we switch to a different persistence system in the future then the fact that every single persistent class has an exactly identical API makes my job far easier.
5) Productivity: It is just a few clicks to build a persistent object system from metadata - this saves thousands of boring developer hours.
Code generation is like using a compiler - on an individual case basis you might be able to write better optimised assembly language, but over large numbers of projects you would rather have the compiler do it for you right?
We employ a simple trick to ensure that classes can always be regenerated without losing customisations: every generated class is abstract. Then the developer extends it with a concrete class, adds the custom business logic and overrides any base class methods he wants to differ from the standard. If there is a change in metadata he can regenerate the abstract class at any time, and if the new model breaks his concrete class the compiler will let him know.
The biggest problem I've had with code generators is during maintenance. If you modify the generated code and then make a change to your schema or template and try to regenerate you can have problems.
One problem is if the tool doesn't allow you to protect changes you've made to the modified code then your changes will be overwritten.
Another problem I've seen, particularly with code generators in RSA for web services, if you change the generated code too much the generator will complain that there is a mismatch and refuse to regenerate the code. This can happen for something as simple as changing the type of a variable. Then you are stuck generating the code to a different project and merging the results back into your original code.
Code generators can be a boon for productivity, but there are a few things to look for:
Let you work the way you want to work.
If you have to bend your non-generated code to fit around the generated code, then you should probably choose a different approach.
Run as part of your regular build.
The output should be generated to an intermediates directory, and not be checked in to source control. The input must be checked in to source control, however.
No install
Ideally, you check the tool in to source control, too. Making people install things when preparing a new build machine is bad news. For example, if you branch, you want to be able to version the tools with the code.
If you must, make a single script that will take a clean machine with a copy of the source tree, and configure the machine as required. Fully automated, please.
No editing output
You shouldn't have to edit the output. If the output isn't useful enough as-is, then the tool isn't working for you.
Also, the output should clearly state that it is a generated file & should not be edited.
Readable output
The output should be written & formatted well. You want to be able to open the output & read it without a lot of trouble.
#line
Many languages support something like a #line directive, which lets you map the contents of the output back to the input, for example when producing compiler error messages or when stepping in the debugger. This can be useful, but it can also be annoying unless done really well, so it's not a requirement.
My stance is that code generators are not bad, but MANY uses of them are.
If you are using a code generator for time savings that writes good code, then great, but often times it is not optimized, or adds a lot of overhead, in those cases I think it is bad.
Code generation might cause you some grief if you like to mix behaviour into your classes. An equally productive alternative might be attributes/annotations and runtime reflection.
Compilers are code generators, so they are not inherently bad unless you only like to program in raw machine code.
I believe however that code generators should always completely encapsulate the generated code. I.e. you should never have to modify the generated code by hand, any change should be done by modifying the input to the generator and regenerate the code.
If its a mainframe cobol code generator that Fran Tarkenton is trying to sell you then absolutely yes!
I've written a few code generators before - and to be honest they saved my butt more than once!
Once you have a clearly defined object - collection - user control design, you can use a code generator to build the basics for you, allowing your time as a developer to be used more effectively in building the complex stuff, after all, who really wants to write 300+ public property declarations and variable instatiations? I'd rather get stuck into the business logic than all the mindless repetitive tasks.
The mistake many people make when using code generation is to edit the generated code. If you keep in mind that if you feel like you need to edit the code, you actually need to be editing the code generation tool it's a boon to productivity. If you are constantly fighting the code that gets generated it's going to end up costing productivity.
The best code generators I've found are those that allow you to edit the templates that generate the code. I really like Codesmith for this reason, because it's template-based and the templates are easily editable. When you find there is a deficiency in the code that gets generated, you just edit the template and regenerate your code and you are forever good after that.
The other thing that I've found is that a lot of code generators aren't super easy to use with a source control system. The way we've gotten around this is to check in the templates rather than the code and the only thing we check into source control that is generated is a compiled version of the generated code (DLL files, mostly). This saves you a lot of grief because you only have to check in a few DLLs rather than possibly hundreds of generated files.
Our current project makes heavy use of a code generator. That means I've seen both the "obvious" benefits of generating code for the first time - no coder error, no typos, better adherence to a standard coding style - and, after a few months in maintenance mode, the unexpected downsides. Our code generator did, indeed, improve our codebase quality initially. We made sure that it was fully automated and integrated with our automated builds. However, I would say that:
(1) A code generator can be a crutch. We have several massive, ugly blobs of tough-to-maintain code in our system now, because at one point in the past it was easier to add twenty new classes to our code generation XML file, than it was to do proper analysis and class refactoring.
(2) Exceptions to the rule kill you. We use the code generator to create several hundred Screen and Business Object classes. Initially, we enforced a standard on what methods could appear in a class, but like all standards, we started making exceptions. Now, our code generation XML file is a massive monster, filled with special-case snippets of Java code that are inserted into select classes. It's nearly impossible to parse or understand.
(3) Since so much of our code is generated, using values from a database, it's proven difficult for developers to maintain a consistent code base on their individual workstations (since there can be multiple versions of the database). Debugging and tracing through the software is a lot harder, and newbies to the team take much longer to figure out the "flow" of the code, because of the extra abstraction and implicit relationships between classes. IDE's cannot pick up relationships between two classes that communicate via a code-generated class.
That's probably enough for now. I think Code Generators are great as part of a developer's individual toolkit; a set of scripts that write out your boilerplate code make starting a project a lot easier. But Code Generators do not make maintenance problems go away.
In certain (not many) cases they are useful. Such as if you want to generate classes based on lookup-type data in the database tables.
Code generation is bad when it makes programming more difficult (IE, poorly generated code, or a maintenance nightmare), but they are good when they make programming more efficient.
They probably don't always generate optimal code, but depending on your need, you might decide that developer manhours saved make up for a few minor issues.
All that said, my biggest gripe with ORM code generators is that maintenance the generated code can be a PITA if the schema changes.
Code generators are not bad, but sometimes they are used in situations when another solution exists (ie, instantiating a million objects when an array of objects would have been more suitable and accomplished in a few lines of code).
The other situation is when they are used incorrectly, or coded badly. Too many people swear off code generators because they've had bad experiences due to bugs, or their misunderstanding of how to correctly configure it.
But in and of themselves, code generators are not bad.
-Adam
They are like any other tool. Some give beter results than others, but it is up to the user to know when to use them or not. A hammer is a terrible tool if you are trying to screw in a screw.
This is one of those highly contentious issues. Personally, I think code generators are really bad due to the unoptimized crap code most of them put out.
However, the question is really one that only you can answer. In a lot of organizations, development time is more important than project execution speed or even maintainability.
We use code generators for generating data entity classes, database objects (like triggers, stored procs), service proxies etc. Anywhere you see lot of repititive code following a pattern and lot of manual work involved, code generators can help. But, you should not use it too much to the extend that maintainability is a pain. Some issues also arise if you want to regenerate them.
Tools like Visual Studio, Codesmith have their own templates for most of the common tasks and make this process easier. But, it is easy to roll out on your own.
It can really become an issue with maintainability when you have to come back and cant understand what is going on in the code. Therefore many times you have to weigh how important it is to get the project done fast compared to easy maintainability
maintainability <> easy or fast coding process
I use My Generation with Entity Spaces and I don't have any issues with it. If I have a schema change I just regenerate the classes and it all works out just fine.
They serve as a crutch that can disable your ability to maintain the program long-term.
The first C++ compilers were code generators that spit out C code (CFront).
I'm not sure if this is an argument for or against code generators.
I think that Mitchel has hit it on the head.
Code generation has its place. There are some circumstances where it's more effective to have the computer do the work for you!
It can give you the freedom to change your mind about the implementation of a particular component when the time cost of making the code changes is small. Of course, it is still probably important to understand the output the code generator, but not always.
We had an example on a project we just finished where a number of C++ apps needed to communicate with a C# app over named pipes. It was better for us to use small, simple, files that defined the messages and have all the classes and code generated for each side of the transaction. When a programmer was working on problem X, the last thing they needed was to worry about the implentation details of the messages and the inevitable cache hit that would entail.
This is a workflow question. ASP.NET is a code generator. The XAML parsing engine actually generates C# before it gets converted to MSIL. When a code generator becomes an external product like CodeSmith that is isolated from your development workflow, special care must be taken to keep your project in sync. For example, if the generated code is ORM output, and you make a change to the database schema, you will either have to either completely abandon the code generator or else take advantage of C#'s capacity to work with partial classes (which let you add members and functionality to an existing class without inheriting it).
I personally dislike the isolated / Alt-Tab nature of generator workflows; if the code generator is not part of my IDE then I feel like it's a kludge. Some code generators, such as Entity Spaces 2009 (not yet released), are more integrated than previous generations of generators.
I think the panacea to the purpose of code generators can be enjoyed in precompilation routines. C# and other .NET languages lack this, although ASP.NET enjoys it and that's why, say, SubSonic works so well for ASP.NET but not much else. SubSonic generates C# code at build-time just before the normal ASP.NET compilation kicks in.
Ask your tools vendor (i.e. Microsoft) to support pre-build routines more thoroughly, so that code generators can be integrated into the workflow of your solutions using metadata, rather than manually managed as externally outputted code files that have to be maintained in isolation.
Jon
The best application of a code generator is when the entire project is a model, and all the project's source code is generated from that model. I am not talking UML and related crap. In this case, the project model also contains custom code.
Then the only thing developers have to care about is the model. A simple architectural change may result in instant modification of thousands of source code lines. But everything remains in sync.
This is IMHO the best approach. Sound utopic? At least I know it's not ;) The near future will tell.
In a recent project we built our own code generator. We generated all the data base stuff, and all the base code for our view and view controller classes. Although the generator took several months to build (mostly because this was the first time we had done this, and we had a couple of false starts) it paid for itself the first time we ran it and generated the basic framework for the whole app in about ten minutes.
This was all in Java, but Ruby makes an excellent code-writing language particularly for small, one-off type projects.
The best thing was the consistency of the code and the project organization. In addition you kind of have to think the basic framework out ahead of time, which is always good.
Code generators are great assuming it is a good code generator. Especially working c++/java which is very verbose.