shorten coldfusion namespaces for components - oop

i am making an object-oriented app in coldfusion, and so i have really broken down the code. so, i have really long namespaces for my components; for example:
folder1.folder2.plugin1.datatypes.Object
i seem to be repeating a lot of stuff, but at the same time, some of these things are acting like "modules". what i mean by this is that "folder2" in the example really contains, for lack of a better term, "stand-alone" components/applications (think of them like plugins). so, aside from them calling other plugins' resources, they act on their own. but, due to the folder structure, i still have to refer to them all as folder1.folder2.... and so on.
so, let us assume that the "folder1.folder2." could change on a whim. (this will not happen, but since "plugin1" would define a stand-alone component, it does not care what "folder1" or "folder2" contains, if they even exist).
when i am writing code within the plugin, is there anyway i can shorten the namespace string; is there such a thing as "relative" namespaceing, just like using relative href links?
such a thing would save me a lot of time, but would also help ensure these things are more stand-alone as they would not be tied to their encapsulating folder structure...

You could use ColdFusion mappings, specifically per-application mappings in Application.cfc.
You do this in Application.cfc
<cfset this.mappings["/com"] = expandPath("folder1/folder2/plugin1") />
The you could reference components by doing com.datatypes.object.
Cannot recall when per-app mappings came about, but its been there for a few releases.

Sounds like you may want to consider dependency injection such as WireBox. This would allow you to have a single configuration file with the full paths and allow you to use an alias to obtain your models. In fact, you can even have wire box scan locations so you don't have to list every object you create.
WireBox was extracted from the amazing ColdBox framework. It is available independently of the ColdBox framework and should be somewhat simple to introduce into your application.
There is a helpful Google Group for ColdOx (and related boxes), ColdBox connection meetings that are recorded and other types of training available for WireBox.
I can not imagine building sophisticated OO without dependency injection. Well worth the effort to learn and implement.

Related

How to understand the big picture in a loose coupled application?

We have been developing code using loose coupling and dependency injection.
A lot of "service" style classes have a constructor and one method that implements an interface. Each individual class is very easy to understand in isolation.
However, because of the looseness of the coupling, looking at a class tells you nothing about the classes around it or where it fits in the larger picture.
It's not easy to jump to collaborators using Eclipse because you have to go via the interfaces. If the interface is Runnable, that is no help in finding which class is actually plugged in. Really it's necessary to go back to the DI container definition and try to figure things out from there.
Here's a line of code from a dependency injected service class:-
// myExpiryCutoffDateService was injected,
Date cutoff = myExpiryCutoffDateService.get();
Coupling here is as loose as can be. The expiry date be implemented literally in any manner.
Here's what it might look like in a more coupled application.
ExpiryDateService = new ExpiryDateService();
Date cutoff = getCutoffDate( databaseConnection, paymentInstrument );
From the tightly coupled version, I can infer that the cutoff date is somehow determined from the payment instrument using a database connection.
I'm finding code of the first style harder to understand than code of the second style.
You might argue that when reading this class, I don't need to know how the cutoff date is figured out. That's true, but if I'm narrowing in on a bug or working out where an enhancement needs to slot in, that is useful information to know.
Is anyone else experiencing this problem? What solutions have you? Is this just something to adjust to? Are there any tools to allow visualisation of the way classes are wired together? Should I make the classes bigger or more coupled?
(Have deliberately left this question container-agnostic as I'm interested in answers for any).
While I don't know how to answer this question in a single paragraph, I attempted to answer it in a blog post instead: http://blog.ploeh.dk/2012/02/02/LooseCouplingAndTheBigPicture.aspx
To summarize, I find that the most important points are:
Understanding a loosely coupled code base requires a different mindset. While it's harder to 'jump to collaborators' it should also be more or less irrelevant.
Loose coupling is all about understanding a part without understanding the whole. You should rarely need to understand it all at the same time.
When zeroing in on a bug, you should rely on stack traces rather than the static structure of the code in order to learn about collaborators.
It's the responsibility of the developers writing the code to make sure that it's easy to understand - it's not the responsibility of the developer reading the code.
Some tools are aware of DI frameworks and know how to resolve dependencies, allowing you to navigate your code in a natural way. But when that isn't available, you just have to use whatever features your IDE provides as best you can.
I use Visual Studio and a custom-made framework, so the problem you describe is my life. In Visual Studio, SHIFT+F12 is my friend. It shows all references to the symbol under the cursor. After a while you get used to the necessarily non-linear navigation through your code, and it becomes second-nature to think in terms of "which class implements this interface" and "where is the injection/configuration site so I can see which class is being used to satisfy this interface dependency".
There are also extensions available for VS which provide UI enhancements to help with this, such as Productivity Power Tools. For instance, you can hover over an interface, a info box will pop up, and you can click "Implemented By" to see all the classes in your solution implementing that interface. You can double-click to jump to the definition of any of those classes. (I still usually just use SHIFT+F12 anyway).
I just had an internal discussion about this, and ended up writing this piece, which I think is too good not to share. I'm copying it here (almost) unedited, but even though it's part of a bigger internal discussion, I think most of it can stand alone.
The discussion is about introduction of a custom interface called IPurchaseReceiptService, and whether or not it should be replaced with use of IObserver<T>.
Well, I can't say that I have strong data points about any of this - it's just some theories that I'm pursuing... However, my theory about cognitive overhead at the moment goes something like this: consider your special IPurchaseReceiptService:
public interface IPurchaseReceiptService
{
void SendReceipt(string transactionId, string userGuid);
}
If we keep it as the Header Interface it currently is, it only has that single SendReceipt method. That's cool.
What's not so cool is that you had to come up with a name for the interface, and another name for the method. There's a bit of overlap between the two: the word Receipt appears twice. IME, sometimes that overlap can be even more pronounced.
Furthermore, the name of the interface is IPurchaseReceiptService, which isn't particularly helpful either. The Service suffix is essentially the new Manager, and is, IMO, a design smell.
Additionally, not only did you have to name the interface and the method, but you also have to name the variable when you use it:
public EvoNotifyController(
ICreditCardService creditCardService,
IPurchaseReceiptService purchaseReceiptService,
EvoCipher cipher
)
At this point, you've essentially said the same thing thrice. This is, according to my theory, cognitive overhead, and a smell that the design could and should be simpler.
Now, contrast this to use of a well-known interface like IObserver<T>:
public EvoNotifyController(
ICreditCardService creditCardService,
IObserver<TransactionInfo> purchaseReceiptService,
EvoCipher cipher
)
This enables you to get rid of the bureaucracy and reduce the design the the heart of the matter. You still have intention-revealing naming - you only shift the design from a Type Name Role Hint to an Argument Name Role Hint.
When it comes to the discussion about 'disconnectedness', I'm under no illusion that use of IObserver<T> will magically make this problem go away, but I have another theory about this.
My theory is that the reason many programmers find programming to interfaces so difficult is exactly because they are used to Visual Studio's Go to definition feature (incidentally, this is yet another example of how tooling rots the mind). These programmers are perpetually in a state of mind where they need to know what's 'on the other side of an interface'. Why is this? Could it be because the abstraction is poor?
This ties back to the RAP, because if you confirm programmers' belief that there's a single, particular implementation behind every interface, it's no wonder they think that interfaces are only in the way.
However, if you apply the RAP, I hope that slowly, programmers will learn that behind a particular interface, there may be any implementation of that interface, and their client code must be able to handle any implementation of that interface without changing the correctness of the system. If this theory holds, we've just introduced the Liskov Substitution Principle into a code base without scaring anyone with high-brow concepts they don't understand :)
However, because of the looseness of the coupling, looking at a class
tells you nothing about the classes around it or where it fits in the
larger picture.
This is not accurate.For each class you know exactly what kind of objects the class depends on, to be able to provide its functionality at runtime.
You know them since you know that what objects are expected to be injected.
What you don't know is the actual concrete class that will be injected at runtime which will implement the interface or base class that you know your class(es) depend on.
So if you want to see what is the actual class injected, you just have to look at the configuration file for that class to see the concrete classes that are injected.
You could also use facilities provided by your IDE.
Since you refer to Eclipse then Spring has a plugin for it, and has also a visual tab displaying the beans you configure. Did you check that? Isn't it what you are looking for?
Also check out the same discussion in Spring Forum
UPDATE:
Reading your question again, I don't think that this is a real question.
I mean this in the following manner.
Like all things loose coupling is not a panacea and has its own disadvantages per se.
Most tend to focus on the benefits but as any solution it has its disadvantages.
What you do in your question is describe one of its main disadvantages which is that it indeed is not easy to see the big picture since you have everything configurable and plugged in by anything.
There are other drawbacks as well that one could complaint e.g. that it is slower than tight coupled applications and still be true.
In any case, re-iterating, what you describe in your question is not a problem you stepped upon and can find a standard solution (or any for that manner).
It is one of the drawbacks of loose coupling and you have to decide if this cost is higher than what you actually gain by it, like in any design-decision trade off.
It is like asking:
Hey I am using this pattern named Singleton. It works great but I can't create new objects!How can I get arround this problem guys????
Well you can't; but if you need to, perhaps singleton is not for you....
One thing that helped me is placing multiple closely related classes in the same file. I know this goes against the general advice (of having 1 class per file) and I generally agree with this, but in my application architecture it works very well. Below I will try to explain in which case this is.
The architecture of my business layer is designed around the concept of business commands. Command classes (simple DTO with only data and no behavior) are defined and for each command there is a 'command handler' that contains the business logic to execute this command. Each command handler implements the generic ICommandHandler<TCommand> interface, where TCommand is the actual business command.
Consumers take a dependency on the ICommandHandler<TCommand> and create new command instances and use the injected handler to execute those commands. This looks like this:
public class Consumer
{
private ICommandHandler<CustomerMovedCommand> handler;
public Consumer(ICommandHandler<CustomerMovedCommand> h)
{
this.handler = h;
}
public void MoveCustomer(int customerId, Address address)
{
var command = new CustomerMovedCommand();
command.CustomerId = customerId;
command.NewAddress = address;
this.handler.Handle(command);
}
}
Now consumers only depend on a specific ICommandHandler<TCommand> and have no notion of the actual implementation (as it should be). However, although the Consumer should know nothing about the implementation, during development I (as a developer) am very much interested in the actual business logic that is executed, simply because development is done in vertical slices; meaning that I'm often working on both the UI and business logic of a simple feature. This means I'm often switching between business logic and UI logic.
So what I did was putting the command (in this example the CustomerMovedCommand and the implementation of ICommandHandler<CustomerMovedCommand>) in the same file, with the command first. Because the command itself is concrete (since its a DTO there is no reason to abstract it) jumping to the class is easy (F12 in Visual Studio). By placing the handler next to the command, jumping to the command means also jumping to the business logic.
Of course this only works when it is okay for the command and handler to be living in the same assembly. When your commands need to be deployed separately (for instance when reusing them in a client/server scenario), this will not work.
Of course this is just 45% of my business layer. Another big peace however (say 45%) are the queries and they are designed similarly, using a query class and a query handler. These two classes are also placed in the same file which -again- allows me to navigate quickly to the business logic.
Because the commands and queries are about 90% of my business layer, I can in most cases move very quickly from presentation layer to business layer and even navigate easily within the business layer.
I must say these are the only two cases that I place multiple classes in the same file, but makes navigation a lot easier.
If you want to learn more about how I designed this, I've written two articles about this:
Meanwhile... on the command side of my architecture
Meanwhile... on the query side of my architecture
In my opinion, loosely coupled code can help you much but I agree with you about the readability of it.
The real problem is that name of methods also should convey valuable information.
That is the Intention-Revealing Interface principle as stated by
Domain Driven Design ( http://domaindrivendesign.org/node/113 ).
You could rename get method:
// intention revealing name
Date cutoff = myExpiryCutoffDateService.calculateFromPayment();
I suggest you to read thoroughly about DDD principles and your code could turn much more readable and thus manageable.
I have found The Brain to be useful in development as a node mapping tool. If you write some scripts to parse your source into XML The Brain accepts, you could browse your system easily.
The secret sauce is to put guids in your code comments on each element you want to track, then the nodes in The Brain can be clicked to take you to that guid in your IDE.
Depending on how many developers are working on projects and whether you want to reuse some parts of it in different projects loose coupling can help you a lot. If your team is big and project needs to span several years, having loose coupling can help as work can be assigned to different groups of developers more easily. I use Spring/Java with lots of DI and Eclipse offers some graphs to display dependencies. Using F3 to open class under cursor helps a lot. As stated in previous posts, knowing shortcuts for your tool will help you.
One other thing to consider is creating custom classes or wrappers as they are more easily tracked than common classes that you already have (like Date).
If you use several modules or layer of application it can be a challenge to understand what a project flow is exactly, so you might need to create/use some custom tool to see how everything is related to each other. I have created this for myself, and it helped me to understand project structure more easily.
Documentation !
Yes, you named the major drawback of loose coupled code. And if you probably already realized that at the end, it will pay off, it's true that it will always be longer to find "where" to do your modifications, and you might have to open few files before finding "the right spot"...
But that's when something really important: the documentation. It's weird that no answer explicitly mentioned that, it's a MAJOR requirement in all big sized development.
API Documentation
An APIDoc with a good search feature. That each file and --almost-- each methods have a clear description.
"Big picture" documentation
I think it's good to have a wiki that explain the big picture. Bob have made a proxy system ? How doest it works ? Does it handle authentication ? What kind of component will use it ? Not a whole tutorial, but just a place when you can read 5 minutes, figure out what components are involved and how they are linked together.
I do agree with all the points of Mark Seemann answer, but when you get in a project for the first time(s), even if you understand well the principles behing decoupling, you'll either need a lot of guessing, or some sort of help to figure out where to implement a specific feature you want to develop.
... Again: APIDoc and a little developper Wiki.
I am astounded that nobody has written about the testability (in terms of unit testing of course) of the loose coupled code and the non-testability (in the same terms) of the tightly coupled design! It is no brainer which design you should choose. Today with all the Mock and Coverage frameworks it is obvious, well, at least for me.
Unless you do not do unit tests of your code or you think you do them but in fact you don't...
Testing in isolation can be barely achieved with tight coupling.
You think you have to navigate through all the dependencies from your IDE? Forget about it! It is the same situation as in case of compilation and runtime. Hardly any bug can be found during the compilation, you cannot be sure whether it works unless you test it, which means execute it. Want to know what is behind the interface? Put a breakpoint and run the goddamn application.
Amen.
...updated after the comment...
Not sure if it is going to serve you but in Eclipse there is something called hierarchy view. It shows you all the implementations of an interface within your project (not sure if the workspace as well). You can just navigate to the interface and press F4. Then it will show you all the concrete and abstract classes implementing the interface.

Language-Portable Example Programs

At the moment I am learning Objective-C 2. I'm aware that it's used heavily by Mac developers, but I'm more interested in learning the language at this point in time than the frameworks for developing on Mac OS X/iPhone (except for Foundation). In order to do this I want to write a few intermediate* console applications, but I'm stuck for ideas.
Most examples are something along the lines of "Write a Fraction class that has getters/setters and a print function", which isn't very challenging coming from a C++ background. I'd like some generic examples of programs, but I don't want them to include any Objective-C implementation details. I want to figure out the program structure/write my own interfaces and learn the language from there.
In summary: I am curious as to what example programs Objective-C programmers would recommend for exploring the language.
An example of an "intermediate" application would be something along the lines of "Write a program that takes a URL from the command line and returns the number of occurrences of a certain word in data returned:
example -url www.google.com -word search
"Project Euler" is a standard response for this kind of thing, but I get the feeling that you're less interested in being told to implement algorithmic stuff (since that knowledge is easier to port between languages) and more interested in miniprojects that will familiarize you with core libraries. Is this fair?
If so, IMO, you ought to know the basics of how to do the following with the standard libraries of language you hope to use for serious work:
Standard IO
Network IO
Disk IO and navigating the filesystem
Regexp utilities
Structured data (XML libraries and CSV libraries if they exist)
Programming problems I would recommend for those:
It sounds like you've already done this.
A very simple proxy - something like what you described in your post, but that listens on a port for a message containing a URL rather than taking it on the command line, and likewise returns the results to whatever contacted it over the network rather than outputting to stdio. [Obviously you need to have the machine behind an appropriate firewall for this!]
Something which takes a directory path and recursively tallies the number of lines its children contain. (So, get the directory's listing, open each child file and count the number of line breaks. Then open each of its child directories, get their listings, ...) Record any errors encountered (e.g., no read privileges) in a reasonable way. Write out the final results to file in the directory supplied.
Usually if I tool around in a language enough, I'll run across some problem which I just naturally find myself using regexps for. I'll assume the same is true for you and punt this element for now.
Fetch StackOverflow.com, and [by putting it into a DOM model and navigating that] determine whether this question is still on the front page.
I got the most out of Objective-C by exploring it with a testing framework. I have written a short blog post about it. You should also wrap your head around the memory management conventions employed by Objective-C, reference counting takes a little time to get used to but works very well if responsibilities are clearly segregated (I have written about that on my blog too).
By getting my hands dirty on a testing framework (GHUnit for that matter), I was able to learn far more about the language than I could have in a "traditional" way. Of course you'll need a little pet project, otherwise this approach doesn't make sense.
I don't think your example is a very good idea as it requires you to mess with http connections, resources etc. which is a little framework specific after all. Parsing a text file would be a little easier in this regard. Using a unit testing framework has the following advantages for you:
learn about platform specific build systems and deployment details
forced to develop components in a loosely coupled fashion from the ground up
thereby exploring unique mechanisms of the language, that might require new or make known patterns redundant (e.g. categories make dependency injection obsolete etc.)
fast compile-test cycle, less time spent in front of the debugger
combined with source control: painless experiments
You should also look into the testing framework implementation, as testing frameworks always require to work with metadata to some extend. Testing frameworks are often used together with isolation frameworks. They basically create objects at runtime that comply to certain interfaces and act as stand-ins for concrete objects. Looking at their implementation will teach you about the runtime manipulations that can be done in Objective-C (keyword: Method-Swizzling)

What is interface bloat?

Can someone explain to me what interface bloat in OOP is (preferably with an example).
G'day,
Assuming you mean API and not GUI, for me I/F bloat can happen in several ways.
An API just keeps getting extended and extended with new functions without any form of segregation so you finish up with a monolithic header file that becomes hard to use.
The functions declared in an existing API keep getting new parameters added to their signatures so you have to keep upgrading and your existing applications are not backwards compatible.
Functions in an existing API keep getting overloaded with very similar variants which can lead to difficulty selecting the relevant function to be used.
To help with this you can:
Separate out the API into a series of headers and libraries so you can more easily control what parts you actually need. Any internal dependencies should be resolved automatically by the vendor so the user doesn't have to find out the dependencies by trial and error, e.g. that I need to include header file wibble.h when I only wanted to use the functions in the API declared in the shozbot.h header file.
Make upgrades to the API backwards compatible by introducing overloading where applicable. But you should group the overloaded functions into categpories, e.g. if new set of overloaded functions are added to an existing API, say our_api.h, to adapt it to a new technology, say SOA, then they are provided separately in their own header file our_api_soa.h in addition to the existing header our_api.h.
HTH
Think of an OO language where all methods are defined in Object, even though they are only meaningful for some subclasses. That would be the most extreme example.
Most Microsoft products?
Interface bloat is having too much on the screen at once, particularly elements that are little used, or are confusing in their function. Probably an easier way to describe interface bloat is to look at something that does not have it, try Basecamp from 37signals. There are only a few tabs, and a few links in the header.
Interface bloat can be remedied by collapsable panes (using Javascript, for example), or drill-down menus that hide less-often used choices until they are needed.
Interface bloat is the gradual addition of elements that turn what may been a simple, elegant interface into one littered with buttons, menus, options, etc. all over the place that ruin the original cohesive feel of the application. One example that comes to mind for me is iTunes. In it's early renditions, it was quite simple, but has, over time, added quite a lot of features that might qualify as bloat (iTunes DJ, Coverflow, Genius).
Interface bloat is sometimes caused by trying to have every feature one click away, as in this humorous example:
Too many toolbar buttons
(Although funny, this example isn't fair to Firefox because in this example the user added all those toolbars)
A UI design technique called "progressive disclosure" is one way to reduce interface bloat. Only expose the most frequently-used features as a top-level click. If you have less-frequently-used features that are still valuable enough to include in your app, group them in a logical way, e.g. behind a dropdown menu or other navigation element.
Learning by example:
http://img46.imageshack.us/img46/5127/ofilematrix.png
An extreme example of interface bloat that most C++ programmers will be familiar with is std::basic_string. Page up and page down of member functions with only small variations, most of these functions wouldn't have had to be member functions but could have been free functions in a string utility library.

What is the best way to save my POJOs into Jackrabbit JCR?

In Jackrabbit I have experienced two ways to save my POJOs into repository nodes for storage in the Jackrabbit JCR:
writing my own layer
and
using Apache Graffito
Writing my own code has proven time consuming and labor intensive (had to write and run a lot of ugly automated tests) though quite flexible.
Using Graffito has been a disappointment because it seems to be a "dead" project stuck in 2006
What are some better alternatives?
Another alternative is to completely skip an OCM framework and simply use javax.jcr.Node as a very flexible DAO itself. The fundamental reason why OCM frameworks exist is because with RDBMS you need a mapping from objects to the relational model. With JCR, which is already very object-oriented (node ~= object), this underlying reason is gone. What is left is that with DAOs you can restrict what your programmers can access in their code (incl. the help of autocompletion). But this approach does not really leverage the JCR concept, which means schema-free and flexible programming. Using the JCR API directly in your code is the best way to follow that concept.
Imagine you want to add a new property to an existing node/object later in the life of your application - with an OCM framework you have to modify it as well and make sure it still works properly. With direct access to nodes it is simply a single point of change. I know, this is a good way to get problems with typos in eg. property names; but this fear is not really backed by reality, since you will in most cases very quickly notice typos or non-matching names when you test your application. A good solution is to use string constants for the common node or property names, even as part of your APIs if you expose the JCR API across them. This still gives you the flexibility to quickly add new properties without having to adopt OCM layers.
For having some constraints on what is allowed or what is mandatory (ie. "semi-schema") you can use node types and mixins (since JCR 2.0 you can also change the node type for existing content): thus you can handle this completely on the repository level and don't have to care about typing and constraints inside your application code - apart from catching the exceptions ;-)
But, of course, this choice depends on your requirements and personal preferences.
You might want to have a look at Jackrabbit OCM that is alive and kickin. Of course another way is to manually serialize/deserialize the POJOs. For that there are many different options. Question is whether you need fix schema to query the objects in JCR. If you just want to serialize into XML then XStream is a very painless way to do so. If you need a more fix schema there is also Betwixt from Apache Commons.
It depends on your needs. When you directly use javax.jcr.node, it means your code is heavily coupled to the underlying mechanism. In medium and even some small sized projects, this is not a good idea. Obviously the question will be how to go from the Node to your own domain model. The problem is quite similar as with going from Jdbc ResultSet to your own domain model. Mind you, I mean from a technical point of view the problem is similar. From a functional point of view, there are huge differences between using JDBC and JCR.
Another deciding factor is whether you can impose a structure in your JCR content or not. Some application domains can (but still match better with JCR than JDBC), in other domains the content may be highly unstructured in nature. In such case OCM is clearly overkill. I'd still advice to write your own wrapper layer around javax.jcr.* classes.
There's also https://github.com/ilikeorangutans/omf, a very flexible object to JCR mapper. Unfortunately it doesn't have write support yet. However we're successfully using this framework in a large CMS installation.
There is also the JCROM project at http://code.google.com/p/jcrom/. That project went dormant for a couple of years, but there have been a few new releases as of summer 2013.

Are code generators bad?

I use MyGeneration along with nHibernate to create the basic POCO objects and XML mapping files. I have heard some people say they think code generators are not a good idea. What is the current best thinking? Is it just that code generation is bad when it generates thousands of lines of not understandable code?
Code generated by a code-generator should not (as a generalisation) be used in a situation where it is subsequently edited by human intervention. Some systems such the wizards on various incarnations of Visual C++ generated code that the programmer was then expected to edit by hand. This was not popular as it required developers to pick apart the generated code, understand it and make modifications. It also meant that the generation process was one shot.
Generated code should live in separate files from other code in the system and only be generated from the generator. The generated code code should be clearly marked as such to indicate that people shouldn't modify it. I have had occasion to do quite a few code-generation systems of one sort or another and All of the code so generated has something like this in the preamble:
-- =============================================================
-- === Foobar Module ===========================================
-- =============================================================
--
-- === THIS IS GENERATED CODE. DO NOT EDIT. ===
--
-- =============================================================
Code Generation in Action is quite a good book on the subject.
Code generators are great, bad code is bad.
Most of the other responses on this page are along the lines of "No, because often the generated code is not very good."
This is a poor answer because:
1) Generators are tool like anything else - if you misuse them, dont blame the tool.
2) Developers tend to pride themselves on their ability to write great code one time, but you dont use code generators for one off projects.
We use a Code Generation system for persistence in all our Java projects and have thousands of generated classes in production.
As a manager I love them because:
1) Reliability: There are no significant remaining bugs in that code. It has been so exhaustively tested and refined over the years than when debugging I never worry about the persistence layer.
2) Standardisation: Every developers code is identical in this respect so there is much less for a guy to learn when picking up a new project from a coworker.
3) Evolution: If we find a better way to do things we can update the templates and update 1000's of classes quickly and consistently.
4) Revolution: If we switch to a different persistence system in the future then the fact that every single persistent class has an exactly identical API makes my job far easier.
5) Productivity: It is just a few clicks to build a persistent object system from metadata - this saves thousands of boring developer hours.
Code generation is like using a compiler - on an individual case basis you might be able to write better optimised assembly language, but over large numbers of projects you would rather have the compiler do it for you right?
We employ a simple trick to ensure that classes can always be regenerated without losing customisations: every generated class is abstract. Then the developer extends it with a concrete class, adds the custom business logic and overrides any base class methods he wants to differ from the standard. If there is a change in metadata he can regenerate the abstract class at any time, and if the new model breaks his concrete class the compiler will let him know.
The biggest problem I've had with code generators is during maintenance. If you modify the generated code and then make a change to your schema or template and try to regenerate you can have problems.
One problem is if the tool doesn't allow you to protect changes you've made to the modified code then your changes will be overwritten.
Another problem I've seen, particularly with code generators in RSA for web services, if you change the generated code too much the generator will complain that there is a mismatch and refuse to regenerate the code. This can happen for something as simple as changing the type of a variable. Then you are stuck generating the code to a different project and merging the results back into your original code.
Code generators can be a boon for productivity, but there are a few things to look for:
Let you work the way you want to work.
If you have to bend your non-generated code to fit around the generated code, then you should probably choose a different approach.
Run as part of your regular build.
The output should be generated to an intermediates directory, and not be checked in to source control. The input must be checked in to source control, however.
No install
Ideally, you check the tool in to source control, too. Making people install things when preparing a new build machine is bad news. For example, if you branch, you want to be able to version the tools with the code.
If you must, make a single script that will take a clean machine with a copy of the source tree, and configure the machine as required. Fully automated, please.
No editing output
You shouldn't have to edit the output. If the output isn't useful enough as-is, then the tool isn't working for you.
Also, the output should clearly state that it is a generated file & should not be edited.
Readable output
The output should be written & formatted well. You want to be able to open the output & read it without a lot of trouble.
#line
Many languages support something like a #line directive, which lets you map the contents of the output back to the input, for example when producing compiler error messages or when stepping in the debugger. This can be useful, but it can also be annoying unless done really well, so it's not a requirement.
My stance is that code generators are not bad, but MANY uses of them are.
If you are using a code generator for time savings that writes good code, then great, but often times it is not optimized, or adds a lot of overhead, in those cases I think it is bad.
Code generation might cause you some grief if you like to mix behaviour into your classes. An equally productive alternative might be attributes/annotations and runtime reflection.
Compilers are code generators, so they are not inherently bad unless you only like to program in raw machine code.
I believe however that code generators should always completely encapsulate the generated code. I.e. you should never have to modify the generated code by hand, any change should be done by modifying the input to the generator and regenerate the code.
If its a mainframe cobol code generator that Fran Tarkenton is trying to sell you then absolutely yes!
I've written a few code generators before - and to be honest they saved my butt more than once!
Once you have a clearly defined object - collection - user control design, you can use a code generator to build the basics for you, allowing your time as a developer to be used more effectively in building the complex stuff, after all, who really wants to write 300+ public property declarations and variable instatiations? I'd rather get stuck into the business logic than all the mindless repetitive tasks.
The mistake many people make when using code generation is to edit the generated code. If you keep in mind that if you feel like you need to edit the code, you actually need to be editing the code generation tool it's a boon to productivity. If you are constantly fighting the code that gets generated it's going to end up costing productivity.
The best code generators I've found are those that allow you to edit the templates that generate the code. I really like Codesmith for this reason, because it's template-based and the templates are easily editable. When you find there is a deficiency in the code that gets generated, you just edit the template and regenerate your code and you are forever good after that.
The other thing that I've found is that a lot of code generators aren't super easy to use with a source control system. The way we've gotten around this is to check in the templates rather than the code and the only thing we check into source control that is generated is a compiled version of the generated code (DLL files, mostly). This saves you a lot of grief because you only have to check in a few DLLs rather than possibly hundreds of generated files.
Our current project makes heavy use of a code generator. That means I've seen both the "obvious" benefits of generating code for the first time - no coder error, no typos, better adherence to a standard coding style - and, after a few months in maintenance mode, the unexpected downsides. Our code generator did, indeed, improve our codebase quality initially. We made sure that it was fully automated and integrated with our automated builds. However, I would say that:
(1) A code generator can be a crutch. We have several massive, ugly blobs of tough-to-maintain code in our system now, because at one point in the past it was easier to add twenty new classes to our code generation XML file, than it was to do proper analysis and class refactoring.
(2) Exceptions to the rule kill you. We use the code generator to create several hundred Screen and Business Object classes. Initially, we enforced a standard on what methods could appear in a class, but like all standards, we started making exceptions. Now, our code generation XML file is a massive monster, filled with special-case snippets of Java code that are inserted into select classes. It's nearly impossible to parse or understand.
(3) Since so much of our code is generated, using values from a database, it's proven difficult for developers to maintain a consistent code base on their individual workstations (since there can be multiple versions of the database). Debugging and tracing through the software is a lot harder, and newbies to the team take much longer to figure out the "flow" of the code, because of the extra abstraction and implicit relationships between classes. IDE's cannot pick up relationships between two classes that communicate via a code-generated class.
That's probably enough for now. I think Code Generators are great as part of a developer's individual toolkit; a set of scripts that write out your boilerplate code make starting a project a lot easier. But Code Generators do not make maintenance problems go away.
In certain (not many) cases they are useful. Such as if you want to generate classes based on lookup-type data in the database tables.
Code generation is bad when it makes programming more difficult (IE, poorly generated code, or a maintenance nightmare), but they are good when they make programming more efficient.
They probably don't always generate optimal code, but depending on your need, you might decide that developer manhours saved make up for a few minor issues.
All that said, my biggest gripe with ORM code generators is that maintenance the generated code can be a PITA if the schema changes.
Code generators are not bad, but sometimes they are used in situations when another solution exists (ie, instantiating a million objects when an array of objects would have been more suitable and accomplished in a few lines of code).
The other situation is when they are used incorrectly, or coded badly. Too many people swear off code generators because they've had bad experiences due to bugs, or their misunderstanding of how to correctly configure it.
But in and of themselves, code generators are not bad.
-Adam
They are like any other tool. Some give beter results than others, but it is up to the user to know when to use them or not. A hammer is a terrible tool if you are trying to screw in a screw.
This is one of those highly contentious issues. Personally, I think code generators are really bad due to the unoptimized crap code most of them put out.
However, the question is really one that only you can answer. In a lot of organizations, development time is more important than project execution speed or even maintainability.
We use code generators for generating data entity classes, database objects (like triggers, stored procs), service proxies etc. Anywhere you see lot of repititive code following a pattern and lot of manual work involved, code generators can help. But, you should not use it too much to the extend that maintainability is a pain. Some issues also arise if you want to regenerate them.
Tools like Visual Studio, Codesmith have their own templates for most of the common tasks and make this process easier. But, it is easy to roll out on your own.
It can really become an issue with maintainability when you have to come back and cant understand what is going on in the code. Therefore many times you have to weigh how important it is to get the project done fast compared to easy maintainability
maintainability <> easy or fast coding process
I use My Generation with Entity Spaces and I don't have any issues with it. If I have a schema change I just regenerate the classes and it all works out just fine.
They serve as a crutch that can disable your ability to maintain the program long-term.
The first C++ compilers were code generators that spit out C code (CFront).
I'm not sure if this is an argument for or against code generators.
I think that Mitchel has hit it on the head.
Code generation has its place. There are some circumstances where it's more effective to have the computer do the work for you!
It can give you the freedom to change your mind about the implementation of a particular component when the time cost of making the code changes is small. Of course, it is still probably important to understand the output the code generator, but not always.
We had an example on a project we just finished where a number of C++ apps needed to communicate with a C# app over named pipes. It was better for us to use small, simple, files that defined the messages and have all the classes and code generated for each side of the transaction. When a programmer was working on problem X, the last thing they needed was to worry about the implentation details of the messages and the inevitable cache hit that would entail.
This is a workflow question. ASP.NET is a code generator. The XAML parsing engine actually generates C# before it gets converted to MSIL. When a code generator becomes an external product like CodeSmith that is isolated from your development workflow, special care must be taken to keep your project in sync. For example, if the generated code is ORM output, and you make a change to the database schema, you will either have to either completely abandon the code generator or else take advantage of C#'s capacity to work with partial classes (which let you add members and functionality to an existing class without inheriting it).
I personally dislike the isolated / Alt-Tab nature of generator workflows; if the code generator is not part of my IDE then I feel like it's a kludge. Some code generators, such as Entity Spaces 2009 (not yet released), are more integrated than previous generations of generators.
I think the panacea to the purpose of code generators can be enjoyed in precompilation routines. C# and other .NET languages lack this, although ASP.NET enjoys it and that's why, say, SubSonic works so well for ASP.NET but not much else. SubSonic generates C# code at build-time just before the normal ASP.NET compilation kicks in.
Ask your tools vendor (i.e. Microsoft) to support pre-build routines more thoroughly, so that code generators can be integrated into the workflow of your solutions using metadata, rather than manually managed as externally outputted code files that have to be maintained in isolation.
Jon
The best application of a code generator is when the entire project is a model, and all the project's source code is generated from that model. I am not talking UML and related crap. In this case, the project model also contains custom code.
Then the only thing developers have to care about is the model. A simple architectural change may result in instant modification of thousands of source code lines. But everything remains in sync.
This is IMHO the best approach. Sound utopic? At least I know it's not ;) The near future will tell.
In a recent project we built our own code generator. We generated all the data base stuff, and all the base code for our view and view controller classes. Although the generator took several months to build (mostly because this was the first time we had done this, and we had a couple of false starts) it paid for itself the first time we ran it and generated the basic framework for the whole app in about ten minutes.
This was all in Java, but Ruby makes an excellent code-writing language particularly for small, one-off type projects.
The best thing was the consistency of the code and the project organization. In addition you kind of have to think the basic framework out ahead of time, which is always good.
Code generators are great assuming it is a good code generator. Especially working c++/java which is very verbose.