API -- CFC vs. cfinclude - oop

So the company I work for has quite an unorganized approach when it comes to our site. All of our scripts are procedural with cfincludes thrown within. I've been wanting to organize this into an internal API that the other web developers would use to do whatever (because making a change has me going thru and locating EVERY other instance that change needs to be updated).
I finally have a live example and showed the boss. It follows what I assumed is the normal method (from my googling). Service layer > Gateway & DAO > Beans, with some factories to help object creation. It works well and does exactly what I had wanted it to accomplish. He's impressed with it and agrees that we need to spruce up our code and better organize it but doesn't see the advantage of using this method of object oriented API calls to a large list of cfincludes to accomplish the same thing. In essence, from the way he explained the cfincludes, it would work the same way as a method call.
He's asked for the advantages of my approach vs this cfinclude and for the life of me I can't really find any obvious advantages other than grouping similar data all within one object. Is there anything else or rather perhaps it would be advantageous to go with the cfinclude approach?

Readability, maintenance, and adherence to proven object-oriented paradigms would be the most important aspects of building a ColdFusion application using a true service layer of CFCs / objects, rather than a multitude of cfincludes, which is amateurish at best, and can cause garbage collection nightmares at its worst.
Readability
Let's say you have a cfinclude called _queries.cfm, which include all the calls for your application. Then, at the top of your employee page, just before you output all the employees, you do this:
<cfinclude template="_queries.cfm" />
<cfoutput query="employeeQry">
Where did employeeQry come from? Is it one of the queries in that template? What does it do? Do I need to include that template when I only want just employees? What if it has all the queries in the site...do they all need to be included every single time?
Why not something a little more readable, like this:
<cfset employeeQry = request.model.queries.getEmployees() />
<cfoutput query="employeeQry">
Ahhh, there we go. At a glance, without knowing anything about the nuances of your system, I can identify right away:
Where the employeeQry variable came from
What cached CFC I am calling the query from
That I am calling one and only one query, and not mass including an array of queries, none of which are needed for the page.
Encapsulating business logic in a service layer (CFCs) increases the readability of your code, which is going to make the difference when you get into the next topic.
Maintenance
You get a hold of a new CF app that you're in charge of, and open up the employee page to find the <cfinclude template="_queries.cfm"> template above.
Inside that, the original developer leaves a comment saying something to the effect of: "Let's not run all the queries, let's just run a specific query based on a parameter", and then you see something like this:
<cfswitch case="#param#">
<cfcase value="employee">
<cfinclude template="_employeeQry.cfm">
</cfcase>
<cfcase value="employees">
<cfinclude template="_employeesQry.cfm">
</cfcase>
<cfcase value="employeesByDept">
<cfinclude template="_employeesByDept.cfm">
</cfcase>
</cfswitch>
...so you look at this and think, well...I need to modify the employeesByDept query, so you crack that template open and find:
<!--- employees by department --->
<cfif args.order_by is "ASC">
<cfinclude template="_employeeQryByDeptOnASCOrder.cfm">
<cfelse>
<cfinclude template="_employeeQryByDeptOnDESCOrder.cfm">
</cfif>
...and by this point, you want to shoot yourself in the face.
This is an exaggerated example, but is all too familiar in the ColdFusion world; a hobbyist mentality when architecting Enterprise-level applications. This "include within include within include" nightmare is something CF developers deal with more frequently than you might think.
The solution is simple!
A single CFC that encapsulates the business logic of producing queries for your Employees.
<cfcomponent>
<cffunction name="getEmployees" returntype="query">
<cfquery name="tmp">
select employeeID, name, age
from employees
</cfquery>
<cfreturn tmp />
</cffunction>
<cffunction name="getEmployeesByDept" returntype="query">
<cfargument name="deptID">
<cfargument name="order_by" required="false" default="ASC">
<cfquery name="tmp">
select employeeID, name, age
from employees e
inner join empToDept etd on (e.employeeID = etd.employeeID)
where etd.deptID = #arguments.deptID#
order by name #iif(arguments.order_by is 'asc',de('asc'),de('desc'))#
</cfquery>
<cfreturn tmp />
</cffunction>
</cfcomponent>
Now, you have a single point of reference for all the information you wish to produce when querying your employee database, and can parameterize/adjust it all at once, without having to dig through mountains of includes within includes within includes...which is cumbersome, and difficult to keep straight (even with adequate source control).
It elegantly allows you to write a single line:
<cfset empQry = request.model.queries.getEmployees() />
or
<cfset empQry = request.model.queries.getEmployeesByDept(14,'DESC') />
and makes your job maintaining the code that much easier.
Adherence to Proven Object-Oriented Paradigms
Your boss announces that a Java rockstar has joined the team. You're very eager and excited to sit down with him since you've primarily been stuck in CF for these last few years, and want an opportunity to show him some of your stuff, and possibly learn from him as well.
"So, how does the application get access to the data?" he asks you.
"Oh, we have a series of queries that we call on various pages, and based on the parameters, we'll pull different types of information."
"Nice", he says, "So...you have a service layer for data object model, that's great."
Not really, you think. It's just includes within includes...but he keeps going,
"That's excellent, because one of the new things we'll be adding is a Contractor object, which is basically a subset of Employee, he's going to have a few different bits of functionality, but overall will act very much like an Employee. We'll just go ahead and subclass Employee, and override some of those queries..."
...and now you are lost. Because there is no subclassing an include. There is no inheritance in an include. An include has no knowledge of a domain or a business object, or how it is supposed to interact with other objects.
A cfinclude is a convenience to reuse common elements, like a header or a footer. They aren't a mechanism to reflect the complexities of a business object.
When you design/construct/implement CFCs as objects that reflect the entities of your application, you're speaking a common langauge: OO. It means that it is not offers you the ability to design a system based upon a proven structure, it extends that language of "OO-ness" to programmers in other technologies. Java programmers, C++/C# programmers, etc...anyone that has reasonable knowledge of Object-Oriented development will automatically be speaking your language and be able to work with you and your system.
Take heed of this final note: Not every application needs to be object-oriented. If your boss wants you to whip up a quick XML dump of the employee table and slap it on the website--yeah, you can probably forego an entire oo model for that. But if you are building an application from the ground up, and it is going to feature employees, users, departments, queries, roles, rules, tickets...in short: entities in a domain, it will be time to put aside cfincludes as your primary tool for reusing code.
Oh, and PS: That little note I left at the top about garbage collection--is no joke. I've seen CF applications incorrectly built, so that the Application.cfc itself calls cfincludes, and after hooking CF up to a JVM that can monitor realtime creation/destruction of objects in the GC, I've seen memory look like an EKG monitor.
Not good.

Shawn's response definitely covers the main points. To summarise all that into the key points your boss will understand .. the CFC approach will save him a lot of money down the track in terms of maintenance, and will keep his developers a lot happier as their job will be so much easier, and retention rate of skilled/motivated developers will be a lot higher than if he stayed with spaghetti code as a standard.

I agree completely with Shawn's response... and if you want to take your code to an even higher level, use a framework! Then it will really save you and other developers a lot of time, since everybody will be adhering to its own standards.
My personal preference is Coldbox, but any of the popular MVC/OO frameworks will do -- Coldbox, Mach-II, Model-Glue, FW/1. I've heard good things about CFWheels too but haven't used it.

Related

Micro ORM - maintaining your SQL query strings

I will not go into the details why I am exploring the use of Micro ORMs at this stage - except to say that I feel powerless when I use a full blown ORM. There are too many things going on in the background that happens automatically, and not all of them are the best possible choices. I was quite ready to go back to raw database access, but I found out about the three new guys on the block: Dapper, PetaPoco and Massive. So I decided to give the low-level approach a go with a pet project. It is not relevant, but so far, I am using PetaPoco.
In any case, I am having trouble deciding how to go about maintaining the SQL strings that I will use from the higher levels. There are three main solutions that I can think of:
Sprinkle the SQL queries wherever I need them. This is the least infrastructure heavy method. However, it suffers in both maintainability and testability areas.
Limit the query usage to some service classes. This helps maintainability, is still low on infrastructure I need to implement. It may also be possible to build these service classes such that it would be easy to mock for testing purposes.
Prepare some classes to make the system somewhat flexible. I have started on this path. I implemented a Repository interface, and a database dependent Repository class. I have also build some tiny interfaces to capture SQL queries that can be passed to my Repository's GetMany() method. All the queries are implemented as individual classes right now, and I will probably need a little more interface around this to add some level of database independence - and maybe for some flexibility in decorating queries into paged and sorted queries (again, this would also make them a little bit more flexible in handling different databases).
What I am mainly worried about right now is that I have entered the slippery slope of writing all the functions needed for a full blown ORM, but badly. For example, it feels sensible right now that I write or find a library to convert linq calls into SQL statements so that I can massage my queries easily or write extenders that can decorate any query I pass to it, etc. But that is a large task, and is already done by the big guys, so I am resisting the urge to go there. I also want to retain control over what queries I send to the database - by explicitly writing them.
So what is the suggestion? Should I go #2 option, or try to stumble along on option #3? I am certain I cannot show any code written in the first option to anyone without blushing. Is there any other approach you can recommend?
EDIT: After I've asked the question, I realized there is another option, somewhat orthogonal to these three options: stored procedures. There seems to be a few advantages to putting all your queries inside the database as stored procedures. They are kept in a central location, and not spread through the code (though maintenance is an issue - the parameters may get out of sync). The reliance on database dialect is solved automatically: if you move databases, you port all your stored procedures, and you are done. And there is also the security benefits.
With the stored procedure option, the alternatives 1 and 2 seem a little bit more suitable. There seems to be not enough entities to warrant option 3 - but it is still possible to separate the procedure call commands from database accessing code.
I've implemented option 3 without stored procedures, and option 2 with stored procedures, and it seems like the latter is more suitable for me (in case anyone is interested with the outcome of the question).
I would say put the sql where you would have put the equivalent LINQ query, or the sql for DataContext.ExecuteQuery. As for where that is... well, that is up to you and depends on how much separation you want. - Marc Gravell, creator on Dapper
See Marc's opinion on the matter
I think the key point is, you shouldn't really be re-using the SQL. If your logic is re-used then it should be wrapped in a method called that can then be called from multiple places.
I know you've accepted your answer already but I still wanted to show you a nice alternative that may be helpful in your case as well. Now or in the future.
When using stored procedures it's wise to use T4
I tend to use stored procedures on my project even though it's not using PetaPoco, Dapper or Massive (project started before these were here). It uses BLToolkit instead. Anyway. Instead of writing my methods to run stored procedures and write code to provide stored procedure parameters, I've written a T4 template that generates the code for me.
Whenever stored procedures change (some may be added/removed, parameters added/removed/renamed/retyped), my code will break on compilation because method calls will not match their signature any more.
I keep my stored procedures in a file (so they get version controlled). If you work in a multi-developer team it may be sensible to have stored procedures each in its own file. It makes updates much less painful. I've experienced that on some project and it worked ok as long as number of SPs is not huge. You can restructure them into folders based on the entity they're related to.
Anyway. Maintenance is related to stored procedures, code change is just a simple click of a button in Visual Studio that converts all T4s at once. You don't have to search your methods that use those procedures. You'll be reported errors while compiling. One thing less to worry about.
So instead of writing
using (var db = new DbManager())
{
return db
.SetSpCommand(
"Person_SaveWithRelations",
db.Parameter("#Name", name),
db.Parameter("#Email", email),
db.Parameter("#Birth", birth),
db.Parameter("#ExternalID", exId))
.ExecuteObject<Person>();
}
and having a bunch of magic strings I can just simply write:
using (var db = new DataManager())
{
return db
.Person
.SaveWithRelations(name, email, birth, exId)
.ExecuteObject<Person>();
}
This is nicer, cleaner breaks on compile and provides intellisense so it's also faster to while developing.
The good thing is that stored procedures may become very complex and may do many things. In my upper example I check some data, insert person record and some related one as well and in the end return the newly inserted Person record. Inserts and updated should usually return data that was added/changed to reflect actual state.

There has to be a better way to do localized database fields

So far there've been several questions regarding this, and they've all come down to the same answer: one table for the language-neutral data, 1-* to a table with the translations and an indexed language ID field.
This has several problems:
Twice as much CRUD.
Need for Ajax CRUD if you want a decently friendly web UI.
More than twice the validation -- you need to ensure that the relationship is 1-* rather than 0-*.
Collation differences between languages isn't accommodated.
Queries require joins.
If you want slugs in multiple languages, oh boy.
A lot of database people have worked on all sorts of theoretical and practical problems, but surprisingly few people work on this one.
I think what we need ultimately is:
A field type that'll store multiple versions of strings
Multiple indices for each such field, one for each language or variation, with the option to specify the correct collation mode
A standard ORM object for this crazy thing
UI elements
Overkill? Sure, maybe, but the whole problem is a real nightmare as it is. And it's not exactly an uncommon scenario.
We gotta try to convince server vendors to work on this.
Edit: By the way, this is my first time using the community wiki; hopefully I'm doing it right.
Edit 2: Something about my wording seems to have made people think that I'm attacking the very concept of DBMS. I'm not; I'm simply saying that built-in support for localization is a much-needed feature.
I probably shouldn't have mentioned performance; it's of course completely negligible most of the time. The focus of my concern is on the fact that this really stifles productivity.
I'll provide an example. Suppose I have a very trivial table for a decidedly trivial store:
Products (id, price, description, name, slug)
In EF/MVC, I'd throw this in the ORM designer, maybe encapsulate it in a repository, build a Products controller, and have actions for Index, Details, Create, Update, Edit and Delete. To identify a product in any of the items, I'd simply do a WHERE(slug = #slug). I'd make a view model for the create/edit actions, design the form control, and wire it up straight to the repository. Done and done. To access the details for a product, the user would go to /products/details/product-slug.
But then since the rest of the website is bilingual, I decide to change the products table accordingly.
Products (id, price)
ProductsText (productId, language, description, name, slug)
Hey, that's not so bad. Yeah, not yet. Then you write your relationships and your constraints, and then you write you write out all your properties in the view-model, and then you make a complete CRUD controller for the ProductsText data or use jQuery/Ajax to add create/update/edit buttons on your Products controller, and then you add validation logic to make sure the user enters at least the primary language, and then when you want to read data for the end-user pages you write another query to take join ProductsText.slug and ProductsText.language with Products... I probably missed something, but you get the idea.
The complexity of the program just explodes with boilerplate code once you have localization involved.
Of course, I don't expect the problem to be solved completely, and it's obviously just as much a UI problem as it is a database problem. But there's just so much that could be done to make all this easier. A "multistring" field type might be a really good start.
Edit 3: Anyone ever hear of SQL Server Modeling Services? It has some localization tools in it that could be a step in the right direction. Still CTP though.
-- Simulate the French locale with the SET LANGUAGE statement.
SET LANGUAGE French
select Id, CountryName,
[System.Globalization].[SessionsString](CountryName, 1) as CountryNameString
from [Location].[CountriesTable]
What is a localized database field?
Typically in applications we've worked in, the UI is localized. This is accomplished using a database, and we put all the translations (and potentially the master phrases) in the table with a locale-code and phraseid being the primary key. This is fairly straightforward, requires a single reusable set of stored procs and has good performance and the usage is well-understood. We often allow translation on the fly so that the app interface includes a translation feature where corrections can be made and other users will see them live - either rich forms applications or web forms applications (depending on caching - which is another key feature of UI localization)
As far as querying requiring joins - that's just a fact of life in a normalized relational database, and performance there is usually managed with a good normalized design and proper indexing.
In other "data", it has made little sense to localize except under direction of the application requirements. For instance, even though you may offer a product in multiple countries, the SKU and distributor may be different. This level of localization is very application specific and we often dealt with it as a separate database and there really isn't anything tying those individually country database together - many products were not available although there may have been equivalent products in the other countries.
If you are selling the same products around the world, then you kind of fall into the original scenario in a kind of multi-lingual CMS. This requires significant work besides the low-level database. For instance, if someone corrects the default product description, what flags the translators that the translations need to also be corrected? These questions are non-trivial. Although I can see where database vendors could assist with features, these are intrinsic difficulties of application requirements and design and not necessarily something the database can add features which will universally solve.
The collation issue is indeed a little awkward. Typically data is stored in nvarchar and you would not know the collation you wanted for retrieval at the time you wrote the stored proc, since the locale would be a parameter. This only affects collections retrieved which need to be ordered by content, not usually natural key and certainly not retrieval by key - it's not a large problem, but is one which cannot easily be handled without dynamic SQL (casting using the preferred collation from a table depending upon the location passed in, if you mix data from different locales, you would have to decide if you want to sort by locale first and then it may be difficult to pick a collation which might work properly within all locales in the same result set). You are probably going to want to use a Windows collation with such a wide variety of data.
Similarly with ORMs, we typically treated the composite unique key of locale/phraseid as the key to retrieve objects (we typically also had a surrogate identity primary key) - I know that traditional ORMs don't necessarily like this departure from retrieval by a meaningless surrogate key.
I've encountered all of these issues for localized CRM-style web sites. Not fun to design and optimize, but it can be done. My 2¢ worth:
1. Twice as much CRUD.
This depends on how your CRUD is designed. Any of my stored procedures or functions that can retrieve a possibly-localized field take a locale/culture code parameter. All of these fields are also NVARCHAR to avoid encoding issues.
2. Need for Ajax CRUD if you want a decently friendly web UI.
I suppose so, but this is application-dependent. Should defer to the "internal" CRUD (DRY principle).
3. More than twice the validation -- you need to ensure that the relationship is 1-* rather than 0-*.
This also assumes that all content is required in all supported locales, instead of using a fallback mechanism. For example, Microsoft's MSDN content is available in multiple locales, but some is in only one (generally this is US English, the "neutral" locale for Microsoft).
For a CRM-style system, any locale can be used for the initial content as long as the fallback uses that if the neutral content is not available.
4. Collation differences between languages isn't accommodated.
I find that it is easier to put all collation support at the UI/reporting layer. Multilingual-aware tables with collation/locale specified on a row-by-row basis would be a very nice-to-have feature but I wouldn't like to wait for it to become available...
5. Queries require joins.
Yes, definitely makes the query a bit more complicated :-) but no real way around that. Can get even more complicated if locale fallback is included (a "locale specificity" ranking field helps here).
6. If you want slugs in multiple languages, oh boy.
This is the reason that the .NET replacement parameters in the format string were designed to be indexed, not positional (printf(), etc. are positional). An English format may need replacements in 1, 2, 3 order, while the German equivalent uses 3, 1, 2.
To make life easier for localizers, whenever I create a .NET resource bundle I document the parameters including index, data type (including minimum and/or maximum string lengths), and a contextual description - context is important for determining text gender in some locales.
Plurality may also require multiple related resources as some locales need more than just "single" and "plural" (e.g. "0 files", "1 file", "2 files").
The same rules must apply to any localizable column in the database.
Well the answers are not that helpfull so far. I had the same problem on various projects I was doing in the past. And there was never a shortcut nor a solution out of the box that helpped me to solve this problem in a easy way. But your approach is going into the right direction and with a little work on your Data Access Layer you can actualy abstract all the burden that is caused by this requirement.
So for Metadata like Types, Categories, Countries etc. performance is not an issue since the whole stuff can be cached. For freetext entries it is a different story. You most probably can't cache them and they tend to be quite long.
You might already know those pages:
http://www.codeproject.com/KB/aspnet/LocalizedSamplePart2.aspx
http://www.sisulizer.com/online-help/DatabaseLocalization.shtml
Best-practices for localizing a SQL Server (2005/2008) database
In my experience I haven't commonly run into the problem where the data stored in the database has many language-dependent versions of the same text. Typically a developed application will have many language files for all the text that's more or less statically built into the application. Then we see database data for text users enter. While an application may be used by users with many different languages, the situation where users type the same text in multiple languages is not so common. Typically uses of an application will show the UI in their language and then enter and view data in their language.
For example, users of our application in the US vs in Netherlands or Saudi Arabia would see the UI in the language of their choice, but for any given installation, the data they enter will consistently be in their native language.
Obviously this doesn't apply to all cases. CRMs are an example where you would have the same text with multiple translations, like Wikipedia, but I think what I described above is the more common scenario.
"A lot of database people have worked on all sorts of theoretical and practical problems, but surprisingly few people work on this one."
That's because there is nothing to work on, from a theoretical perspective, in your example. The so-called "problems" you mention are, all of them, nothing more than a direct consequence of the fact that you are managing more data.
"Twice as much CRUD."
And why is that a problem ? I know of at least a few systems I built that had a lot more of that than your example.
"Need for Ajax CRUD if you want a decently friendly web UI."
Is that really so ? I don't know, but at any rate how data is handled in the presentation layer, is no concern of the DBMS, and if the programmer thinks it is too difficult/cumbersome, then don't blame the DBMS for that.
More than twice the validation -- you need to ensure that the relationship is 1-* rather than 0-*.
And why is that a problem ? If more business rules are stated, more validation is required.
"Collation differences between languages isn't accommodated."
How so ? What is the sense of collating English text with French ? Of English text with Ukrainian or Russian or Chinese ? Or did you mean something else ?
"Queries require joins."
And why is that a problem ?
"If you want slugs in multiple languages, oh boy."
In what context ? For what purpose ?
SELECT language,nllabel FROM ...
NATURAL JOIN (SELECT 'EN' as language UNION SELECT 'FR' as language)
Oh but wait, I forgot ... JOINs are also a problem.
"and it's obviously just as much a UI problem as it is a database problem."
I disagree that it is. When looking at your problem from a database angle, there are two things that might possibly be a small beginning of a solution :
the possibility to do full view updating (both through JOIN and through GROUP, for your case).
the possibility to have attributes of type 'table' inside database tables. You could then have the entire set of applicable localized names-stuff as a sinle attribute in a single row for your product/...
As for full view updating : don't hold your breath. You'll suffocate long before it has arrived.
As for nested tables : they might already exist, if anyone has them Oracle will, I don't really know, but I'm not really confident that this will really make life easier on the UI side of things.
Oh, and BTW : SQL is nowhere near "theoretically pure".

How do I control the definition, presentation, validation and storage of HTML form fields from one place?

I want to be able to define everything about a form field in one place, as opposed to having some info in the DB, some in HTML, some in JavaScript, some in ASP...
Why do I have to worry about possibly changing things in four separate places (or more) when I want to change something about one field?
I.e., I don't want to:
declare the field in the DB
and duplicate some of that info in HTML somewhere
and duplicate some more info in some JavaScript somewhere
and duplicate some more info in some ASP somewhere
Since I'm a developer, I'm ideally looking for a methodology, not a tool or S/W package. (I think!)
Currently, I'm doing this by putting all control information into SQL's extended property "Description" text area. E.g., a required phone number field would have the following SQL declaration:
[home_phone] [varchar](15) NOT NULL
and I put the following "controls" in the Description extended property:
["Home Phone"][phone_text][user_edit][required][allow_na][form_field_size_equals_size][default=""][group="home_address"][rollover="enter only: numbers, dash, parenthesis, space"][explanation="enter <strong>n/a</strong> if you don't have a home phone"]
With my current system, the following HTML is dynamically generated for the Home Phone field:
<div class="div-item" id="item-FORM:FIELD:TABLE_HOME:HOME_PHONE">
<div class="div-item-description" id="item_description-FORM:FIELD:TABLE_HOME:HOME_PHONE">
<span class="rollover-explanation" title="enter only: numbers, dash, parenthesis, space">
<label for="FORM:FIELD:TABLE_HOME:HOME_PHONE" id="item_label-FORM:FIELD:TABLE_HOME:HOME_PHONE">
Home Phone
</label>
</span>
</div>
<div class="div-item-stipulation" id="item_stipulation-FORM:FIELD:TABLE_HOME:HOME_PHONE">
<span class="stipulation-required" id="item_stipulation_span-FORM:FIELD:TABLE_HOME:HOME_PHONE" title="required" >
*
</span>
</div>
<div class="div-item-value" id="item_value-FORM:FIELD:TABLE_HOME:HOME_PHONE">
<div class="individual-forms">
<form class="individual-forms" id="FORM:TABLE_HOME:HOME_PHONE" name="FORM:TABLE_HOME:HOME_PHONE" action="" method="post" enctype="multipart/form-data" onsubmit="return(false);">
<div class="individual-forms-element">
<input
class=""
type="text"
id="FORM:FIELD:TABLE_HOME:HOME_PHONE" name="FORM:FIELD:TABLE_HOME:HOME_PHONE"
size="15" maxlength="15"
value=""
FORM_control="true"
FORM_control_name="Home Phone"
FORM_control_is_required="true"
FORM_control_is_phone_text="true"
>
</div>
</form>
</div>
</div>
<span class="spanExplanation">
enter <strong>n/a</strong> if you don't have a home phone
</span>
</div>
which looks like this (in IE 7):
Client-side JavaScript validation is controlled by the **FORM_control**... parameters, which on error produces explanations and field highlighting. (Unfortunately, custom parameters in HTML elements isn't exactly standards compliant.)
My primary problem is that this method using the Description field has always been cumbersome to use and maintain. The Description property can only be 255 chars, so I have lots of abbreviations. As the system has expanded, the number of controls has also greatly expanded past the original dozen or so. And my code for interpreting all these controls and their abbreviations is just not pretty or efficient. And as I said, custom parameters in HTML elements don't work in FireFox.
Things I'm currently controlling (and want to continue to control) include:
Field description (e.g. "Home Phone Number")
DB table name (e.g., "home_address")
DB field name (e.g., "home_phone")
DB field type/size
DB allow null
Grouping (e.g., this particular field is part of all "Home" fields)
Required/optional
Read-only (for system supplied data)
Size (presented form field size)
Type (e.g., text, numeric, alpha, select, zipcode, phone, street address, name, date, etc)
Accepted input (non-blank; numeric only; no spaces; phone number; reg exp; etc)
Extended explanation (e.g., for phone # "enter n/a if you don't have a home phone")
Roll-over explanation (e.g., for phone # "enter only: numbers, dash, parenthesis, space")
Rows (for select lists -- 1 = drop-down)
Rows/Columns (for textareas)
Error message text
Error indication (how to show which field contains an error, e.g., red background)
Etc...
And to be clear, I'm all for separation of logic and design elements. I do have a separate CSS file which is manually maintained (not part of the generation process).
My server environment is classic (non-.Net) ASP and SQL 2008. I'm pretty good with HTML, CSS, JavaScript, and ASP and I'm comfortable with SQL.
What I imagine that I want is some sort of JSON, XML, etc that is the single source used to generate everything, e.g.:
a SQL script that actually creates the SQL tables
the HTML (with CSS classes and JavaScript client-side validation/function calls)
the ASP (server-side validation)
My current method that does this is dynamic (not compiled) and pretty slow, so I'm probably looking for some sort of "compiler" that generates this stuff once. And I really only have classic ASP, JavaScript or SQL as the available languages for this "compiler".
And while I think I could create this system myself, I'm hoping that other, better developers have already come up with something similar.
Assume this should scale to at least scores of fields. (FYI, my current form has way too many fields on one page, but I'm solving that issue separately.)
Thanks for any help!
Javascript validation is overrated
I think javascript validation is overrated. It was good in the days when a server round-trip could take 10's of seconds but typically now it can take less than 3 seconds. In you factor in an AJAX submission process you can bring that time down to sub-second.
In return for all that effort to slice off a round-trip you have to deal with all the various complexities of cross-browser support, complex debugging, lack of server-side logging and dealing with the case where JS is disabled by the user. In a typical scenario we're talking about a lot of wasted hours and difficult debugging (try asking a typical idiot what browser they use, let alone what version they're using).
The database as a one-stop validator
You said the database isn't a complete validation environment but I think that's no longer true. A modern database like PostgreSQL will allow you to hook up complex validation functions as triggers in pretty much your language of choice and return appropriate error responses to the application.
So if you follow where I'm going here it IS possible to validate in one place, the database, without the historical drawbacks. The process is:
Create a basic HTML form, forget
HTML5 or Javascript validation.
When the form is complete, or as
required, submit it via AJAX (if
available) or standard POST if not.
Pass the UPDATE/INSERT more or less
straight to the DB where your
trigger functions normalise and
validate the data.
Immediately return the result and or
errors (probably via a transaction),
and perform any further server
processing at this stage. If you
decide not to keep the data you
could either delete the new row or
rollback a transaction.
On conclusion return any appropriate
redirection, messages or updates to
the browser via JSON/AJAX or a
reload with the cleaned data.
This may sound slow/inefficient but I think that's ignoring todays realities, namely:
Pretty much everything is broadband now, even wireless.
Processing power is cheaper than
developer time.
These sorts of
updates tend to be limited by the
speed users can fill out forms, you're not going to
hammer your DB in a typical
scenario.
You still have to do the validation somewhere, so why not the DB?
And the benefits are huge:
On a high volume server (like an exchange, twitter, feed, etc) this
process lends itself to API control
via SOAP/AJAX/RSS/whatever since
only a thin layer is need to
transfer data between the API client
and the DB.
No matter what client
language or protocols are used the
validation remains the same.
Even a raw SQL statement gets
validated which can prevent
programming errors, corrupted
imports or 3rd-party feeds from
destroying your data structures.
Triggers are easily toggled if
required. It can often be harder in normal code.
Validation is always consistent.
Validation functions live inside the
database, allowing then to access
indexes and other rows or tables without
connector overhead, data conversion
and network/socket lag.
Validation functions can run in
compiled code, even if your web
server language is dynamic.
The only real drawbacks are:
Difficult to upgrade or migrate to
other database software.
Difficult if your preferred language
isn't supported (However Postgres
supports functions written in C,
PL/pgSQL, Python, TCL, Perl, Java,
R, Ruby, Scheme, Bash and PHP so
unless you're stuck on C#/VB you
should find one you can handle).
Context sensitivity
There are some aspects of your question I wouldn't recommend at all. Primarily where you're trying to tie the presentation of HTML form objects to your data in a single location.
This idea is going to backfire very quickly because you will find that in a typical application the presentation of information is highly sensitive to context - specifically the target audience.
For example on an ordering system you may have data entered by a client that is then accessible to an admin. The clients view of the data may be more limited, it may need a different title and description, it may be preferable to display as checkboxes while a admin gets a more compact view. You may even be presenting the same data to different types of client (retail vs. wholesale).
In short, the presentation of data is typically required to be more fluid than its validation so at some point you should really draw the line - even if that means some repetition.
I've been working on exactly the same problem at my job. I can't stand repeating myself, particularly because I know that when I have to change something months later, I'll never remember all of the scattered redundant pieces. The answer must take into account the following truths:
The database should, as much as is reasonable possible, validate itself. This is basic data integrity; the DB should throw a fit if you try to put invalid data in it.
The database cannot validate itself. It's easy to add constraints for uniqueness, or format, or foreign keys, and technically SQL can go a lot further, but if you're enforcing, say, address/zip code correspondence at the database level, you're going to regret it. Some part of the validation logic must live in the server-side code. In your case and mine, this means the ASP.
If you want client-side validation, this means Javascript.
At this point, we're already talking about validation constraints in three languages, and the impedance mismatch between them may be significant. You can't always factor validation out to one of them. All you can do is keep the logic together as much as possible.
The solution you suggest has one giant advantage – that all of the logic is in one place, together. This advantage is balanced by several drawbacks:
You can't do any validation at all without talking to the database.
In order to get the metadata from the database to your ASP, you have to have special code to interpret your metadata minilanguage. This is far more complex than accepting some degree of redundancy.
Your metadata puts front-end display code in your database. This means that if you want to change the text, you have to edit your database model. This is a rather drastic requirement which ties your database model to your presentational logic. In addition, internationalization is virtually impossible.
Because there are so many translational layers between your metadata and the user, any extension to your metadata space will require the revision of several layers of tightly coupled code.
To try to find a middle ground between your solution and the redundancy it's designed to avoid, I suggest the following:
Put basic validation constraints in the database.
Create a system in ASP for specifying a behavioral data model with arbitrary contents and validation constraints. When you define your model using this syntax, you will duplicate only the bare-bones constraints in the database.
Create a system in ASP to display form fields on the page in HTML. A form field declaration will reference the appropriate data model and additionally include display code such as labels and descriptive text. The HTML generation code can use sensible defaults derived from the data model. The only duplicated data should be the name of the field, which is used as a key to bind a displayed field to the appropriate data model.
Create or find Javascript validation library. Have the aforementioned HTML generation code automatically insert hooks to this library in the generated markup based on the associated data model.
Thus, you have a system where information about a field may be stored in a handful of places, depending on where it is most appropriate, but almost never duplicated. Validation information is declared in the ASP data model. Display information is found only in the on-page field declaration. The field name is used throughout this stack as a key to link them together, and the hierarchy of concerns allows you to override assumptions made on lower levels as needed.
I'm still working on my implementation of this design, but if you're interested, I can post some sample code.
Seems to me that this gos against every principle of separation of logic and design elements. I know that on the bigger projects I worked on, there were actual SDLC requirements that dictated one type of engineer could touch one level of file, while a UI Engineer could touch another and a "code monkey" could only touch a subset of that. Could you imagine the chaos that would ensue in that scenario? The Code monkey would have to get permission from the UI Engineer who in turn would have to coordinate with the engineer who would have to join on a conference call with integration who would have to ping tech support who then would shelve the project until business asked legal....
All kidding aside, I don't think your method is bad.
I do believe in handling things as they were meant to be handled natively, i.e. building a form text field is likely more efficiently handled by html natively than by database calls which then build html via a series of scripts. Your "compiled" method makes me wonder if it would cancel out the benefits of caching of common javascript and css elements in their respective files.
There are frameworks such as Zend, CodeIgniter, and Symfony (on the PHP side) that are getting closer to what you mention via built-in functionality....although they're not there yet. Zend in particular uses programmic features to build, validate and style forms, and once you figure out its nuances it's quite powerful. Perhaps it could serve as a model for your ultimate quest. Although it seems like you're a classic asp guy....and that's not what you're looking for. I digress.
I think this question is out of my scope of knowledge, but I figure it doesn't hurt to try and help.
I'm working on my first PHP site and since the beginning, since I could not predict many factors of the site and only being one person, I decided from the beginning that every design element on every page will be maintainable through one page. This is a learning experience, so I'm not too concerned that there wasn't much planning involved, but some things just grind my gears, like naming conventions, but with my method, I always am able to make site wide changes with ease.
Every page I make is structured like this:
<?php require_once 'constants.php'; ?>
<?php $pageTitle = 'Home Page'; ?>
<?php require_once REF_LAYOUT_TOP; ?>
<h1>Hello!</h1>
<p>World</p>
<?php require_once REF_LAYOUT_BOTTOM; ?>
In the constants, I have constants for just about everything. CSS colors (for consistent layout), directory locations, database connections, links, constants just for certain pages (so I can modify file names and not have them damage anything), and all sorts of things.
The top part contains navigation, error handling JavaScript scripts, any kind of dynamically created content, the navigation, etc.
This way, if I ever want to implement something new, it's implemented every where. I gave jQuery a shot and it required only one link.
Possible Solution
If you are trying to adjust many things from one location, I highly suggest you invest in a little PHP knowledge. Since PHP is just a server script, it's only output is text. In other words, you can insert PHP in JavaScript, HTML, and just about anywhere. This is how you can set up the same text for all sorts of hover pop ups. I don't know if ASP will prevent you from doing this (I have zero knowledge of it).
I figure this is how most sites must be constructed. It has to be ... how else could they maintain hundreds of pages? I think this is the most logical and semantically correct.
I'm not familiar with ASP, so I'll be speaking more generally without knowing how they're implemented.
Generally, a form represents the information needed to create, edit, or delete an entity. So I'd start with an Entity class. In other architectures this is typically called a model (as in Model-View-Controller). The Entity class determines what information it needs, and it takes care of the database queries.
A form could be built from the Entity directly this way. The Entity gives you more direct control, for example, you may have a field in the database for an integer, though the value you really want is between 0 and 255. The Entity can know this more specific constraint, even if the database doesn't.
Next, you could create some sort of form class that would use an entity to generate its interface. It would take care of all the HTML, Javascript, and whatever else you needed.
The entity could have a good variety of types. The representation in the database can be effectively separated. Let's say a post can have many tags. In the database you'd probably keep two tables, one for posts and another for tags. But the entity would represent a post, along with a list of tags, so they're not separate.
The form class can take care of what this looks like, and you just worry about the semantics. For example, if the entity calls for a list of strings, the form could implement that by using Javascript to create an expanding list of text fields, then the form takes care of properly submitting this data to the Entity.
The form would also make a difference in multiple fields working together, or parsing. For example, if the form saw a type that could be null, it would offer an explanation saying "Type n/a if you don't have a phone number" and if it saw that string, correctly return null.
A Type class could be an interface to validate form data. If all the validate() methods on all the types return true, the form is submitted. Each type also takes care of parsing its values (like the "n/a" parsing) so the right thing is submitted.
One point of this is that forms are not analogous to tables. An id field in a table shouldn't show up in a form, and some data may be connected to it in another table, so think of the forms in terms of the "entity" it's modelling, not the table. It's just an adaptor.
I define my schema for each table in XML files. Then I wrote one set of CRUD methods that can operate on any XML schema (passed in as a request paramter). Beside CRUD it can create and drop the table, export the contents to CSV and import a CSV file as well. All I have to do is drop an new schema file in my schema directory and I have full CRUD for this new table. If a field is a FK, a link automatically appears next to the input box during an INSERT or UPDATE that when clicked, pops open a window to look up the foreign key. If the field is a DATE, a link for a pop up calendar appears automatically.
I did this using Java EE and jsp. But I'm sure it could be done with php as well.
<schema>
<tableName>xtblPersonnel</tableName>
<tableTitle>Personnel</tableTitle>
<tableConstraints></tableConstraints>
<!-- COLUMNS ====================================== -->
<column>
<name>PID</name>
<type>VARCHAR2</type>
<size>9</size>
<label>Badge ID</label>
</column>
<column>
<name>PCLASS</name>
<type>VARCHAR2</type>
<size>329</size>
<label>Classification</label>
</column>
<column>
<name>PFOREMAN</name>
<type>VARCHAR2</type>
<size>9</size>
<label>Foreman Badge</label>
</column>
<column>
<name>REGDATE</name>
<type>DATE</type>
<size>10</size>
<label>Registration Date</label>
</column>
<column>
<name>PISEDITOR</name>
<type>VARCHAR2</type>
<size>3</size>
<label>Is Editor?</label>
<help>0=No</help>
<help>1=Yes</help>
</column>
<column>
<name>PHOME</name>
<type>VARCHAR2</type>
<size>9</size>
<label>Home?</label>
</column>
<column>
<name>PNOTE</name>
<type>VARCHAR2</type>
<size>35</size>
<label>Employee Notes</label>
</column>
<!-- Primary Keys ====================================== -->
<!-- The Primary Key type can be timestamp, enter, or a sequence name) -->
<primaryKey>
<name>PID</name>
<type>enter</type>
</primaryKey>
<!-- FOREIGN KEYS ====================================== -->
<!-- The Foreign Key table is the lookup table using the Key to retreive the Label -->
<foreignKey>
<name>PID</name>
<table>phonebook</table>
<key>badge</key>
<label>lname</label>
</foreignKey>
</schema>
Bravo! Your idea is pretty good and the concept is in the right direction, and it already has been done by multiple companies. That was the original "RAD" (Rapid Application Development) concept.
The idea was to keep the attributes of each field in a database, aka "metadata repository" or "data dictionary". This is not only a good idea, but it is a best practice, so all the fields are consistent in type, length, description, etc. The data dictionary should be used not only with the user interface, but also with the database creation. Going a bit further, with this approach you can easily handle multiple locales.
Unfortunately, RAD tools are not that common these days. They are expensive, and in some cases inflexible and restrictive. Programmers love to program, and look with disdain those kind of tools. But, who knows? A new open source project seems to start every day!
Unfortunately your "tool set" is pretty limited, and creating a RAD tool is no trivial task: it involves an unexpected degree of complexity. You probably need to learn .NET, Java, or any other powerful language.
The best approach is to create a tool that, based in your data dictionary stored in a database, generates the ASP or whatever HTML required, so you improve performance. If the dictionary or a form change, you simply run your generator, and voila!, your new page is ready.
You also need to be able to allow "overriding" the dictionary if required. For example, in some cases the word "Telephone" will be too long for certain forms. Furthermore, you also need to have a code generator good enough, so you don't have to manually modify your generated code, and if you need to do it, your tool will be smart enough to remember those changes.
Unfortunately I can't help you more with this. My recommendations are: (1) improve your skills, (2) look for open source projects that do what you need, (3) if willing, help the project, and (4) leave everybody in the dust generating applications faster than anybody else. ;)

Where do ORMs fall through?

I often hear people bashing ORMs for being inflexible and a "leaky abstraction", but you really don't hear why they're problematic. When used properly, what exactly are the faults of ORMs? I'm asking this because I'm working on a PHP orm and I'd like for it to solve problems that a lot of other ORMs fail at, such as lazy loading and the lack of subqueries.
Please be specific with your answers. Show some code or describe a database schema where an ORM struggles. Doesn't matter the language or the ORM.
One of the bigger issues I have noticed with all the ORMs I have used is updating only a few fields without retrieving the object first.
For example, say I have a Project object mapped in my database with the following fields: Id, name, description, owning_user. Say, through ajax, I want to just update the description field. In most ORMs the only way for me to update the database table while only having an Id and description values is to either retrieve the project object from the database, set the description and then send the object back to the database (thus requiring two database operations just for one simple update) or to update it via stored procedures (which is the method I am currently using).
Objects and database records really aren't all that similar. They have typed slots that you can store stuff in, but that's about it. Databases have a completely different notion of identity than programming languages. They can't handle composite objects well, so you have to use additional tables and foreign keys instead. Most have no concept of type inheritance. And the natural way to navigate a network of objects (follow some of the pointers in one object, get another object, and dereference again) is much less efficient when mapped to the database world, because you have to make multiple round trips and retrieve lots of data that you didn't care about.
In other words: the abstraction cannot be made very good in the first place; it isn't the ORM tools that are bad, but the metaphor that they implement. Instead of a perfect isomorphism it is is only a superficial similarity, so the task itself isn't a very good abstraction. (It is still way more useful than having to understand databases intimately, though. The scorn for ORM tools come mostly from DBAs looking down on mere programmers.)
ORMs also can write code that is not efficient. Since database performance is critical to most systems, they can cause problems that could have been avoided if a human being wrote the code (but which might not have been any better if the human in question didn't understand database performance tuning). This is especially true when the querying gets complex.
I think my biggest problem with them though is that by abstracting away the details, junior programmers are getting less understanding of how to write queries which they need to be able to to handle the edge cases and the places where the ORM writes really bad code. It's really hard to learn the advanced stuff when you never had to understand the basics. An ORM in the hands of someone who understands joins and group by and advanced querying is a good thing. In the hands of someone who doesn't understand boolean algebra and joins and a bunch of other basic SQL concepts, it is a very bad thing resulting in very poor design of database and queries.
Relational databases are not objects and shouldn't be treated as such. Trying to make an eagle into a silk purse is generally not successful. Far better to learn what the eagle is good at and why and let the eagle fly than to have a bad purse and a dead eagle.
The way I see it is like this. To use an ORM, you have to usually stack several php functions, and then connect to a database and essentially still run a MySQL query or something similar.
Why all of the abstraction in between code and database? Why can't we just use what we already know? Typically a web dev knows their backend language, their db language (some sort of SQL), and some sort of frontend languages, such as html, css, js, etc...
In essence, we're trying to add a layer of abstraction that includes many functions (and we all know php functions can be slower than assigning a variable). Yes, this is a micro calculation, but still, it adds up.
Not only do we now have several functions to go through, but we also have to learn the way the ORM works, so there's some time wasted there. I thought the whole idea of separation of code was to keep your code separate at all levels. If you're in the LAMP world, just create your query (you should know MySQL) and use the already existing php functionality for prepared statements. DONE!
LAMP WAY:
create query (string);
use mysqli prepared statements and retrieve data into array.
ORM WAY:
run a function that gets the entity
which runs a MySQL query
run another function that adds a conditional
run another function that adds another conditional
run another function that joins
run another function that adds conditionals on the join
run another function that prepares
runs another MySQL query
run another function that fetches the data
runs another MySQL Query
Does anyone else have a problem with the ORM stack? Why are we becoming such lazy developers? Or so creative that we're harming our code? If it ain't broke don't fix it. In turn, fix your dev team to understand the basics of web dev.
ORMs are trying to solve a very complex problem. There are edge cases galore and major design tradeoffs with no clear or obvious solutions. When you optimize an ORM design for situation A, you inherently make it awkward for solving situation B.
There are ORMs that handle lazy loading and subqueries in a "good enough" manner, but it's almost impossible to get from "good enough" to "great".
When designing your ORM, you have to have a pretty good handle on all the possible awkward database designs your ORM will be expected to handle. You have to explicitly make tradeoffs around which situations you are willing to handle awkwardly.
I don't look at ORMs as inflexible or any more leaky than your average complex abstraction. That said, certain ORMs are better than others in those respects.
Good luck reinventing the wheel.

Avoid loading unnecessary data from db into objects (web pages)

Really newbie question coming up. Is there a standard (or good) way to deal with not needing all of the information that a database table contains loaded into every associated object. I'm thinking in the context of web pages where you're only going to use the objects to build a single page rather than an application with longer lived objects.
For example, lets say you have an Article table containing id, title, author, date, summary and fullContents fields. You don't need the fullContents to be loaded into the associated objects if you're just showing a page containing a list of articles with their summaries. On the other hand if you're displaying a specific article you might want every field loaded for that one article and maybe just the titles for the other articles (e.g. for display in a recent articles sidebar).
Some techniques I can think of:
Don't worry about it, just load everything from the database every time.
Have several different, possibly inherited, classes for each table and create the appropriate one for the situation (e.g. SummaryArticle, FullArticle).
Use one class but set unused properties to null at creation if that field is not needed and be careful.
Give the objects access to the database so they can load some fields on demand.
Something else?
All of the above seem to have fairly major disadvantages.
I'm fairly new to programming, very new to OOP and totally new to databases so I might be completely missing the obvious answer here. :)
(1) Loading the whole object is, unfortunately what ORMs do, by default. That is why hand tuned SQL performs better. But most objects don't need this optimization, and you can always delay optimization until later. Don't optimize prematurely (but do write good SQL/HQL and use good DB design with indexes). But by and large, the ORM projects I've seen resultin a lot of lazy approaches, pulling or updating way more data than needed.
2) Different Models (Entities), depending on operation. I prefer this one. May add more classes to the object domain, but to me, is cleanest and results in better performance and security (especially if you are serializing to AJAX). I sometimes use one model for serializing an object to a client, and another for internal operations. If you use inheritance, you can do this well. For example CustomerBase -> Customer. CustomerBase might have an ID, name and address. Customer can extend it to add other info, even stuff like passwords. For list operations (list all customers) you can return CustomerBase with a custom query but for individual CRUD operations (Create/Retrieve/Update/Delete), use the full Customer object. Even then, be careful about what you serialize. Most frameworks have whitelists of attributes they will and won't serialize. Use them.
3) Dangerous, special cases will cause bugs in your system.
4) Bad for performance. Hit the database once, not for each field (Except for BLOBs).
You have a number of methods to solve your issue.
Use Stored Procedures in your database to remove the rows or columns you don't want. This can work great but takes up some space.
Use an ORM of some kind. For .NET you can use Entity Framework, NHibernate, or Subsonic. There are many other ORM tools for .NET. Ruby has it built in with Rails. Java uses Hibernate.
Write embedded queries in your website. Don't forget to parametrize them or you will open yourself up to hackers. This option is usually frowned upon because of the mingling of SQL and code. Also, it is the easiest to break.
From you list, options 1, 2 and 4 are probably the most commonly used ones.
1. Don't worry about it, just load everything from the database every time: Well, unless your application is under heavy load or you have some extremely heavy fields in your tables, use this option and save yourself the hassle of figuring out something better.
2. Have several different, possibly inherited, classes for each table and create the appropriate one for the situation (e.g. SummaryArticle, FullArticle): Such classes would often be called "view models" or something similar, and depending on your data access strategy, you might be able to get hold of such objects without actually declaring any new class. Eg, using Linq-2-Sql the expression data.Articles.Select(a => new { a .Title, a.Author }) will give you a collection of anonymously typed objects with the properties Title and Author. The generated SQL will be similar to select Title, Author from Article.
4. Give the objects access to the database so they can load some fields on demand: The objects you describe here would usaly be called "proxy objects" and/or their properties reffered to as being "lazy loaded". Again, depending on your data access strategy, creating proxies might be hard or easy. Eg. with NHibernate, you can have lazy properties, by simply throwing in lazy=true in your mapping, and proxies are automatically created.
Your question does not mention how you are actually mapping data from your database to objects now, but if you are not using any ORM framework at the moment, do have a look at NHibernate and Entity Framework - they are both pretty solid solutions.