How do I control the definition, presentation, validation and storage of HTML form fields from one place? - sql

I want to be able to define everything about a form field in one place, as opposed to having some info in the DB, some in HTML, some in JavaScript, some in ASP...
Why do I have to worry about possibly changing things in four separate places (or more) when I want to change something about one field?
I.e., I don't want to:
declare the field in the DB
and duplicate some of that info in HTML somewhere
and duplicate some more info in some JavaScript somewhere
and duplicate some more info in some ASP somewhere
Since I'm a developer, I'm ideally looking for a methodology, not a tool or S/W package. (I think!)
Currently, I'm doing this by putting all control information into SQL's extended property "Description" text area. E.g., a required phone number field would have the following SQL declaration:
[home_phone] [varchar](15) NOT NULL
and I put the following "controls" in the Description extended property:
["Home Phone"][phone_text][user_edit][required][allow_na][form_field_size_equals_size][default=""][group="home_address"][rollover="enter only: numbers, dash, parenthesis, space"][explanation="enter <strong>n/a</strong> if you don't have a home phone"]
With my current system, the following HTML is dynamically generated for the Home Phone field:
<div class="div-item" id="item-FORM:FIELD:TABLE_HOME:HOME_PHONE">
<div class="div-item-description" id="item_description-FORM:FIELD:TABLE_HOME:HOME_PHONE">
<span class="rollover-explanation" title="enter only: numbers, dash, parenthesis, space">
<label for="FORM:FIELD:TABLE_HOME:HOME_PHONE" id="item_label-FORM:FIELD:TABLE_HOME:HOME_PHONE">
Home Phone
</label>
</span>
</div>
<div class="div-item-stipulation" id="item_stipulation-FORM:FIELD:TABLE_HOME:HOME_PHONE">
<span class="stipulation-required" id="item_stipulation_span-FORM:FIELD:TABLE_HOME:HOME_PHONE" title="required" >
*
</span>
</div>
<div class="div-item-value" id="item_value-FORM:FIELD:TABLE_HOME:HOME_PHONE">
<div class="individual-forms">
<form class="individual-forms" id="FORM:TABLE_HOME:HOME_PHONE" name="FORM:TABLE_HOME:HOME_PHONE" action="" method="post" enctype="multipart/form-data" onsubmit="return(false);">
<div class="individual-forms-element">
<input
class=""
type="text"
id="FORM:FIELD:TABLE_HOME:HOME_PHONE" name="FORM:FIELD:TABLE_HOME:HOME_PHONE"
size="15" maxlength="15"
value=""
FORM_control="true"
FORM_control_name="Home Phone"
FORM_control_is_required="true"
FORM_control_is_phone_text="true"
>
</div>
</form>
</div>
</div>
<span class="spanExplanation">
enter <strong>n/a</strong> if you don't have a home phone
</span>
</div>
which looks like this (in IE 7):
Client-side JavaScript validation is controlled by the **FORM_control**... parameters, which on error produces explanations and field highlighting. (Unfortunately, custom parameters in HTML elements isn't exactly standards compliant.)
My primary problem is that this method using the Description field has always been cumbersome to use and maintain. The Description property can only be 255 chars, so I have lots of abbreviations. As the system has expanded, the number of controls has also greatly expanded past the original dozen or so. And my code for interpreting all these controls and their abbreviations is just not pretty or efficient. And as I said, custom parameters in HTML elements don't work in FireFox.
Things I'm currently controlling (and want to continue to control) include:
Field description (e.g. "Home Phone Number")
DB table name (e.g., "home_address")
DB field name (e.g., "home_phone")
DB field type/size
DB allow null
Grouping (e.g., this particular field is part of all "Home" fields)
Required/optional
Read-only (for system supplied data)
Size (presented form field size)
Type (e.g., text, numeric, alpha, select, zipcode, phone, street address, name, date, etc)
Accepted input (non-blank; numeric only; no spaces; phone number; reg exp; etc)
Extended explanation (e.g., for phone # "enter n/a if you don't have a home phone")
Roll-over explanation (e.g., for phone # "enter only: numbers, dash, parenthesis, space")
Rows (for select lists -- 1 = drop-down)
Rows/Columns (for textareas)
Error message text
Error indication (how to show which field contains an error, e.g., red background)
Etc...
And to be clear, I'm all for separation of logic and design elements. I do have a separate CSS file which is manually maintained (not part of the generation process).
My server environment is classic (non-.Net) ASP and SQL 2008. I'm pretty good with HTML, CSS, JavaScript, and ASP and I'm comfortable with SQL.
What I imagine that I want is some sort of JSON, XML, etc that is the single source used to generate everything, e.g.:
a SQL script that actually creates the SQL tables
the HTML (with CSS classes and JavaScript client-side validation/function calls)
the ASP (server-side validation)
My current method that does this is dynamic (not compiled) and pretty slow, so I'm probably looking for some sort of "compiler" that generates this stuff once. And I really only have classic ASP, JavaScript or SQL as the available languages for this "compiler".
And while I think I could create this system myself, I'm hoping that other, better developers have already come up with something similar.
Assume this should scale to at least scores of fields. (FYI, my current form has way too many fields on one page, but I'm solving that issue separately.)
Thanks for any help!

Javascript validation is overrated
I think javascript validation is overrated. It was good in the days when a server round-trip could take 10's of seconds but typically now it can take less than 3 seconds. In you factor in an AJAX submission process you can bring that time down to sub-second.
In return for all that effort to slice off a round-trip you have to deal with all the various complexities of cross-browser support, complex debugging, lack of server-side logging and dealing with the case where JS is disabled by the user. In a typical scenario we're talking about a lot of wasted hours and difficult debugging (try asking a typical idiot what browser they use, let alone what version they're using).
The database as a one-stop validator
You said the database isn't a complete validation environment but I think that's no longer true. A modern database like PostgreSQL will allow you to hook up complex validation functions as triggers in pretty much your language of choice and return appropriate error responses to the application.
So if you follow where I'm going here it IS possible to validate in one place, the database, without the historical drawbacks. The process is:
Create a basic HTML form, forget
HTML5 or Javascript validation.
When the form is complete, or as
required, submit it via AJAX (if
available) or standard POST if not.
Pass the UPDATE/INSERT more or less
straight to the DB where your
trigger functions normalise and
validate the data.
Immediately return the result and or
errors (probably via a transaction),
and perform any further server
processing at this stage. If you
decide not to keep the data you
could either delete the new row or
rollback a transaction.
On conclusion return any appropriate
redirection, messages or updates to
the browser via JSON/AJAX or a
reload with the cleaned data.
This may sound slow/inefficient but I think that's ignoring todays realities, namely:
Pretty much everything is broadband now, even wireless.
Processing power is cheaper than
developer time.
These sorts of
updates tend to be limited by the
speed users can fill out forms, you're not going to
hammer your DB in a typical
scenario.
You still have to do the validation somewhere, so why not the DB?
And the benefits are huge:
On a high volume server (like an exchange, twitter, feed, etc) this
process lends itself to API control
via SOAP/AJAX/RSS/whatever since
only a thin layer is need to
transfer data between the API client
and the DB.
No matter what client
language or protocols are used the
validation remains the same.
Even a raw SQL statement gets
validated which can prevent
programming errors, corrupted
imports or 3rd-party feeds from
destroying your data structures.
Triggers are easily toggled if
required. It can often be harder in normal code.
Validation is always consistent.
Validation functions live inside the
database, allowing then to access
indexes and other rows or tables without
connector overhead, data conversion
and network/socket lag.
Validation functions can run in
compiled code, even if your web
server language is dynamic.
The only real drawbacks are:
Difficult to upgrade or migrate to
other database software.
Difficult if your preferred language
isn't supported (However Postgres
supports functions written in C,
PL/pgSQL, Python, TCL, Perl, Java,
R, Ruby, Scheme, Bash and PHP so
unless you're stuck on C#/VB you
should find one you can handle).
Context sensitivity
There are some aspects of your question I wouldn't recommend at all. Primarily where you're trying to tie the presentation of HTML form objects to your data in a single location.
This idea is going to backfire very quickly because you will find that in a typical application the presentation of information is highly sensitive to context - specifically the target audience.
For example on an ordering system you may have data entered by a client that is then accessible to an admin. The clients view of the data may be more limited, it may need a different title and description, it may be preferable to display as checkboxes while a admin gets a more compact view. You may even be presenting the same data to different types of client (retail vs. wholesale).
In short, the presentation of data is typically required to be more fluid than its validation so at some point you should really draw the line - even if that means some repetition.

I've been working on exactly the same problem at my job. I can't stand repeating myself, particularly because I know that when I have to change something months later, I'll never remember all of the scattered redundant pieces. The answer must take into account the following truths:
The database should, as much as is reasonable possible, validate itself. This is basic data integrity; the DB should throw a fit if you try to put invalid data in it.
The database cannot validate itself. It's easy to add constraints for uniqueness, or format, or foreign keys, and technically SQL can go a lot further, but if you're enforcing, say, address/zip code correspondence at the database level, you're going to regret it. Some part of the validation logic must live in the server-side code. In your case and mine, this means the ASP.
If you want client-side validation, this means Javascript.
At this point, we're already talking about validation constraints in three languages, and the impedance mismatch between them may be significant. You can't always factor validation out to one of them. All you can do is keep the logic together as much as possible.
The solution you suggest has one giant advantage – that all of the logic is in one place, together. This advantage is balanced by several drawbacks:
You can't do any validation at all without talking to the database.
In order to get the metadata from the database to your ASP, you have to have special code to interpret your metadata minilanguage. This is far more complex than accepting some degree of redundancy.
Your metadata puts front-end display code in your database. This means that if you want to change the text, you have to edit your database model. This is a rather drastic requirement which ties your database model to your presentational logic. In addition, internationalization is virtually impossible.
Because there are so many translational layers between your metadata and the user, any extension to your metadata space will require the revision of several layers of tightly coupled code.
To try to find a middle ground between your solution and the redundancy it's designed to avoid, I suggest the following:
Put basic validation constraints in the database.
Create a system in ASP for specifying a behavioral data model with arbitrary contents and validation constraints. When you define your model using this syntax, you will duplicate only the bare-bones constraints in the database.
Create a system in ASP to display form fields on the page in HTML. A form field declaration will reference the appropriate data model and additionally include display code such as labels and descriptive text. The HTML generation code can use sensible defaults derived from the data model. The only duplicated data should be the name of the field, which is used as a key to bind a displayed field to the appropriate data model.
Create or find Javascript validation library. Have the aforementioned HTML generation code automatically insert hooks to this library in the generated markup based on the associated data model.
Thus, you have a system where information about a field may be stored in a handful of places, depending on where it is most appropriate, but almost never duplicated. Validation information is declared in the ASP data model. Display information is found only in the on-page field declaration. The field name is used throughout this stack as a key to link them together, and the hierarchy of concerns allows you to override assumptions made on lower levels as needed.
I'm still working on my implementation of this design, but if you're interested, I can post some sample code.

Seems to me that this gos against every principle of separation of logic and design elements. I know that on the bigger projects I worked on, there were actual SDLC requirements that dictated one type of engineer could touch one level of file, while a UI Engineer could touch another and a "code monkey" could only touch a subset of that. Could you imagine the chaos that would ensue in that scenario? The Code monkey would have to get permission from the UI Engineer who in turn would have to coordinate with the engineer who would have to join on a conference call with integration who would have to ping tech support who then would shelve the project until business asked legal....
All kidding aside, I don't think your method is bad.
I do believe in handling things as they were meant to be handled natively, i.e. building a form text field is likely more efficiently handled by html natively than by database calls which then build html via a series of scripts. Your "compiled" method makes me wonder if it would cancel out the benefits of caching of common javascript and css elements in their respective files.
There are frameworks such as Zend, CodeIgniter, and Symfony (on the PHP side) that are getting closer to what you mention via built-in functionality....although they're not there yet. Zend in particular uses programmic features to build, validate and style forms, and once you figure out its nuances it's quite powerful. Perhaps it could serve as a model for your ultimate quest. Although it seems like you're a classic asp guy....and that's not what you're looking for. I digress.

I think this question is out of my scope of knowledge, but I figure it doesn't hurt to try and help.
I'm working on my first PHP site and since the beginning, since I could not predict many factors of the site and only being one person, I decided from the beginning that every design element on every page will be maintainable through one page. This is a learning experience, so I'm not too concerned that there wasn't much planning involved, but some things just grind my gears, like naming conventions, but with my method, I always am able to make site wide changes with ease.
Every page I make is structured like this:
<?php require_once 'constants.php'; ?>
<?php $pageTitle = 'Home Page'; ?>
<?php require_once REF_LAYOUT_TOP; ?>
<h1>Hello!</h1>
<p>World</p>
<?php require_once REF_LAYOUT_BOTTOM; ?>
In the constants, I have constants for just about everything. CSS colors (for consistent layout), directory locations, database connections, links, constants just for certain pages (so I can modify file names and not have them damage anything), and all sorts of things.
The top part contains navigation, error handling JavaScript scripts, any kind of dynamically created content, the navigation, etc.
This way, if I ever want to implement something new, it's implemented every where. I gave jQuery a shot and it required only one link.
Possible Solution
If you are trying to adjust many things from one location, I highly suggest you invest in a little PHP knowledge. Since PHP is just a server script, it's only output is text. In other words, you can insert PHP in JavaScript, HTML, and just about anywhere. This is how you can set up the same text for all sorts of hover pop ups. I don't know if ASP will prevent you from doing this (I have zero knowledge of it).
I figure this is how most sites must be constructed. It has to be ... how else could they maintain hundreds of pages? I think this is the most logical and semantically correct.

I'm not familiar with ASP, so I'll be speaking more generally without knowing how they're implemented.
Generally, a form represents the information needed to create, edit, or delete an entity. So I'd start with an Entity class. In other architectures this is typically called a model (as in Model-View-Controller). The Entity class determines what information it needs, and it takes care of the database queries.
A form could be built from the Entity directly this way. The Entity gives you more direct control, for example, you may have a field in the database for an integer, though the value you really want is between 0 and 255. The Entity can know this more specific constraint, even if the database doesn't.
Next, you could create some sort of form class that would use an entity to generate its interface. It would take care of all the HTML, Javascript, and whatever else you needed.
The entity could have a good variety of types. The representation in the database can be effectively separated. Let's say a post can have many tags. In the database you'd probably keep two tables, one for posts and another for tags. But the entity would represent a post, along with a list of tags, so they're not separate.
The form class can take care of what this looks like, and you just worry about the semantics. For example, if the entity calls for a list of strings, the form could implement that by using Javascript to create an expanding list of text fields, then the form takes care of properly submitting this data to the Entity.
The form would also make a difference in multiple fields working together, or parsing. For example, if the form saw a type that could be null, it would offer an explanation saying "Type n/a if you don't have a phone number" and if it saw that string, correctly return null.
A Type class could be an interface to validate form data. If all the validate() methods on all the types return true, the form is submitted. Each type also takes care of parsing its values (like the "n/a" parsing) so the right thing is submitted.
One point of this is that forms are not analogous to tables. An id field in a table shouldn't show up in a form, and some data may be connected to it in another table, so think of the forms in terms of the "entity" it's modelling, not the table. It's just an adaptor.

I define my schema for each table in XML files. Then I wrote one set of CRUD methods that can operate on any XML schema (passed in as a request paramter). Beside CRUD it can create and drop the table, export the contents to CSV and import a CSV file as well. All I have to do is drop an new schema file in my schema directory and I have full CRUD for this new table. If a field is a FK, a link automatically appears next to the input box during an INSERT or UPDATE that when clicked, pops open a window to look up the foreign key. If the field is a DATE, a link for a pop up calendar appears automatically.
I did this using Java EE and jsp. But I'm sure it could be done with php as well.
<schema>
<tableName>xtblPersonnel</tableName>
<tableTitle>Personnel</tableTitle>
<tableConstraints></tableConstraints>
<!-- COLUMNS ====================================== -->
<column>
<name>PID</name>
<type>VARCHAR2</type>
<size>9</size>
<label>Badge ID</label>
</column>
<column>
<name>PCLASS</name>
<type>VARCHAR2</type>
<size>329</size>
<label>Classification</label>
</column>
<column>
<name>PFOREMAN</name>
<type>VARCHAR2</type>
<size>9</size>
<label>Foreman Badge</label>
</column>
<column>
<name>REGDATE</name>
<type>DATE</type>
<size>10</size>
<label>Registration Date</label>
</column>
<column>
<name>PISEDITOR</name>
<type>VARCHAR2</type>
<size>3</size>
<label>Is Editor?</label>
<help>0=No</help>
<help>1=Yes</help>
</column>
<column>
<name>PHOME</name>
<type>VARCHAR2</type>
<size>9</size>
<label>Home?</label>
</column>
<column>
<name>PNOTE</name>
<type>VARCHAR2</type>
<size>35</size>
<label>Employee Notes</label>
</column>
<!-- Primary Keys ====================================== -->
<!-- The Primary Key type can be timestamp, enter, or a sequence name) -->
<primaryKey>
<name>PID</name>
<type>enter</type>
</primaryKey>
<!-- FOREIGN KEYS ====================================== -->
<!-- The Foreign Key table is the lookup table using the Key to retreive the Label -->
<foreignKey>
<name>PID</name>
<table>phonebook</table>
<key>badge</key>
<label>lname</label>
</foreignKey>
</schema>

Bravo! Your idea is pretty good and the concept is in the right direction, and it already has been done by multiple companies. That was the original "RAD" (Rapid Application Development) concept.
The idea was to keep the attributes of each field in a database, aka "metadata repository" or "data dictionary". This is not only a good idea, but it is a best practice, so all the fields are consistent in type, length, description, etc. The data dictionary should be used not only with the user interface, but also with the database creation. Going a bit further, with this approach you can easily handle multiple locales.
Unfortunately, RAD tools are not that common these days. They are expensive, and in some cases inflexible and restrictive. Programmers love to program, and look with disdain those kind of tools. But, who knows? A new open source project seems to start every day!
Unfortunately your "tool set" is pretty limited, and creating a RAD tool is no trivial task: it involves an unexpected degree of complexity. You probably need to learn .NET, Java, or any other powerful language.
The best approach is to create a tool that, based in your data dictionary stored in a database, generates the ASP or whatever HTML required, so you improve performance. If the dictionary or a form change, you simply run your generator, and voila!, your new page is ready.
You also need to be able to allow "overriding" the dictionary if required. For example, in some cases the word "Telephone" will be too long for certain forms. Furthermore, you also need to have a code generator good enough, so you don't have to manually modify your generated code, and if you need to do it, your tool will be smart enough to remember those changes.
Unfortunately I can't help you more with this. My recommendations are: (1) improve your skills, (2) look for open source projects that do what you need, (3) if willing, help the project, and (4) leave everybody in the dust generating applications faster than anybody else. ;)

Related

Automatic verification of database contents

Background
I have a software component that writes data to a postgres database (into several tables) and I want to write an automatic functional test for this component. I already have a host of unit tests in place that check the subcomponents, but I'd like a test that checks the whole system end-to-end.
For each test run, I use a clean database (actually a completely new, this-test-run-only database). The software component is stable in the sense that given the same input, it will always write the same user data to the database.
The database design is relational, such that most tables contain foreign keys. Obviously, I don't want to check the value of these keys, because I don't want to rely on the fact that these keys are generated in a predictive manner by postgres.
Assume that there are no issues regarding user rights on the database, connection issues etc. Also disregard development/production disparities.
I currently use a number of select statements to produce a textual "dump" of the database and compare it to a reference dump (ignoring whitespace and so on), but this seems rather clumsy. Also, this doesn't take into account the relationships between the tables. Extending the current approach to deal with this doesn't strike me as maintainable at all, should the database layout ever change.
My software as well as the testing framework is written in C++, the testing scripts are simple bash scripts. I'm open to use any language to achieve this.
Question
How can I automatically verify the database contents in "the database way"?
Even better would be an approach that doesn't rely on postgres as the backend.
pgTap is a testing framework for PostgreSQL. You can use it to test both the structure and the content of a PostgreSQL database. I've used it on projects that had to meet certain contractual standards for seeded data (data for "lookup" tables like state codes and abbreviations, delivery carriers, user roles, etc.). It has worked well for that purpose.
But I don't yet see a compelling reason to abandon your current method, which is already written and working. Text dumps of single tables are supported by all current SQL dbms, as far as I know. If you move to a different dbms, you'll have to change the name of the dump program and the arguments to it. I can't imagine why you'd need to change the reference file, but I suppose that could happen.
The "database way" is really just to select the data you expect to be in the database, and see if it's really there. That's pretty much what you're doing now, and what pgTap does with perhaps greater flexibility.
To increase maintainability (to reduce duplication), you could generate the INSERT statements from the reference data, or you could generate the reference data from the INSERT statements. I can imagine development environments where that would be a wise thing to do, but I don't know whether yours is one of them.

Is there a Rails convention to persisting lots of query data to the browser?

I have an application that allows the user to drill down through data from a single large table with many columns. It works like this:
There is a list of distinct top-level table values on the screen.
User clicks on it, then the list changes to the distinct next-level values for whatever was clicked on.
User clicks on one of those values, taken to 3rd level values, etc.
There are about 50 attributes they could go through, but it usually ends up only being 3 or 4. But since those 3 or 4 vary among the 50 possible attributes, I have to persist the selections to the browser. Right now I do it in a hideous and bulky hidden form. It works, but it is delicate and suboptimal. In order for it to work, the value of whatever level attribute is on the screen is populated in the appropriate place on the hidden form on the click event, and then a jQuery Ajax POST submits the form. Ugly.
I have also looked at Backbone.js, but I don't want to roll another toolkit into this project while there may be some other simple convention that I'm missing. Is there a standard Rails Way of doing something like this, or just some better way period?
Possible Approaches to Single-Table Drill-Down
If you want to perform column selections from a single table with a large set of columns, there are a few basic approaches you might consider.
Use a client-side JavaScript library to display/hide columns on demand. For example, you might use DataTables to dynamically adjust which columns are displayed based on what's relevant to the last value (or set of values) selected.
You can use a form in your views to pass relevant columns names into the session or the params hash, and inspect those values for what columns to render in the view when drilling down to the next level.
Your next server-side request could include a list of columns of interest, and your controller could use those column names to build a custom query using SELECT or #pluck. Such queries often involve tainted objects, so sanitize that input thoroughly and handle with care!
If your database supports views, users could select pre-defined or dynamic views from the next controller action, which may or may not be more performant. It's at least an idea worth pursuing, but you'd have to benchmark this carefully, and make sure you don't end up with SQL injections or an unmanageable number of pre-defined views to maintain.
Some Caveats
There are generally trade-offs between memory and latency when deciding whether to handle this sort of feature client-side or server-side. It's also generally worth revisiting the business logic behind having a huge denormalized table, and investigating whether the problem domain can't be broken down into a more manageable set of RESTful resources.
Another thing to consider is that Rails won't stop you from doing things that violate the basic resource-oriented MVC pattern. From your question, there is an implied assumption that you don't have a canonical representation for each data resource; approaching Rails this way often increases complexity. If that complexity is truly necessary to meet your application's requirements then that's fine, but I'd certainly recommend carefully assessing your fundamental design goals to see if the functional trade-offs and long-term maintenance burdens are worth it.
I've found questions similar to yours on Stack Overflow; there doesn't appear to be an API or style anyone mentions for persisting across requests. The best you can do seems to be storage in classes or some iteration on what you're already doing:
1) Persistence in memory between sessions/requests
2) Coping with request persistence design-wise
3) Using class caching

There has to be a better way to do localized database fields

So far there've been several questions regarding this, and they've all come down to the same answer: one table for the language-neutral data, 1-* to a table with the translations and an indexed language ID field.
This has several problems:
Twice as much CRUD.
Need for Ajax CRUD if you want a decently friendly web UI.
More than twice the validation -- you need to ensure that the relationship is 1-* rather than 0-*.
Collation differences between languages isn't accommodated.
Queries require joins.
If you want slugs in multiple languages, oh boy.
A lot of database people have worked on all sorts of theoretical and practical problems, but surprisingly few people work on this one.
I think what we need ultimately is:
A field type that'll store multiple versions of strings
Multiple indices for each such field, one for each language or variation, with the option to specify the correct collation mode
A standard ORM object for this crazy thing
UI elements
Overkill? Sure, maybe, but the whole problem is a real nightmare as it is. And it's not exactly an uncommon scenario.
We gotta try to convince server vendors to work on this.
Edit: By the way, this is my first time using the community wiki; hopefully I'm doing it right.
Edit 2: Something about my wording seems to have made people think that I'm attacking the very concept of DBMS. I'm not; I'm simply saying that built-in support for localization is a much-needed feature.
I probably shouldn't have mentioned performance; it's of course completely negligible most of the time. The focus of my concern is on the fact that this really stifles productivity.
I'll provide an example. Suppose I have a very trivial table for a decidedly trivial store:
Products (id, price, description, name, slug)
In EF/MVC, I'd throw this in the ORM designer, maybe encapsulate it in a repository, build a Products controller, and have actions for Index, Details, Create, Update, Edit and Delete. To identify a product in any of the items, I'd simply do a WHERE(slug = #slug). I'd make a view model for the create/edit actions, design the form control, and wire it up straight to the repository. Done and done. To access the details for a product, the user would go to /products/details/product-slug.
But then since the rest of the website is bilingual, I decide to change the products table accordingly.
Products (id, price)
ProductsText (productId, language, description, name, slug)
Hey, that's not so bad. Yeah, not yet. Then you write your relationships and your constraints, and then you write you write out all your properties in the view-model, and then you make a complete CRUD controller for the ProductsText data or use jQuery/Ajax to add create/update/edit buttons on your Products controller, and then you add validation logic to make sure the user enters at least the primary language, and then when you want to read data for the end-user pages you write another query to take join ProductsText.slug and ProductsText.language with Products... I probably missed something, but you get the idea.
The complexity of the program just explodes with boilerplate code once you have localization involved.
Of course, I don't expect the problem to be solved completely, and it's obviously just as much a UI problem as it is a database problem. But there's just so much that could be done to make all this easier. A "multistring" field type might be a really good start.
Edit 3: Anyone ever hear of SQL Server Modeling Services? It has some localization tools in it that could be a step in the right direction. Still CTP though.
-- Simulate the French locale with the SET LANGUAGE statement.
SET LANGUAGE French
select Id, CountryName,
[System.Globalization].[SessionsString](CountryName, 1) as CountryNameString
from [Location].[CountriesTable]
What is a localized database field?
Typically in applications we've worked in, the UI is localized. This is accomplished using a database, and we put all the translations (and potentially the master phrases) in the table with a locale-code and phraseid being the primary key. This is fairly straightforward, requires a single reusable set of stored procs and has good performance and the usage is well-understood. We often allow translation on the fly so that the app interface includes a translation feature where corrections can be made and other users will see them live - either rich forms applications or web forms applications (depending on caching - which is another key feature of UI localization)
As far as querying requiring joins - that's just a fact of life in a normalized relational database, and performance there is usually managed with a good normalized design and proper indexing.
In other "data", it has made little sense to localize except under direction of the application requirements. For instance, even though you may offer a product in multiple countries, the SKU and distributor may be different. This level of localization is very application specific and we often dealt with it as a separate database and there really isn't anything tying those individually country database together - many products were not available although there may have been equivalent products in the other countries.
If you are selling the same products around the world, then you kind of fall into the original scenario in a kind of multi-lingual CMS. This requires significant work besides the low-level database. For instance, if someone corrects the default product description, what flags the translators that the translations need to also be corrected? These questions are non-trivial. Although I can see where database vendors could assist with features, these are intrinsic difficulties of application requirements and design and not necessarily something the database can add features which will universally solve.
The collation issue is indeed a little awkward. Typically data is stored in nvarchar and you would not know the collation you wanted for retrieval at the time you wrote the stored proc, since the locale would be a parameter. This only affects collections retrieved which need to be ordered by content, not usually natural key and certainly not retrieval by key - it's not a large problem, but is one which cannot easily be handled without dynamic SQL (casting using the preferred collation from a table depending upon the location passed in, if you mix data from different locales, you would have to decide if you want to sort by locale first and then it may be difficult to pick a collation which might work properly within all locales in the same result set). You are probably going to want to use a Windows collation with such a wide variety of data.
Similarly with ORMs, we typically treated the composite unique key of locale/phraseid as the key to retrieve objects (we typically also had a surrogate identity primary key) - I know that traditional ORMs don't necessarily like this departure from retrieval by a meaningless surrogate key.
I've encountered all of these issues for localized CRM-style web sites. Not fun to design and optimize, but it can be done. My 2¢ worth:
1. Twice as much CRUD.
This depends on how your CRUD is designed. Any of my stored procedures or functions that can retrieve a possibly-localized field take a locale/culture code parameter. All of these fields are also NVARCHAR to avoid encoding issues.
2. Need for Ajax CRUD if you want a decently friendly web UI.
I suppose so, but this is application-dependent. Should defer to the "internal" CRUD (DRY principle).
3. More than twice the validation -- you need to ensure that the relationship is 1-* rather than 0-*.
This also assumes that all content is required in all supported locales, instead of using a fallback mechanism. For example, Microsoft's MSDN content is available in multiple locales, but some is in only one (generally this is US English, the "neutral" locale for Microsoft).
For a CRM-style system, any locale can be used for the initial content as long as the fallback uses that if the neutral content is not available.
4. Collation differences between languages isn't accommodated.
I find that it is easier to put all collation support at the UI/reporting layer. Multilingual-aware tables with collation/locale specified on a row-by-row basis would be a very nice-to-have feature but I wouldn't like to wait for it to become available...
5. Queries require joins.
Yes, definitely makes the query a bit more complicated :-) but no real way around that. Can get even more complicated if locale fallback is included (a "locale specificity" ranking field helps here).
6. If you want slugs in multiple languages, oh boy.
This is the reason that the .NET replacement parameters in the format string were designed to be indexed, not positional (printf(), etc. are positional). An English format may need replacements in 1, 2, 3 order, while the German equivalent uses 3, 1, 2.
To make life easier for localizers, whenever I create a .NET resource bundle I document the parameters including index, data type (including minimum and/or maximum string lengths), and a contextual description - context is important for determining text gender in some locales.
Plurality may also require multiple related resources as some locales need more than just "single" and "plural" (e.g. "0 files", "1 file", "2 files").
The same rules must apply to any localizable column in the database.
Well the answers are not that helpfull so far. I had the same problem on various projects I was doing in the past. And there was never a shortcut nor a solution out of the box that helpped me to solve this problem in a easy way. But your approach is going into the right direction and with a little work on your Data Access Layer you can actualy abstract all the burden that is caused by this requirement.
So for Metadata like Types, Categories, Countries etc. performance is not an issue since the whole stuff can be cached. For freetext entries it is a different story. You most probably can't cache them and they tend to be quite long.
You might already know those pages:
http://www.codeproject.com/KB/aspnet/LocalizedSamplePart2.aspx
http://www.sisulizer.com/online-help/DatabaseLocalization.shtml
Best-practices for localizing a SQL Server (2005/2008) database
In my experience I haven't commonly run into the problem where the data stored in the database has many language-dependent versions of the same text. Typically a developed application will have many language files for all the text that's more or less statically built into the application. Then we see database data for text users enter. While an application may be used by users with many different languages, the situation where users type the same text in multiple languages is not so common. Typically uses of an application will show the UI in their language and then enter and view data in their language.
For example, users of our application in the US vs in Netherlands or Saudi Arabia would see the UI in the language of their choice, but for any given installation, the data they enter will consistently be in their native language.
Obviously this doesn't apply to all cases. CRMs are an example where you would have the same text with multiple translations, like Wikipedia, but I think what I described above is the more common scenario.
"A lot of database people have worked on all sorts of theoretical and practical problems, but surprisingly few people work on this one."
That's because there is nothing to work on, from a theoretical perspective, in your example. The so-called "problems" you mention are, all of them, nothing more than a direct consequence of the fact that you are managing more data.
"Twice as much CRUD."
And why is that a problem ? I know of at least a few systems I built that had a lot more of that than your example.
"Need for Ajax CRUD if you want a decently friendly web UI."
Is that really so ? I don't know, but at any rate how data is handled in the presentation layer, is no concern of the DBMS, and if the programmer thinks it is too difficult/cumbersome, then don't blame the DBMS for that.
More than twice the validation -- you need to ensure that the relationship is 1-* rather than 0-*.
And why is that a problem ? If more business rules are stated, more validation is required.
"Collation differences between languages isn't accommodated."
How so ? What is the sense of collating English text with French ? Of English text with Ukrainian or Russian or Chinese ? Or did you mean something else ?
"Queries require joins."
And why is that a problem ?
"If you want slugs in multiple languages, oh boy."
In what context ? For what purpose ?
SELECT language,nllabel FROM ...
NATURAL JOIN (SELECT 'EN' as language UNION SELECT 'FR' as language)
Oh but wait, I forgot ... JOINs are also a problem.
"and it's obviously just as much a UI problem as it is a database problem."
I disagree that it is. When looking at your problem from a database angle, there are two things that might possibly be a small beginning of a solution :
the possibility to do full view updating (both through JOIN and through GROUP, for your case).
the possibility to have attributes of type 'table' inside database tables. You could then have the entire set of applicable localized names-stuff as a sinle attribute in a single row for your product/...
As for full view updating : don't hold your breath. You'll suffocate long before it has arrived.
As for nested tables : they might already exist, if anyone has them Oracle will, I don't really know, but I'm not really confident that this will really make life easier on the UI side of things.
Oh, and BTW : SQL is nowhere near "theoretically pure".

Avoid loading unnecessary data from db into objects (web pages)

Really newbie question coming up. Is there a standard (or good) way to deal with not needing all of the information that a database table contains loaded into every associated object. I'm thinking in the context of web pages where you're only going to use the objects to build a single page rather than an application with longer lived objects.
For example, lets say you have an Article table containing id, title, author, date, summary and fullContents fields. You don't need the fullContents to be loaded into the associated objects if you're just showing a page containing a list of articles with their summaries. On the other hand if you're displaying a specific article you might want every field loaded for that one article and maybe just the titles for the other articles (e.g. for display in a recent articles sidebar).
Some techniques I can think of:
Don't worry about it, just load everything from the database every time.
Have several different, possibly inherited, classes for each table and create the appropriate one for the situation (e.g. SummaryArticle, FullArticle).
Use one class but set unused properties to null at creation if that field is not needed and be careful.
Give the objects access to the database so they can load some fields on demand.
Something else?
All of the above seem to have fairly major disadvantages.
I'm fairly new to programming, very new to OOP and totally new to databases so I might be completely missing the obvious answer here. :)
(1) Loading the whole object is, unfortunately what ORMs do, by default. That is why hand tuned SQL performs better. But most objects don't need this optimization, and you can always delay optimization until later. Don't optimize prematurely (but do write good SQL/HQL and use good DB design with indexes). But by and large, the ORM projects I've seen resultin a lot of lazy approaches, pulling or updating way more data than needed.
2) Different Models (Entities), depending on operation. I prefer this one. May add more classes to the object domain, but to me, is cleanest and results in better performance and security (especially if you are serializing to AJAX). I sometimes use one model for serializing an object to a client, and another for internal operations. If you use inheritance, you can do this well. For example CustomerBase -> Customer. CustomerBase might have an ID, name and address. Customer can extend it to add other info, even stuff like passwords. For list operations (list all customers) you can return CustomerBase with a custom query but for individual CRUD operations (Create/Retrieve/Update/Delete), use the full Customer object. Even then, be careful about what you serialize. Most frameworks have whitelists of attributes they will and won't serialize. Use them.
3) Dangerous, special cases will cause bugs in your system.
4) Bad for performance. Hit the database once, not for each field (Except for BLOBs).
You have a number of methods to solve your issue.
Use Stored Procedures in your database to remove the rows or columns you don't want. This can work great but takes up some space.
Use an ORM of some kind. For .NET you can use Entity Framework, NHibernate, or Subsonic. There are many other ORM tools for .NET. Ruby has it built in with Rails. Java uses Hibernate.
Write embedded queries in your website. Don't forget to parametrize them or you will open yourself up to hackers. This option is usually frowned upon because of the mingling of SQL and code. Also, it is the easiest to break.
From you list, options 1, 2 and 4 are probably the most commonly used ones.
1. Don't worry about it, just load everything from the database every time: Well, unless your application is under heavy load or you have some extremely heavy fields in your tables, use this option and save yourself the hassle of figuring out something better.
2. Have several different, possibly inherited, classes for each table and create the appropriate one for the situation (e.g. SummaryArticle, FullArticle): Such classes would often be called "view models" or something similar, and depending on your data access strategy, you might be able to get hold of such objects without actually declaring any new class. Eg, using Linq-2-Sql the expression data.Articles.Select(a => new { a .Title, a.Author }) will give you a collection of anonymously typed objects with the properties Title and Author. The generated SQL will be similar to select Title, Author from Article.
4. Give the objects access to the database so they can load some fields on demand: The objects you describe here would usaly be called "proxy objects" and/or their properties reffered to as being "lazy loaded". Again, depending on your data access strategy, creating proxies might be hard or easy. Eg. with NHibernate, you can have lazy properties, by simply throwing in lazy=true in your mapping, and proxies are automatically created.
Your question does not mention how you are actually mapping data from your database to objects now, but if you are not using any ORM framework at the moment, do have a look at NHibernate and Entity Framework - they are both pretty solid solutions.

How does Virtuemart do EAV without using EAV?

I understand the three basic failures in EAV, namely that it takes a lot of work to reassemble the data. However, I want a database where I can add custom fields. A lot of people say that Virtuemart allows custom fields but without using an EAV database structure. Can someone explain how this can be done or provide links?
I believe they store custom fields in a chunk of XML or YAML or other domain-specific language.
Basically, they use Martin Fowler's Serialized LOB pattern.
This makes it hard to use SQL expressions to query the custom attributes. You have to fetch the whole row back into your application and parse out the custom attributes. But this is no worse than the pain caused by EAV.
See http://web.archive.org/web/20110709125812/http://sankuru.biz/en/blog/8-joomla-configuration-issues/35-the-cck-buzz-content-creation-kit-and-the-eav-problem.html
Virtuemart and CCK
Virtuemart (VM) custom user fields are
CCK-style, but do not rely on EAV.
Therefore, they are very usable, and
useful. I do recommend their use.
VM product types are also CCK-style,
but unfortunately do rely on EAV.
Therefore, I avoid VM product types
like the plague. Instead, I just
manually create additional fields in
the product record.
The VM attribute system (simple,
custom, advanced) is actually too
underpowered to be considered CCK
grade.
A good improvement to VM, would
consist in rephrasing the VM product
types and attributes to non-EAV
CCK-style custom fields (and therefore
make them work more like the VM custom
user fields).