I'm familiar with the definitions of DDL, DML, and DCL as applied to SQL. There are lots of web sites and books that define and explain them. But no one seems to give an authoritative reference.
I'm interested in the origin of these terms. Did SQL invent them? Were they already used historically for other databases? Did some other standard create them, and they were used by SQL? Or did SQL even use them at all in the ISO specifications?
One book indicates that SQL92 included these terms, but I can't find them in the draft available online. (Maybe I'll have to purchase the final SQL92 ISO specification to know for sure.) SQL:1999 switched to a different classification system.
(One reason I'm curious is that, if these were general industry terms not invented by SQL, then it wouldn't necessarily be incorrect to continue using them, complementary to the new SQL classifications.)
Can anyone provide more insight on the origin of these terms, along with authoritative references to any standards or specifications that might have originally defined them?
According to Wikipedia's "Data definition language" entry:
The concept of the data definition language and its name was first
introduced in relation to the Codasyl database model, where the schema
of the database was written in a language syntax describing the
records, fields, and sets of the user data model.[1] Later it was used
to refer to a subset of Structured Query Language (SQL) for declaring
tables, columns, data types and constraints. SQL-92 introduced a
schema manipulation language and schema information tables to query
schemas. These information tables were specified as SQL/Schemata in
SQL:2003. The term DDL is also used in a generic sense to refer to any
formal language for describing data or information structures.
A bit more detail is provided in Wikipedia's "Codasyl" entry:
In October 1969 the DBTG published its first language specifications
for the network database model which became generally known as the
CODASYL Data Model. This specification in fact defined several
separate languages: a data definition language (DDL) to define the
schema of the database, another DDL to create one or more subschemas
defining application views of the database; and a data manipulation
language (DML) defining verbs for embedding in the COBOL programming
language to request and update data in the database. Although the work
was focused on COBOL, the idea of a host-language independent database
was starting to emerge, prompted by IBM's advocacy of PL/I as a COBOL
replacement.
And the "Data Base Task Group" Wikipedia entry says it published a final report 2 years later:
In April 1971, the DBTG published a report containing specifications
of a Data Manipulation Language (DML) and a Data Definition Language
(DDL) for standardization of network database model. The first DBTG
proposals had already been published in 1969. The specification was
subsequently modified and developed in various committees and
published by other reports in 1973 and 1978. The specification is
often referred to as the DBTG database model or the CODASYL database
model. As well as the data model, many basic concepts of database
terminology were introduced by this group, notably the concepts of
schema and subschema.
The above covers DDL and DML. Unfortunately Wikipedia's "Data control language" entry doesn't have much detail at the time of writing and I was unable to find the origins of this term elsewhere. But given the above and the Google Ngram graph shown below, would suspect it came later - possibly some time in the mid-1970s.
And here is another graph showing all three terms that appears to back this up:
Related
For a Client of mine I'm documenting an existing database with a few tables and some queries.
For tables I'm using E-R Diagram to show tables and relationships. I'm doing that with DIA Diagram Editor.
How do I describe and visualize queries? There exist some sort of UML Object to do that?
So far, I've created a table with query name and a description of what it does and an example screenshot of data retrieved by the query itself. I'm doing that in Word.
I don't like the result of this work, there exist something more professional to do that?
I wouldn't install new software just know how database designers have to accomplish that task.
EDIT 1
As #Serg suggested I may use view and diagramm the view as an entity.
If I've understood something like:
What the client needs is to understand for each query where data come from.
EDIT 2
I'm doing something like the following:
Where
driver_tabella_utenti is a TABLE
driver_imp_causali_preparazione is a QUERY
driver_query_riepilogo_prsp is a QUERY
I think that isn't bast practice! How can I visual design sql queries as I do with entities and relationships?
Short summary
I see 2 options based on specification. This is only based on specification taken literally, not any additional research on the web.
Model class providing an operation representing what the SQL query does. Present a behavior describing this operation as a class stereotyped with <> and with linked object of a type OpaqueBehavior with provided values for attributes body (an SQL query) and language ('SQL').
Model class providing an operation representing what the SQL query does. Provide a note linked to the class containing description of respective OpaqueBehavior (body and language)
I've found another solution in the Specification
Model SQL query as action. The action can be depicted as usually in activity diagram (rectangle with rounded corners) and put SQL directly inside of the rectangle. The action needs to be a part of activity diagram (that is a description of a behaviour that utilizes this SQL).
Explanation
You can use a BehavioralFeature (e.g. Operation) to define that there is some SQL available (it sould be some class that exposes this operation) and then you can define a method with describing OpaqueBehavior (method) that contains body (SQL statement) and language ('SQL').
As specification does not provide any information about notation you can specify it either representing a method as an object (this is a specific instance of OpaqueBehavior) or using a note. In both cases it should be linked to a respective object describing the Behavior provided by the SQL statement as such (e.g. activity).
See 13.2.3.3 of UML Sepcification. Below is just the diagram describing this area.
Edit as a result of further research:
UML Specification describes literally your case in section 16.2 (as OpaqueAction). An example figure 16.8 in section 16.2.5.1 is exactly the case I've described as a 3rd possible answer.
Note however (as already mentioned in comments) that UML is not always the best suitable solution. While I strongly recommend modelling the system using UML, the SQL code itself should be a part of textual documentation that should be created together with the UML diagrams. It will benefit from more clarity, possibility to search, possibility to copy-paste the code etc. Also if your query is more than 2-3 short lines it might even hard to notice that it is still a part of an UML diagram.
Is there any recommended/established way to explain a working of SQL query?
We have quite a few complex queries in our project and we need to document it using UML or any other modelling language.
Please note, I am not asking about Data Modelling. It's more about documenting SQL logic i.e how tables are connected, how conditions are affecting the outcome, joins etc.
So, finally:
What are the recommended ways?
Any example?
Good question.
I don't have one answer, it depends on the context - how complex is the actual query, who is the audience (how proficient they are with different notations), if I will be able to present this face-to-face or is it for off-line docs, etc. Some suggestions are:
For various audience, use source-to-target mapping in a form of a spreadsheet table:
Source Column | Target Column | Transformation logic (rules + SQL)
Note that this also covers part of the data model, doesn't have to be very formal (but can be), and you can decide about the proper level of detail (capture only inputs and outputs of the whole data processing or document every step i.e. each subquery).
For other developers (who are proficient with SQL) - write well formatted and documented SQL code, with extensive use of indentation to indicate different nesting levels of sub-sub-sub-queries.
For other developers or DBAs - many mature RDBMS have a functionality of generating an explain plan (or even visual explain plan) - such output is often very helpful and carries additional information (actual execution plan with estimated cost of each step).
For academics ;-) - use relational algebra notation and draw a tree to picture query logic (example: https://people.ok.ubc.ca/rlawrenc/teaching/304/Labs/Lab1/, check Wikipedia for symbols used to represent specific operations).
Sometimes a specific tool is used to build / define data processing (for instance, Informatica or MS SSIS used to build parts of ETL logic or SAS Enterprise Guide which has a graphical interface to represent code executed) and such flow is visualized in the tool itself, however I'm not convinced that it is as expressive as SQL and can show some really complex queries well.
QBE - Query-By-Example invented by Moshé M. Zloof (available in MS Access for instance), is kind of a graphical tool / language used to define a query. But it has its limitations, too (not every query can be represented this way).
I'm having a problem with my students using multi-valued fields in access and getting confused about normalisation as a result.
Here is what I can make out. Given a 1-to-many relationship, e.g.
Articles Comments
-------- --------
artID{PK} commID{PK}
text text
artID{FK}
Access makes it possible to store this information into what appears to be one table, something like
Articles
--------
artID{PK}
text
comment
+ value
"value" referring to multiple comment values for the comment "column", which access actually stores as a separate table. The specifics of how the values are stored - table, its PK and FK - is completely hidden, but it is possible to query the multi-valued field, e.g. in the example above with the query
INSERT INTO article( [comment].Value )
VALUES ('thank you')
WHERE artID = 1;
But the query doesn't quite reveal the underlying structure of the hidden table implementing the multi-valued field.
Given this (disaster, in my view) - my problem is how to help newcomers to database design and normalisation understand what Access is offering them, why it may not be helpful, and that it is not a reason to ignore the basics of the relational model. More specifically:
Are there better ways, besides queries as above, to reveal the structure behind multi-valued fields?
Are there good examples of where the multi-valued field is not good enough, and shows the advantage of normalising explicitly?
Are there straightforward ways to obtain the multi-select visual output of Access multi-values, but based on separate, explicit tables?
Thanks!
I cannot give you advice in using this feature, because I never used it; however, I can give you reasons not to use it.
I want to have full control on what I'm doing. This is not the case for multi-valued fields, therefore I don't use them.
This feature is not expandable. What if you want to add a date field to your comments, for instance?
It is sometimes necessary to upsize an Access (backend) database to a "big" database (SQL Server, Oracle). These Databases don't offer such a feature. It is often the customer who decides which database has to be used. Recently I had to migrate an Access application (frontend) using an Oracle backend to a SQL-Server backend because my client decided to drop his Oracle server. Therefore it is a good idea to restrict yourself to use only common features.
For common tasks like editing lookup tables I created generic forms. My existing solutions will not work with multi-valued fields.
I have a (self-made) tool that synchronizes changes in the structure of the database on my developer’s site with the database on the client’s site. This tool cannot deal with multi-valued fields.
I have tools for the security management that can grant SELECT, INSERT, UPDATE and DELETE rights on tables or revoke them. Again, the management tool does not work with multi-valued fields.
Having a separate table for the comments allows you to quickly inspect all the comments (by opening the table). You cannot do this with multi-valued fields.
You will not see the 1 to n relation between the articles and the comments in a database diagram.
With a separate table you can choose whether you want to cascade deletes to the details table or not. If you don't, you will not be able to delete an article as long as there are comments attached to it. This can be desirable, if you want to protect the comments from being deleted inadvertently.
It is important to realize the difference between physical and logical relationships. Today the whole internet and web services (SOAP) quite much realizes on a data format that is multi-value in nature.
When you represent multi-value data with a relational database (such as Access), then behind the scenes you are using a traditional (and legitimate) relation. I cannot stress that as such, then the use of multi-value columns in Access is in fact a LEGITIMATE relational model.
The fact that table is not exposed does not negate this issue. In fact, if you represent an invoice (master record, and repeating details) as a XML data cube, then we see two things:
1) you can build and represent that invoice with a relational database like Access
2) such a relational data model that is normalized can ALSO be represented as a SINGLE xml string.
3) deleting the XML record (or string) means that cascade delete of the child rows (invoice details) MUST occur.
So while it is true that Multi-Value fields been added to Access to deal with SharePoint, it is MOST important to realize that such data can be mapped to a relational database (if you could not do this, then Access could not consume that XML data using relational database tables as ACCESS CURRENTLY DOES RIGHT NOW).
And with the web such as XML, and SharePoint then the need to consume and manage and utilize such data is not only widespread, but is in fact a basic staple of the internet.
As more and more data becomes of a complex nature, we find the requirement for multi-value data exploding in use. Anyone who used that so called "fad" the internet is thus relying and using data that is in fact VERY OFTEN XML and is multi-value (complex) in nature.
As long as the logical (not physical) relational data model is kept, then use of multi-value columns to represent such data is possible and this is exactly what Access is doing (it is mapping the relational data model to a complex model). Note that the complex (xml) data model does NOT necessary have to be relational in nature. However, if you ARE going to map such data to Access then the complex multi-value model MUST CONFORM TO A RELATIONAL data model.
This is EXACTLY what is occurring in Access.
The fact that such a correct and legitimate math relational model is not exposed is of little issue here. Are we to suggest that because Excel does not expose the binary codes used then users will never learn about computers? Or perhaps we all must program in assembler so we all correctly learn how computers works.
At the end of the day, who cares and why does this matter? The fact that people drive automatic cars today does not toss out the concept that they are using different gears to operate that car. The idea that we shut down all of society because someone is going to drive an automatic car or in this case use complex data would be galactic stupid on our part.
So keep in mind that extensions to SQL do exist in Access to query the multi-value data, but as well pointed out here those underlying tables are not exposed. However, as noted, exposing such tables would STILL REQUIRE one to not change or mess with cascade delete since that feature is required TO MAINTAIN A INTERSECTION OF FEATURES and a CORRECT MATH relational model between the complex data model (xml) and that of using two related tables to represent such data.
In other words, you can use related tables to represent the complex data model IF YOU REMOVE the ability of users to play with the referential integrity options. The RI options MUST remain as set in those hidden tables else such data will not be able to make the trip BACK to the XML or complex data model of which it was consumed from.
As noted, in regards to users being taught how gasoline reacts with oxygen for that of learning to drive a car, or using a word processor and being forced to learn a relational model and expose the underlying tables makes little sense here.
However, the points made here in regards to such tables being exposed are legitimate concerns.
The REAL problem is SQL server and Oracle etc. cannot consume or represent that complex data WHILE ACCESS CAN CONSUME such data.
As noted, the complex data ship has LONG ago sailed! XML, soap, and the basic technologies of the internet are based on this complex data model.
In effect, SQL server, Oracle and most databases cannot that consume this multi-value data represent it without users having to create and model such data in a relational fashion is a BIG shortcoming of SQL server etc.
Access stands alone in this ability to consume this data.
So, for anyone who used a smartphone, iPad or the web, you are using basic technologies that are built around using complex data, something that Access now allows.
It is likely that the rest of the industry will have to follow suit given that more and more data is complex in nature. If the database industry does not change, then the mainstream traditional relational database system will NOT be the resting place of such data.
A trend away from storing data in related tables is occurring at a rapid pace right now and products like SharePoint, or even Google docs is proof of this concept. So Access is only reacting to market pressures and it is likely that other database vendors will have to follow suit or simply give up on being part of the "fad" called the internet.
XML and complex data structures are STAPLE and fact of our industry right now – this is not an issue we all should run away from, but in fact embrace.
Albert D. Kallal (Access MVP)
Edmonton, Alberta Canada
kallal#msn.com
The technical discussion is interesting. I think the real problem lies in student understanding. Because it is available in Access students will use it, and initially it will probably provide a simple solution to some design problems. The negatives will occur later when they try and use the data. Maybe a simple example demonstrating the problems would persuade some students to avoid using multi-valued fields ? Maybe an example of storing the data in another, more usable format would help ?
Good luck !
Peter Bullard
MS Access does a great job of simplifying database management and abstracting out a lot of complexity. This however makes the learning of dbms concepts a bit difficult. Have you tried using other 'standard' dbms tools like MySQL (or even sqlite). From a learning perspective they may be better.
I know this post is old. But, it's not quite the same as every other post I've seen on this topic. This one has someone making a good case for using Multi Valued Fields...
As someone who is trying who is still trying very hard to get their head around Access, I find the discussion for and against using the Multi Valued Fields incredibly frustrating.
I'm trying to sort through it all, but if everyone is so against them, what is an alternative method? It seems that in every search result I find everyone is either telling you how to use Multi Valued Fields and Controls or telling you how horrible and what a mistake they are. Many people refer to an alternative to them, but nobody says "Here's an example". I'm here to learn about these things. And while I know that this is a simpler concept for a lot of people in these forums, I could really use some examples to take a look at.
I'm at a point where I have to decide which way to go. It would be wonderful to compare examples of using Multi Valued Fields and alternatives and using a control to select multiple values.
Or am I wrong and the functionality of a combobox where you can select multiple items is only available through Access?
I want to address the last of your questions first. There is a way of providing a visual presentation of a parent child relationship. It's called subforms. If you get help about subforms in Access, it will explain the concept.
I have used subforms in a project where I wanted to display the transaction header in a form and the transaction details in a subform. There is nothing to hinder this construct even when the data is stored in two normalized tables.
Of course, this affects the screen, not the database. That's the whole point. Normalization is relevant to storage and retrieval, not to other uses of data.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
This post was edited and submitted for review 12 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
(Note: I realize this is close to How do you document your database structure? , but I don't think it's identical.)
I've started work at a place with a database with literally hundreds of tables and views, all with cryptic names with very few vowels, and no documentation. They also don't allow gratuitous changes to the database schema, nor can I touch any database except the test one on my own machine (which gets blown away and recreated regularly), so I can't add comments that would help anybody.
I tried using "Toad" to create an ER diagram, but after leaving it running for 48 hours straight it still hadn't produced anything visible and I needed my computer back. I was talking to some other recent hires and we all suggested that whenever we've puzzled out what a particular table or what some of its columns means, we should update it in the developers wiki.
So what's a good way to do this? Just list tables/views and their columns and fill them in as we go? The basic tools I've got to hand are Toad, Oracle's "SQL Developer", MS Office, and Visio.
In my experience, ER (or UML) diagrams aren't the most useful artifact - with a large number of tables, diagrams (especially reverse engineered ones) are often a big convoluted mess that nobody learns anything from.
For my money, some good human-readable documentation (perhaps supplemented with diagrams of smaller portions of the system) will give you the most mileage. This will include, for each table:
Descriptions of what the table means and how it's functionally used (in the UI, etc.)
Descriptions of what each attribute means, if it isn't obvious
Explanations of the relationships (foreign keys) from this table to others, and vice-versa
Explanations of additional constraints and / or triggers
Additional explanation of major views & procs that touch the table, if they're not well documented already
With all of the above, don't document for the sake of documenting - documentation that restates the obvious just gets in people's way. Instead, focus on the stuff that confused you at first, and spend a few minutes writing really clear, concise explanations. That'll help you think it through, and it'll massively help other developers who run into these tables for the first time.
As others have mentioned, there are a wide variety of tools to help you manage this, like Enterprise Architect, Red Gate SQL Doc, and the built-in tools from various vendors. But while tool support is helpful (and even critical, in bigger databases), doing the hard work of understanding and explaining the conceptual model of the database is the real win. From that perspective, you can even do it in a text file (though doing it in Wiki form would allow several people to collaborate on adding to that documentation incrementally - so, every time someone figures out something, they can add it to the growing body of documentation instantly).
One thing to consider is the COMMENT facility built into the DBMS. If you put comments on all of the tables and all of the columns in the DBMS itself, then your documentation will be inside the database system.
Using the COMMENT facility does not make any changes to the schema itself, it only adds data to the USER_TAB_COMMENTS catalog table.
In our team we came to useful approach to documenting legacy large Oracle and SQL Server databases. We use Dataedo for documenting database schema elements (data dictionary) and creating ERD diagrams. Dataedo comes with documentation repository so all your team can work on documenting and reading recent documentation online. And you don’t need to interfere with database (Oracle comments or SQL Server MS_Description).
First you import schema (all tables, views, stored procedures and functions – with triggers, foreign keys etc.). Then you define logical domains/modules and group all objects (drag & drop) into them to be able to analyze and work on smaller chunks of database. For each module you create an ERD diagram and write top level description. Then, as you discover meaning of tables and views write a short description for each. Do the same for each column. Dataedo enables you to add meaningful title for each object and column – it’s useful if object names are vague or invalid. Pro version enables you to describe foreign keys, unique keys/constraints and triggers – which is useful but not essential to understand a database.
You can access documentation through UI or you can export it to PDF or interactive HTML (the latter is available only in Pro version).
Described here is a continuous process rather than one time job. If your database changes (eg. new columns, views) you should sync your documentation on regular basis (couple clicks with Dataedo).
See sample documentation:
http://dataedo.com/download/Dataedo%20repository.pdf
Some guidelines on documentation process:
Diagrams:
Keep your diagrams small and readable – just include important tables, relations and columns – only the one that have any meaning to understand big picture – primary/business keys, important attributes and relations,
Use different color for key tables in a diagram,
You can have more than one diagram per module,
You can add diagram to description of most important tables/with most relations.
Descriptions:
Don’t document the obvious – don’t write description “Document date” for document.date column. If there’s nothing meaningful to add just leave it blank,
If objects stored in tables have types or statuses it’s good to list them in general description of a table,
Define format that is expected, eg. “mm/dd/yy” for a date that is stored in text field,
List all known/important values an it’s meaning, e.g. for status column could be something like this: “Document status: A – Active, C – Cancelled, D – Deleted”,
If there’s any API to a table – a view that should be used to read data and function/procedures to insert/update data – list it in the description of table,
Describe where does rows/columns’ values come from (procedure, form, interface etc.) ,
Use “[deprecated]” mark (or similar) for columns that should not be used (title column is useful for this, explain which field should be used instead in description field).
We use Enterprise Architect for our DB definitions. We include stored procedures, triggers, and all table definitions defined in UML. The three brilliant features of the program are:
Import UML Diagrams from an ODBC Connection.
Generate SQL Scripts (DDL) for the entire DB at once
Generate Custom Templated Documentation of your DB.
You can edit your class / table definitions within the UML tool, and generate a fully descriptive with pictures included document. The autogenerated document can be in multiple formats including MSWord. We have just less than 100 tables in our schema, and it's quite managable.
I've never been more impressed with any other tool in my 10+ years as a developer. EA supports Oracle, MySQL, SQL Server (multiple versions), PostGreSQL, Interbase, DB2, and Access in one fell swoop. Any time I've had problems, their forums have answered my problems promptly. Highly recommended!!
When DB changes come in, we make then in EA, generate the SQL, and check it into our version control (svn). We use Hudson for building, and it auto-builds the database from scripts when it sees you've modified the checked-in sql.
(Mostly stolen from another answer of mine)
This answer extends Kieveli's above, which I upvoted. If your version of EA supports Object Role Modeling (conceptual design, vs. logical design = ERD), reverse engineer to that and then fill out the model with the expressive richness it gives you.
The cheap and lighter-weight option is to download Visiomodeler for free from MS, and do the same with that.
The ORM (call it ORMDB) is the only tool I've ever found that supports and encourages database design conversations with non-IS stakeholders about BL objects and relationships.
Reality check - on the way to generating your DDL, it passes through a full-stop ERD phase where you can satisfy your questions about whether it does anything screwy. It doesn't. It will probably show you weaknesses in the ERD you designed yourself.
ORMDB is a classic case of the principle that the more conceptual the tool, the smaller the market. Girls just want to have fun, and programmers just want to code.
A wiki solution supports hyperlinks and collaborative editing, but a wiki is only as good as the people who keep it organized and up to date. You need someone to take ownership of the document project, regardless of what tool you use. That person may involve other knowledgeable people to fill in the details, but one person should be responsible for organizing the information.
If you can't use a tool to generate an ERD by reverse engineering, you'll have to design one by hand using TOAD or VISIO.
Any ERD with hundreds of objects is probably useless as a guide for developers, because it'll be unreadable with so many boxes and lines. In a database with so many objects, it's likely that there are "sub-systems" of a few dozen tables and views each. So you should make custom diagrams of these sub-systems, instead of expecting a tool to do it for you.
You can also design a pseudo-ERD, where groups of tables are represented by a single object in one diagram, and that group is expanded in another diagram.
A single ERD or set of ERD's are not sufficient to document a system of this complexity, any more than a class diagram would be adequate to document an OO system. You'll have to write a document, using the ERD's as illustrations. You need text descriptions of the meaning and use of each table, each column, and the relationships between tables (especially where such relationships are implicit instead of represented by referential integrity constraints).
All of this is a lot of work, but it will be worth it. If there's a clear and up-to-date place where the schema is documented, the whole team will benefit from it.
Since you have the luxury of working with fellow developers that are in the same boat, I would suggest asking them what they feel would convey the needed information, most easily. My company has over 100 tables, and my boss gave me an ERD for a specific set tables that all connect. So also, you might want to try breaking 1 massive ERD into a bunch of smaller, manageable, ERDs.
Well, a picture tells a thousand words so I would recommend creating ER diagrams where you can view the relationship between tables at a glance, something that is hard to do with a text-only description.
You don't have to do the whole database in one diagram, break it up into sections. We use Visual Paradigm at work but EA is a good alternative as is ERWIN, and no doubt there are lots of others that are just as good.
If you have the patience, then using html to document the tables and columns makes your documentation easier to access.
If describing your databases to your end users is your primary goal Ooluk Data Dictionary Manager can prove useful. It is a web-based multi-user software that allows you to attach descriptions to tables and columns and allows full text searches on those descriptions. It also allows you to logically group tables using labels and browse tables using those labels. Tables as well as columns can be tagged to find similar data items across your database/databases.
The software allows you to import metadata information such as table name, column name, column data type, foreign keys into its internal repository using an API. Support for JDBC data sources comes built-in and can be extended further as the API source is distributed under ASL 2.0. It is coded to read the COMMENTS/REMARKS from many RDBMSs.You can always manually override the imported information. The information you can store about tables and columns can be extended using custom fields.
The Data Dictionary Manager uses the "data object" and "attribute" terminology instead of table and column because it isn't designed specifically for relational databases.
Notes
If describing technical aspects of your database such as triggers,
indexes, statistics is important this software isn't the best option.
It is however possible to combine a technical solution with this
software using hyperlink custom fields.
The software doesn't produce an ERD
Disclosure: I work at the company that develops this product.
Martin Fowler defines an elegant object model for the scheduling of recurring tasks here, which maps to OO code very nicely. Mapping this to a relational database schema for persistence, however, is tricky.
Can anyone suggest a schema + SQL combination that encapsulates all the functionality he describes, particularly in the image on page 11. Intersects and Unions are fairly obvious - the complexity lies in representing the 'Temporal Expressions', which take variable parameters and must be interpreted differently, and then combining those into a 'Temporal Set'.
To be clear, there are many ways to represent the concept of recurring events in relational databases. What I'd like everyone's input on is how to map this particular model.
Some possible options:
'Meta' tables that define number of, and use of arguments. Ugly, but probably works. However, there is only likely to be a limited number of 'Temporal Expression' forms, so the extreme flexibility this offers is probably too much.
Some form of table inheritance, as supported by Postgres (and presumably, other) RBMS.
Serialising the parameter list and storing the result in a varchar() is not a solution as that method prevents set-based queries :)
I'm afraid this answer will be a lot of references and very little practical code, and it has been a while since I last messed with this, but...
I think the two technologies you want to mix here are 'active databases' and 'temporal databases'.
The first would be useful for evaluating the rules and so on, and the second is useful to store temporal data and evaluate at when a certain record is valid. Both of these are pretty large research areas, but you can do most of the temporal stuff in plain SQL (provided your database has good time support). The active part is harder in SQL, but PostgreSQL at least has rules to help slightly with this. I don't know about the others databases, but most of them has rule/trigger/constraint support that would be able to translate to what you are looking for.
Active databases are databases that can react to changes in the facts that it stores using rules. These rules are specified in implementation specific languages, but for every day discussion Event-Condition-Action rules (ECA Rules) are common. For an introduction to active database systems read the articles The Active Database Management System Manifesto and Active Database Systems. For some more information on ECA rules, check out Logical Events and ECA Rules (the pages are in reverse order o_0) and Events in an Active Object-Oriented Database System.
Events processing is a special case of the rule handling dealing with how to handle composite events and trigger their actions appropriately. An interesting read regarding this is Composite Events for Active Databases: Semantics, Contexts and Detection and Anatomy of a Composite Event Detector. Also see the Complex Event Processing site and the Event Stream Processing and Complex Event Processing wikipedia articles.
Temporal databases can be seen as a database that can understand time, and in particular two specific kinds of time; valid-time and transaction-time. The valid-time of a record is the time period during which that record is valid, and the transaction-time of a record is the time during which it is present in the database. As a good practical introduction I'd recommand the book on how to do temporal databases in SQL: Developing Time-Oriented Database Applications in SQL by Richard T. Snodgrass.
Oterhwise, everything you might possibly want to know about temporal databases can be read in Temporal Database Entries for the Springer Encyclopedia of Database Systems which is a pretty comprehensive document (I would start at the 'Temporal Database' entry), but to get started a bit quicker, check out the Temporal Database Glossary which is rather easier to browse and read, and the site Time Center whose publications part has (or did have...) links to most notable publications in the area.
So, now that you know all about this you see quickly that the image on page 11 can be expressed as a composite event, and can be detected/evaluated as such provided you have implemented the proper required subset of a composite event detector, and the rest could be expressed as a entries in tables with temporal aspects :)
Martin Fowler addresses much of this himself in his Patterns for things that change with time that summarizes many patterns that deals with time.
In the end, I would probably create a database schema for the temporal information and either use the DB rules for the active parts or implement that part in the application (there be dragons though). If you use PostgreSQL, the rule mechanisms are described in The Rule System part of the docs.
Much to read, but if you thoroughly understand all this your professional net worth can go up quite a bit :)
Also, good terms to google are 'temporal database', 'active database', 'ECA Rule'.
SQL is a language for querying sets of data. It doesn't easily support encoding of domain-specific logic operations. In other words, "rule to be evaluated" is not a data type in SQL. That's an object-oriented concept, that both data and logic are components of an object instance.
So I would say the most you could do within the SQL paradigm would be to store 365 rows, corresponding to the days of the year, and a true/false value for whether the respective day satisfies the criteria of the recurring schedule. So you have to use OO logic implementing Fowler's model to make the calculation, and store the resulting 365 rows.
Then when you need to know "is today (or any given date) part of the schedule?" it's very easy to look up the appropriate row and check the true/false column. Storing 365 rows per year is trivial for any database.
This may seem like cheating, but like I said, SQL is about sets of data, not logic.