Database Table Design Issues - sql

I am new to DB Design and I've recently inherited the responsibility of adding some new attributes to an existing design.
Below is a sample of the current table in question:
Submission Table:
ID (int)
Subject (text)
Processed (bit)
SubmissionDate (datetime)
Submitted (bit)
...
The new requirements are:
A Submission can be marked as valid or invalid
A Reason must be provided when a Submission is marked as invalid. (So a submission may have an InvalidReason)
Submissions can be associated with one another such that: Multiple valid Submissions can be set as "replacements" for an invalid Submission.
So I've currently taken the easy solution and simply added new attributes directly to the Submission Table so that it looks like this:
NEW Submission Table:
ID (int)
Subject (text)
Processed (bit)
SubmissionDate (datetime)
Submitted (bit)
...
IsValid (bit)
InvalidReason (text)
ReplacedSubmissionID (int)
Everything works fine this way, but it just seems a little strange:
Having InvalidReason as a column that will be NULL for majority of submissions.
Having ReplacedSubmissionID as a column that will be NULL for majority of submissions.
If I understand normalization right, InvalidReason might be transitively dependent on the IsValid bit.
It just seems like somehow some of these attributes should be extracted to a separate table, but I don't see how to create that design with these requirements.
Is this single table design okay? Anyone have better alternative ideas?

Whether or not you should have a single table design really depends on
1) How you will be querying the data
2) How much data would end up being potentially NULL in the resulting table.
In your case its probably ok, but again it depends on #1. If you will be querying separately to get information on invalid submissions, you may want to create a separate table that references the Id of invalid submissions and the reason:
New table: InvalidSubmissionInfo
Id (int) (of invalid submissions; will have FK contraint on Submission table)
InvalidReason (string)
Additionally if you will be querying for replaced submissions separately you may want to have a table just for those:
New table: ReplacementSubmissions
Id (int) (of the replacement submissions; will have FK contraint on Submission table)
ReplacedSubmissionId (int) (of what got replaced; will have FK constraint on submission table)
To get the rest of the info you will still have to join with the Submissions table.
All this to say you do not need separate this out into multiple tables. Having a NULL value only takes up 1 bit of memory which isn't bad. And if you need to query and return an entire Submission record each time, it makes more sense to condense this info into one table.

Single table design looks good to me and it should work in your case.
If you do not like NULLS, you can give default value of an empty string and ReplacedSubmissionID to 0. Default values are always preferable in database design.
Having an empty string or default value will make your data look more cleaner.
Please remember if you add default values, you might need to change queries to get proper results.
For example:-
Getting submissions which have not been replaced>
Select * from tblSubmission where ReplacedSubmissionID = 0

Don't fear joins. Looking for ways to place everything in a single table is at best a complete waste of time, at worst results in a convoluted, unmaintainable mess.
You are correct about InvalidReason and IsValid. However, you missed SubmittedDate and Submitted.
Whenever modeling an entity that will be processed in some way and going through consecutive state changes, these states really should be placed in a separate table. Any information concerning the state change -- date, reason for the change, authorization, etc. -- will have a functional dependency on the state rather than the entity as a whole, therefore an attempt to make the state information part of the entity tuple will fail at 2nf.
The problem this causes is shown in your very question. You already incorporated Submitted and SubmittedDate into the tuple. Now you have another state you want to add. If you had normalized the submission data, you could have simply added another state and gone on.
create table StateDefs(
ID int auto_generated primary key,
Name varchar( 16 ) not null, -- 'Submitted', 'Processed', 'Rejected', etc.
... -- any other data concerning states
);
create table Submissions(
ID int auto_generated primary key,
Subject varchar( 128 ) not null,
... -- other data
);
create table SubmissionStates(
SubID int not null references Submissions( ID ),
State int not null references StateDefs( ID ),
When date not null,
Description varchar( 128 )
);
This shows that a state consists of a date and an open text field to place any other information. That may suit your needs. If different states require different data, you may have to (gasp) create other state tables. Whatever your needs require.
You could insert the first state of a submission into the table and update that record at state changes. But you lose the history of state changes and that is useful information. So each state change would call for a new record each time. Reading the history of a submission would then be easy. Reading the current state would be more difficult.
But not too difficult:
select ss.*
from SubmissionStates ss
where ss.SubID = :SubID
and ss.When =(
select Max( When )
from SubmissionStates
where SubID = ss.SubID
and When <= Today() );
This finds the current row, that is, the row with the most recent date. To find the state that was in effect on a particular date, change Today() to something like :AsOf and place the date of interest in that variable. Storing the current date in that variable returns the current state so you can use the same query to find current or past data.

Related

How do you constrain a composite key that has a large number of non-unique combinations?

So given a table structure that looks something like this:
Order_date DATE
Order_id NUMBER
State VARCHAR2(16)
...
other properties/attributes
Keep in mind that I could use a sequence of integers here and generate a PK, however that does not interest me because of how I use this table in the main application.
So the composite key is made of Order_date, Order_id and State. The problem with this combination is that it's not necessary to be unique, but it is constrained in a way.
Ex:
Order_date | Order_id | State
21-09-2014 7218821 Pending
22-09-2014 2771272 Pending
20-09-2014 3277127 Approved
13-08-2014 2218765 Done
13-08-2014 2218765 Cancelled
Constraints:
There is no way for one combination of the same order_date and
order_id and state Done to be duplicated in this
There can be any number of the same order_date and order_id with any other state than Done
You cannot add a record with state DONE or ERROR
You cannot skip from one state to another by bypassing their natural sequence (REGISTERED -> PENDING -> APPROVED -> DONE | CANCELLED | ERROR)
What whould be the best way for me to implement these constraints for a Oracle database?
The first is handled by a primary key or unique key.
The second is tricky. The second can be handled with a function-based unique key, because Oracle allows multiple values for NULL:
create unique index unq_order_date_id_done on
orders(order, order_date, order_id,
(case when state = 'DONE' then state end));
I think the third and fourth need a trigger to prevent the value from being added.
Bullet by bullet:
This is most likely true with no monitoring needed. Although you don't show it, the DATE field contains the time down to the second. In order to have a duplicate, the state for the same order will have to be changed twice within the same second.
Doubtful. Unless your processing allows for multiple state changes for the same order within a second of each other.
Your example data shows a state of DONE. How did that get there?
Your description states that after APPROVED, the only allowed states are DONE or CANCELED or ERROR. Your example data shows an order going from DONE to CANCELED. This does not seem to be allowed. Actually, your second bullet suggests that a status of ERROR is not allowed under any circumstances.
The only way you can have duplicated (order, date) values is if status changes occur very quickly -- within the same second. OR...you truncate the time values from the date fields. This doesn't seem likely as there is no reason to be discarding such valuable information as the time a status change was recorded. You get no benefit and processing becomes more difficult: lose/lose.

Is it better to have int joins instead of string columns?

Let's say I have a User which has a status and the user's status can be 'active', 'suspended' or 'inactive'.
Now, when creating the database, I was wondering... would it be better to have a column with the string value (with an enum type, or rule applied) so it's easier to both query and know the current user status or are joins better and I should join in a UserStatuses table which contains the possible user statuses?
Assuming, of course statuses can not be created by the application user.
Edit: Some clarification
I would NOT use string joins, it would be a int join to UserStatuses PK
My primary concern is performance wise
The possible status ARE STATIC and will NEVER change
On most systems it makes little or no difference to performance. Personally I'd use a short string for clarity and join that to a table with more detail as you suggest.
create table intLookup
(
pk integer primary key,
value varchar(20) not null
)
insert into intLookup (pk, value) values
(1,'value 1'),
(2,'value 2'),
(3,'value 3'),
(4,'value 4')
create table stringLookup
(
pk varchar(4) primary key,
value varchar(20) not null
)
insert into stringLookup (pk, value) values
(1,'value 1'),
(2,'value 2'),
(3,'value 3'),
(4,'value 4')
create table masterData
(
stuff varchar(50),
fkInt integer references intLookup(pk),
fkString varchar(4)references stringLookup(pk)
)
create index i on masterData(fkInt)
create index s on masterData(fkString)
insert into masterData
(stuff, fkInt, fkString)
select COLUMN_NAME, (ORDINAL_POSITION %4)+1,(ORDINAL_POSITION %4)+1 from INFORMATION_SCHEMA.COLUMNS
go 1000
This results in 300K rows.
select
*
from masterData m inner join intLookup i on m.fkInt=i.pk
select
*
from masterData m inner join stringLookup s on m.fkString=s.pk
On my system (SQL Server)
- the query plans, I/O and CPU are identical
- execution times are identical.
- The lookup table is read and processed once (in either query)
There is NO difference using an int or a string.
I think, as a whole, everyone has hit on important components of the answer to your question. However, they all have good points which should be taken together, rather than separately.
As logixologist mentioned, a healthy amount of Normalization is generally considered to increase performance. However, in contrast to logixologist, I think your situation is the perfect time for normalization. Your problem seems to be one of normalization. In this case, using a numeric key as Santhosh suggested which then leads back to a code table containing the decodes for the statuses will result in less data being stored per record. This difference wouldn't show in a small Access database, but it would likely show in a table with millions of records, each with a status.
As David Aldridge suggested, you might find that normalizing this particular data point will result in a more controlled end-user experience. Normalizing the status field will also allow you to edit the status flag at a later date in one location and have that change perpetuated throughout the database. If your boss is like mine, then you might have to change the Status of Inactive to Closed (and then back again next week!), which would be more work if the status field was not normalized. By normalizing, it's also easier to enforce referential integrity. If a status key is not in the Status code table, then it can't be added to your main table.
If you're concerned about the performance when querying in the future, then there are some different things to consider. To pull back status, if it's normalized, you'll be adding a join to your query. That join will probably not hurt you in any sized recordset but I believe it will help in larger recordsets by limiting the amount of raw text that must be handled. If your primary concern is performance when querying the data, here's a great resource on how to optimize queries: http://www.sql-server-performance.com/2007/t-sql-where/ and I think you'll find that a lot of the rules discussed here will also apply to any inclusion criteria you enforce in the join itself.
Hope this helps!
Christopher
The whole idea behind normalization is to keep the data from repeating (well at least one of the concepts).
In this case there is only 1 status a user at one time (I assume) can have so their is no reason to put it in its own table. You would simply complicate things. The only reason you would have a seperate table is if for some reason these statuses were not static. Meaning next month you may add "Sort of Active" and "Maybe Inactive". This would mean changing code to make up for that if you didnt put them in their own table. You could create a maintenace page where users could add statuses and then that would require you to create a seperate table.
An issue to consider is whether these status values have attributes of their own.
For example, perhaps you would want to have a default sort order that is different from the alphabetical order of the status text. You might also want to treat two of the statuses in a particular way that you do not treat the other, and that could be an attribute.
If you have a need for that, or suspect a future need for that, then move the status text to a different table and use an integer key value for them.
I would suggest using Integer values like 0, 1, 2. If this is fixed. When interpreting the results in Reports we can change these status back to strings.

Multiple Wildcard Counts in Same Query

One of my job functions is being responsible for mining and marketing on a large newsletter subscription database. Each one of my newsletters has four columns (newsletter_status, newsletter_datejoined, newsletter_dateunsub, and newsletter_unsubmid).
In addition to these columns, I also have a master unsub column that our customer service dept. can update to accomodate irate subscribers who wish to be removed from all our mailings, and another column that gets updated if a hard bounce (or a set number of soft bounces) occurs called emailaddress_status.
When I pull a count for current valid subscribers for one list I use the following syntax:
select count (*) from subscriber_db
WHERE (emailaddress_status = 'VALID' OR emailaddress_status IS NULL)
AND newsletter_status = 'Y'
and unsub = 'N' and newsletter_datejoined >= '2013-01-01';
What I'd like to have is one query that looks for all columns with %_status, with the aforementioned criteria ordered by current count size.
I'd like for it to look like this:
etc.
I've search around the web for months looking for something similar, but other than running them in a terminal and exporting the results I've not been able to successfully get them all in one query.
I'm running PostgreSQL 9.2.3.
A proper test case would be each aggregate total matching the counts I get when running the individual queries.
Here's my obsfucated table definition for ordinal placement, column_type, char_limit, and is_nullable.
Your schema is absolutely horrifying:
24 ***_status text YES
25 ***_status text YES
26 ***_status text YES
27 ***_status text YES
28 ***_status text YES
29 ***_status text YES
where I presume the masked *** is something like the name of a publication/newsletter/etc.
You need to read about data normalization or you're going to have a problem that keeps on growing until you hit PostgreSQL's row-size limit.
Since each item of interest is in a different column the only way to solve this with your existing schema is to write dynamic SQL using PL/PgSQL's EXECUTE format(...) USING .... You might consider this as an interim option only, but it's a bit like using a pile driver to jam the square peg into the round hole because a hammer wasn't big enough.
There are no column name wildcards in SQL, like *_status or %_status. Columns are a fixed component of the row, with different types and meanings. Whenever you find yourself wishing for something like this it's a sign that your design needs to be re-thought.
I'm not going to write an example since (a) this is an email marketing company and (b) the "obfuscated" schema is completely unusable for any kind of testing without lots of manual work re-writing it. (In future, please provide CREATE TABLE and INSERT statements for your dummy data, or better yet, a http://sqlfiddle.com/). You'll find lots of examples of dynamic SQL in PL/PgSQL - and warnings about how to avoid the resulting SQL injection risks by proper use of format - with a quick search of Stack Overflow. I've written a bunch in the past.
Please, for your sanity and the sanity of whoever else needs to work on this system, normalize your schema.
You can create a view over the normalized tables to present the old structure, giving you time to adapt your applications. With a bit more work you can even define a DO INSTEAD view trigger (newer Pg versions) or RULE (older Pg versions) to make the view updateable and insertable, so your app can't even tell that anything has changed - though this comes at a performance cost so it's better to adapt the app if possible.
Start with something like this:
CREATE TABLE subscriber (
id serial primary key,
email_address text not null,
-- please read http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
-- for why I merged "fname" and "lname" into one field:
realname text,
-- Store birth month/year as a "date" with a "CHECK" constraint forcing it to be the 1st day
-- of the month. Much easier to work with.
birthmonth date,
CONSTRAINT birthmonth_must_be_day_1 CHECK ( extract(day from birthmonth) = 1),
postcode text,
-- Congratulations! You made "gender" a "text" field to start with, you avoided
-- one of the most common mistakes in schema design, the boolean/binary gender
-- field!
gender text,
-- What's MSO? Should have a COMMENT ON...
mso text,
source text,
-- Maintain these with a trigger. If you want modified to update when any child record
-- changes you can do that with triggers on subscription and reducedfreq_subscription.
created_on timestamp not null default current_timestamp,
last_modified timestamp not null,
-- Use the native PostgreSQL UUID type, after running CREATE EXTENSION "uuid-ossp";
uuid uuid not null,
uuid2 uuid not null,
brand text,
-- etc etc
);
CREATE TABLE reducedfreq_subscription (
id serial primary key,
subscriber_id integer not null references subscriber(id),
-- Suspect this was just a boolean stored as text in your schema, in which case
-- delete it.
reducedfreqsub text,
reducedfreqpref text,
-- plural, might be a comma list? Should be in sub-table ("join table")
-- if so, but without sample data can only guess.
reducedfreqtopics text,
-- date can be NOT NULL since the row won't exist unless they joined
reducedfreq_datejoined date not null,
reducedfreq_dateunsub date
);
CREATE TABLE subscription (
id serial primary key,
subscriber_id integer not null references subscriber(id),
sub_name text not null,
status text not null,
datejoined date not null,
dateunsub date
);
CREATE TABLE subscriber_activity (
last_click timestamptz,
last_open timestamptz,
last_hardbounce timestamptz,
last_softbounce timestamptz,
last_successful_mailing timestamptz
);
To call it merely "horrifying" shows a great deal of tact and kindness on your part. Thank You. :) I inherited this schema only recently (which was originally created by the folks at StrongMail).
I have a full relational DB re-arch project on my roadmap this year - the sample normalization is very much inline with what I'd been working on. Very interesting insight on realname, I hadn't really thought about that. I suppose the only reason StrongMail had it broken out was for first name email personalization.
MSO is multiple systems operator (cable company). We're a large lifestyle media company, and the newsletters we produce are on food, travel, homes and gardening.
I'm creating a Fiddle for this - I'm new here so going forward I'll be more mindful of what you guys need to be able to help. Thank you!

Flags in database - Sensible way of keeping track

I am trying to design my first database and I have found that I have quite a few different "flags" that I want to keep of in the database:
Active # Shows whether the item submission has been completed
Indexed # Shows whether the item has been indexed
Reminded # Shows whether the “expiring email” has been sent to the user
Error # Shows whether there is an error with the submission
Confirmation # Shows whether the confirmation email has been sent
Other than just having a Boolean field for these, is there a clever way of storing these details? I was wondering if I had these under a status group in the database with an ID for every connotation (32) and just link to that.
Unless there is some reason to do otherwise, I'd recommend simply adding those five boolean (or bit) columns to the item table.
It depends on how immutable the list of connotations is.
If there are just the five you mentioned then just add five flag columns. If the list of possible connotations could change in the future, it might be safer to have a separate table with a list of connotations that currently apply to each row in the main table, with a one-to-many relationship.
Consider:
Table: Vehicle
ID
Type
Doors
Color
Table: Type_Categories
ID
Name
Table: Types
TypeID
CategoryID
Value
DataType
This way reuse of type can occur in other places as needed. However this assumes non-boolean "Flags" if all flags are truly boolean... Id stick w/ putting them in the table. But I always hated boolean values. I preferred time stamps so I know when the flag was set not just that it was set. if the timestamp is null, then it's not been set.
In my experience the status columns frequently evolve to more than two states. So I would use a smallint for each status for convenience and simplicity.
But if your aim is to save space then you can save all the statuses in a single smallint using casts to and from bit to manipulate the statuses individually or as a whole.
create table t (status smallint);
To save 10010 then cast it to smallint:
insert into t (status) values (b'10010'::int::smallint);
List all statuses:
select status::int::bit(5) from t;
status
--------
10010
To set the 3rd status use the bitwise or:
update t set status = (status::integer::bit(5) | b'00100')::integer::smallint;
select status::int::bit(5) from t;
status
--------
10110
To unset that status use the bitwise and:
update t set status = (status::integer::bit(5) & b'11011')::integer::smallint;
select status::int::bit(5) from t;
status
--------
10010
To retrieve the lines with the 3rd status set:
select status
from t
where substring(status::integer::bit(5) from 3 for 1) = '1'
You could write functions to simplify the conversions.
If they are "just" flags, store them as boolean-type columns on the table.
I'd recommend against Clodoaldo's solution unless space is REALLY tight - see this question.
It looks like the columns you mention have "business importance" - i.e. it may not be enough to store "Indexed", but also the date on which the item was indexed. It may be necessary to limit combinations of states, or impose rules on the sequencing (you can't go to complete whilst being in error state). In that case, you may want to implement an "item_status" table to store history etc.
In this case, your schema would be something like this:
ITEM
---------
item_id
....
STATUS
---------
status_id
description
ITEM_STATUS
--------------
item_id
status_id
date
Every time an item changes status, you insert a new row into the ITEM_STATUS table; the current status is the row with the latest date for that item.

Dynamically generate criteria in SQL

I have a Users table that contains dozens of columns like date of birth, year of vehicle owned, make and model of the vehicle, color and many other personal fields unrelated to the vehicle
There's also a 2nd table called Coupons that needs to be designed in a way to support a qualification like "user qualifies if younger than 30 yrs old", "user qualifies if vehicle is greater than 10 yrs old", "user qualifies if vehicle color is green".
When a user logs in, I need to present all coupons the user qualifies for. The problem that I'm having is that the coupon qualifications could be numerous, could have qualifiers like equal, greater than or less than and may have different combinations.
My only solution at this point is to store the actual sql string within one of the coupons table columns like
select * from Users where UserId = SOME_PLACEHOLDER and VehicleYear < 10
Then I could execute the sql for each coupon row and return true or false. Seems very inefficient as I would potentially have to execute 1000s of sql statements for each coupon code.
Any insight, help is appreciated. I do have server-side code where I could potentially be able to do looping.
Thank you.
Very difficult problem. Seems like users will be added at high volume speed, with coupons at a fairly regular frequency.
Adding SQL to a table to be used dynamically is workable - at least you'll get a fresh execution plan - BUT your plan cache may balloon up.
I have a feeling that running a single coupon for all users is probably likely to be your highest performing query because it's one single set of criteria which will be fairly selective on users first and total number of coupons is small, whereas running all coupons for a single user is separate criteria for each coupon for that user. Running all coupons for all users may still perform well, even though it's effectively a cross join first - I guess it is just going to depend.
Anyway, the case for all coupons for all users (or sliced either way, really) will be something like this:
SELECT user.id, coupon.id
FROM user
INNER JOIN coupon
ON (
CASE WHEN <coupon.criteria> THEN <coupon.id> -- code generated from the coupon rules table
CASE WHEN <coupon.criteria> THEN <coupon.id> -- etc.
ELSE NULL
) = coupon.id
To generate the coupon rules, you can relatively easily do the string concatenation in a single swipe (and you can combine an individual rule lines design for a coupon with AND with a further inner template):
DECLARE #outer_template AS varchar(max) = 'SELECT user.id, coupon.id
FROM user
INNER JOIN coupon
ON (
{template}
ELSE NULL
) = coupon.id
';
DECLARE #template AS varchar(max) = 'CASE WHEN {coupon.rule} THEN {coupon.id}{crlf}';
DECLARE #coupon AS TABLE (id INT, [rule] varchar(max));
INSERT INTO #coupon VALUES
(1, 'user.Age BETWEEN 20 AND 29')
,(2, 'user.Color = ''Yellow''');
DECLARE #sql AS varchar(MAX) = REPLACE(
#outer_template
,'{template}',
REPLACE((
SELECT REPLACE(REPLACE(
#template
,'{coupon.rule}', coupon.[rule])
, '{coupon.id}', coupon.id)
FROM #coupon AS coupon
FOR XML PATH('')
), '{crlf}', CHAR(13) + CHAR(10)));
PRINT #sql;
// EXEC (#sql);
There's ways to pretty that up - play with it here: https://data.stackexchange.com/stackoverflow/q/115098/
I would consider adding computed columns (possibly persisted and indexed) to assist. For instance, age - non-persisted computed column will likely perform better than a scalar function.
I would consider batching this with a table which says whether a coupon is valid for a user and when it was last validated.
Seems like ages can change and a user can become valid or invalid for a coupon as their birthday passes.
When a user logs in you could spawn a background job to update their coupons. On subsequent logons, there won't be any need to update (since it's not likely to change until the next day or a triggering event).
Just a few ideas.
I would also add that you should have a way to test a coupon before it is approved to ensure there are no syntax errors (since the SQL is ad hoc or arbitrary) - this can be done relatively easily - perhaps a test user table (test_user as user in the generated code template instead) is required to contain pass and fail rows and the coupon rule points to those. Not only does the EXEC have to work - the rows it returns should be the expected and only the expected rows for that coupon.
This is not an easy problem. Here are some quick ideas that may help depending on your domain requirements:
Restrict the type of criteria you will be filtering on so that you can use dynamic or non-dynamic sql to execute them efficiently. For example if you are going to only have integers between a range of min and max values as a criteria then the problem becomes simpler. (You only need to know the field name, and the min max values to describe a criterian, not the full where statement.)
Create a number of views which expose the attributes in a helpful way. Then perform queries against those views -- or have those views pre-select in some way. For example, an age group view that has a field which can contain the values < 21, 21-30, 30-45, >45. Then your select just needs to return the rows from this view that match these strings.
Create a table which stores the results of running your criteria matching query (This can be run off line by a back ground process). Then for a given user check for membership by looking where in the table this user's ID exists.
Thinking about this some more I realize all my suggestions are based on one idea.
A query for an individual user will work faster overall if you first perform an SQL query against all users and cache that result in some way. If every user is reproducing queries against the whole dataset you will lose efficiency. You need some way to cache results and reuse them.
Hope this helps -- comment if these ideas are not clear.
My first thought on an approach (similar to Hogan's) would be to test for coupon applicability at the time the coupon is created. Store those results in a table (User_Coupons for example). If any user data is changed, your system would then retest any changed users for which coupons are applicable to them. At coupon creation (or change) time it would only check versus that coupon. At use creation (or change) time it would only check versus that user.
The coupon criteria should be from a known set of possible criteria and any time that you want to add a new type of criteria, it would possibly involve a code change. For example, let's say that you have a table set up similar to this:
CREATE TABLE Coupon_Criteria (
coupon_id INT NOT NULL,
age_minimum SMALLINT NULL,
age_maximum SMALLINT NULL,
vehicle_color VARCHAR(20) NULL,
...
CONSTRAINT PK_Coupon_Criteria PRIMARY KEY CLUSTERED (coupon_id)
)
If you wanted to add the ability to base a coupon on vehicle age then you would have to add a column to the table and likewise you would have to adjust your search code. You would use NULL values to indicate that the criteria is unused for that coupon.
An example query for the above table:
SELECT
CC.coupon_id
FROM
Users U
INNER JOIN Coupon_Criteria CC ON
(CC.age_maximum IS NULL OR dbo.f_GetAge(U.birthday) <= age_maximum) AND
(CC.age_minimum IS NULL OR dbo.f_GetAge(U.birthday) >= age_minimum) AND
(CC.vehicle_color IS NULL OR U.vehicle_color = CC.vehicle_color) AND
...
This can get unwieldy if the number of possible criteria gets to be very large.
Another possibility would be to save the coupon criteria in XML and have a business object for your application use that to determine eligibility. It could use the XML to generate a proper query against the User table (and any other necessary tables).
Here's another possibility. Each criteria could be given a query template which you could append to your queries. This would just involve updates to the data instead of DDL and could have good performance. It would involve dynamic SQL.
CREATE TABLE Coupons (
coupon_id INT NOT NULL,
description VARCHAR(2000) NOT NULL,
...
CONSTRAINT PK_Coupons PRIMARY KEY CLUSTERED (coupon_id)
)
CREATE TABLE Coupon_Criteria (
coupon_id INT NOT NULL,
criteria_num SMALLINT NOT NULL,
description VARCHAR(50) NOT NULL,
code_template VARCHAR(500) NOT NULL,
CONSTRAINT PK_Coupon_Criteria PRIMARY KEY CLUSTERED (coupon_id, criteria_num),
CONSTRAINT FK_Coupon_Criteria_Coupon FOREIGN KEY (coupon_id) REFERENCES Coupons (coupon_id)
)
INSERT INTO Coupons (coupon_id, description)
VALUES (1, 'Young people save $200 on yellow vehicles!')
INSERT INTO Coupon_Criteria (coupon_id, criteria_num, description, code_template)
VALUES (1, 1, 'Young people', 'dbo.Get_Age(U.birthday) <= 20')
INSERT INTO Coupon_Criteria (coupon_id, criteria_num, description, code_template)
VALUES (1, 2, 'Yellow Vehicles', U.vehicle_color = ''Yellow''')
You could then build a query by simply concatenating all of the criteria for any given coupon. The big downside to this one is that it's only one-directional. Given a coupon you can easily find who is qualified for it, but given a user you cannot find all coupons for which they are eligible except by going through all of the coupons. My guess is that the second is what you'd probably be most interested in unfortunately. Maybe this will give you some other ideas though.
For example, you could potentially have it work the other way by having a set number of criteria in a table and for the coupon/criteria linking table indicate whether or not that criteria is active. When querying you could then include that in your query. In other words, the query would look something like:
WHERE
(CC.is_active = 0 OR <code from the code column>) AND
The querying gets very complex though since you either need to join once for every possible criteria or you need to query to compare the number of active requirements for a coupon versus the number that are fulfilled. That is possible in SQL, but it's similar to working with an EAV model - which is basically what this turns into: a variation on an EAV model (yuck)