What is the preferred way of saving dynamic lists in database? - sql

In our application user can create different lists (like sharepoint) for example a user can create a list of cars (name, model, brand) and a list of students (name, dob, address, nationality), e.t.c.
Our application should be able to query on different columns of the list so we can't just serialize each row and save it in one row.
Should I create a new table at runtime for each newly created list? If this was the best solution then probably Microsoft SharePoint would have done it as well I suppose?
Should I use the following schema
Lists (Id, Name)
ListColumns (Id, ListId, Name)
ListRows (Id, ListId)
ListData(RowId, ColumnId, Value)
Though a single row will create as many rows in list data table as there are columns in the list, this just doesn't feel right.
Have you dealt with this situation? How did you handle it in database?

what you did is called EAV (Entity-Attribute-Value Model).
For a list with 3 columns and 1000 entries:
1 record in Lists
3 records in ListColumns
and 3000 Entries in ListData
This is fine. I'm not a fan of creating tables on-the-fly because it could mess up your database and you would have to "generate" your SQL queries dynamically. I would get a strange feeling when users could CREATE/DROP/ALTER Tables in my database!
Another nice feature of the EAV model is that you could merge two lists easily without droping and altering a table.
Edit:
I think you need another table called ListRows that tells you which ListData records belong together in a row!

Well I've experienced something like this before - I don't want to share the actual table schema so lets do some thought exercises using some of the suggested table structures:
Lets have a lists table containing a list of all my lists
Lets also have a columns table containing the metadata (column names)
Now we need a values table which contains the column values
We also need a rows table which contains a list of all the rows, otherwise it gets very difficult to work out how many rows there actually are
To keep things simple lets just make everything a string (VARCAHR) and have a go at coming up with some queries:
Counting all the rows in a table
SELECT COUNT(*) FROM [rows]
JOIN [lists]
ON [rows].list_id = [Lists].id
WHERE [Lists].name = 'Cars'
Hmm, not too bad, compared to:
SELECT * FROM [Cars]
Inserting a row into a table
BEGIN TRANSACTION
DECLARE #row_id INT
DECLARE #list_id INT
SELECT #list_id = id FROM [lists] WHERE name = 'Cars'
INSERT INTO [rows] (list_id) VALUES (#list_id)
SELECT #row_id = ##IDENTITY
DECLARE #column_id INT
-- === Need one of these for each column ===
SELECT #column_id = id FROM [columns]
WHERE name = 'Make'
AND list_id = #list_id
INSERT INTO [values] (column_id, row_id, value)
VALUES (#column_id, #row_id, 'Rover')
-- === Need one of these for each column ===
SELECT #column_id = id FROM [columns]
WHERE name = 'Model'
AND list_id = #list_id
INSERT INTO [values] (column_id, row_id, value)
VALUES (#column_id, #row_id, 'Metro')
COMMIT TRANSACTION
Um, starting to get a little bit hairy compared to:
INSERT INTO [Cars] ([Make], [Model}) VALUES ('Rover', 'Metro')
Simple queries
I'm now getting bored of constructing tediously complex SQL statements so maybe you can have a go at coming up with equivalent queries for the followng statements:
SELECT [Model] FROM [Cars] WHRE [Make] = 'Rover'
SELECT [Cars].[Make], [Cars].[Model], [Owners].[Name] FROM [Cars]
JOIN [Owners] ON [Owners].id = [Cars].owner_id
WHERE [Owners].Age > 50
SELECT [Cars].[Make], [Cars].[Model], [Owners].[Name] FROM [Cars]
JOIN [Owners] ON [Owners].id = [Cars].owner_id
JOIN [Addresses] ON [Addresses].id = [Owners].address_id
WHERE [Addresses].City = 'London'
I hope you are beginning to get the idea...
In short - I've experienced this before and I can assure you that creating a database inside a database in this way is definitely a Bad Thing.
If you need to do anything but the most basic querying on these lists (and literally I mean "Can I have all the items in this list please?"), you should try and find an alternative.
As long as each user pretty much has their own database I'll definitely recommend the CREATE TABLE approach. Even if they don't I'd still recommend that you at least consider it.

Perhaps a potential solution would be the creating of lists can involve CREATE TABLE statements for those entities/lists?
It sounds like the db structure or schema can change at runtime, or at the user's command, so perhaps something like this might help?
User wants to create a new list of an entity never seen before. Call it Computer.
User defines the attributes (screensize, CpuSpeed, AmountRAM, NumberOfCores)
System allows user to create in the UI
system generally lets them all be strings, unless can tell when all supplied values are indeed dates or numbers.
build the CREATE scripts, execute them against the DB.
insert the data that the user defined into that new table.
Properly coded, we're working with the requirements given: let users create new entities. There was no mention of scale here. Of course, this requires all input to be sanitized, queries parameterized, actions logged, etc.
The negative comment below doesn't actually give any good reasons, but creates a bit of FUD. I'd be interested in addressing any concerns with this potential solution. We haven't heard about scale, security, performance, or usage (internal LAN vs. internet).

You should absolutely not dynamically create tables when your users create lists. That isn't how databases are meant to work.
Your schema is correct, and the pluralization is, in my opinion, also correct, though I would remove the camel case and call them lists, list_columns, list_rows and list_data.
I would further improve upon your schema by skipping rows and columns tables, they serve no purpose. Simply have a row/column number attached to each cell, and keep things sparse: Don't bother holding empty cells in the database. You retain the ability to query/sort based on row/column, your queries will be (potentially very much) faster because the number of list_cells will be reduced, and you won't have to do any crazy joining to link your data back to its table.
Here is the complete schema:
create table lists (
id int primary key,
name varchar(25) not null
);
create table list_cells (
id int primary key,
list_id int not null references lists(id)
on delete cascade on update cascade,
row int not null,
col int not null,
data varchar(25) not null
);

It sounds like you might have Sharepoint already deployed in your environment.
Consider integrating your application with Sharepoint, and have it be your datastore. No need to recreate all the things you like about Sharepoint, when you could leverage it.
It'd take a bit of configuring, but you could call SP web services to CRUD your list data for you.
inserting list data into Sharepoint via web services
reading SP lists via web services
Sharepoint 2010 can also expose lists via OData, which would be simple to consume from any application.

Related

SQL: Inserting into a (dynamic) lookup table

Most articles about lookup tables deal with its creation, initial population and use (for looking up: id-->value).
My question is about dynamic updating (inserting new values) of the lookup table, as new data is stored in data tables.
For example, we have a table of persons, and one attribute (column) of it is city of residency. Many persons would have the same value, so it makes sense to use a lookup table for it. As the list of cities that would appear is not known beforehand, the lookup table is initially empty.
To clarify, the value(s) of city is/are:
not know beforehand (we don't know what customer might contact us tomorrow)
there is no "list of all possible cities" (real life cities come and go, get renamed etc)
many persons will share the same value
initially, there will be a few different values (up to 10), later more (but not very much, a few hundred)
Also, the expected number of person objects will be thousands if not millions.
So the basic algorithm is (pseudocode):
procedure insertPerson(name,age,city)
{
cityId := lookup(city);
if cityId == null
cityId := insertIntoLookupTableAndReturnId(city);
INSERT INTO person_table VALUES (name,age,cityId);
}
What is a good lookup table organization for this problem? What exact code to use?
The goal is high performance of person insertion (whether the city is already in the lookup table or not).
General answers are welcome and Oracle 11g would be great.
Note: This is about an OLTP scenario. New persons are inserted in real time. There is no known list of persons that can be used for initialization of the lookup table.
Your basic approach appears to be OK except for one small change I would do: The function lookup(city) will search for the city and return the ID and, if the city is not found, will insert a new record and return its ID. This way, you are further encapsulating the management of the lookup table (cities). As such, your code would become:
procedure insertPerson(name,age,city)
{
INSERT INTO person_table VALUES (name,age,lookup(city));
}
One additional thing you may consider is to create a VIEW that would be used to query for persons' information, including the name of the city.
After some testing, the best performance (least block accesses) I could find was with an index organized table as the lookup table and the below SQL for inserting data.
create table citylookup (key number primary key, city varchar2(100)) organization index;
create unique index cltx1 on citylookup(city);
create sequence lookupkeys;
create sequence datakeys;
create table data (x number primary key, k number references citylookup(key) not null);
-- "Rome" is the city we try to insert
insert all
when oldkey is null then -- if the city is not in the lookup yet
into citylookup values (lookupkeys.nextval, 'Rome') -- then insert it
-- finally, insert the data row with the correct lookup key
when 1=1 then into data values (datakeys.nextval,nvl(oldkey, lookupkeys.nextval))
select (select key from citylookup where city='Rome') as oldkey from dual;
Result: 6+2 blocks for city-exists case, 10+2 for city-doesn't-exists yet (as reported by SQL*Plus with set autotrace on: first value is db block gets, the second consistent gets).
Alternatively, as suggested by Dudu Markovitz, the lookup table could cached in the application and in the hit case just perform an simple INSERT into the DATA table, which then costs only 6+1 block accesses (for the above test case). Here the problem is keeping the cached lookup table in sync with the database and possible other instances of the server application.
PS: The above INSERT ALL command "wastes" a sequence value from the lookupkeys sequence on each run, even if no new city is inserted into the lookup table. It is an additional exercise to solve that.

Is it better to have int joins instead of string columns?

Let's say I have a User which has a status and the user's status can be 'active', 'suspended' or 'inactive'.
Now, when creating the database, I was wondering... would it be better to have a column with the string value (with an enum type, or rule applied) so it's easier to both query and know the current user status or are joins better and I should join in a UserStatuses table which contains the possible user statuses?
Assuming, of course statuses can not be created by the application user.
Edit: Some clarification
I would NOT use string joins, it would be a int join to UserStatuses PK
My primary concern is performance wise
The possible status ARE STATIC and will NEVER change
On most systems it makes little or no difference to performance. Personally I'd use a short string for clarity and join that to a table with more detail as you suggest.
create table intLookup
(
pk integer primary key,
value varchar(20) not null
)
insert into intLookup (pk, value) values
(1,'value 1'),
(2,'value 2'),
(3,'value 3'),
(4,'value 4')
create table stringLookup
(
pk varchar(4) primary key,
value varchar(20) not null
)
insert into stringLookup (pk, value) values
(1,'value 1'),
(2,'value 2'),
(3,'value 3'),
(4,'value 4')
create table masterData
(
stuff varchar(50),
fkInt integer references intLookup(pk),
fkString varchar(4)references stringLookup(pk)
)
create index i on masterData(fkInt)
create index s on masterData(fkString)
insert into masterData
(stuff, fkInt, fkString)
select COLUMN_NAME, (ORDINAL_POSITION %4)+1,(ORDINAL_POSITION %4)+1 from INFORMATION_SCHEMA.COLUMNS
go 1000
This results in 300K rows.
select
*
from masterData m inner join intLookup i on m.fkInt=i.pk
select
*
from masterData m inner join stringLookup s on m.fkString=s.pk
On my system (SQL Server)
- the query plans, I/O and CPU are identical
- execution times are identical.
- The lookup table is read and processed once (in either query)
There is NO difference using an int or a string.
I think, as a whole, everyone has hit on important components of the answer to your question. However, they all have good points which should be taken together, rather than separately.
As logixologist mentioned, a healthy amount of Normalization is generally considered to increase performance. However, in contrast to logixologist, I think your situation is the perfect time for normalization. Your problem seems to be one of normalization. In this case, using a numeric key as Santhosh suggested which then leads back to a code table containing the decodes for the statuses will result in less data being stored per record. This difference wouldn't show in a small Access database, but it would likely show in a table with millions of records, each with a status.
As David Aldridge suggested, you might find that normalizing this particular data point will result in a more controlled end-user experience. Normalizing the status field will also allow you to edit the status flag at a later date in one location and have that change perpetuated throughout the database. If your boss is like mine, then you might have to change the Status of Inactive to Closed (and then back again next week!), which would be more work if the status field was not normalized. By normalizing, it's also easier to enforce referential integrity. If a status key is not in the Status code table, then it can't be added to your main table.
If you're concerned about the performance when querying in the future, then there are some different things to consider. To pull back status, if it's normalized, you'll be adding a join to your query. That join will probably not hurt you in any sized recordset but I believe it will help in larger recordsets by limiting the amount of raw text that must be handled. If your primary concern is performance when querying the data, here's a great resource on how to optimize queries: http://www.sql-server-performance.com/2007/t-sql-where/ and I think you'll find that a lot of the rules discussed here will also apply to any inclusion criteria you enforce in the join itself.
Hope this helps!
Christopher
The whole idea behind normalization is to keep the data from repeating (well at least one of the concepts).
In this case there is only 1 status a user at one time (I assume) can have so their is no reason to put it in its own table. You would simply complicate things. The only reason you would have a seperate table is if for some reason these statuses were not static. Meaning next month you may add "Sort of Active" and "Maybe Inactive". This would mean changing code to make up for that if you didnt put them in their own table. You could create a maintenace page where users could add statuses and then that would require you to create a seperate table.
An issue to consider is whether these status values have attributes of their own.
For example, perhaps you would want to have a default sort order that is different from the alphabetical order of the status text. You might also want to treat two of the statuses in a particular way that you do not treat the other, and that could be an attribute.
If you have a need for that, or suspect a future need for that, then move the status text to a different table and use an integer key value for them.
I would suggest using Integer values like 0, 1, 2. If this is fixed. When interpreting the results in Reports we can change these status back to strings.

Building a Relationship Between Attributes Or Columns Of Bits "Flatting it out"

I have the following SQL design issue. The code below might look a little much but basically I have a table of cars and another table of attributes the car could have. This makes complete sense to me to structure a table of attributes for an object using a linking table, #CarSupportedAtttibutes. Recently I've been tasked with doing something similar but use one table that has each of the Attributes as columns making it "flat". Similar to below:
[CarId][Name][Manual Transmission][Sunroof][Automatic Transmission][AWD]
I am told doing so it will boost the speed of my queries, but its starting to turn into a nightmare. In C# I have enumerated values for each of the car's attributes, 1 = Manual Transmission, so using the non "flat" version I am able to pull off a query pretty quickly as the SQL code below shows. Since I am being pushed to making the table flat for speed the only way I can think of is to take the enumerated value and build it into the where clause, using a case statement for every 1,2,3 and selecting off a column name.
To me it just makes more sense to organize the data like below. Like what if a new attribute about a car is needed, say "HEMI Engine". Not all cars are going to have this, in fact its going to be a rare case. But The way I am told to design is to keep the table "flat", so now I would be adding a Column called "Hemi Engine" to my table, instead of adding a row in my CarAttributes, and then only adding rows for the cars that have that as true.
Below is a snippet of the way I currently see approaching this problem, as opposed to doing a "flat" table (table with mostly columns of bits).
Question: What design makes more sense? Which is more maintainable? Am I completely crazy for thinking below is a better approach, and why?
CREATE TABLE #Car
(
CarId INT,
Name VARCHAR(250)
)
INSERT INTO #Car VALUES (1, 'Fusion')
INSERT INTO #Car VALUES (2, 'Focus')
CREATE TABLE #CarAttributes
(
AttributeId INT,
Name VARCHAR(250)
)
INSERT INTO #CarAttributes VALUES (1, 'Manual Transmission')
INSERT INTO #CarAttributes VALUES (2, 'SunRoof')
SELECT * FROM #CarAttributes
CREATE TABLE #CarSupportedAttributes
(
AttributeId INT,
CarId INT
)
INSERT INTO #CarSupportedAttributes VALUES (1,2)
--Determine if A Focus has a manual transmission
SELECT * FROM #Car c
INNER JOIN #CarSupportedAttributes csa
ON csa.CarId = c.CarId
INNER JOIN #CarAttributes ca
ON ca.AttributeId = csa.AttributeId
WHERE c.Name = 'Focus'
AND ca.AttributeId = 1
Your approach is known as Entity-Attribute-Value, or EAV (yours is slightly modified, since in your model the presence of the attribute on the entity is the value, but the concept is the same).
EAV is usually considered an anti-pattern, but it can be appropriate in some cases. Basically, if either...
Your list of attributes is large and any given entity (car) will have only a small percentage of the total attributes
Your list of attributes is subject to frequent user change and they represent only data and not anything structural about the entity
Then EAV can be an appropriate choice. I can't answer either of those questions for you (though I have my suspicions), but it does seem like it might be appropriate in your case.
The other option, which is likely what most 6NF proponents would suggest, would be to have a table per attribute, like CarSunroof or CarManualTransmission. This would solve the first issue and the requirement of changing a table's definition whenever a new attribute is added, but would not address the issue of the user being able to change it.

Dynamically generate criteria in SQL

I have a Users table that contains dozens of columns like date of birth, year of vehicle owned, make and model of the vehicle, color and many other personal fields unrelated to the vehicle
There's also a 2nd table called Coupons that needs to be designed in a way to support a qualification like "user qualifies if younger than 30 yrs old", "user qualifies if vehicle is greater than 10 yrs old", "user qualifies if vehicle color is green".
When a user logs in, I need to present all coupons the user qualifies for. The problem that I'm having is that the coupon qualifications could be numerous, could have qualifiers like equal, greater than or less than and may have different combinations.
My only solution at this point is to store the actual sql string within one of the coupons table columns like
select * from Users where UserId = SOME_PLACEHOLDER and VehicleYear < 10
Then I could execute the sql for each coupon row and return true or false. Seems very inefficient as I would potentially have to execute 1000s of sql statements for each coupon code.
Any insight, help is appreciated. I do have server-side code where I could potentially be able to do looping.
Thank you.
Very difficult problem. Seems like users will be added at high volume speed, with coupons at a fairly regular frequency.
Adding SQL to a table to be used dynamically is workable - at least you'll get a fresh execution plan - BUT your plan cache may balloon up.
I have a feeling that running a single coupon for all users is probably likely to be your highest performing query because it's one single set of criteria which will be fairly selective on users first and total number of coupons is small, whereas running all coupons for a single user is separate criteria for each coupon for that user. Running all coupons for all users may still perform well, even though it's effectively a cross join first - I guess it is just going to depend.
Anyway, the case for all coupons for all users (or sliced either way, really) will be something like this:
SELECT user.id, coupon.id
FROM user
INNER JOIN coupon
ON (
CASE WHEN <coupon.criteria> THEN <coupon.id> -- code generated from the coupon rules table
CASE WHEN <coupon.criteria> THEN <coupon.id> -- etc.
ELSE NULL
) = coupon.id
To generate the coupon rules, you can relatively easily do the string concatenation in a single swipe (and you can combine an individual rule lines design for a coupon with AND with a further inner template):
DECLARE #outer_template AS varchar(max) = 'SELECT user.id, coupon.id
FROM user
INNER JOIN coupon
ON (
{template}
ELSE NULL
) = coupon.id
';
DECLARE #template AS varchar(max) = 'CASE WHEN {coupon.rule} THEN {coupon.id}{crlf}';
DECLARE #coupon AS TABLE (id INT, [rule] varchar(max));
INSERT INTO #coupon VALUES
(1, 'user.Age BETWEEN 20 AND 29')
,(2, 'user.Color = ''Yellow''');
DECLARE #sql AS varchar(MAX) = REPLACE(
#outer_template
,'{template}',
REPLACE((
SELECT REPLACE(REPLACE(
#template
,'{coupon.rule}', coupon.[rule])
, '{coupon.id}', coupon.id)
FROM #coupon AS coupon
FOR XML PATH('')
), '{crlf}', CHAR(13) + CHAR(10)));
PRINT #sql;
// EXEC (#sql);
There's ways to pretty that up - play with it here: https://data.stackexchange.com/stackoverflow/q/115098/
I would consider adding computed columns (possibly persisted and indexed) to assist. For instance, age - non-persisted computed column will likely perform better than a scalar function.
I would consider batching this with a table which says whether a coupon is valid for a user and when it was last validated.
Seems like ages can change and a user can become valid or invalid for a coupon as their birthday passes.
When a user logs in you could spawn a background job to update their coupons. On subsequent logons, there won't be any need to update (since it's not likely to change until the next day or a triggering event).
Just a few ideas.
I would also add that you should have a way to test a coupon before it is approved to ensure there are no syntax errors (since the SQL is ad hoc or arbitrary) - this can be done relatively easily - perhaps a test user table (test_user as user in the generated code template instead) is required to contain pass and fail rows and the coupon rule points to those. Not only does the EXEC have to work - the rows it returns should be the expected and only the expected rows for that coupon.
This is not an easy problem. Here are some quick ideas that may help depending on your domain requirements:
Restrict the type of criteria you will be filtering on so that you can use dynamic or non-dynamic sql to execute them efficiently. For example if you are going to only have integers between a range of min and max values as a criteria then the problem becomes simpler. (You only need to know the field name, and the min max values to describe a criterian, not the full where statement.)
Create a number of views which expose the attributes in a helpful way. Then perform queries against those views -- or have those views pre-select in some way. For example, an age group view that has a field which can contain the values < 21, 21-30, 30-45, >45. Then your select just needs to return the rows from this view that match these strings.
Create a table which stores the results of running your criteria matching query (This can be run off line by a back ground process). Then for a given user check for membership by looking where in the table this user's ID exists.
Thinking about this some more I realize all my suggestions are based on one idea.
A query for an individual user will work faster overall if you first perform an SQL query against all users and cache that result in some way. If every user is reproducing queries against the whole dataset you will lose efficiency. You need some way to cache results and reuse them.
Hope this helps -- comment if these ideas are not clear.
My first thought on an approach (similar to Hogan's) would be to test for coupon applicability at the time the coupon is created. Store those results in a table (User_Coupons for example). If any user data is changed, your system would then retest any changed users for which coupons are applicable to them. At coupon creation (or change) time it would only check versus that coupon. At use creation (or change) time it would only check versus that user.
The coupon criteria should be from a known set of possible criteria and any time that you want to add a new type of criteria, it would possibly involve a code change. For example, let's say that you have a table set up similar to this:
CREATE TABLE Coupon_Criteria (
coupon_id INT NOT NULL,
age_minimum SMALLINT NULL,
age_maximum SMALLINT NULL,
vehicle_color VARCHAR(20) NULL,
...
CONSTRAINT PK_Coupon_Criteria PRIMARY KEY CLUSTERED (coupon_id)
)
If you wanted to add the ability to base a coupon on vehicle age then you would have to add a column to the table and likewise you would have to adjust your search code. You would use NULL values to indicate that the criteria is unused for that coupon.
An example query for the above table:
SELECT
CC.coupon_id
FROM
Users U
INNER JOIN Coupon_Criteria CC ON
(CC.age_maximum IS NULL OR dbo.f_GetAge(U.birthday) <= age_maximum) AND
(CC.age_minimum IS NULL OR dbo.f_GetAge(U.birthday) >= age_minimum) AND
(CC.vehicle_color IS NULL OR U.vehicle_color = CC.vehicle_color) AND
...
This can get unwieldy if the number of possible criteria gets to be very large.
Another possibility would be to save the coupon criteria in XML and have a business object for your application use that to determine eligibility. It could use the XML to generate a proper query against the User table (and any other necessary tables).
Here's another possibility. Each criteria could be given a query template which you could append to your queries. This would just involve updates to the data instead of DDL and could have good performance. It would involve dynamic SQL.
CREATE TABLE Coupons (
coupon_id INT NOT NULL,
description VARCHAR(2000) NOT NULL,
...
CONSTRAINT PK_Coupons PRIMARY KEY CLUSTERED (coupon_id)
)
CREATE TABLE Coupon_Criteria (
coupon_id INT NOT NULL,
criteria_num SMALLINT NOT NULL,
description VARCHAR(50) NOT NULL,
code_template VARCHAR(500) NOT NULL,
CONSTRAINT PK_Coupon_Criteria PRIMARY KEY CLUSTERED (coupon_id, criteria_num),
CONSTRAINT FK_Coupon_Criteria_Coupon FOREIGN KEY (coupon_id) REFERENCES Coupons (coupon_id)
)
INSERT INTO Coupons (coupon_id, description)
VALUES (1, 'Young people save $200 on yellow vehicles!')
INSERT INTO Coupon_Criteria (coupon_id, criteria_num, description, code_template)
VALUES (1, 1, 'Young people', 'dbo.Get_Age(U.birthday) <= 20')
INSERT INTO Coupon_Criteria (coupon_id, criteria_num, description, code_template)
VALUES (1, 2, 'Yellow Vehicles', U.vehicle_color = ''Yellow''')
You could then build a query by simply concatenating all of the criteria for any given coupon. The big downside to this one is that it's only one-directional. Given a coupon you can easily find who is qualified for it, but given a user you cannot find all coupons for which they are eligible except by going through all of the coupons. My guess is that the second is what you'd probably be most interested in unfortunately. Maybe this will give you some other ideas though.
For example, you could potentially have it work the other way by having a set number of criteria in a table and for the coupon/criteria linking table indicate whether or not that criteria is active. When querying you could then include that in your query. In other words, the query would look something like:
WHERE
(CC.is_active = 0 OR <code from the code column>) AND
The querying gets very complex though since you either need to join once for every possible criteria or you need to query to compare the number of active requirements for a coupon versus the number that are fulfilled. That is possible in SQL, but it's similar to working with an EAV model - which is basically what this turns into: a variation on an EAV model (yuck)

Records linked to any table?

Hi Im struggling a bit with this and could use some ideas...
Say my database has the following tables ;
Customers
Supplers
SalesInvoices
PurchaseInvoices
Currencies
etc etc
I would like to be able to add a "Notes" record to ANY type of record
The Notes table would like this
NoteID Int (PK)
NoteFK Int
NoteFKType Varchar(3)
NoteText varchar(100)
NoteDate Datetime
Where NoteFK is the PK of a customer or supplier etc and NoteFKType says what type of record the note is against
Now i realise that I cannot add a FK which references multiple tables without NoteFK needing to be present in all tables.
So how would you design the above ?
The note FK needs to be in any of the above tables
Cheers,
Daniel
You have to accept the limitation that you cannot teach the database about this foreign key constraint. So you will have to do without the integrity checking (and cascading deletes).
Your design is fine.
It is easily extensible to extra tables, you can have multiple notes per entity, and the target tables do not even need to be aware of the notes feature.
An advantage that this design has over using a separate notes table per entity table is that you can easily run queries across all notes, for example "most recent notes", or "all notes created by a given user".
As for the argument of that table growing too big, splitting it into say five table will shrink the table to about a fifth of its size, but this will not make any difference for index-based access. Databases are built to handle big tables (as long as they are properly indexed).
I think your design is ok, if you can accept the fact, that the db system will not check whether a note is referencing an existing entity in other table or not. It's the only design I can think of that doesn't require duplication and is scalable to more tables.
The way you designed it, when you add another entity type that you'd like to have notes for, you won't have to change your model. Also, you don't have to include any additional columns in your existing model, or additional tables.
To ensure data integrity, you can create set of triggers or some software solution that will clean notes table once in a while.
I would think twice before doing what you suggest. It might seem simple and elegant in the short term, but if you are truly interested in data integrity and performance, then having separate notes tables for each parent table is the way to go. Over the years, I've approached this problem using the solutions found in the other answers (triggers, GUIDs, etc.). I've come to the conclusion that the added complexity and loss of performance isn't worth it. By having separate note tables for each parent table, with an appropriate foreign key constraints, lookups and joins will be simple and fast. When combining the related items into one table, join syntax becomes ugly and your notes table will grow to be huge and slow.
I agree with Michael McLosky, to a degree.
The question in my mind is: What is the technical cost of having multiple notes tables?
In my mind, it Is preferable to consolidate the same functionality into a single table. It aso makes reporting and other further development simpler. Not to mention keeping the list of tables smaller and easier to manage.
It's a balancing act, you need to try to predetermine both the benefits And the costs of doing something like this. My -personal- preference is database referential integrity. Application management of integrity should, in my opinion, be limitted ot business logic. The database should ensure the data is always consistent and valid...
To actually answer your question...
The option I would use is a check constraint using a User Defined Function to check the values. This works in M$ SQL Server...
CREATE TABLE Test_Table_1 (id INT IDENTITY(1,1), val INT)
GO
CREATE TABLE Test_Table_2 (id INT IDENTITY(1,1), val INT)
GO
CREATE TABLE Test_Table_3 (fk_id INT, table_name VARCHAR(64))
GO
CREATE FUNCTION id_exists (#id INT, #table_name VARCHAR(64))
RETURNS INT
AS
BEGIN
IF (#table_name = 'Test_Table_1')
IF EXISTS(SELECT * FROM Test_Table_1 WHERE id = #id)
RETURN 1
ELSE
IF (#table_name = 'Test_Table_2')
IF EXISTS(SELECT * FROM Test_Table_2 WHERE id = #id)
RETURN 1
RETURN 0
END
GO
ALTER TABLE Test_Table_3 WITH CHECK ADD CONSTRAINT
CK_Test_Table_3 CHECK ((dbo.id_exists(fk_id,table_name)=(1)))
GO
ALTER TABLE [dbo].[Test_Table_3] CHECK CONSTRAINT [CK_Test_Table_3]
GO
INSERT INTO Test_Table_1 SELECT 1
GO
INSERT INTO Test_Table_1 SELECT 2
GO
INSERT INTO Test_Table_1 SELECT 3
GO
INSERT INTO Test_Table_2 SELECT 1
GO
INSERT INTO Test_Table_2 SELECT 2
GO
INSERT INTO Test_Table_3 SELECT 3, 'Test_Table_1'
GO
INSERT INTO Test_Table_3 SELECT 3, 'Test_Table_2'
GO
In that example, the final insert statement would fail.
You can get the FK referential integrity, at the costing of having one column in the notes table for each other table.
create table Notes (
id int PRIMARY KEY,
note varchar (whatever),
customer_id int NULL REFERENCES Customer (id),
product_id int NULL REFERENCES Product (id)
)
Then you'll need a constraint to make sure that you have only one of the columns set.
Or maybe not, maybe you might want a note to be able to be associated with both a customer and a product. Up to you.
This design would require adding a new column to Notes if you want to add another referencing table.
You could add a GUID field to the Customers, Suppliers, etc. tables. Then in the Notes table, change the foreign key to reference that GUID.
This does not help for data integrity. But it makes M-to-N relationships easily possible to any number of tables and it saves you from having to define a NoteFKType column in the Notes table.
You can easily implement "multi"-foreign key with triggers. Triggers will give you very flexible mechanism and you can do any integrity checks you wish.
Why dont you do it the other way around and have a foreign key in other tables (Customer, Supplier etc etc) to NotesID. This way you have one to one mapping.