Combining amendsments to database table fields - sql

I am interested to understand the correct approach to solving a database problem I find myself faced with.
I have a table (with ~45 columns) that holds information about stock, it's pricing and a lot of information relating to it's packaging, discounting, etc.
The stock is accessed via a web application using ADO (VB6) by constructing TSQL queries in numerous places.
There is now a need to hold a new table with a very cut down list of the above columns to allow some users to override parts of the stock information from the source (mainly descriptions and such).
The problem I'm faced with is coming up with a way to (perhaps) construct a view of the two tables such that the software still thinks it is talking to the first table (changing the software is simply a no-go) when in fact it is the first table amended by the second, perhaps via some sort of UNION.
To present a simple example, suppose you have a stock table as:
CREATE TABLE Stock
(
id int NOT NULL,
ref varchar(20) NOT NULL,
short_description varchar(50) NOT NULL,
long_description varchar(255) NOT NULL,
...many other columns
)
and an amendments table as:
CREATE TABLE StockAmendments
(
id int NOT NULL,
ref varchar(20) NOT NULL,
short_description varchar(50) NOT NULL
)
The idea would be to rename Stock as StockSource and to build a view called Stock which amends StockSource with StockAmendments (in this case a potentially different short_description). This way the software does not need to know about the change.
Is this possible?

This should be doable. I haven't done t-sql in a very long time but something like this:
CREATE VIEW Stock
AS
SELECT
ss.id,
ss.ref,
iif(isnull(sa.short_description), ss.short_description, sa.short_description)) as short_description,
ss.long_description
FROM StockSource ss, StockAmendments sa
WHERE ss.id = sa.id AND ss.ref = sa.ref
One thing to worry about is query performance depending on indexes, etc. If this is a problem, you might be better off creating a real 'Stock' table based off of StockSource and putting a trigger on StockAmendments to update the 'Stock' table.

Yes it is possible using updatable views.
Rename Stock as StockSource and to build an updatable view Stock over StockSource and StockAmendments. This way you'll not have to do any change in VB6 code.

Related

Automatic Normalization of VARCHAR columns?

Imagine you have many tables which have a VARCHAR(50) column called Country, there is a small number of different country names which repeat millions of times, the wise thing to do here is to create a table called dbo.Country(CountryID, CountryName) and have all the tables hold CountryID with a foreign key reference.
Problem is we have to JOIN all our queries with dbo.Country every time we want do something with that column.
But all the joins seem to follow the same pattern, so my question is, can SQL Server do it automatically? For example I would specify a column called CountryName in some table which looks like a VARCHAR but is actually stored as a CountryID with foreign key, and SQL Server could implicitly add the JOIN whenever necessary.
Is there such a feature in SQL Server or any other SQL database?
You can't do this "automatically". However, you do have a couple of options.
One is to create a view on top of the table that automatically does the join:
create view v_table as
select t.*, c.CountryName
from table t join
country c
on t.countryId = c.countryId;
Alternatively, you could make Country an enumerated type. That would allow it to be accessed as a string but stored as an integer.

Multiple Wildcard Counts in Same Query

One of my job functions is being responsible for mining and marketing on a large newsletter subscription database. Each one of my newsletters has four columns (newsletter_status, newsletter_datejoined, newsletter_dateunsub, and newsletter_unsubmid).
In addition to these columns, I also have a master unsub column that our customer service dept. can update to accomodate irate subscribers who wish to be removed from all our mailings, and another column that gets updated if a hard bounce (or a set number of soft bounces) occurs called emailaddress_status.
When I pull a count for current valid subscribers for one list I use the following syntax:
select count (*) from subscriber_db
WHERE (emailaddress_status = 'VALID' OR emailaddress_status IS NULL)
AND newsletter_status = 'Y'
and unsub = 'N' and newsletter_datejoined >= '2013-01-01';
What I'd like to have is one query that looks for all columns with %_status, with the aforementioned criteria ordered by current count size.
I'd like for it to look like this:
etc.
I've search around the web for months looking for something similar, but other than running them in a terminal and exporting the results I've not been able to successfully get them all in one query.
I'm running PostgreSQL 9.2.3.
A proper test case would be each aggregate total matching the counts I get when running the individual queries.
Here's my obsfucated table definition for ordinal placement, column_type, char_limit, and is_nullable.
Your schema is absolutely horrifying:
24 ***_status text YES
25 ***_status text YES
26 ***_status text YES
27 ***_status text YES
28 ***_status text YES
29 ***_status text YES
where I presume the masked *** is something like the name of a publication/newsletter/etc.
You need to read about data normalization or you're going to have a problem that keeps on growing until you hit PostgreSQL's row-size limit.
Since each item of interest is in a different column the only way to solve this with your existing schema is to write dynamic SQL using PL/PgSQL's EXECUTE format(...) USING .... You might consider this as an interim option only, but it's a bit like using a pile driver to jam the square peg into the round hole because a hammer wasn't big enough.
There are no column name wildcards in SQL, like *_status or %_status. Columns are a fixed component of the row, with different types and meanings. Whenever you find yourself wishing for something like this it's a sign that your design needs to be re-thought.
I'm not going to write an example since (a) this is an email marketing company and (b) the "obfuscated" schema is completely unusable for any kind of testing without lots of manual work re-writing it. (In future, please provide CREATE TABLE and INSERT statements for your dummy data, or better yet, a http://sqlfiddle.com/). You'll find lots of examples of dynamic SQL in PL/PgSQL - and warnings about how to avoid the resulting SQL injection risks by proper use of format - with a quick search of Stack Overflow. I've written a bunch in the past.
Please, for your sanity and the sanity of whoever else needs to work on this system, normalize your schema.
You can create a view over the normalized tables to present the old structure, giving you time to adapt your applications. With a bit more work you can even define a DO INSTEAD view trigger (newer Pg versions) or RULE (older Pg versions) to make the view updateable and insertable, so your app can't even tell that anything has changed - though this comes at a performance cost so it's better to adapt the app if possible.
Start with something like this:
CREATE TABLE subscriber (
id serial primary key,
email_address text not null,
-- please read http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
-- for why I merged "fname" and "lname" into one field:
realname text,
-- Store birth month/year as a "date" with a "CHECK" constraint forcing it to be the 1st day
-- of the month. Much easier to work with.
birthmonth date,
CONSTRAINT birthmonth_must_be_day_1 CHECK ( extract(day from birthmonth) = 1),
postcode text,
-- Congratulations! You made "gender" a "text" field to start with, you avoided
-- one of the most common mistakes in schema design, the boolean/binary gender
-- field!
gender text,
-- What's MSO? Should have a COMMENT ON...
mso text,
source text,
-- Maintain these with a trigger. If you want modified to update when any child record
-- changes you can do that with triggers on subscription and reducedfreq_subscription.
created_on timestamp not null default current_timestamp,
last_modified timestamp not null,
-- Use the native PostgreSQL UUID type, after running CREATE EXTENSION "uuid-ossp";
uuid uuid not null,
uuid2 uuid not null,
brand text,
-- etc etc
);
CREATE TABLE reducedfreq_subscription (
id serial primary key,
subscriber_id integer not null references subscriber(id),
-- Suspect this was just a boolean stored as text in your schema, in which case
-- delete it.
reducedfreqsub text,
reducedfreqpref text,
-- plural, might be a comma list? Should be in sub-table ("join table")
-- if so, but without sample data can only guess.
reducedfreqtopics text,
-- date can be NOT NULL since the row won't exist unless they joined
reducedfreq_datejoined date not null,
reducedfreq_dateunsub date
);
CREATE TABLE subscription (
id serial primary key,
subscriber_id integer not null references subscriber(id),
sub_name text not null,
status text not null,
datejoined date not null,
dateunsub date
);
CREATE TABLE subscriber_activity (
last_click timestamptz,
last_open timestamptz,
last_hardbounce timestamptz,
last_softbounce timestamptz,
last_successful_mailing timestamptz
);
To call it merely "horrifying" shows a great deal of tact and kindness on your part. Thank You. :) I inherited this schema only recently (which was originally created by the folks at StrongMail).
I have a full relational DB re-arch project on my roadmap this year - the sample normalization is very much inline with what I'd been working on. Very interesting insight on realname, I hadn't really thought about that. I suppose the only reason StrongMail had it broken out was for first name email personalization.
MSO is multiple systems operator (cable company). We're a large lifestyle media company, and the newsletters we produce are on food, travel, homes and gardening.
I'm creating a Fiddle for this - I'm new here so going forward I'll be more mindful of what you guys need to be able to help. Thank you!

Building a Relationship Between Attributes Or Columns Of Bits "Flatting it out"

I have the following SQL design issue. The code below might look a little much but basically I have a table of cars and another table of attributes the car could have. This makes complete sense to me to structure a table of attributes for an object using a linking table, #CarSupportedAtttibutes. Recently I've been tasked with doing something similar but use one table that has each of the Attributes as columns making it "flat". Similar to below:
[CarId][Name][Manual Transmission][Sunroof][Automatic Transmission][AWD]
I am told doing so it will boost the speed of my queries, but its starting to turn into a nightmare. In C# I have enumerated values for each of the car's attributes, 1 = Manual Transmission, so using the non "flat" version I am able to pull off a query pretty quickly as the SQL code below shows. Since I am being pushed to making the table flat for speed the only way I can think of is to take the enumerated value and build it into the where clause, using a case statement for every 1,2,3 and selecting off a column name.
To me it just makes more sense to organize the data like below. Like what if a new attribute about a car is needed, say "HEMI Engine". Not all cars are going to have this, in fact its going to be a rare case. But The way I am told to design is to keep the table "flat", so now I would be adding a Column called "Hemi Engine" to my table, instead of adding a row in my CarAttributes, and then only adding rows for the cars that have that as true.
Below is a snippet of the way I currently see approaching this problem, as opposed to doing a "flat" table (table with mostly columns of bits).
Question: What design makes more sense? Which is more maintainable? Am I completely crazy for thinking below is a better approach, and why?
CREATE TABLE #Car
(
CarId INT,
Name VARCHAR(250)
)
INSERT INTO #Car VALUES (1, 'Fusion')
INSERT INTO #Car VALUES (2, 'Focus')
CREATE TABLE #CarAttributes
(
AttributeId INT,
Name VARCHAR(250)
)
INSERT INTO #CarAttributes VALUES (1, 'Manual Transmission')
INSERT INTO #CarAttributes VALUES (2, 'SunRoof')
SELECT * FROM #CarAttributes
CREATE TABLE #CarSupportedAttributes
(
AttributeId INT,
CarId INT
)
INSERT INTO #CarSupportedAttributes VALUES (1,2)
--Determine if A Focus has a manual transmission
SELECT * FROM #Car c
INNER JOIN #CarSupportedAttributes csa
ON csa.CarId = c.CarId
INNER JOIN #CarAttributes ca
ON ca.AttributeId = csa.AttributeId
WHERE c.Name = 'Focus'
AND ca.AttributeId = 1
Your approach is known as Entity-Attribute-Value, or EAV (yours is slightly modified, since in your model the presence of the attribute on the entity is the value, but the concept is the same).
EAV is usually considered an anti-pattern, but it can be appropriate in some cases. Basically, if either...
Your list of attributes is large and any given entity (car) will have only a small percentage of the total attributes
Your list of attributes is subject to frequent user change and they represent only data and not anything structural about the entity
Then EAV can be an appropriate choice. I can't answer either of those questions for you (though I have my suspicions), but it does seem like it might be appropriate in your case.
The other option, which is likely what most 6NF proponents would suggest, would be to have a table per attribute, like CarSunroof or CarManualTransmission. This would solve the first issue and the requirement of changing a table's definition whenever a new attribute is added, but would not address the issue of the user being able to change it.

What is the preferred way of saving dynamic lists in database?

In our application user can create different lists (like sharepoint) for example a user can create a list of cars (name, model, brand) and a list of students (name, dob, address, nationality), e.t.c.
Our application should be able to query on different columns of the list so we can't just serialize each row and save it in one row.
Should I create a new table at runtime for each newly created list? If this was the best solution then probably Microsoft SharePoint would have done it as well I suppose?
Should I use the following schema
Lists (Id, Name)
ListColumns (Id, ListId, Name)
ListRows (Id, ListId)
ListData(RowId, ColumnId, Value)
Though a single row will create as many rows in list data table as there are columns in the list, this just doesn't feel right.
Have you dealt with this situation? How did you handle it in database?
what you did is called EAV (Entity-Attribute-Value Model).
For a list with 3 columns and 1000 entries:
1 record in Lists
3 records in ListColumns
and 3000 Entries in ListData
This is fine. I'm not a fan of creating tables on-the-fly because it could mess up your database and you would have to "generate" your SQL queries dynamically. I would get a strange feeling when users could CREATE/DROP/ALTER Tables in my database!
Another nice feature of the EAV model is that you could merge two lists easily without droping and altering a table.
Edit:
I think you need another table called ListRows that tells you which ListData records belong together in a row!
Well I've experienced something like this before - I don't want to share the actual table schema so lets do some thought exercises using some of the suggested table structures:
Lets have a lists table containing a list of all my lists
Lets also have a columns table containing the metadata (column names)
Now we need a values table which contains the column values
We also need a rows table which contains a list of all the rows, otherwise it gets very difficult to work out how many rows there actually are
To keep things simple lets just make everything a string (VARCAHR) and have a go at coming up with some queries:
Counting all the rows in a table
SELECT COUNT(*) FROM [rows]
JOIN [lists]
ON [rows].list_id = [Lists].id
WHERE [Lists].name = 'Cars'
Hmm, not too bad, compared to:
SELECT * FROM [Cars]
Inserting a row into a table
BEGIN TRANSACTION
DECLARE #row_id INT
DECLARE #list_id INT
SELECT #list_id = id FROM [lists] WHERE name = 'Cars'
INSERT INTO [rows] (list_id) VALUES (#list_id)
SELECT #row_id = ##IDENTITY
DECLARE #column_id INT
-- === Need one of these for each column ===
SELECT #column_id = id FROM [columns]
WHERE name = 'Make'
AND list_id = #list_id
INSERT INTO [values] (column_id, row_id, value)
VALUES (#column_id, #row_id, 'Rover')
-- === Need one of these for each column ===
SELECT #column_id = id FROM [columns]
WHERE name = 'Model'
AND list_id = #list_id
INSERT INTO [values] (column_id, row_id, value)
VALUES (#column_id, #row_id, 'Metro')
COMMIT TRANSACTION
Um, starting to get a little bit hairy compared to:
INSERT INTO [Cars] ([Make], [Model}) VALUES ('Rover', 'Metro')
Simple queries
I'm now getting bored of constructing tediously complex SQL statements so maybe you can have a go at coming up with equivalent queries for the followng statements:
SELECT [Model] FROM [Cars] WHRE [Make] = 'Rover'
SELECT [Cars].[Make], [Cars].[Model], [Owners].[Name] FROM [Cars]
JOIN [Owners] ON [Owners].id = [Cars].owner_id
WHERE [Owners].Age > 50
SELECT [Cars].[Make], [Cars].[Model], [Owners].[Name] FROM [Cars]
JOIN [Owners] ON [Owners].id = [Cars].owner_id
JOIN [Addresses] ON [Addresses].id = [Owners].address_id
WHERE [Addresses].City = 'London'
I hope you are beginning to get the idea...
In short - I've experienced this before and I can assure you that creating a database inside a database in this way is definitely a Bad Thing.
If you need to do anything but the most basic querying on these lists (and literally I mean "Can I have all the items in this list please?"), you should try and find an alternative.
As long as each user pretty much has their own database I'll definitely recommend the CREATE TABLE approach. Even if they don't I'd still recommend that you at least consider it.
Perhaps a potential solution would be the creating of lists can involve CREATE TABLE statements for those entities/lists?
It sounds like the db structure or schema can change at runtime, or at the user's command, so perhaps something like this might help?
User wants to create a new list of an entity never seen before. Call it Computer.
User defines the attributes (screensize, CpuSpeed, AmountRAM, NumberOfCores)
System allows user to create in the UI
system generally lets them all be strings, unless can tell when all supplied values are indeed dates or numbers.
build the CREATE scripts, execute them against the DB.
insert the data that the user defined into that new table.
Properly coded, we're working with the requirements given: let users create new entities. There was no mention of scale here. Of course, this requires all input to be sanitized, queries parameterized, actions logged, etc.
The negative comment below doesn't actually give any good reasons, but creates a bit of FUD. I'd be interested in addressing any concerns with this potential solution. We haven't heard about scale, security, performance, or usage (internal LAN vs. internet).
You should absolutely not dynamically create tables when your users create lists. That isn't how databases are meant to work.
Your schema is correct, and the pluralization is, in my opinion, also correct, though I would remove the camel case and call them lists, list_columns, list_rows and list_data.
I would further improve upon your schema by skipping rows and columns tables, they serve no purpose. Simply have a row/column number attached to each cell, and keep things sparse: Don't bother holding empty cells in the database. You retain the ability to query/sort based on row/column, your queries will be (potentially very much) faster because the number of list_cells will be reduced, and you won't have to do any crazy joining to link your data back to its table.
Here is the complete schema:
create table lists (
id int primary key,
name varchar(25) not null
);
create table list_cells (
id int primary key,
list_id int not null references lists(id)
on delete cascade on update cascade,
row int not null,
col int not null,
data varchar(25) not null
);
It sounds like you might have Sharepoint already deployed in your environment.
Consider integrating your application with Sharepoint, and have it be your datastore. No need to recreate all the things you like about Sharepoint, when you could leverage it.
It'd take a bit of configuring, but you could call SP web services to CRUD your list data for you.
inserting list data into Sharepoint via web services
reading SP lists via web services
Sharepoint 2010 can also expose lists via OData, which would be simple to consume from any application.

Records linked to any table?

Hi Im struggling a bit with this and could use some ideas...
Say my database has the following tables ;
Customers
Supplers
SalesInvoices
PurchaseInvoices
Currencies
etc etc
I would like to be able to add a "Notes" record to ANY type of record
The Notes table would like this
NoteID Int (PK)
NoteFK Int
NoteFKType Varchar(3)
NoteText varchar(100)
NoteDate Datetime
Where NoteFK is the PK of a customer or supplier etc and NoteFKType says what type of record the note is against
Now i realise that I cannot add a FK which references multiple tables without NoteFK needing to be present in all tables.
So how would you design the above ?
The note FK needs to be in any of the above tables
Cheers,
Daniel
You have to accept the limitation that you cannot teach the database about this foreign key constraint. So you will have to do without the integrity checking (and cascading deletes).
Your design is fine.
It is easily extensible to extra tables, you can have multiple notes per entity, and the target tables do not even need to be aware of the notes feature.
An advantage that this design has over using a separate notes table per entity table is that you can easily run queries across all notes, for example "most recent notes", or "all notes created by a given user".
As for the argument of that table growing too big, splitting it into say five table will shrink the table to about a fifth of its size, but this will not make any difference for index-based access. Databases are built to handle big tables (as long as they are properly indexed).
I think your design is ok, if you can accept the fact, that the db system will not check whether a note is referencing an existing entity in other table or not. It's the only design I can think of that doesn't require duplication and is scalable to more tables.
The way you designed it, when you add another entity type that you'd like to have notes for, you won't have to change your model. Also, you don't have to include any additional columns in your existing model, or additional tables.
To ensure data integrity, you can create set of triggers or some software solution that will clean notes table once in a while.
I would think twice before doing what you suggest. It might seem simple and elegant in the short term, but if you are truly interested in data integrity and performance, then having separate notes tables for each parent table is the way to go. Over the years, I've approached this problem using the solutions found in the other answers (triggers, GUIDs, etc.). I've come to the conclusion that the added complexity and loss of performance isn't worth it. By having separate note tables for each parent table, with an appropriate foreign key constraints, lookups and joins will be simple and fast. When combining the related items into one table, join syntax becomes ugly and your notes table will grow to be huge and slow.
I agree with Michael McLosky, to a degree.
The question in my mind is: What is the technical cost of having multiple notes tables?
In my mind, it Is preferable to consolidate the same functionality into a single table. It aso makes reporting and other further development simpler. Not to mention keeping the list of tables smaller and easier to manage.
It's a balancing act, you need to try to predetermine both the benefits And the costs of doing something like this. My -personal- preference is database referential integrity. Application management of integrity should, in my opinion, be limitted ot business logic. The database should ensure the data is always consistent and valid...
To actually answer your question...
The option I would use is a check constraint using a User Defined Function to check the values. This works in M$ SQL Server...
CREATE TABLE Test_Table_1 (id INT IDENTITY(1,1), val INT)
GO
CREATE TABLE Test_Table_2 (id INT IDENTITY(1,1), val INT)
GO
CREATE TABLE Test_Table_3 (fk_id INT, table_name VARCHAR(64))
GO
CREATE FUNCTION id_exists (#id INT, #table_name VARCHAR(64))
RETURNS INT
AS
BEGIN
IF (#table_name = 'Test_Table_1')
IF EXISTS(SELECT * FROM Test_Table_1 WHERE id = #id)
RETURN 1
ELSE
IF (#table_name = 'Test_Table_2')
IF EXISTS(SELECT * FROM Test_Table_2 WHERE id = #id)
RETURN 1
RETURN 0
END
GO
ALTER TABLE Test_Table_3 WITH CHECK ADD CONSTRAINT
CK_Test_Table_3 CHECK ((dbo.id_exists(fk_id,table_name)=(1)))
GO
ALTER TABLE [dbo].[Test_Table_3] CHECK CONSTRAINT [CK_Test_Table_3]
GO
INSERT INTO Test_Table_1 SELECT 1
GO
INSERT INTO Test_Table_1 SELECT 2
GO
INSERT INTO Test_Table_1 SELECT 3
GO
INSERT INTO Test_Table_2 SELECT 1
GO
INSERT INTO Test_Table_2 SELECT 2
GO
INSERT INTO Test_Table_3 SELECT 3, 'Test_Table_1'
GO
INSERT INTO Test_Table_3 SELECT 3, 'Test_Table_2'
GO
In that example, the final insert statement would fail.
You can get the FK referential integrity, at the costing of having one column in the notes table for each other table.
create table Notes (
id int PRIMARY KEY,
note varchar (whatever),
customer_id int NULL REFERENCES Customer (id),
product_id int NULL REFERENCES Product (id)
)
Then you'll need a constraint to make sure that you have only one of the columns set.
Or maybe not, maybe you might want a note to be able to be associated with both a customer and a product. Up to you.
This design would require adding a new column to Notes if you want to add another referencing table.
You could add a GUID field to the Customers, Suppliers, etc. tables. Then in the Notes table, change the foreign key to reference that GUID.
This does not help for data integrity. But it makes M-to-N relationships easily possible to any number of tables and it saves you from having to define a NoteFKType column in the Notes table.
You can easily implement "multi"-foreign key with triggers. Triggers will give you very flexible mechanism and you can do any integrity checks you wish.
Why dont you do it the other way around and have a foreign key in other tables (Customer, Supplier etc etc) to NotesID. This way you have one to one mapping.