Generating unique foreign key tuples - sql

I have a table like this:
CREATE Table tblClassProperty (
ClassPropertyID integer IDENTITY(1,1) PRIMARY KEY,
Active bit DEFAULT 0 NOT NULL,
DEL bit DEFAULT 0 NOT NULL,
FINAL bit DEFAULT 0 NOT NULL,
Locked bit DEFAULT 0 NOT NULL,
_Project integer FOREIGN KEY REFERENCES tblProject(ProjectID),
_Property integer FOREIGN KEY REFERENCES tblProperty(PropertyID),
_Class integer FOREIGN KEY REFERENCES tblClass(ClassID)
CONSTRAINT CClassProperty UNIQUE(_Project,_Property,_Class))
I am migrating data from an old table (in another DB) into this one. For the most of the columns I am just copying the data from the other table. The foreign key columns don't exist in the old DB, they are just random values linked with the corresponding tables in the new DB. For your solution please take it for granted that the values for the columns _Project, _Property and _Class need to be generated the way they are now (with the SELECT TOP 1 ...)
INSERT INTO ClassDB..tblClassProperty (Active, DEL, FINAL, Locked,
_Project, _Property, _Class)
SELECT Active, DEL, FINAL, Locked,
(SELECT TOP 1 ProjectID FROM ClassDB..tblProject ORDER BY NEWID()),
(SELECT TOP 1 PropertyID FROM ClassDB..tblProperty ORDER BY NEWID()),
(SELECT TOP 1 ClassID FROM ClassDB..tblClass ORDER BY NEWID()),
FROM ClassDB_Access..tblClassProperty
Now, the INSERT INTO script won't run because of the unique constraint, and the constraint must stay as it is.
My question: I need a script that copies the existing columns, generates the foreign keys as above, BUT the generated values for the tuple (_Project, _Property, _Class) will be unique as per constraint.
The script has to run in SQL Server 2008 R2, and in the future also in Oracle.

You might cross join all three columns, assign random row numbers to them and join them back to tblClassproperty augmented by row_number:
; with randomized as (
select a.ProjectID,
b.PropertyID,
c.ClassID,
row_number() over(order by newid()) rn
from ClassDB..tblProject a
cross join ClassDB..tblProperty b
cross join ClassDB..tblClass c
),
ordered as (
select Active, DEL, FINAL, Locked,
row_number() over (order by newid()) rn
from ClassDB_Access..tblClassProperty
)
INSERT INTO ClassDB..tblClassProperty (Active, DEL, FINAL, Locked,
_Project, _Property, _Class)
select ordered.Active, ordered.DEL, ordered.FINAL, ordered.Locked,
randomized.projectID, randomized.PropertyID, randomized.ClassID
from ordered
inner join randomized
on ordered.rn = randomized.rn
This will not work in Oracle without reshuffling cte parts - or you might rewrite this using derived tables for instant compatibility.
Disclaimer: Other than syntax check this is not tested. Performance might be awful.
Edit: Forgot to mention that, in case of duplicate values in IDs, you might want to do distinct before cross joins.

Related

Sql 2 primary keys remove rows where 1 primary key is duplicated

I ve a Sql table (mariaDB) with 2 Primary keys. I want to remove the rows where the first primary key is duplicated.(yes i know that primary keys cant be duplicated but with 2 Primary keys they work like a touple so that it is possible, but in my case not wanted) example:
id(pk)
name(pk)
smth
smth else
1
a
1234
qwerty
1
b
4567
asdf
and i want to remove the 2nd line cause the id key is duplicated.
tried:
almost any delete query with row count
the query i tried last:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS RN
FROM product_names
)
DELETE FROM CTE WHERE RN<>1
To clarify the definition, you cannot have two primary key in a table. But the primary key of your table is composed of two columns.
To improve your schema, you may want to alter your table so that the primary key is only based on first column. However, depending on the database engine, it can be usefule to keep your composite key. It may speed up query which retrieve the second column only from the primary key. In that case you may want to add a unique clause to the first colume of your primary key.
To cleanup your table you can use that, but beware it doesn't have a filter on the second column, meaning any column with the same id can be deleted depending on its order.
WITH duplicated AS (
SELECT id, name, row_number() OVER (PARTITION BY a) row_number
FROM product_names
ORDER BY name
)
DELETE FROM product_names
WHERE (a, b) IN (SELECT a, b FROM duplicated WHERE row_number > 1);

query without using subquery or explicit join

I need to re-write this without using subquery and explicit join..help please been looking around for a while
SELECT snum, pnum, shipdate
FROM supply as b
WHERE EXISTS (SELECT pname, pnum FROM parts as a WHERE b.pnum = a.pnum);
I believe you've been given a trick question. The answer is this.
SELECT snum, pnum, shipdate
FROM supply
The reason is that the condition you're checking for should be impossible in a well designed database.
Let's have a look at what the original query is doing.
SELECT snum, pnum, shipdate
FROM supply as b
WHERE EXISTS (
SELECT pname, pnum FROM parts as a WHERE b.pnum = a.pnum
);
It's getting every row in supply where there's a corresponding part in parts. How do you do this in a query without a join? You shouldn't have to do it in the first place. Instead you should rely on referential integrity.
Referential integrity is a property of good table design that says all references are valid. There should be no need to check that each part in supply exists in parts because such a condition should be impossible. You do this with a well designed schema with appropriate use of foreign key and not null constraints.
(My examples are done in Postgres. The syntax for your database may vary.)
create table parts(
pnum integer primary key,
pname text not null
);
create table supply(
snum integer primary key,
pnum integer references parts(pnum) not null,
shipdate date not null
);
By declaring supply.pnum as references parts(pnum) we have told the database this is a foreign key and there must be a corresponding row in parts. Adding not null guarantees each row in supply must supply a valid part. The database enforces these constraints automatically.
(Note that MySQL takes a little more convincing to enforce a foreign key constraint. Because MySQL is so non-standard one can pick up bad habits learning on it. Use Postgres or even SQLite instead.)
You can also add the constraints to an existing table using alter table.
test=> alter table supply alter pnum set not null;
ALTER TABLE
test=> alter table supply add constraint pnum_fk foreign key (pnum) references parts(pnum);
ALTER TABLE
For example, let's say we have these parts.
test=> select * from parts;
pnum | pname
------+---------
1 | flange
2 | thingy
3 | whatsit
We can insert a row into supply for one of those parts.
test=> insert into supply (pnum, shipdate) values (3, '2018-02-03');
INSERT 0 1
But if we try to insert a part that doesn't exist, we get an error.
test=> insert into supply (pnum, shipdate) values (99, '2018-02-03');
ERROR: insert or update on table "supply" violates foreign key constraint "supply_pnum_fkey"
DETAIL: Key (pnum)=(99) is not present in table "parts".
Or one with a null part number...
test=> insert into supply (pnum, shipdate) values (null, '2018-02-03');
ERROR: null value in column "pnum" violates not-null constraint
DETAIL: Failing row contains (1, null, 2018-02-03).
The condition you're testing for is now impossible. There's no need for it. So the answer is:
SELECT snum, pnum, shipdate
FROM supply
One way is INTERSECT(column list is limited to common ones):
SELECT pnum
FROM supply
INTERSECT
SELECT pnum
FROM parts;
Using SEMIJOIN:
SELECT b.snum, b.pnum, b.shipdate
FROM supply as b
SEMIJOIN parts as a
ON b.pnum = a.pnum;
The subquery can be replaced by an INNER JOIN, as follows :
SELECT b.snum, b.pnum, b.shipdate
FROM
supply as b
INNER JOIN parts as a ON b.pnum = a.pnum
GROUP BY b.snum, b.pnum, b.shipdate
You could also go for implicit join, but I would not recommend it as it is less readable and out of favor as of now :
SELECT b.snum, b.pnum, b.shipdate
FROM
supply as b,
parts as a
WHERE b.pnum = a.pnum
GROUP BY b.snum, b.pnum, b.shipdate

Alternateive SQL Query for a Badly-Designed Oracle Table

I am pulling data from a legacy table (that I did not design) to convert that data for use in a different application. Here is the truncated table design:
-- Create table
create table order
(
insert_timestamp TIMESTAMP(6) default systimestamp not null,
numeric_identity NUMBER not null,
my_data VARCHAR2(100) not null
)
-- Create/Recreate primary, unique and foreign key constraints
alter table order
add constraint order_pk primary key (numeric_identity, insert_timestamp);
The idea behind this original structure was that the numeric_identity identified a particular customer. The most current order would be the one with the newest insert timestamp value for the given customer's numeric identity. In this particular case, there are no instances where more than one row has the same insert_timestamp value and numeric_identity value.
I'm tasked with retrieving this legacy data for conversion. I wrote the following query to pull back the latest, unique records, as older records need not be converted:
select * from order t where t.insert_timestamp =
(select max(w.insert_timestamp) from order
where t.numeric_identity = w.numeric_identity);
This pulls back the expected dataset, but could fail if there somehow were more than one row with the same insert_timestamp and numeric_identity. Is there a better query than what I've written to pull back unique records in a table designed in this fashion?
Another way to write this query:
select *
from (select t.*, row_number() over (partition by numeric_identity order by insert_timestamp desc) rn
from order t)
where rn = 1
Also, you can't get situation when one row has the same insert_timestamp and numeric_identity, because you have primary key on these two columns.

How do I create a composite foreign key going backwards down a relationship?

Sorry about the title, couldn't think of a better way to write it.
Here's my problem...
I have 2 tables in my database [Drawings] and [Revisions];
[Drawings] 1-----* [Revisions]
ProjectId(pk) ProjectId(pk)(fk)
DrawingNo(pk) DrawingNo(pk)(fk)
RevisionNo(pk)
LatestRevision
There is a foreign key in [revisions] referencing [drawings] on [ProjectId] and [DrawingNo].
I need to implement a way of enforcing that the drawings latest revision number equals a corresponding revision number in the revisions table:
... WHERE [Drawings].[LatestRevision] NOT IN (
SELECT [RevisionNo]
FROM [Revisions]
WHERE [Drawings].[ProjectId] = [Revisions].[ProjectId]
AND [Drawings].[DrawingNo] = [Revisions].[DrawingNo])
How would I put something like this into a foreign key?
I need this to work on sql server 2008 express onwards.
Thanks in advance for any help!
Schema:
TABLE Drawings
( ProjectId varchar,
DrawingNo varchar,
LatestRevisions varchar,
...other columns
PRIMARY KEY(ProjectId, DrawingNo)
)
TABLE Revisions
( ProjectId varchar,
DrawingNo varchar,
RevisionNo varchar,
...other columns
PRIMARY KEY(ProjectId, DrawingNo, RevisionNo)
FOREIGN KEY(ProjectId, DrawingNo) REFERENCES (Drawings(ProjectId, DrawingNo))
)
Drawing 'A' can have revision '1', and Drawing 'B' can have a different revision '1',
Revision number by itself is not unique
I will take the schema as follows:
TABLE drawings
( projectid integer,
drawingno integer,
latestRevision integer,
primary key (projectid, drawingno)
)
TABLE revisions
( revisionno integer primary key,
projectid integer,
drawingno integer,
foreign key (projectid, drawingno)
references (drawings (projectid, drawingno))
)
In this case, I would issue:
ALTER TABLE drawings
ADD FOREIGN KEY (latestRevision)
REFERENCES (revisions(revisionNo))
This would mean that every revisions.revisionNo is unique and the column drawings.latestRevision is a foreign key that references the primary key of revisions table, that is, revisionNo.
Please let me know if there are any changes to the schema you have.
Also, the foreign key is enforced only if it is referencing a primary key of another table. If revisions.revisionno is not a primary key or if the primary key constraint is disabled on this column, then the ALTER TABLE .. ADD FOREIGN KEY statement is bound to return an error.
The following structures replaces your tables with views and looks similar to what you describe, except it's instead maintained behind the scenes rather than being an explicit foreign key. I don't know what operations you'd want to support on Revisions, at the moment I only support INSERT:
create table dbo._Drawings (
ProjectId int not null,
DrawingId int not null,
constraint PK_Drawings PRIMARY KEY (ProjectID,DrawingID)
)
go
create table dbo._Revisions (
ProjectID int not null,
DrawingID int not null,
RevisionNo int not null,
_PreviousRevision as CASE WHEN RevisionNo > 1 THEN RevisionNo - 1 END persisted,
_NextRevision int null,
constraint PK_Revisions PRIMARY KEY (ProjectID,DrawingID,RevisionNo),
constraint FK_Revisions_Drawings FOREIGN KEY (ProjectID,DrawingID)
references _Drawings (ProjectID,DrawingID),
constraint CK_RevisionNos CHECK (RevisionNo >= 1),
constraint UK_Revisions_Previous UNIQUE (ProjectID,DrawingID,_PreviousRevision),
constraint UK_Revisions_Next UNIQUE (ProjectID,DrawingID,_NextRevision),
constraint FK_Revisions_Previous FOREIGN KEY (ProjectID,DrawingID,_PreviousRevision)
references _Revisions (ProjectID,DrawingID,RevisionNo),
constraint FK_Revisions_Next FOREIGN KEY (ProjectID,DrawingID,_NextRevision)
references _Revisions (ProjectID,DrawingID,RevisionNo)
)
The above two tables are the "backing store" for the data. The _Revisions table ensures that the revision sequences are strictly monotonically increasing from 1. Each row maintains a foreign key to its immediate preceding and succeeding revisions, except the first and last, for which NULLs substitute (but the unique constraints ensure only one of each exists for each ProjectID,DrawingID combination.
create view dbo.Drawings
with schemabinding
as
select
d.ProjectID,
d.DrawingID,
r.RevisionNo as LatestRevision
from
dbo._Drawings d
left join
dbo._Revisions r
on
d.ProjectId = r.ProjectID and
d.DrawingId = r.DrawingID and
r._NextRevision is null
The above view mimics your asked for Drawings table and would be used for any actual data access. If you wanted to enforce an invariant that each drawing must have at least one revision, you could switch the left join to an inner join and make this an indexed view. You'd need to add a trigger to support INSERTs, in much the same way as the below does for Revisions, which then populates both tables.
create view dbo.Revisions
with schemabinding
as
select
ProjectID,
DrawingID,
RevisionNo
from
dbo._Revisions
This view creates the impression that Revisions is as simple as in your query
create trigger T_Revisions_I
on dbo.Revisions
instead of insert
as
;with SplitData as (
select ProjectID,DrawingID,RevisionNo,RevisionNo-1 as Prev, Seq
from inserted cross join (select 1 union all select 2) t(Seq)
)
merge into dbo._Revisions r
using SplitData s
on
r.ProjectID = s.ProjectID and
r.DrawingID = s.DrawingID and
(
(s.Seq = 1 and r.RevisionNo = s.Prev) or
(s.Seq = 2 and r.RevisionNo = s.RevisionNo)
)
when matched and s.Seq = 1
then update set _NextRevision = s.RevisionNo
when not matched and s.Seq = 2
then insert (ProjectID,DrawingID,RevisionNo) values (s.ProjectID,s.DrawingID,s.RevisionNo)
;
And finally, this trigger is responsible for maintaining the _Revisions structure in the way that the constraints I created above require. The trick is that we use a MERGE statement so that at the same time as we insert the new row, we also update the previous row so that it's _NextRevision column is no longer null and references the row that we're inserting.
More triggers can be added to support more advanced usage.

Should you use single table inheritance or multiple tables that are union-ed in a view?

Let's say you have a notes table. The note can be about a particular account, orderline or order.
Notes that are about the account do not apply to any specific orderline or order.
Notes that are about the orderline also apply to the parent order and the account that is attached to the order.
Notes that are on the order also apply to the attached account but not the orderline.
NOTES table
[Id] [int] IDENTITY(1,1) NOT NULL
[NoteTypeId] [smallint] NOT NULL
[AccountId] [int] NULL
[OrderId] [int] NULL
[OrderLineId] [int] NULL,
[Note] [varchar](300) NOT NULL
The idea is that if I view a client I can see all notes that are in someway related. Initially I created a notes table for each of the above and union-ed them in a view.
The problem here comes with editing/deleting a record. Notes can be edited/deleted on the particular item or in a generic notes view of the account or order. This method made that more difficult.
Then I switched to the Single Table Inheritance pattern. My notes table has nullable values for AccountId, OrderId and OrderLineId. I also added the NoteTypeId to identify the record explicitly. Much easier to manage update/delete scenarios.
I have some problems & questions still with this approach.
Integrity - Although complex constraints can be set in SQL and/or in code, most DBAs would not like the STI approach.
The idea of bunch of nulls is debated (although I believe performance in SQL 2008 has improved based on the storage of null values)
A table in a RDBMS does not have to represent an object in code. Normalization in a table doesn't say that the table has to be a unique object. I believe the two previous sentences to be true, what say you?
Discussed some here.
Is an overuse of nullable columns in a database a "code smell"? I'd have to say I agree with Ian but I'd like some opposite views as well.
Although complex constraints can be set in SQL and/or in code, most DBAs would not like the STI approach.
Because you need additional logic (CHECK constraint or trigger) to implement the business rule that a note refers to only one of the entities - account, order, orderline.
It's more scalable to implement a many-to-many table between each entity and the note table.
There's no need for an ALTER TABLE statement to add yet another nullable foreign key (there is a column limit, not that most are likely to reach it)
A single note record can be associated with multiple entities
No impact to existing records if a new entity & many-to-many table is added
It seems the STI would work OK in your case?. If I read your requirements correctly, the entity inheritance would be a chain:
Note <- AccountNote(AccountId) <- AccountAndOrderNote(OrderId) <-AccountAndOrderAndOrderLineNote (OrderLineId)
Integrity:
Surely not an issue? Each of AccountId, OrderId and OrderLineId can be FK'd to their respective tables (or be NULL)
If on the other hand, if you removed AccountId, OrderId and OrderLineId (I'm NOT recommending BTW!) and instead just ObjectId and NoteTypeId, then you couldn't add RI and would have a really messy CASE WHEN type Join.
Performance:
Since you say that AccountId must always be present, I guess it could be non-null, and since OrderLine cannot exist without Order, an Index of (AccountId, OrderId) or (AccountId, OrderId, OrderLineId) seems to make sense (Depending on selectability vs narrowness tradeoffs on the average #OrderLines per Order)
But OMG Ponies is right about messy ALTER TABLEs to extend this to new note types, and the indexing will cause headaches if new notes aren't Account-derived.
HTH
Initially I [created a] separate notes
table for each of the above and
union-ed them in a view.
This makes me wonder if you've considered using multi-table structure without NULLable columns where each note gets a unique ID regardless of type. You could present the data in the 'single table inheritance' (or similar) in a query without using UNION.
Below is a suggested structure. I've changed NoteTypeId to a VARCHAR to make the different types clearer and easier to read (you didn't enumerate the INTEGER values anyhow):
CREATE TABLE Notes
(
Id INTEGER IDENTITY(1,1) NOT NULL UNIQUE,
NoteType VARCHAR(11) NOT NULL
CHECK (NoteType IN ('Account', 'Order', 'Order line')),
Note VARCHAR(300) NOT NULL,
UNIQUE (Id, NoteType)
);
CREATE TABLE AccountNotes
(
Id INTEGER NOT NULL UNIQUE,
NoteType VARCHAR(11)
DEFAULT 'Account'
NOT NULL
CHECK (NoteType = 'account'),
FOREIGN KEY (Id, NoteType)
REFERENCES Notes (Id, NoteType)
ON DELETE CASCADE,
AccountId INTEGER NOT NULL
REFERENCES Accounts (AccountId)
);
CREATE TABLE OrderNotes
(
Id INTEGER NOT NULL UNIQUE,
NoteType VARCHAR(11)
DEFAULT 'Order'
NOT NULL
CHECK (NoteType = 'Order'),
FOREIGN KEY (Id, NoteType)
REFERENCES Notes (Id, NoteType)
ON DELETE CASCADE,
OrderId INTEGER NOT NULL
REFERENCES Orders (OrderId)
);
CREATE TABLE OrderLineNotes
(
Id INTEGER NOT NULL UNIQUE,
NoteType VARCHAR(11)
DEFAULT 'Order line'
NOT NULL
CHECK (NoteType = 'Order line'),
FOREIGN KEY (Id, NoteType)
REFERENCES Notes (Id, NoteType)
ON DELETE CASCADE,
OrderLineId INTEGER NOT NULL
REFERENCES OrderLines (OrderLineId)
);
To present the data in the 'single table inheritance' structure (i.e. all JOINs and no UNIONs):
SELECT N1.Id, N1.NoteType, N1.Note,
AN1.AccountId,
ON1.OrderId,
OLN1.OrderLineId
FROM Notes AS N1
LEFT OUTER JOIN AccountNotes AS AN1
ON N1.Id = AN1.Id
LEFT OUTER JOIN OrderNotes AS ON1
ON N1.Id = ON1.Id
LEFT OUTER JOIN OrderLineNotes AS OLN1
ON N1.Id = OLN1.Id;
Consider that the above structure has full data integrity constraints. To do the same using the 'single table inheritance' structure would require many more CHECK constraints with many, many conditions for nullable columns e.g.
CHECK (
(
AccountId IS NOT NULL
AND OrderId IS NULL
AND OrderLineId IS NULL
)
OR
(
AccountId IS NULL
AND OrderId IS NOT NULL
AND OrderLineId IS NULL
)
OR
(
AccountId IS NULL
AND OrderId IS NULL
AND OrderLineId IS NOT NULL
)
);
CHECK (
(
NoteType = 'Account'
AND AccountId IS NOT NULL
)
OR
(
NoteType = 'Order'
AND OrderId IS NOT NULL
)
OR
(
NoteType = 'Order line'
AND OrdereLineId IS NOT NULL
)
);
etc etc
I'd wager that most application developers using 'single table inheritance' would not be bothered to create these data integrity constraints if it occurred to them to do so at all (that's not meant to sound rude, just a difference in priorities to us who care more about the 'back end' than the 'front end' :)