Best way to delete duplicate rows in an intersection table based on foreign key column

Best way to delete duplicate rows in an intersection table based on foreign key column - sql

I'm looking for a way to delete dupicate rows from an intersection table based on a foreign key from one of the tables from the opposite ends of the intersection table. For the sake of clarity, I'll say based on a foreign key value from the intersection table on the left 'tableA'.
CREATE TABLE tableA
(
link varchar(64) NOT NULL PRIMARY KEY,
name varchar(64) NOT NULL
)
CREATE TABLE tableB
(
link varchar(64) NOT NULL PRIMARY KEY,
name varchar(64) NOT NULL
)
CREATE TABLE tableA_AND_tableB--Intersection Table
(
link varchar(64) NOT NULL PRIMARY KEY,
LtableA varchar(64) references tableA(link),
LtableB varchar(64) references tableB(link)
)
Basically, I want to delete all duplicate rows in the intersection table based on the 'LtableA' foreign key field. For instance: Say I have 20 duplicates of 'LtableA = id20140722' in 'tableA_AND_tableB', how do I go about deleting all the rows matching the value 'id20140722' in 'tableA_AND_tableB' without affecting anything else?
Hope my question makes sense.

Delete from tableA_AND_tableB where LtableA = 'id20140722'
This will remove all rows from that table sepcifically with that ID. Alternatively you can see this question for something that will delete all duplicates. Though that answer will keep either the first or last duplicate.

If you want to delete duplicates but still keep one distinct copy of each row:
WITH t AS (
SELECT ROW_NUMBER() OVER(PARTITION BY LtableA, LtableB ORDER BY link) row_num
FROM tableA_AND_tableB
)
DELETE
FROM t
WHERE row_num > 1
AND LtableA = 'id20140722'

Related

postgres sql returns no row

I have 3 followings tables:
CREATE TABLE public.a
(
id bigint NOT NULL DEFAULT nextval('a_id_seq'::regclass),
CONSTRAINT a_pkey PRIMARY KEY (id)
)
CREATE TABLE public.b
(
id bigint NOT NULL DEFAULT nextval('b_id_seq'::regclass),
fkid bigint,
CONSTRAINT b_pkey PRIMARY KEY (id),
CONSTRAINT b_fkid_fkey FOREIGN KEY (fkid)
REFERENCES public.a (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
CREATE TABLE public.history
(
id bigint NOT NULL DEFAULT nextval('history_id_seq'::regclass),
CONSTRAINT history_pkey PRIMARY KEY (id)
)
As you can see, the table A has 1 to many relationship with table B. The history table keeps track of logical time inside the app. The following query:
WITH main_q AS (
SELECT
a.id
FROM a
LEFT JOIN b ON a.id = b.fkid
)
SELECT main_q.id,
max(history.id) as as_of
FROM main_q,history
GROUP BY
main_q.id
Returns no rows, even if tables A contains rows, because History table is empty. However, as soon as I add at least 1 row in history table, the query works.
Could someone please explain how I should modify my query so that it returns rows even if history is empty (as long as table A contains rows) ? I would expect to receive a NULL value in as_of column or 0. I tried COALESCE(max(history.id,0)) but this doesn't work either, well, because no rows in history
EDIT Example:
Table A content:
id
1
2
Table B content:
id, fkid
1, 2
2, 2
Table History content:
<empty>

The issue is the query below
SELECT main_q.id,
max(history.id) as as_of
FROM main_q,history
GROUP BY
main_q.id
Why not try a left join, you may have nothing to join, you can try something below
SELECT main_q.id,
max(history.id) as as_of
FROM main_q left join history on 1=1
GROUP BY
main_q.id

If I am not wrong, you are performing a cartesian product between main_q and history, because you haven't a predicate to join these 2 tables.
Likely, when you are performing this cartesian product, if the second table (history) is empty, the cartesian product returns nothing. As soon as you have 1 record in history, you will get data from Main_q. However, if you have more than 1 record in history, you will start to get duplicate data because of the cartesian product.

Deleting millions of record in bunch in postgresql

I have to delete rows from table that has 120 millions records.
The data that has highest(entry_date) and second highest(entry_date) should not be deleted.
Table has many constraints.
One PRIMARY key
Two FOREIGN keys
and two indexes other than index on primary key.
I have already successfully tried method to delete as creating temp table and moving required data into temp table.
Then dropping the present table and then again moving back filtered data from temp to main table.And it worked fine.
But I need a way to delete records in bunch .
CREATE TABLE values
(
value_id bigint NOT NULL,
content_definition_id bigint NOT NULL,
value_s text,
value_n double precision,
order integer,
scope_id integer NOT NULL,
answer boolean NOT NULL,
date timestamp without time zone NOT NULL,
entry_date timestamp without time zone NOT NULL,
CONSTRAINT "value_PK" PRIMARY KEY (value_id),
CONSTRAINT content_definition_id_fk FOREIGN KEY (content_definition_id)
REFERENCES content_definition (content_definition_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT scope_fk FOREIGN KEY (scope_id)
REFERENCES scopes (scope_id) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT
)
-- Index: fki_content_definition_id_fk
-- Index: fki_value_value_scope_id
How to delete record in bunch like first only 1 million data should be deleted and on.

This assumes you have no conflicting locks. Note that index page locks may slow things down as well.
Recent PostgreSQL allows you to use a CTE in a delete statement. I.e. you can:
WITH ids_to_delete (
SELECT value_id FROM values
where ...
limit ...
)
delete from values where value_id in (select value_id from ids_to_delete)

Can you try merge with conditions of your temp tables and use delete part of it. That should give you a good performance

How do I create a composite foreign key going backwards down a relationship?

Sorry about the title, couldn't think of a better way to write it.
Here's my problem...
I have 2 tables in my database [Drawings] and [Revisions];
[Drawings] 1-----* [Revisions]
ProjectId(pk) ProjectId(pk)(fk)
DrawingNo(pk) DrawingNo(pk)(fk)
RevisionNo(pk)
LatestRevision
There is a foreign key in [revisions] referencing [drawings] on [ProjectId] and [DrawingNo].
I need to implement a way of enforcing that the drawings latest revision number equals a corresponding revision number in the revisions table:
... WHERE [Drawings].[LatestRevision] NOT IN (
SELECT [RevisionNo]
FROM [Revisions]
WHERE [Drawings].[ProjectId] = [Revisions].[ProjectId]
AND [Drawings].[DrawingNo] = [Revisions].[DrawingNo])
How would I put something like this into a foreign key?
I need this to work on sql server 2008 express onwards.
Thanks in advance for any help!
Schema:
TABLE Drawings
( ProjectId varchar,
DrawingNo varchar,
LatestRevisions varchar,
...other columns
PRIMARY KEY(ProjectId, DrawingNo)
)
TABLE Revisions
( ProjectId varchar,
DrawingNo varchar,
RevisionNo varchar,
...other columns
PRIMARY KEY(ProjectId, DrawingNo, RevisionNo)
FOREIGN KEY(ProjectId, DrawingNo) REFERENCES (Drawings(ProjectId, DrawingNo))
)
Drawing 'A' can have revision '1', and Drawing 'B' can have a different revision '1',
Revision number by itself is not unique

I will take the schema as follows:
TABLE drawings
( projectid integer,
drawingno integer,
latestRevision integer,
primary key (projectid, drawingno)
)
TABLE revisions
( revisionno integer primary key,
projectid integer,
drawingno integer,
foreign key (projectid, drawingno)
references (drawings (projectid, drawingno))
)
In this case, I would issue:
ALTER TABLE drawings
ADD FOREIGN KEY (latestRevision)
REFERENCES (revisions(revisionNo))
This would mean that every revisions.revisionNo is unique and the column drawings.latestRevision is a foreign key that references the primary key of revisions table, that is, revisionNo.
Please let me know if there are any changes to the schema you have.
Also, the foreign key is enforced only if it is referencing a primary key of another table. If revisions.revisionno is not a primary key or if the primary key constraint is disabled on this column, then the ALTER TABLE .. ADD FOREIGN KEY statement is bound to return an error.

The following structures replaces your tables with views and looks similar to what you describe, except it's instead maintained behind the scenes rather than being an explicit foreign key. I don't know what operations you'd want to support on Revisions, at the moment I only support INSERT:
create table dbo._Drawings (
ProjectId int not null,
DrawingId int not null,
constraint PK_Drawings PRIMARY KEY (ProjectID,DrawingID)
)
go
create table dbo._Revisions (
ProjectID int not null,
DrawingID int not null,
RevisionNo int not null,
_PreviousRevision as CASE WHEN RevisionNo > 1 THEN RevisionNo - 1 END persisted,
_NextRevision int null,
constraint PK_Revisions PRIMARY KEY (ProjectID,DrawingID,RevisionNo),
constraint FK_Revisions_Drawings FOREIGN KEY (ProjectID,DrawingID)
references _Drawings (ProjectID,DrawingID),
constraint CK_RevisionNos CHECK (RevisionNo >= 1),
constraint UK_Revisions_Previous UNIQUE (ProjectID,DrawingID,_PreviousRevision),
constraint UK_Revisions_Next UNIQUE (ProjectID,DrawingID,_NextRevision),
constraint FK_Revisions_Previous FOREIGN KEY (ProjectID,DrawingID,_PreviousRevision)
references _Revisions (ProjectID,DrawingID,RevisionNo),
constraint FK_Revisions_Next FOREIGN KEY (ProjectID,DrawingID,_NextRevision)
references _Revisions (ProjectID,DrawingID,RevisionNo)
)
The above two tables are the "backing store" for the data. The _Revisions table ensures that the revision sequences are strictly monotonically increasing from 1. Each row maintains a foreign key to its immediate preceding and succeeding revisions, except the first and last, for which NULLs substitute (but the unique constraints ensure only one of each exists for each ProjectID,DrawingID combination.
create view dbo.Drawings
with schemabinding
as
select
d.ProjectID,
d.DrawingID,
r.RevisionNo as LatestRevision
from
dbo._Drawings d
left join
dbo._Revisions r
on
d.ProjectId = r.ProjectID and
d.DrawingId = r.DrawingID and
r._NextRevision is null
The above view mimics your asked for Drawings table and would be used for any actual data access. If you wanted to enforce an invariant that each drawing must have at least one revision, you could switch the left join to an inner join and make this an indexed view. You'd need to add a trigger to support INSERTs, in much the same way as the below does for Revisions, which then populates both tables.
create view dbo.Revisions
with schemabinding
as
select
ProjectID,
DrawingID,
RevisionNo
from
dbo._Revisions
This view creates the impression that Revisions is as simple as in your query
create trigger T_Revisions_I
on dbo.Revisions
instead of insert
as
;with SplitData as (
select ProjectID,DrawingID,RevisionNo,RevisionNo-1 as Prev, Seq
from inserted cross join (select 1 union all select 2) t(Seq)
)
merge into dbo._Revisions r
using SplitData s
on
r.ProjectID = s.ProjectID and
r.DrawingID = s.DrawingID and
(
(s.Seq = 1 and r.RevisionNo = s.Prev) or
(s.Seq = 2 and r.RevisionNo = s.RevisionNo)
)
when matched and s.Seq = 1
then update set _NextRevision = s.RevisionNo
when not matched and s.Seq = 2
then insert (ProjectID,DrawingID,RevisionNo) values (s.ProjectID,s.DrawingID,s.RevisionNo)
;
And finally, this trigger is responsible for maintaining the _Revisions structure in the way that the constraints I created above require. The trick is that we use a MERGE statement so that at the same time as we insert the new row, we also update the previous row so that it's _NextRevision column is no longer null and references the row that we're inserting.
More triggers can be added to support more advanced usage.

How to constraint one column with values from a column from another table?

This isn't a big deal, but my OCD is acting up with the following problem in the database I'm creating. I'm not used to working with databases, but the data has to be stored somewhere...
Problem
I have two tables A and B.
One of the datafields is common to both tables - segments. There's a finite number of segments, and I want to write queries that connect values from A to B through their segment values, very much asif the following table structure was used:
However, as you can see the table Segments is empty. There's nothing more I want to put into that table, rather than the ID to give other table as foreign keys. I want my tables to be as simple as possible, and therefore adding another one just seems wrong.
Note also that one of these tables (A, say) is actually master, in the sense that you should be able to put any value for segment into A, but B one should first check with A before inserting.
EDIT
I tried one of the answers below:
create table A(
id int primary key identity,
segment int not null
)
create table B(
id integer primary key identity,
segment int not null
)
--Andomar's suggestion
alter table B add constraint FK_B_SegmentID
foreign key (segment) references A(segment)
This produced the following error.
Maybe I was somehow unclear that segments is not-unique in A or B and can appear many times in both tables.
Msg 1776, Level 16, State 0, Line 11 There are no primary or candidate
keys in the referenced table 'A' that match the referencing column
list in the foreign key 'FK_B_SegmentID'. Msg 1750, Level 16, State 0,
Line 11 Could not create constraint. See previous errors.

You can create a foreign key relationship directly from B.SegmentID to A.SegmentID. There's no need for the extra table.
Update: If the SegmentIDs aren't unique in TableA, then you do need the extra table to store the segment IDs, and create foreign key relationships from both tables to this table. This however is not enough to enforce that all segment IDs in TableB also occur in TableA. You could instead use triggers.

You can ensure the segment exists in A with a foreign key:
alter table B add constraint FK_B_SegmentID
foreign key (SegmentID) references A(SegmentID)
To avoid rows in B without a segment at all, make B.SegmentID not nullable:
alter table B alter column SegmentID int not null
There is no need to create a Segments table unless you want to associate extra data with a SegmentID.

As Andomar and Mark Byers wrote, you don't have to create an extra table.
You can also CASCADE UPDATEs or DELETEs on the master. Be very carefull with ON DELETE CASCADE though!
For queries use a JOIN:
SELECT *
FROM A
JOIN B ON a.SegmentID = b.SegmentID
Edit:
You have to add a UNIQUE constraint on segment_id in the "master" table to avoid duplicates there, or else the foreign key is not possible. Like this:
ALTER TABLE A ADD CONSTRAINT UNQ_A_SegmentID UNIQUE (SegmentID);

If I've understood correctly, a given segment cannot be inserted into table B unless it has also been inserted into table A. In which case, table A should reference table Segments and table B should reference table A; it would be implicit that table B ultimately references table Segments (indirectly via table A) so an explicit reference is not required. This could be done using foreign keys (e.g. no triggers required).
Because table A has its own key I assume a given segment_ID can appear in table A more than once, therefore for B to be able to reference the segment_ID value in A then a superkey would need to be defined on the compound of A_ID and segment_ID. Here's a quick sketch:
CREATE TABLE Segments
(
segment_ID INTEGER NOT NULL UNIQUE
);
CREATE TABLE A
(
A_ID INTEGER NOT NULL UNIQUE,
segment_ID INTEGER NOT NULL
REFERENCES Segments (segment_ID),
A_data INTEGER NOT NULL,
UNIQUE (segment_ID, A_ID) -- superkey
);
CREATE TABLE B
(
B_ID INTEGER NOT NULL UNIQUE,
A_ID INTEGER NOT NULL,
segment_ID INTEGER NOT NULL,
FOREIGN KEY (segment_ID, A_ID)
REFERENCES A (segment_ID, A_ID),
B_data INTEGER NOT NULL
);

Set based insert into two tables with 1 to 0-1 relation

I have two tables, the first has a primary key that is an identity, the second has a primary key that is not, but that key has a foreign key constraint back to the first table's primary key.
If I am inserting one record at a time I can use the Scope_Identity to get the value for the pk just inserted in table 1 that I want to insert into the second table.
My problem is I have many records coming from selects I want to insert in both tables, I've not been able to think of a set based way to do these inserts.
My current solution is to use a cursor, insert in the first table, get key using scope_identity, insert into second table, repeat.
Am I missing a non-cursor solution?

Yes, Look up the output clause in Books online.

I had this problem just this week: someone had introduced a table with a meaningless surrogate key into the schema where naturally keys are used. No doubt I'll fix this soon :) until then, I'm working around it by creating a table of data to INSERT from: this could be a permanent or temporary base table or a derived table (see below), which should suit your desire for a set-based solution anyhow. Use a join between this table and the table with the IDENTITY column on the natural key to find out the auto-generated values. Here's a brief example:
CREATE TABLE Test1
(
surrogate_key INTEGER IDENTITY NOT NULL UNIQUE,
natural_key CHAR(10) NOT NULL CHECK (natural_key NOT LIKE '%[^0-9]%') UNIQUE
);
CREATE TABLE Test2
(
surrogate_key INTEGER NOT NULL UNIQUE
REFERENCES Test1 (surrogate_key),
data_col INTEGER NOT NULL
);
INSERT INTO Test1 (natural_key)
SELECT DT1.natural_key
FROM (
SELECT '0000000001', 22
UNION ALL
SELECT '0000000002', 55
UNION ALL
SELECT '0000000003', 99
) AS DT1 (natural_key, data_col);
INSERT INTO Test2 (surrogate_key, data_col)
SELECT T1.surrogate_key, DT1.natural_key
FROM (
SELECT '0000000001', 22
UNION ALL
SELECT '0000000002', 55
UNION ALL
SELECT '0000000003', 99
) AS DT1 (natural_key, data_col)
INNER JOIN Test1 AS T1
ON T1.natural_key = DT1.natural_key;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas