deleting multiple records from two tables - sql

I have two tables called TableA and TableB.
TableA has the following fields:
TableA_ID
FileName
TableB has the following fields:
TableB_ID
TableA_ID
CreationDate
There is a foreign key link between the two tables on the TableA_ID field
I need to delete records from both tables. I need to look at the “CreationDate” on TableB and if it’s after a certain date, delete that record. I will also need to delete the record in TableA with the same TableA_ID as the record in TableB
There may be several records in TableB that use the TableA_ID (a one to many relationship). So I can’t delete the record in TableA if entries in TableB still use it.
I know this can’t be done in a single statement but am happy to do it in a transaction. The problem I have is I’m not sure how to do this. I’m using MS SQL server 2008. I don’t want to use triggers if possible.

Can there be records in TableA with no matching record in TableB? If not, then we know after we delete from TableB, we can delete any non-matching records in TableA:
begin transaction
delete from TableB
where CreationDate > #SomeDate
delete from TableA
where TableA_ID not in (select TableA_ID from TableB)
end transaction
Otherwise:
begin transaction
-- Save the TableA_IDs being deleted:
select distinct TableA_ID
into #TableA_Delete
from Table_B
where CreationDate > #Somedate
-- Depending on the expected size of #TableA_Delete, you may want
-- to create an index here, to speed up the delete from TableA.
delete from TableB
where CreationDate > #SomeDate
delete from TableA
where TableA_id in (select TableA_Id from #TableA_Delete)
and TableA_id not in (select TableA_id from TableB)
commit transaction
NOTE Both above solutions need error handing added.
Also, see NYSystemsAnalyst for another method of storing the IDs temporarily.

Referential integrity should delete the child rows when the parent is deleted. Make sure you have cascade deletes enabled.
Otherwise you would have to put in two delete statements one for the child records and one for the parent record.
DELETE FROM TableB INNER JOIN TableA ON TableA.TableAID = TableB.TableAID WHERE CreationDate >= SomeDate
DELETE FROM TableA WHERE TableAID=SomeID

Edit: Sorry, I didn't see the part of your question stating the need to keep the TableA entries if they have multiple records in TableB. Use this instead:
/* Table valued variable to hold Table A IDs to be deleted. */
DELCARE #IDs AS table
(
ID int
);
/* Get the TableA IDs that are subject to deletion. */
INSERT INTO #IDs (
ID
)SELECT TableA_ID
FROM TableB
WHERE CreationDate >= #MyCreationDate)
GROUP BY TableA_ID
HAVING (COUNT(TableA_ID) = 1); /* Only get IDs that appear once in TableB */
/* Delete the TableB records. */
DELETE b
FROM TableB AS b
WHERE CreationDate >= #MyCreationDate);
/* Delete the TableA records. */
DELETE a
FROM TableA AS a
INNER JOIN #IDs AS c ON (a.TableA_ID = c.ID);
You should wrap all of this in a transaction to ensure data integrity.

Related

How to loop through rows in two tables and create a new set based on the merged results in SQL

Here is my obstacle.
I have two tables. Table A contains more rows than Table B. I have to merge the results and if Table A does not contain a row from Table B then I insert it into the new set. If however, a row from Table A contains a row with the same primary key as Table B, the new set will take the row from Table B.
Would this best be done in a cursor or is there an easier way to do this? I ask because there are 20 million rows and while I am new to sql, i've heard cursors are expensive.
Your phrasing is a little vague. It seems that you want everything from TableB and then rows from TableA that have no matching primary key in B. The following query solves this problem:
select *
from tableB union all
select *
from tableA
where tableA.pk not in (select pk from tableB)
Yep, cursors are expensive.
There's a MERGE command in later versions of SQL that will do this in one shot, but it's sooo cumbersome. Better to do it in two pieces - first:
UPDATE A SET
field1 = B.field1
,field2 = B.field2
, etc
FROM A JOIN B on B.id = A.id
Then:
INSERT A SELECT * FROM B --enumerate fields if different
WHERE B.id not in (select id FROM A)
An OUTER JOIN should do what you need and be more efficient than a cursor.
Try this query
--first get the rows that match between TableA and TableB
INSERT INTO [new set]
SELECT TableB.* --or columns of your choice
FROM TableA LEFT JOIN TableB ON [matching key criteria]
WHERE TableB.[joining column/PK] IS NOT NULL
--then get the rows from TableA that don't have a match
INSERT INTO [new set]
SELECT TableA.* --you didn't say what was inserted if there was no matching row
FROM TableA LEFT JOIN TableB ON [matching key criteria]
WHERE TableB.[joining column/PK] IS NULL

Batch Delete from tableA based on data on tableB

I'm trying to implement a batch delete.
I found this code on the internet:
DECLARE #rowcount int = 1
WHILE (#rowcount != 0 ) BEGIN
DELETE T1
FROM (SELECT TOP (50) * FROM Orders WHERE OrderCity = #city) T1
SELECT #rowcount = ##ROWCOUNT
END
the idea is to delete all orders from #city
It seems to work fine but on my reality, I need to delete from Orders where OrderCity in (select ID from SomeOtherTable)
If I try to do the same, it works but it takes a lot of time because SomeOtherTable will contain around 1.5 million rows and the data is being deleted from the main table, so it doesnt get any smaller (it does not contains cities, its another thing).
I also cant join both tables because it wont run saying that more than one table will be affected.
So basically my question is: Is there anyway to batch delete from tableA where tableA.ID IN (select ID from tableB)
Yes you can do it without join as:
DELETE tableA
FROM tableB
WHERE tableA.ID = tableB.ID
Delete Order
FROM
Order INNER JOIN SomeOtherTable ON Order.OrderCity = SomeOtherTable.ID
This could solve your problem
You should be able to delete based on a join. Try
DELETE FROM tableA
FROM tableA A
JOIN tableB B ON A.ID = B.ID
Also, if tableB has ~ a million rows, it would really help if you have an index on the ID column.

Delete many records from table A and B with one FK to table B

I have 2 tables: A and B
A contains the following columns:
Id [uniqueIdentifier] -PK
checkpointId [numeric(20,0)]
B contains the following:
Id [uniqueIdentifier] – PK
A_id (FK, uniqueIdentifier)
B has a reference to A from A_id column (FK)
The question:
I want to delete all records from table A that their checkpoint_id is less than X:
delete from CheckpointStorageObject where checkpointIdentifierIdentifier <= 1000
But I can't do it since "The primary key value cannot be deleted because references to this key still exist"
I tried to delete first from B table without a join:
DELETE FROM CheckpointToProtectionGroup
WHERE EXIST (SELECT * from CheckpointStorageObject
WHERE CheckpointStorageObject.CheckpointIdentifierIdentifier <= 1000)
But it didn't work.
How can I do it?
Is it possible to delete from both table with one execute commands?
The resulted deleted records may be very big - more than 30K records in each table.
Try this:
First delete from tableB:
delete from tableB where A_id IN (Select Id from tableA where checkpointId <= 1000)
And then delete from tableA:
delete from tableA where checkpointId <= 1000
You will first have to delete the entries from table B
delete from tableB where A_id IN (Select Id from tableA where checkpointIdentifierIdentifier <= 1000)
Once that is done you can delete from table A, by checking the IDs that are no longer in table B
delete from tableA where Id not in (select A_id from tableB)
Your second query has some flaws:
it's EXISTS and not EXIST
you need to specify the join condition between the 2 tables. In correlated subqueries like this one, you add thi scondition in the WHERE clause
it's also usfeul to have aliases for the tables, to reduce code and make it more readable, especially with such long names
enclose the 2 statements in a transaction so you are sure it either succeeds - and delete from both tables - or fail and delete nothing. If you don't use a transaction, the second delete may not succeed, if in the small time between the 2 deletes, a row is inserted at table B and is refering to a row in table A that you second statement will try to delete.
So, delete first from table B (CheckpointToProtectionGroup):
BEGIN TRANSACTION
DELETE FROM CheckpointToProtectionGroup AS b
WHERE EXISTS --- EXISTS
( SELECT *
FROM CheckpointStorageObject AS a
WHERE a.id = b.A_id --- join condition
AND a.CheckpointId <= 1000
) ;
and then from table A (CheckpointStorageObject):
DELETE FROM CheckpointStorageObject
WHERE CheckpointId <= 1000 ;
COMMIT TRANSACTION ;

SQL Select Into Field

I want to accomplish something of the following:
Select DISTINCT(tableA.column) INTO tableB.column FROM tableA
The goal would be to select a distinct data set and then insert that data into a specific column of a new table.
SELECT column INTO tableB FROM tableA
SELECT INTO will create a table as it inserts new records into it. If that is not what you want (if tableB already exists), then you will need to do something like this:
INSERT INTO tableB (
column
)
SELECT DISTINCT
column
FROM tableA
Remember that if tableb has more columns that just the one, you will need to list the columns you will be inserted into (like I have done in my example).
You're pretty much there.
SELECT DISTINCT column INTO tableB FROM tableA
It's going to insert into whatever column(s) are specified in the select list, so you would need to alias your select values if you need to insert into columns of tableB that aren't in tableA.
SELECT INTO
Try the following...
INSERT INTO tableB (column)
Select DISTINCT(tableA.column)
FROM tableA
The goal would be to select a distinct data set and then insert that data into a specific column of a new table.
I don't know what the schema of tableB is... if table B already exists and there is no unique constraint on the column you can do as any of the others suggest here....
INSERT INTO tableB (column)Select DISTINCT(tableA.column)FROM tableA
but if you have a unique constraint on table B and it already exists you'll have to exclude those values already in table B...
INSERT INTO tableB (column)
Select DISTINCT(tableA.column)
FROM tableA
WHERE tableA.column NOT IN (SELECT /* NOTE */ tableB.column FROM tableB)
-- NOTE: Remember if there is a unique constraint you don't need the more
-- costly form of a "SELECT DISTICT" in this subquery against tableB
-- This could be done in a number of different ways - this is just
-- one version. Best version will depend on size of data in each table,
-- indexes available, etc. Always prototype different ways and measure perf.

Compare unique values from two mysql tables

I have two mysql tables: TableA has 10,000 records TableB has 2,000 records.
I want to copy the 8,000 unique records from TableA into TableB ignoring the 2,000 in TableB which have already been copied.
If uniqueness is determined by PRIMARY KEY constraint or UNIQUE constraint, then you can use INSERT IGNORE:
INSERT IGNORE INTO TableB SELECT * FROM TableA;
The rows that are duplicates and that conflict with rows already in TableB will be silently skipped and the other 8,000 rows should be inserted.
See the docs on INSERT for more details.
If you need to do this in PHP, read about the array_diff_key() function. Store your arrays with the primary key values as the key of the array elements. No guarantees for the performance of this PHP function on such large arrays, though!
Use the INSERT INTO syntax:
INSERT INTO TABLE_B
SELECT *
FROM TABLE_A a
WHERE NOT EXISTS(SELECT NULL
FROM TABLE_B b
WHERE b.column = a.column)
You'll need to update the WHERE b.column = a.column) to satisfy however you determine that a record already exists in TABLE_B.
What about something like this :
insert into TableB
select *
from Table A
where not exists (
select 1
from TableB
where TableB.id = TableA.id
)
Or, if the entries in table B are "not unique" because of their primary key, an insert ignore might do the trick, I suppose.