Delete subset of a table based on temp table

Delete subset of a table based on temp table - sql

I have a table, say myTable. I also have a temp table, say myTableTemp, that contains the exact values I want to keep eliminate from myTable (myTable has more value than I need).
I was initially thinking I could drop myTable, and then rename myTableTemp to myTable`. However there are many FK contraints that I do not want to touch. In theory, my query would look like:
DELETE FROM myTable where in (myTableTemp);
At least logically that is how i think about it
EDIT: The temp table contains the data I want to DELETE from myTable

DELETE FROM myTable where in (myTableTemp);
Isn't the above backwards? Don't you want to keep all the values in myTableTemp?
I would do the following:
DELETE FROM myTable t1
WHERE NOT EXISTS ( SELECT 1 FROM myTableTemp t2
WHERE t2.primary_key = t1.primary_key );
Again, that's assuming that you want to keep everything in myTableTemp and delete everything in myTable that isn't in myTableTemp.

As an alternate solution to eliminate from myTable items present in myTableTemp:
DELETE FROM myTable
WHERE primary_key IN ( SELECT primary_key FROM myTableTemp )
;
It is usually believed that [NOT] EXISTS queries perform better than those using [NOT] IN. But it is not always that obvious.

Related

Removing duplicates and keeping one copy

I have been going through the threads about removing duplicates from a table and keeping one copy .I have seen an illustration in the case one have a table with composite key.anyone with the idea ?
table contr with composite key checkno,salary_month,sal_year
delete (select * from CONTR t1
INNER JOIN
(select CHECKNO, SALARY_YEAR,SALARY_MONTH FROM CONTR
group by CHECKNO, SALARY_YEAR,SALARY_MONTH HAVING COUNT(*) > 1) dupes
ON
t1.CHECKNO = dupes.CHECKNO AND
t1.SALARY_YEAR= dupes.SALARY_YEAR AND
t1.SALARY_MONTH=dupes.SALARY_MONTH);
I expected one duplicate to be removed and one maintained.

You can use this query below to remove duplicates by using rowid as having a unique valued column :
delete contr t1
where rowid <
(
select max(rowid)
from contr t2
where t2.checkno = t1.checkno
and t2.salary_year = t1.salary_year
and t2.salary_month = t1.salary_month
);
Demo

Another way to achieve this assuming you have dupes with 3 columns you have mentioned is
Create a temp table with distinct values
Drop your table
Rename the temp table
Especially if you are dealing huge volume of data this way would be a lot faster than delete.
If the dup data you are working on is subset of your main table the steps would be
Create a temp table with distinct values
Delete all dup columns from main table
Insert data from temp table to main table
The SQL for the first step would be
create table tmp_CONTR AS
select distinct CHECKNO, SALARY_YEAR,SALARY_MONTH -- this part can be modified to match your needs
from CONTR t1;

visual foxpro 9.0, how to find/get repeated records

I have a table with 2001233 records.
I can use 'Select distinct * from that_table' to get all records not repeated.
Maybe about 2001100 records.
How to get those 133 records into another table if I want to check the records disappeared after 'distinct'.
Another question is
When appending new records from one table to another table, how to check that the appended record is not already in the target table?
thanks for answering my question :)

It would be a hack and slow for 2+ million rows but you can do this:
Select Sys(2017,'',0,3) As crc, * ;
from myTable Into Cursor crsTemp ;
nofilter
Select * From crsTemp ;
where crc In ;
( Select crc From crsTemp;
having Count(*) > 1 ;
group By crc) ;
into Cursor crsDupes ;
nofilter
Select crsDupes
Browse
You should have used primary key from the start.
For your second question, I think it is best to use "insert into" rather than append. i.e.:
Insert into tableA ;
select * from tableB t1 ;
where not exists ( ;
select * from tableA t2 ;
where t1.field1 = t2.field1 and t1.field2 = t2.field2)

Here is another way to find the duplicate records:
Assuming that you don't keep deleted records hanging around...
select tableA
set deleted off
delete all
index on <key expression> to keyfield unique
set deleted on
recall all
browse for deleted
This process will delete all the records, and the recall statement will only apply to the indexed records, leaving the duplicates tagged as deleted.

Try this to check if there is a duplicate record.
SELECT colName, count(*) FROM tblName GROUP BY colName HAVING count(*) > 1

Deleting at most one record for each unique tuple combination

I want to delete at most one record for each unique (columnA, columnB)-tuple in my following delete statement:
DELETE FROM tableA
WHERE columnA IN
(
--some subqueryA
)
AND columnB IN
(
--some subqueryB
)
How is this accomplished? Please only consider those statements that work when used against MSS 2000 (i.e., T-SQL 2000 syntax). I can do it with iterating through a temptable but I want to write it using only sets.
Example:
subqueryA returns 1
subqueryB returns 2,3
If the original table contained
(columnA, columnB, columnC)
5,2,5
1,2,34
1,2,45
1,3,86
Then
1,2,34
1,3,86
should be deleted. Each unique (columnA, columnB)-tuple will appear at most twice in tableA and each time I run my SQL statement I want to delete at most one of these unique combinations - never two.
If there is one record for a given unique (columnA, columnB)-tuple,
delete it.
If there are two records for a given unique (columnA,
columnB)-tuple, delete only one of them.
Delete tabA
from TableA tabA
Where tabA.columnC in (
select max(tabAA.columnC) from TableA tabAA
where tabAA.columnA in (1)
and tabAA.columnB in (2,3)
group by tabAA.columnA,tabAA.columnB
)

How often are you going to be running this that it matters whether you use temp tables or not? Maybe you should consider adding constraints to the table so you only have to do this once...
That said, in all honesty, the best way to do this for SQL Server 2000 is probably to use the #temp table as you're already doing. If you were trying to delete all but one of each dupe, then you could do something like:
insert the distinct rows into a separate table
delete all the rows from the old table
move the distinct rows back into the original table
I've also done things like copy the distinct rows into a new table, drop the old table, and rename the new table.
But this doesn't sound like the goal. Can you show the code you're currently using with the #temp table? I'm trying to envision how you're identifying the rows to keep, and maybe seeing your existing code will trigger something.
EDIT - now with better understood requirements, I can propose the following query. Please test it on a copy of the table first!
DELETE a
FROM dbo.TableA AS a
INNER JOIN
(
SELECT columnA, columnB, columnC = MIN(columnC)
FROM dbo.TableA
WHERE columnA IN
(
-- some subqueryA
SELECT 1
)
AND columnB IN
(
-- some subqueryB
SELECT 2 UNION SELECT 3
)
GROUP BY columnA, columnB
) AS x
ON a.columnA = x.columnA
AND a.columnB = x.columnB
AND a.columnC = x.columnC;
Note that this doesn't confirm that there are exactly one or two rows that match the grouping on columnA and columnB. Also note that if you run this twice it will delete the remaining row that still matches the subquery!

I want to leave always one record if table record count = 1 with SQL

I can delete records with this SQL clause,
DELETE FROM TABLE WHERE ID = 2
I need to always leave one record if table count = 1 even if "ID=2". How can I do this?

Add a WHERE clause to ensure there's more than one row:
DELETE FROM TABLE
WHERE ID = 2
AND (SELECT COUNT(*) FROM TABLE) > 1

Untested, but something in the lines of this might work?
DELETE FROM TABLE WHERE ID = 2 LIMIT (SELECT COUNT(*)-1 FROM TABLE WHERE ID=2);
Maybe add in an if-statement to ensure count is above 1.

Simple way is to disallow any delete that empties the table
CREATE TRIGGER TRG_MyTable_D FOR DELETE
AS
IF NOT EXISTS (SELECT * FROM MyTable)
ROLLBACK TRAN
GO
More complex, what if you do this multirow delete that empties the table?
DELETE FROM TABLE WHERE ID BETWEEN 2 AND 5
so, randomly repopulate from what you just deleted
CREATE TRIGGER TRG_MyTable_D FOR DELETE
AS
IF NOT EXISTS (SELECT * FROM MyTable)
INSERT mytable (col2, col2, ..., coln)
SELECT TOP 1 col2, col2, ..., coln FROM INSERTED --ORDER BY ??
GO
However, the requirement is a bit dangerous and vague. In English, OK, "always have at least one row in the table", but in practice "which row?"

How can I merge two MySQL tables?

How can I merge two MySQL tables that have the same structure?
The primary keys of the two tables will clash, so I have take that into account.

You can also try:
INSERT IGNORE
INTO table_1
SELECT *
FROM table_2
;
which allows those rows in table_1 to supersede those in table_2 that have a matching primary key, while still inserting rows with new primary keys.
Alternatively,
REPLACE
INTO table_1
SELECT *
FROM table_2
;
will update those rows already in table_1 with the corresponding row from table_2, while inserting rows with new primary keys.

It depends on the semantic of the primary key. If it's just autoincrement, then use something like:
insert into table1 (all columns except pk)
select all_columns_except_pk
from table2;
If PK means something, you need to find a way to determine which record should have priority. You could create a select query to find duplicates first (see answer by cpitis). Then eliminate the ones you don't want to keep and use the above insert to add records that remain.

INSERT
INTO first_table f
SELECT *
FROM second_table s
ON DUPLICATE KEY
UPDATE
s.column1 = DO_WHAT_EVER_MUST_BE_DONE_ON_KEY_CLASH(f.column1)

If you need to do it manually, one time:
First, merge in a temporary table, with something like:
create table MERGED as select * from table 1 UNION select * from table 2
Then, identify the primary key constraints with something like
SELECT COUNT(*), PK from MERGED GROUP BY PK HAVING COUNT(*) > 1
Where PK is the primary key field...
Solve the duplicates.
Rename the table.
[edited - removed brackets in the UNION query, which was causing the error in the comment below]

Not as complicated as it sounds....
Just leave the duplicate primary key out of your query....
this works for me !
INSERT INTO
Content(
`status`,
content_category,
content_type,
content_id,
user_id,
title,
description,
content_file,
content_url,
tags,
create_date,
edit_date,
runs
)
SELECT `status`,
content_category,
content_type,
content_id,
user_id,
title,
description,
content_file,
content_url,
tags,
create_date,
edit_date,
runs
FROM
Content_Images

You could write a script to update the FK's for you.. check out this blog: http://multunus.com/2011/03/how-to-easily-merge-two-identical-mysql-databases/
They have a clever script to use the information_schema tables to get the "id" columns:
SET #db:='id_new';
select #max_id:=max(AUTO_INCREMENT) from information_schema.tables;
select concat('update ',table_name,' set ', column_name,' = ',column_name,'+',#max_id,' ; ') from information_schema.columns where table_schema=#db and column_name like '%id' into outfile 'update_ids.sql';
use id_new
source update_ids.sql;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Delete subset of a table based on temp table - sql

As an alternate solution to eliminate from myTable items present in myTableTemp: DELETE FROM myTable WHERE primary_key IN ( SELECT primary_key FROM myTableTemp ) ; It is usually believed that [NOT] EXISTS queries perform better than those using [NOT] IN. But it is not always that obvious.

Related

Removing duplicates and keeping one copy

visual foxpro 9.0, how to find/get repeated records

Deleting at most one record for each unique tuple combination

I want to leave always one record if table record count = 1 with SQL

How can I merge two MySQL tables?

Categories

Resources