I have 2 tables, I_786 (100k records) and B_786(700k records). The I_786 table all records must be deleted from B_786 table. Here I table is derived from B table by taking the first 100k records of it by using rownum. After that i need to delete the same records which are in I table.
I created the query as below,
DELETE FROM B_786
WHERE EXISTS (SELECT *
FROM I_786);
But it is deleting all the data from B_786 table. Here the where condition is not working.
How to optimize the where clause here?
You should use some column on which basis you want to delete the record from table B_786, like if there are 2 columns in table B_786 (id name) and having 700k records, and there are
100k records I_786 with columns (id, name). So to delete the data from B_786 table which matches the record with I_786 table.
Delete from B_786 where id in (select id from I_786);
By executing above command data will be deleted from B_786 which matches the id from I_786.
DELETE FROM B_786
WHERE EXISTS (SELECT 1 FROM I_786 where I_786.id = B_786.id);
In your case, you need to use a reference key, a field in one table that refers to the PRIMARY KEY in another table.
Assuming pid as referenced key in table I_786 and fid as referenced key in B_786
Your query will look like this:
DELETE FROM B_786 WHERE B_786.fid IN (SELECT I_786.pid FROM I_786) as temp;
Related
I have a table (A) having 200,000 records. I have created a backup table (B) from the same table (A) which contains around 100,000 records. Now I have to delete all the records which are present in table B from the parent table A.
I am using the below query:
delete from A where id in (select id from B);
The query which I am using is taking a lot of time, the delete is happening very slowly. Could someone please help me in reducing the time taken while deleting the records??
Any help will be appreciated.
How about creating a table with the records you want to keep, dropping the old A table, and renaming the new one back to A?
A query like:
Create table C
Select A.*
From A Left Outer Join B ON A.id=B.id
Where B.id is null
should perform well if id is indexed.
I have a table with duplicate data in one or two columns. I want to delete duplicate data and keep one record only.
I tried the following code, but it deleted all of the data from my table.
DELETE from test del
WHERE EXISTS (
SELECT *
FROM test ex
WHERE ex.name= del.name
);
If there is no primary key, the trick to discriminate duplicate rows is to use ctid, the pseudo-column that identifies the (non-durable) physical location of the row.
Two rows cannot have the same ctid, and ctid can be compared to each other.
The following query adds the condition to discriminate the rows to delete against the one to keep, for each duplicate.
DELETE from test del
WHERE EXISTS (
SELECT *
FROM test ex
WHERE ex.name= del.name
AND ex.ctid > del.ctid
);
If your data has a primary key, then you can do:
delete from test t
where t.pk in (select min(t2.pk) from test t2 group by t2.name);
I have this table, where every column is a VARCHAR (or equivalent):
field001 field002 field003 field004 field005 .... field500
500 VARCHAR columns. No primary keys. And no column is guaranteed to be unique. So the only way to know for sure if two rows are the same is to compare the values of all columns.
(Yes, this should be in TheDailyWTF. No, it's not my fault. Bear with me here).
I inserted a duplicate set of rows by mistake, and I need to find them and remove them.
There's 12 million rows on this table, so I'd rather not recreate it.
However, I do know what rows were mistakenly inserted (I have the .sql file).
So I figured I'd create another table and load it with those. And then I'd do some sort of join that would compare all columns on both tables and then delete the rows that are equal from the first table. I tried a NATURAL JOIN as that looked promising, but nothing was returned.
What are my options?
I'm using Amazon Redshift (so PostgreSQL 8.4 if I recall), but I think this is a general SQL question.
You can treat the whole row as a single record in Postgres (and thus I think in Redshift).
The following works in Postgres, and will keep one of the duplicates
delete from the_table
where ctid not in (select min(ctid)
from the_table
group by the_table); --<< Yes, the group by is correct!
This is going to be slow!
Grouping over so many columns and then deleting with a NOT IN will take quite some time. Especially if a lot of rows are going to be deleted.
If you want to delete all duplicate rows (not keeping any of them), you can use the following:
delete from the_table
where the_table in (select the_table
from the_table
group by the_table
having count(*) > 1);
You should be able to identify all the mistakenly inserted rows using CREATEXID.If you group by CREATEXID on your table as below and get the count you should be able to understand how many rows were inserted in your transaction and remove them using DELETE command.
SELECT CREATEXID,COUNT(1)
FROM yourtable
GROUP BY 1;
One simplistic solution is to recreate the table, e.g.
CREATE TABLE my_temp_table (
-- add column definitions here, just like the original table
);
INSERT INTO my_temp_table SELECT DISTINCT * FROM original_table;
DROP TABLE original_table;
ALTER TABLE my_temp_table RENAME TO original_table;
or even
CREATE TABLE my_temp_table AS SELECT DISTINCT * FROM original_table;
DROP TABLE original_table;
ALTER TABLE my_temp_table RENAME TO original_table;
It is a trick but probably it helps.
Each row in the table containing the transaction ID in which it row was inserted/updated: System Columns. It is xmin column. So using it you can to find the transaction ID in which you inserted the wrong data. Then just delete the rows using
delete from my_table where xmin = <the_wrong_transaction_id>;
PS: Be careful and try it on the some test table first.
I have an Hbase table with 6 columns created using Hive and loaded with data. I need to delete the rows which have the same data in that particular column. I need a HQL command for that.
Here are my table columns:
firstname lastname location id address description
I want to delete the rows with same description.
DELETE is available starting in Hive 0.14.
Deletes can only be performed on tables that support ACID. See Hive Transactions for details.
As for your query try something like
DELETE
FROM TABLE
WHERE TABLE.a IN (SELECT foo FROM B);
OR
DELETE
FROM T1
WHERE EXISTS (SELECT B FROM T2 WHERE T1.X = T2.Y)
How would I delete records from a table where they match a delete table? As in, I have a table of record keys that say what need to be deleted from my main table. How would I write a delete to say "delete anything from my main table where this field matches a field in my delete table?"
If it's a small delete table:
delete from TableA A
where a.key in ( select key from deleteTable );
If it's a bigger table, you can try an EXISTs:
delete from TableA A
where exists ( select * from deleteTable d
where d.key = A.key );
All this depends of course, on your indexes and sizes of tables ... etc ....
(and if it gets really big, you might want to consider another option .. ie partitioning, rebuild table, etc.)