Delete duplicate rows teradata - sql

im using this query to get all the duplicate rows :
SELECT count(*),col1, col2 from table GROUP BY col1, col2 having count(*)>1
i tried this query :
DELETE FROM TABLE WHERE (col1, col2) in (SELECT count(*),col1, col2 from table GROUP BY col1, col2
having count(*)>1 )
but it doesnt work because of count(*) in the select statement.
How can i delete all the duplicate rows of this query ?
Thanks

You do this using below query if rowid usage is enabled
delete from table
where row_id not in (
select max(rowid) from table
group by col1,col2 ) TMP
OR
You can copy all data to a new SET table(get rid of duplicates)
Remove all records from main table
Re insert all records from newly created SET table to main table

Related

SQL query to remove duplicates from a table with 139 columns and load all columns to another table

I need to remove the duplicates from a table with 139 columns based on 2 columns and load the unique rows with 139 columns into another table.
eg :
col1 col2 col3 .....col139
a b .............
b c .............
a b .............
o/p:
col1 col2 col3 .....col139
a b .............
b c .............
need a SQL query for DB2?
If the "other table" does not exist yet you can create it like this
CREATE TABLE othertable LIKE originaltable
And the insert the requested row with this statement:
INSERT INTO othertable
SELECT col1,...,coln
FROM (SELECT
t.*,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col1) AS num
FROM t) t
WHERE num = 1
There are numerous tools out there that generate queries and column lists - so if you do not want to write it by hand you could generate it with these tools or use another SQL statement to select it from the Db2 catalog table (syscat.columns).
You might be better just deleting the duplicates in place. This can be done without specifying a column list.
DELETE FROM
( SELECT
ROW_NUMBER() OVER (PARTITION BY col1, col2) AS DUP
FROM t
)
WHERE
DUP > 1
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by a, b order by a) as seqnum
from t
) t;
If you don't want seqnum in the result set, though, you need to list out all the columns.
To find duplicate values in col1 or any column, you can run the following query:
SELECT col1 FROM your_table GROUP BY col1 HAVING COUNT(*) > 1;
And if you want to delete those duplicate rows using the value of col1, you can run the following query:
DELETE FROM your_table WHERE col1 IN (SELECT col1 FROM your_table GROUP BY col1 HAVING COUNT(*) > 1);
You can use the same approach to delete duplicate rows from the table using col2 values.

Inserting unique rows into a table where not exist

I am using postgres 8.4.
I am merging several tables into one. There are duplicates both within and across tables. The new table will have a unique constraint. I have inserted the first table into the new big table without trouble, but when trying to add the second table I get an error. I have tried:
INSERT INTO big_table(id, col1, col2)
SELECT DISTINCT ON (id)
id,
col1,
col2,
FROM table2
WHERE NOT EXISTS(
SELECT id, col1, col2
FROM big_table
WHERE(big_table.id = table2.id))
I get the following error:
invalid reference to FROM-clause entry for table "big_table" LINE
13: ...big_table WHERE(table2.id = big_table.id))
HINT: There is an entry for table "big_tweets", but it cannot be
referenced from this part of the query.
I think it might have something to do with the fact that big_table changes, but I'm not sure how else to exclude rows that already exist in the table.
Not related to your question. But instead you can UNION all the table before create the big table to remove the duplicates.
CREATE big_table as
SELECT id, col1, col2 FROM Table1
UNION
SELECT id, col1, col2 FROM Table2
....
UNION
SELECT id, col1, col2 FROM TableN
You also can use a CTE to solve the self reference problem
WITH cte as (
SELECT DISTINCT ON (id)
id,
col1,
col2,
FROM table2
WHERE NOT EXISTS(
SELECT id, col1, col2
FROM big_table
WHERE(big_table.id = table2.id))
)
INSERT INTO big_table
SELECT *
FROM cte

Insert distinct values from one table into another table

So for each distinct value in a column of one table I want to insert that unique value into a row of another table.
list = select distinct(id) from table0
for distinct_id in list
insert into table1 (id) values (distinct_id)
end
Any ideas as to how to go about this?
Whenever you think about doing something in a loop, step back, and think again. SQL is optimized to work with sets. You can do this using a set-based query without the need to loop:
INSERT dbo.table1(id) SELECT DISTINCT id FROM dbo.table0;
There are some edge cases where looping can make more sense, but as SQL Server matures and more functionality is added, those edge cases get narrower and narrower...
insert into table1 (id)
select distinct id from table0
The following statement works with me.
insert into table1(col1, col2) select distinct on (col1) col1 col2 from table0
The below query will also check the existing data in the Table2.
INSERT INTO Table2(Id) SELECT DISTINCT Id FROM Table1 WHERE Id NOT IN(SELECT Id FROM Table2);
Other Simple way to copy distinct data with multiple columns from one table to other
Insert into TBL2
Select * from (Select COL1, ROW_NUMBER() over(PARTITION BY COL1 Order By COL1) AS COL2 From TBL1)T
where T.COL2 = 1

Efficiently duplicate some rows in PostgreSQL table

I have PostgreSQL 9 database that uses auto-incrementing integers as primary keys. I want to duplicate some of the rows in a table (based on some filter criteria), while changing one or two values, i.e. copy all column values, except for the ID (which is auto-generated) and possibly another column.
However, I also want to get the mapping from old to new IDs. Is there a better way to do it then just querying for the rows to copy first and then inserting new rows one at a time?
Essentially I want to do something like this:
INSERT INTO my_table (col1, col2, col3)
SELECT col1, 'new col2 value', col3
FROM my_table old
WHERE old.some_criteria = 'something'
RETURNING old.id, id;
However, this fails with ERROR: missing FROM-clause entry for table "old" and I can see why: Postgres must be doing the SELECT first and then inserting it and the RETURNING clauses only has access to the newly inserted row.
RETURNING can only refer to the columns in the final, inserted row. You cannot refer to the "OLD" id this way unless there is a column in the table to hold both it and the new id.
Try running this which should work and will show all the possible values that you can get via RETURNING:
INSERT INTO my_table (col1, col2, col3)
SELECT col1, 'new col2 value', col3
FROM my_table AS old
WHERE old.some_criteria = 'something'
RETURNING *;
It won't get you the behavior you want, but should illustrate better how RETURNING is designed to work.
This can be done with the help of data-modifiying CTEs (Postgres 9.1+):
WITH sel AS (
SELECT id, col1, col3
, row_number() OVER (ORDER BY id) AS rn -- order any way you like
FROM my_table
WHERE some_criteria = 'something'
ORDER BY id -- match order or row_number()
)
, ins AS (
INSERT INTO my_table (col1, col2, col3)
SELECT col1, 'new col2 value', col3
FROM sel
ORDER BY id -- redundant to be sure
RETURNING id
)
SELECT s.id AS old_id, i.id AS new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS rn FROM ins) i
JOIN sel s USING (rn);
SQL Fiddle demonstration.
This relies on the undocumented implementation detail that rows from a SELECT are inserted in the order provided (and returned in the order provided). It works in all current versions of Postgres and is not going to break. Related:
Does Postgres preserve insertion order of records?
Window functions are not allowed in the RETURNING clause, so I apply row_number() in another subquery.
More explanation in this related later answer:
INSERT INTO ... FROM SELECT ... RETURNING id mappings
Good! I test this code, but I change
this (FROM my_table AS old) in (FROM my_table) and
this (WHERE old.some_criteria = 'something') in (WHERE some_criteria = 'something')
This is the final code that I use
INSERT INTO my_table (col1, col2, col3)
SELECT col1, 'new col2 value', col3
FROM my_table AS old
WHERE some_criteria = 'something'
RETURNING *;
Thanks!
DROP TABLE IF EXISTS tmptable;
CREATE TEMPORARY TABLE tmptable as SELECT * FROM products WHERE id = 100;
UPDATE tmptable SET id = sbq.id from (select max(id)+1 as id from products) as sbq;
INSERT INTO products (SELECT * FROM tmptable);
DROP TABLE IF EXISTS tmptable;
add another update before the insert to modify another field
UPDATE tmptable SET another = 'data';
'old' is a reserved word, used by the rule rewrite system.
[ I presume this query fragment is not part of a rule; in that case you would have phrased the question differently ]

SQL delete records with same ID, leaving 1

Strange question, I know. I don't want to delete all the rows and start again, but we have a development database table where some of the rows have duplicate IDs, but different values.
I want to delete all records with duplicate IDs, so I can force data integrity on the table for the new version and build relationships. At the moment it's an ID that is inserted and generated by code (legacy).
From another question I got this:
delete
t1
from
tTable t1, tTable t2
where
t1.locationName = t2.locationName and
t1.id > t2.id
But this won't work as the IDs are the same!
How can I delete all but one record where IDs are the same? That is, delete where the count of records with the same ID > 1? If that's not possible, then deleting all records with duplicate IDs would be fine.
In SQL Server 2005 and above:
WITH q AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY locationName ORDER BY id) rn
FROM tTable
)
DELETE
FROM q
WHERE rn > 1
Depends on your DB server, but you can associate DELETE and LIMIT (mysql) or TOP (sql server).
You could also move the first (not duplicate) of each record to a temp table, delete the original table and copy the temp table back to the original one.
Not sure for mysql but for a MSServer database you could use the following
SET IDENTITY_INSERT [tablename] ON
SELECT DISTINCT col1, col2, col3 INTO temp_[tablename] FROM [tablename]
ALTER TABLE temp_[tablename] ADD IDcol INT IDENTITY
TRUNCATE TABLE [tablename]
INSERT INTO [tablename](IDcol, col1, col2, col3) SELECT IDcol, col1, col2, col3 FROM temp_[tablename]
DROP TABLE temp_[tablename]
Hope this helps.