PostgreSQL how to delete duplicated values - sql

I have a table in my Postgres database where I forgot to insert a unique index. because of that index that i have now duplicated values. How to remove the duplicated values? I want to add a unique index on the fields translationset_Id and key.

I think you are asking for this:
DELETE FROM tablename
WHERE id IN (SELECT id
FROM (SELECT id,
ROW_NUMBER() OVER (partition BY column1, column2, column3 ORDER BY id) AS rnum
FROM tablename) t
WHERE t.rnum > 1);

It appears that you only want to delete records which are duplicate with regard to the translationset_id column. In this case, we can use Postgres' row number functionality to discern between duplicate rows, and then to delete those duplicates.
WITH cte AS
(
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY translationset_id, key) AS rnum
FROM yourTable t
)
DELETE FROM yourTable
WHERE translationset_id IN (SELECT translationset_id FROM cte WHERE rnum > 1)

I think the most efficient way to do this is below.
DELETE FROM
table_name a
USING table_name b
WHERE
a.id < b.id and
a.same_column = b.same_column;

delete from mytable
where exists (select 1
from mytable t2
where t2.name = mytable.name and
t2.address = mytable.address and
t2.zip = mytable.zip and
t2.ctid > mytable.ctid
);

Related

Oracle: UPDATE with ORDER BY [duplicate]

I want to populate a table column with a running integer number, so I'm thinking of using ROWNUM. However, I need to populate it based on the order of other columns, something like ORDER BY column1, column2. That is, unfortunately, not possible since Oracle does not accept the following statement:
UPDATE table_a SET sequence_column = rownum ORDER BY column1, column2;
Nor the following statement (an attempt to use WITH clause):
WITH tmp AS (SELECT * FROM table_a ORDER BY column1, column2)
UPDATE tmp SET sequence_column = rownum;
So how do I do it using an SQL statement and without resorting to cursor iteration method in PL/SQL?
This should work (works for me)
update table_a outer
set sequence_column = (
select rnum from (
-- evaluate row_number() for all rows ordered by your columns
-- BEFORE updating those values into table_a
select id, row_number() over (order by column1, column2) rnum
from table_a) inner
-- join on the primary key to be sure you'll only get one value
-- for rnum
where inner.id = outer.id);
OR you use the MERGE statement. Something like this.
merge into table_a u
using (
select id, row_number() over (order by column1, column2) rnum
from table_a
) s
on (u.id = s.id)
when matched then update set u.sequence_column = s.rnum
UPDATE table_a
SET sequence_column = (select rn
from (
select rowid,
row_number() over (order by col1, col2)
from table_a
) x
where x.rowid = table_a.rowid)
But that won't be very fast and as Damien pointed out, you have to re-run this statement each time you change data in that table.
First Create a sequence :
CREATE SEQUENCE SEQ_SLNO
START WITH 1
MAXVALUE 999999999999999999999999999
MINVALUE 1
NOCYCLE
NOCACHE
NOORDER;
after that Update the table using the sequence:
UPDATE table_name
SET colun_name = SEQ_SLNO.NEXTVAL;
A small correction just add AS RN :
UPDATE table_a
SET sequence_column = (select rn
from (
select rowid,
row_number() over (order by col1, col2) AS RN
from table_a
) x
where x.rowid = table_a.rowid)

How to get duplicate text values from SQL query

I have to get table only with duplicate text values using SQL query. I have used Having count(columnname) > 1 but I'm not getting result, only with duplicate values instead getting all values.
Can anyone suggest whether I have to add anything to my query?
Thanks.
Use the below query. mention the column which is getting duplicated in the patition by clause..
with CTE_1
AS
(SELECT *,COUNT(1) OVER(PARTITION BY LTRIM(RTRIM(REPLACE(yourDuplicateColumn,' ',''))) Order by -anycolunm- ) cnt
FROM YourTable
)
SELECT *
FROM CTE_1
WHERE cnt>1
Assuming id is a primary key
select *
from myTable t1
where exists (select 1
from myTable t2
where t2.text = t1.text and t2.id != t1.id)
You can use similar to following query:
SELECT
column1, COUNT(*)
FROM table
GROUP BY column1
HAVING COUNT(*) > 1

Oracle: Why I cannot rely on ROWNUM in a delete clause

I have a such statement:
SELECT MIN(ROWNUM) FROM my_table
GROUP BY NAME
HAVING COUNT(NAME) > 1);
This statement gives me the rownum of the first duplicate, but when transform this statement into DELETE it just delete everything. Why does it happen so?
This is because ROWNUM is a pseudo column which implies that they do not exist physically. You can better use rowid to delete the records.
To remove the duplicates you can try like this:
DELETE FROM mytable a
WHERE EXISTS( SELECT 1 FROM mytable b
WHERE a.id = b.id
AND a.name = b.name
AND a.rowid > b.rowid )
Using rownum to delete duplicate records makes not much sense. If you need to delete duplicate rows, leaving only one row for each value of name, try the following:
DELETE FROM mytable
WHERE ROWID IN (SELECT ID
FROM (SELECT ROWID ID, ROW_NUMBER() OVER
(PARTITION BY name ORDER BY name) numRows FROM mytable
)
WHERE numRows > 1)
By adding further columns in ORDER BY clause, you can choice to delete the record with greatest/smallest ID, or some other field.

Delete duplicate rows not based on primary key

I have this table in my database:
tblAgencies
----------------------
AgencyID (PK)
VendorID
RegionID
Name
Zip
Long story short, I accidentally copied my entire table into itself - so every row in my table has a duplicate.
But with my AgencyID field being the identity, and automatically incrementing, I need to find duplicates based on all the other fields, since AgencyID is unique.
Does anyone know how I can do this?
This will keep the oldest AgencyID values, and delete any duplicates otherwise.
;WITH x AS
(
SELECT *, rn = ROW_NUMBER() OVER
(PARTITION BY VendorID, RegionID, Name, Zip
ORDER BY AgencyID) FROM dbo.tblAgencies
)
DELETE x WHERE rn > 1;
Be careful, though; this may not work if other tables reference AgencyID and they've obtained any of your newer, erroneous values.
The simplest solution, use select distinct into a temp table, then reload the original
This query will give you duplicates provided that the combination of all other columns is unique:
select * from mytable t1
where exists
(select * from mytable t2
where t1.VendorID = t2.VendorID
and t1.RegionID = t2.RegionID
and and t1.Name = t2.Name
and t1.Zip = t2.Zip
and t1.AgencyID > t2.AgencyID)
This should give you all the rows that have duplicate values except for the minimum agencyid row.
select *
from tblAgencies
where AgencyID not in (select min(AgencyID)
from tblAgencies
group by VendorID, RegionID, Name, Zip)
edit: adding SQLFiddle
;with CTE
AS
(
SELECT ID_Column, rn = ROW_NUMBER() OVER (PARTITION BY Column1, Column2, Column3... ORDER BY ID ASC)
FROM T
)
DELETE FROM CTE
WHERE rn >= 2
;with CTE
AS
(SELECT MAX(AgencyID) AgentID,VendorID ,
RegionID ,
Name ,
Zip FROM tblAgencies
GROUP BY VendorID ,
RegionID ,
Name ,
Zip
HAVING COUNT(*) > 1)
DELETE FROM tblAgencies WHERE EXISTS (SELECT 1 FROM CTE
WHERE AgentID = tblAgencies.AgencyID)
Lots of answers that will give you what you want here, but there's no need to use a CTE or do any grouping, the simplest way is just:
delete t1
from tblAgencies t1
join tblAgencies t2
on t1.VendorId = t2.VendorId
and t1.RegionId = t2.RegionId
and t1.Name = t2.Name
and t1.Zip = t2.Zip
and t1.AgencyId > t2.AgencyId
Maybe this will help: How to delete duplicates in the presence of a primary key?

Oracle: Updating a table column using ROWNUM in conjunction with ORDER BY clause

I want to populate a table column with a running integer number, so I'm thinking of using ROWNUM. However, I need to populate it based on the order of other columns, something like ORDER BY column1, column2. That is, unfortunately, not possible since Oracle does not accept the following statement:
UPDATE table_a SET sequence_column = rownum ORDER BY column1, column2;
Nor the following statement (an attempt to use WITH clause):
WITH tmp AS (SELECT * FROM table_a ORDER BY column1, column2)
UPDATE tmp SET sequence_column = rownum;
So how do I do it using an SQL statement and without resorting to cursor iteration method in PL/SQL?
This should work (works for me)
update table_a outer
set sequence_column = (
select rnum from (
-- evaluate row_number() for all rows ordered by your columns
-- BEFORE updating those values into table_a
select id, row_number() over (order by column1, column2) rnum
from table_a) inner
-- join on the primary key to be sure you'll only get one value
-- for rnum
where inner.id = outer.id);
OR you use the MERGE statement. Something like this.
merge into table_a u
using (
select id, row_number() over (order by column1, column2) rnum
from table_a
) s
on (u.id = s.id)
when matched then update set u.sequence_column = s.rnum
UPDATE table_a
SET sequence_column = (select rn
from (
select rowid,
row_number() over (order by col1, col2)
from table_a
) x
where x.rowid = table_a.rowid)
But that won't be very fast and as Damien pointed out, you have to re-run this statement each time you change data in that table.
First Create a sequence :
CREATE SEQUENCE SEQ_SLNO
START WITH 1
MAXVALUE 999999999999999999999999999
MINVALUE 1
NOCYCLE
NOCACHE
NOORDER;
after that Update the table using the sequence:
UPDATE table_name
SET colun_name = SEQ_SLNO.NEXTVAL;
A small correction just add AS RN :
UPDATE table_a
SET sequence_column = (select rn
from (
select rowid,
row_number() over (order by col1, col2) AS RN
from table_a
) x
where x.rowid = table_a.rowid)