remove rows with some duplicate column value - sql

Suppose I have a table with column A like following :
a
--
x
y
m
x
n
y
I want to delete all rows that have duplicate a column value and keep just one value.
After this operation, my column would be like If you do :
select distinct a from A;
I know how to select rows with repeated a column values But I can't just replace select with DELETE because it would delete the unique values too.
Any help would be greatly appreciated.

In Oracle, you can do this by using the hidden column rowid and a correlated subquery:
delete from a
where rowid > (select min(rowid)
from a a2
where a.a = a2.a
);
Alternatively, you can phrase this as a not in:
delete from a
where rowid not in (select min(rowid)
from a a2
group by a2.a
);

You can use combination of CTE and Ranking function
;With cte As
(
Select ROW_NUMBER() OVER (PARTITION BY colA ORDER BY colA) as rNum
From yourTable
)
Delete From cte
Where rNum<>1

In SQL, You can use CTE and delete the duplicated rows. See the query below.
WITH CTE AS(
SELECT a,
RN = ROW_NUMBER()OVER(PARTITION BY a ORDER BY a)
FROM A
)
DELETE FROM CTE WHERE RN > 1

Related

Update column as Duplicate

I have a table with three columns, A, B, and status.
first, I filter the table to get only duplicate value
using this query
SELECT A
FROM Table_1
GROUP BY A
HAVING COUNT(A) >1
the output :
In the second step, I need to check if column B has a duplicate value or not, if have duplicate I need to update the status as D.
I try this query
UPDATE Table_1
SET status = 'D'
WHERE exists
(SELECT B
FROM Table_1
GROUP BY B
HAVING COUNT(B) >1)
but it is updated all the rows.
The following does what you need using row_number to identify any group with a duplicate and an updateable CTE to check for any row that's part of a group with a duplicate:
with d as (
select *, row_number() over(partition by a,b order by a,b) dn
from t
)
update d set d.status='D'
where exists (select * from d d2 where d2.a=d.a and d2.b=d.b and d2.dn>1)
You can do this with an updatable CTE without any further joins by using a windowed COUNT
WITH d AS (
SELECT *,
cnt = COUNT(*) OVER (PARTITION BY a, b)
FROM t
)
UPDATE d
SET status = 'D'
WHERE cnt > 1;

select N-1 records for update

I have a query where I want to update n-1 records from result set. Can this be done without loops?
If my query is like this:
with cte(id, count)
as
(
select e.id, count(*) as count
from data
where id in (multiple values)
group by id
having count(*) >1
)
Now I want to update the rows in another table with the resulting id's but only any n-1 rows for each id value from the above query. Something like this:
update top( count-1 or n-1) from data2
inner join cte on data2.id = cte.id
set somecolumn = 'some value'
where id in (select id from cte)
The id column is not unique. There are multiple rows with the same id values in table data 2.
This query will do what you want. It uses two CTEs; the first generates the list of eligible id values to update, and the second generates row numbers for id values in data2 which match those in the first CTE. The second CTE is then updated if the row number is greater than 1 (so only n-1 rows get updated):
with cte(id, count) as (
select id, count(*) as count
from data
where id in (2, 3, 4, 6, 7)
group by id
having count(*) >1
),
cte2 as (
select d.id, d.somecolumn,
row_number() over (partition by d.id order by rand()) as rn
from data2 d
join cte on cte.id = d.id
)
update cte2
set somecolumn = 'some value'
where rn > 1
Note I've chosen to order row numbers randomly, you might have some other scheme for deciding which n-1 values you want to update (e.g. ordered by id, or ...).
Is this what you're looking for? The CTE identifies ALL of the source rows, but the WHEREclause in the UPDATE statement limits the updates to n-1.
WITH cte AS
(
SELECT
id,
ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS RowNum
FROM data
)
UPDATE t
SET t.<whatever> = <whateverElse>
FROM
otherTable AS t
JOIN
cte AS c
ON t.id = c.id
WHERE
c.RowNum > 1;
I believe this would work just fine
;with cte(id, count)
as
(
select e.id, count(*) as count
from data
where id in (multiple values)
group by id
having count(*) >1
)
update data
set soemcolumn = 'some value'
from data join cte on cte.id = data.id
;

Oracle: Why I cannot rely on ROWNUM in a delete clause

I have a such statement:
SELECT MIN(ROWNUM) FROM my_table
GROUP BY NAME
HAVING COUNT(NAME) > 1);
This statement gives me the rownum of the first duplicate, but when transform this statement into DELETE it just delete everything. Why does it happen so?
This is because ROWNUM is a pseudo column which implies that they do not exist physically. You can better use rowid to delete the records.
To remove the duplicates you can try like this:
DELETE FROM mytable a
WHERE EXISTS( SELECT 1 FROM mytable b
WHERE a.id = b.id
AND a.name = b.name
AND a.rowid > b.rowid )
Using rownum to delete duplicate records makes not much sense. If you need to delete duplicate rows, leaving only one row for each value of name, try the following:
DELETE FROM mytable
WHERE ROWID IN (SELECT ID
FROM (SELECT ROWID ID, ROW_NUMBER() OVER
(PARTITION BY name ORDER BY name) numRows FROM mytable
)
WHERE numRows > 1)
By adding further columns in ORDER BY clause, you can choice to delete the record with greatest/smallest ID, or some other field.

How to select all columns for rows where I check if just 1 or 2 columns contain duplicate values

I'm having difficulty with what I figure should be an easy problem. I want to select all the columns in a table for which one particular column has duplicate values.
I've been trying to use aggregate functions, but that's constraining me as I want to just match on one column and display all values. Using aggregates seems to require that I 'group by' all columns I'm going to want to display.
If I understood you correctly, this should do:
SELECT *
FROM YourTable A
WHERE EXISTS(SELECT 1
FROM YourTable
WHERE Col1 = A.Col1
GROUP BY Col1
HAVING COUNT(*) > 1)
You can join on a derived table where you aggregate and determine "col" values which are duplicated:
SELECT a.*
FROM Table1 a
INNER JOIN
(
SELECT col
FROM Table1
GROUP BY col
HAVING COUNT(1) > 1
) b ON a.col = b.col
This query gives you a chance to ORDER BY cola in ascending or descending order and change Cola output.
Here's a Demo on SqlFiddle.
with cl
as
(
select *, ROW_NUMBER() OVER(partition by colb order by cola ) as rn
from tbl)
select *
from cl
where rn > 1

how to find the most appears one in a table using sql?

I have a table A with two columns named B and C as following:
('W1','F2')
('W1','F7')
('W2','F1')
('W2','F6')
('W2','F8')
('W4','F7')
('W6','F2')
('W6','F15')
('W7','F1')
('W7','F4')
('W7','F17')
('W8','F13')
How can I find which one in the B column appears with the most time using sql in oracle? (In this case, it's W2 and W7). Thank you!
Use a subquery to calculate the number of items in columC for each value in columnB and rank() the results of the subquery based on that count. Then in your main select return just the values of columnB where the rank of the rows returned by the subquery is 1:
SELECT ColB
FROM (
SELECT ColB,
Count(ColC),
rank() over (ORDER BY Count(ColC) DESC) AS rnk
FROM yourTable
GROUP BY ColB)
WHERE rnk = 1
Here's a sql fiddle: http://sqlfiddle.com/#!4/fa6bd/2
/*
C2 REFERS TO THE COLUMN B
T1 Refers to an alias
*/
WITH T1 AS
(
SELECT C2,COUNT(*) AS COUNT
FROM YOURTABLE
GROUP BY C2
)
SELECT C2,COUNT FROM T1 WHERE COUNT=(SELECT MAX(COUNT) FROM T1 )
;
Select ColB, Count(*)
FROM yourTable
GROUP BY ColB
ORDER BY count(*) desc