Update column as Duplicate - sql

I have a table with three columns, A, B, and status.
first, I filter the table to get only duplicate value
using this query
SELECT A
FROM Table_1
GROUP BY A
HAVING COUNT(A) >1
the output :
In the second step, I need to check if column B has a duplicate value or not, if have duplicate I need to update the status as D.
I try this query
UPDATE Table_1
SET status = 'D'
WHERE exists
(SELECT B
FROM Table_1
GROUP BY B
HAVING COUNT(B) >1)
but it is updated all the rows.

The following does what you need using row_number to identify any group with a duplicate and an updateable CTE to check for any row that's part of a group with a duplicate:
with d as (
select *, row_number() over(partition by a,b order by a,b) dn
from t
)
update d set d.status='D'
where exists (select * from d d2 where d2.a=d.a and d2.b=d.b and d2.dn>1)

You can do this with an updatable CTE without any further joins by using a windowed COUNT
WITH d AS (
SELECT *,
cnt = COUNT(*) OVER (PARTITION BY a, b)
FROM t
)
UPDATE d
SET status = 'D'
WHERE cnt > 1;

Related

select N-1 records for update

I have a query where I want to update n-1 records from result set. Can this be done without loops?
If my query is like this:
with cte(id, count)
as
(
select e.id, count(*) as count
from data
where id in (multiple values)
group by id
having count(*) >1
)
Now I want to update the rows in another table with the resulting id's but only any n-1 rows for each id value from the above query. Something like this:
update top( count-1 or n-1) from data2
inner join cte on data2.id = cte.id
set somecolumn = 'some value'
where id in (select id from cte)
The id column is not unique. There are multiple rows with the same id values in table data 2.
This query will do what you want. It uses two CTEs; the first generates the list of eligible id values to update, and the second generates row numbers for id values in data2 which match those in the first CTE. The second CTE is then updated if the row number is greater than 1 (so only n-1 rows get updated):
with cte(id, count) as (
select id, count(*) as count
from data
where id in (2, 3, 4, 6, 7)
group by id
having count(*) >1
),
cte2 as (
select d.id, d.somecolumn,
row_number() over (partition by d.id order by rand()) as rn
from data2 d
join cte on cte.id = d.id
)
update cte2
set somecolumn = 'some value'
where rn > 1
Note I've chosen to order row numbers randomly, you might have some other scheme for deciding which n-1 values you want to update (e.g. ordered by id, or ...).
Is this what you're looking for? The CTE identifies ALL of the source rows, but the WHEREclause in the UPDATE statement limits the updates to n-1.
WITH cte AS
(
SELECT
id,
ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS RowNum
FROM data
)
UPDATE t
SET t.<whatever> = <whateverElse>
FROM
otherTable AS t
JOIN
cte AS c
ON t.id = c.id
WHERE
c.RowNum > 1;
I believe this would work just fine
;with cte(id, count)
as
(
select e.id, count(*) as count
from data
where id in (multiple values)
group by id
having count(*) >1
)
update data
set soemcolumn = 'some value'
from data join cte on cte.id = data.id
;

Keep Track of already summed tuples sql

If we have a table with values for a and b, is there a way to only add up the b's if its not a duplicate a? For example
a b
1 2
2 3
2 3
so we would get only 5 (instead of 8)
A sort of
select sum(b if unique a),
from table
where ...
The following query selects the lowest value of b for each group a
select min(b) min_b
from mytable
group by a
You can then sum those values by selecting the sum from a derived table
select sum(min_b) from (
select min(b) min_b
from mytable
group by a
) t
http://sqlfiddle.com/#!9/d82c5/1
You haven't specified your RDBMS, but if you are using a database which supporting window functions like SQL Server, you can query the unique rows first by using WITH clause and ROW_NUMBER() function and then get the SUM out of that.
;WITH C AS(
SELECT a, b,
ROW_NUMBER() OVER (PARTITION BY a ORDER BY a) AS Rn
FROM Table1
)
SELECT SUM(b) FROM C
WHERE Rn = 1
SQL Fiddle

remove rows with some duplicate column value

Suppose I have a table with column A like following :
a
--
x
y
m
x
n
y
I want to delete all rows that have duplicate a column value and keep just one value.
After this operation, my column would be like If you do :
select distinct a from A;
I know how to select rows with repeated a column values But I can't just replace select with DELETE because it would delete the unique values too.
Any help would be greatly appreciated.
In Oracle, you can do this by using the hidden column rowid and a correlated subquery:
delete from a
where rowid > (select min(rowid)
from a a2
where a.a = a2.a
);
Alternatively, you can phrase this as a not in:
delete from a
where rowid not in (select min(rowid)
from a a2
group by a2.a
);
You can use combination of CTE and Ranking function
;With cte As
(
Select ROW_NUMBER() OVER (PARTITION BY colA ORDER BY colA) as rNum
From yourTable
)
Delete From cte
Where rNum<>1
In SQL, You can use CTE and delete the duplicated rows. See the query below.
WITH CTE AS(
SELECT a,
RN = ROW_NUMBER()OVER(PARTITION BY a ORDER BY a)
FROM A
)
DELETE FROM CTE WHERE RN > 1

How to get Original Rows filtered by a HAVING Condition?

What is the method in T-SQL to select the orginal values limited by a HAVING attribute. For example, if I have
A|B
10|1
11|2
10|3
How would I get all the values of B (Not An Average or some other summary stat), Grouped by A, having a Count (Occurrences of A) greater than or equal two 2?
Actually, you have several options to choose from
1. You could make a subquery out of your original having statement and join it back to your table
SELECT *
FROM YourTable yt
INNER JOIN (
SELECT A
FROM YourTable
GROUP BY
A
HAVING COUNT(*) >= 2
) cnt ON cnt.A = yt.A
2. another equivalent solution would be to use a WITH clause
;WITH cnt AS (
SELECT A
FROM YourTable
GROUP BY
A
HAVING COUNT(*) >= 2
)
SELECT *
FROM YourTable yt
INNER JOIN cnt ON cnt.A = yt.A
3. or you could use an IN statement
SELECT *
FROM YourTable yt
WHERE A IN (SELECT A FROM YourTable GROUP BY A HAVING COUNT(*) >= 2)
A self join will work:
select B
from table
join(
select A
from table
group by 1
having count(1)>1
)s
using(A);
You can use window function (no joins, only one table scan):
select * from (
select *, cnt=count(*) over(partiton by A) from table
) as a
where cnt >= 2

In SQL in a "group by" expression: how to get the string that occurs most often in a group?

Assume we have the following table:
Id A B
1 10 ABC
2 10 ABC
3 10 FFF
4 20 HHH
As result of a "group by A" expression I want to have the value of the B-Column that occurs most often:
select A, mostoften(B) from table group by A;
A mostoften(B)
10 ABC
20 HHH
How do I achieve this in Oracle 10g?
Remark: in the case of a tie (when there are more than one value that occurs most often) it does not matter which value is selected.
select A, B
from (
select A, B, ROW_NUMBER() OVER (PARTITION BY A ORDER BY C_B DESC) as rn
from (
select A, COUNT (B) as C_B, B
from table
group by A, B
) count_table
) order_table
where rn = 1;
You want the Bs with the MAX of COUNT group by A, B.
Old school solution, it took me some time and some cursing :)
select a,b
from ta ta1
group by a,b
having count(*) = (select max(count(*))
from ta ta2
where ta1.a = ta2.a
group by b)
This problem can be clarified by creating a view for the count in each A & B group:
CREATE VIEW MyTableCounts AS
SELECT A, B, COUNT(*) C
FROM MyTable
GROUP BY A, B;
Now we can do a query that finds the row c1 where the count is greatest. That is, no other row that has the same A has a greater count. Therefore if we try to find a row c2 with a greater count, no match is found.
SELECT c1.A, c1.B
FROM MyTableCounts c1
LEFT OUTER JOIN MyTableCounts c2
ON (c1.A = c2.A AND (c1.C < c2.C OR (c1.C = c2.C AND c1.B < c2.B)))
WHERE c2.A IS NULL
ORDER BY c1.A;
To resolve tied counts (c1.C = c2.C), we use the value of B which we know is unique within a given group of A.
try this (works on SQL Server 2005):
declare #yourtable table (rowid int, a int,b char(3))
insert into #yourtable values (1,10,'ABC')
insert into #yourtable values (2,10,'ABC')
insert into #yourtable values (3,10,'FFF')
insert into #yourtable values (4,20,'HHH')
;WITH YourTableCTE AS
(
SELECT
*, ROW_NUMBER() OVER(partition by A ORDER BY A ASC,CountOfB DESC) AS RowRank
FROM (SELECT
A, B, COUNT(B) AS CountOfB
FROM #yourtable
GROUP BY A,B
) dt
)
SELECT
A,B
FROM YourTableCTE
WHERE RowRank=1
EDIT without CTE...
SELECT
A,B
FROM (SELECT
*, ROW_NUMBER() OVER(partition by A ORDER BY A ASC,CountOfB DESC) AS RowRank
FROM (SELECT
A, B, COUNT(B) AS CountOfB
FROM #yourtable
GROUP BY A,B
) dt
) dt2
WHERE RowRank=1