Remove and Identify Duplicate Records in SQL

Remove and Identify Duplicate Records in SQL - sql

I have some sample records below that I need to use the CASE WHEN statement to remove and identify the duplicate records in SQL.
Quantity Values Desc event ID
1 5 Blue 12550 577
1 5 bluee 12550 525
2 10 blu 12550 535
i would like to use a case statement to show the duplicate indicators such as:
Dup_Quantity Dup_Value Dup_Desc Quantity Values Desc event ID
Y Y N 1 5 Blue 12550 577
Y Y N 1 5 Bluee 12550 525
however, after using this script, the result still shows as:
Dup_Quantity Dup_Value Dup_Desc Quantity Values Desc event ID
Y Y N 1 5 Blue 12550 577
Y Y N 1 5 Bluee 12550 525
Y N N 2 10 Blu 12550 535
SELECT DISTINCT
CASE WHEN a.Quantity = b.Quantity THEN 'Y' ELSE 'N' END AS "Dup_Quantity",
CASE WHEN a.Values = b.Values THEN 'Y' ELSE 'N' END AS "Dup_Value",
CASE WHEN a.Desc = b.Desc THEN 'Y' ELSE 'N' END AS "Dup_Desc"
FROM Table1 a
INNER JOIN Table1 b ON a.event = b.event
WHERE (a.Quantity = b.Quantity OR a.Values = b.Values OR a.Desc = b.Desc)
AND a.ID <> b.ID
Basically, record with ID 535 stills shows up in the result. Would someone Please give me some pointers?

SQL Fiddle
MS SQL Server 2012 Schema Setup:
CREATE TABLE Table1
([Quantity] int, [Values] int, [Desc] varchar(5), [event] int, [ID] int)
;
INSERT INTO Table1
([Quantity], [Values], [Desc], [event], [ID])
VALUES
(1, 5, 'Blue', 12550, 577),
(1, 5, 'bluee', 12550, 525),
(2, 10, 'blu', 12550, 535)
;
Query 1:
SELECT
CASE WHEN (SELECT COUNT(*)
FROM Table1 t2
WHERE t1.Quantity = t2.Quantity AND
t1.ID <> t2.ID AND t1.event = t2.event) > 0
THEN 'Y' ELSE 'N' END AS Dup_Quantity,
CASE WHEN (SELECT COUNT(*)
FROM Table1 t2
WHERE t1."Values" = t2."Values" AND
t1.ID <> t2.ID AND t1.event = t2.event) > 0
THEN 'Y' ELSE 'N' END AS Dup_Value,
CASE WHEN (SELECT COUNT(*)
FROM Table1 t2
WHERE t1."Desc" = t2."Desc" AND
t1.ID <> t2.ID AND t1.event = t2.event) > 0
THEN 'Y' ELSE 'N' END AS Dup_Desc,
*
FROM Table1 t1
WHERE
(SELECT COUNT(*)
FROM Table1 t2
WHERE t1.Quantity = t2.Quantity AND
t1.ID <> t2.ID AND t1.event = t2.event) > 0
OR
(SELECT COUNT(*)
FROM Table1 t2
WHERE t1."Values" = t2."Values" AND
t1.ID <> t2.ID AND t1.event = t2.event) > 0
OR
(SELECT COUNT(*)
FROM Table1 t2
WHERE t1."Desc" = t2."Desc" AND
t1.ID <> t2.ID AND t1.event = t2.event) > 0
Results:
| DUP_QUANTITY | DUP_VALUE | DUP_DESC | QUANTITY | VALUES | DESC | EVENT | ID |
|--------------|-----------|----------|----------|--------|-------|-------|-----|
| Y | Y | N | 1 | 5 | Blue | 12550 | 577 |
| Y | Y | N | 1 | 5 | bluee | 12550 | 525 |

Your query returns:
Dup_Quantity Dup_Value Dup_Desc
Y Y N
However I don't get what you want to do here, the correct version of it is:
SELECT
CASE WHEN a."Quantity" = b."Quantity" THEN 'Y' ELSE 'N' END AS "Dup_Quantity",
CASE WHEN a."Values" = b."Values" THEN 'Y' ELSE 'N' END AS "Dup_Value",
CASE WHEN a."Desc" = b."Desc" THEN 'Y' ELSE 'N' END AS "Dup_Desc",
a.*
FROM Table1 a
INNER JOIN Table1 b ON b.event = a.event
WHERE (a."Quantity" = b."Quantity" OR a."Values" = b."Values" OR a."Desc" = b."Desc")
AND a.ID <> b.ID
If you want to get duplicated rows in terms of Quantity, Values and Desc:
SELECT
a.*
FROM Table1 a
INNER JOIN Table1 b ON b.event = a.event
WHERE (a."Quantity" = b."Quantity" AND a."Values" = b."Values" AND a."Desc" = b."Desc")
AND a.ID <> b.ID

Related

Getting the wrong values after SUM() query

I'm executing the following query:
SELECT
MAX(table1.id) as id,
clients.Username as account,
table1.clientid,
SUM(table1.symbols) as symbols,
SUM(table1.tickets) as tickets,
SUM(table1.cash) as cash,
(SUM(CASE WHEN table2.memo = 'Withdraw' THEN amount ELSE 0 END)) AS withdraw,
(SUM(CASE WHEN table2.memo = 'Depos' THEN amount ELSE 0 END)) AS depos,
FROM
table1
LEFT JOIN
(
clients
LEFT JOIN
table2
ON
clients.Fidx = table2.clientid
AND
table2.date >= '01-09-2016'
AND
table2.date <= '01-09-2017'
)
ON
clients.Fidx = table1.clientid
WHERE
table1.tradedate >= '01-09-2016'
AND
table1.tradedate <= '01-09-2017'
GROUP BY
table1.clientid, clients.Username, table2.clientid
ORDER BY
clients.Username;
And I want to get a simple result table combined of three tables:
+---------+--------+---------+
| account |withdraw| depos |
+---------+--------+---------+
| adaf | 300 | 0 |
| rich | 1000 | 355 |
| call | 0 | 45 |
| alen | 0 | 0 |
| courney| 0 | 106 |
| warren | 0 | 0 |
+---------+--------+---------+
What's the problem? - I'm getting the wrong values in the result table. Exactly in withdraw and depos. They're in 4 times more, than they should be. For example, for some client SUM(depos) should be 500, but in my result table this value gets 2000. I guess, the problem is in a GROUP BY method, cause when I'm executing the following query, the result seems to look OK:
SELECT clientid, SUM(case when memo = 'Withdraw' then amount else 0 end) as withdraw, SUM(case when memo = 'Depos' then amount else 0 end) as depos
from clients
LEFT JOIN
table2
ON
clients.Fidx = table2.clientid
WHERE table2.date >= '01-09-2016' and table2.date<='01-09-2017' GROUP BY clientid ORDER BY clientid;
What can be a reason of such a wrong result? I'm in trouble and need your help, guys.

You pretty much answered this yourself with the bottom query. You need to do the summing before the join, in a derived table or subquery, just like you have in the bottom query. This will ensure you join on a many-to-one relationship, instead of a many-to-many, which must be causing your current duplication (or 'multiplied' sums).
SELECT
MAX(table1.id) as id,
clients.Username as account,
table1.clientid,
SUM(table1.symbols) as symbols,
SUM(table1.tickets) as tickets,
SUM(table1.cash) as cash,
withdraw,
depos,
FROM
table1
LEFT JOIN
(SELECT clientid,
SUM(case when memo = 'Withdraw' then amount else 0 end) as withdraw,
SUM(case when memo = 'Depos' then amount else 0 end) as depos
FROM clients
LEFT JOIN table2 ON clients.Fidx = table2.clientid
AND table2.date >= '01-09-2016'
AND table2.date<='01-09-2017'
GROUP BY clientid) clients
ON
clients.clientID = table1.clientid
WHERE
table1.tradedate >= '01-09-2016'
AND
table1.tradedate <= '01-09-2017'
GROUP BY
table1.clientid, clients.Username, table2.clientid, clients.depos, clients.withdraw
ORDER BY
clients.Username;
Further example:
Table1
id | someInfo
1 | a
1 | b
1 | c
Table2
id | value
1 | 5
1 | 10
This query:
SELECT t1.id, SUM(t2.Value)
FROM table1 t1
JOIN table2 t2 on t1.id = t2.id --This will be many-to-many
GROUP BY t1.id
Will result in this:
Results
id | value
1 | 45 --sum of 45 because the `table2` values are triplicated from the join
Where this query:
SELECT DISTINCT t1.id, Value
FROM table1 t1
JOIN (SELECT id, SUM(Value) value
FROM table2
GROUP BY id) t2 on t1.id = t2.id --This will be many-to-one
Will result in this:
Results
id | value
1 | 15 --sum of 15 because the `table2` values are not triplicated from the join

Aggregate before joining:
select
t1.id,
c.Username as account,
c.clientid,
t1.symbols,
t1.tickets,
t1.cash,
coalesce(t2.withdraw, 0) as withdraw,
coalesce(t2.depos, 0) as depos
from clients c
join
(
select
clientid,
max(id) as id,
sum(symbols) as symbols,
sum(tickets) as tickets,
sum(cash) as cash
from table1
where date >= '20160901' and date <= '20170901'
group by clientid
) t1 on t1.clientid = c.fidx
left join
(
select
clientid,
sum(case when memo = 'Withdraw' then amount end) as withdraw,
sum(case when memo = 'Depos' then amount end) as depos
from table2
where date >= '20160901' and date <= '20170901'
group by clientid
) t2 on t2.clientid = c.fidx;

SQL Server Select ID from table 1 where all ID in table2 match criteria

Table1 Table2
requestID requestComplete requestID DocumnentNum DocumentComplete
1 0 1 ABC 1
2 1 1 DEF 1
3 0 1 GHI 1
4 0 2 XXX 1
5 1 3 YYY 0
My question is this: how do I find requestID's from table one where requestComplete = 0 and all the documents for that requestID in table2 have documentCompleted = 1?

select t1.requestID
from table1 t1
join table2 t2 on t1.requestID = t2.requestID
where t1.requestComplete = 0
group by t1.requestID
having sum(case when documentCompleted <> 1 then 1 else 0 end) = 0

SELECT t1.requestID
FROM Table1 t1
WHERE NOT EXISTS (
SELECT *
FROM Table2 t2
WHERE t1.requestID = t2.requestID
AND t2.DocumentComplete = 0
) AND t1.requestComplete = 1

should work like:
SELECT T1.*
FROM Table1 T1
WHERE T1.requestComplete = 0
AND NOT EXISTS (SELECT 1
FROM Table2 T2
WHERE T2.requestID = T1.requestID
AND T2.DocumentComplete != 1)

If T2.DocumentComplete is of a numeric datatype you can do something like this.
SELECT t1.RequestID, t1.RequestComplete, t2.MinDoc
FROM Table1 t1
JOIN (SELECT t2.RequestID, MIN(t2.DocumentComplete) AS MinDoc
FROM Table2 t2
GROUP BY t2.RequestID
HAVING MIN(t2.DocumentComplete) = 1) t2 ON t1.RequestID = t2.Requestid
WHERE t1.RequestComplete = 0
If it is not then you could just convert the t2.documentcomplete to INT. It would look like this.
SELECT t1.RequestID, t1.RequestComplete, t2.MinDoc
FROM Table1 t1
JOIN (SELECT t2.RequestID, MIN(CONVERT(INT,t2.DocumentComplete)) AS MinDoc
FROM Table2 t2
GROUP BY t2.RequestID
HAVING MIN(CONVERT(INT,t2.DocumentComplete)) = 1) t2 ON t1.RequestID =
t2.Requestid
WHERE t1.RequestComplete = 0

Running multiple SET statements in CTE?

using SQL 2012
I have a CTE statement that give me incorrect results. Multiple records for each record_id may exist with different types. This seems to be skipping records and not updating all of them correctly:
WITH cte as (
SELECT
o.sname, o.type, o.record_id,
p.data1, p.data2, p.data3
FROM
table1 p
JOIN table2 o ON o.record_id = p.record_id
WHERE
o.record_id IN(1,2,3)
--AND (o.type = 123 or o.type = 456 or o.type = 789)
UPDATE cte
set data1 = (case when type = 123 then 1 else data1 end),
data2 = (case when type = 456 then 1 else data2 end),
data3 = (case when type = 378 then 1 else data3 end)
where type in (123,456,789)
Not sure why this happens.
What I am after is to look at only certain records and if a specific TYPE value exists, change the DATA value to 1 every time it is encountered for specific TYPES.
If I run the UPDATE part of the CTE this way, it works correctly, just not when together:
UPDATE cte
set data1 = (case when type = 123 then 1 else data1 end),
where type in (123)
UPDATE cte
set data2 = (case when type = 456 then 1 else data2 end)
where type in (456)
UPDATE cte
set data3 = (case when type = 789 then 1 else data3 end)
where type in (789)
Whats wrong?
Here are Tables and desired outputs:
TABLE1
record_id |type |sname
------------|-------|-----|
1 |123 |alpha
2 |123 |alpha
2 |456 |beta
3 |456 |beta
3 |789 |gamma
Table 2 is originally all zeros
Desired Output:
TABLE2
record_id| data1| data2| data3|
---------|-------|-------|-------|
1 |1 | 0 | 0
2 |1 | 1 | 0
3 |0 | 1 | 1
Actual Output:
TABLE2
record_id| data1| data2| data3|
---------|-------|-------|-------|
1 |1 | 0 | 0
2 |1 | 0 | 0
3 |0 | 1 | 0
Thanks,
MP

You can simply use aggregation in a subquery to check which type exists for a given record_id and then multitable update like this:
update t2
set t2.data1 = t1.data1,
t2.data2 = t1.data2,
t2.data3 = t1.data3
from table2 t2
join (
select record_id,
max(case when type = 123 then 1 else 0 end) as data1,
max(case when type = 456 then 1 else 0 end) as data2,
max(case when type = 789 then 1 else 0 end) as data3
from table1
group by record_id
) t1
on t1.record_id = t2.record_id;
Demo
Another way is using correlation with EXISTS:
update t2
set data1 = case when exists (select 1 from table1 t1 where t1.record_id = t2.record_id and t1.type = 123) then 1 else 0 end,
data2 = case when exists (select 1 from table1 t1 where t1.record_id = t2.record_id and t1.type = 456) then 1 else 0 end,
data3 = case when exists (select 1 from table1 t1 where t1.record_id = t2.record_id and t1.type = 789) then 1 else 0 end
from table2 t2;
Demo 2

SQL Query For Counting Sequence of Days Based on a Value

I have table that has id, timestamp and control value.
I need query for ids which has control value 0 for 4 days or more in sequence
TimeS ID Kontrol
2012-06-18 5457554F-E9A5-4312-8BA3-424B2333D0B7 1
2012-06-14 3FC4AC80-7D94-496A-92D0-22350CA3CEA9 1
2012-06-14 FE3C1872-0F13-48CC-A6C9-BBE0EAB07B9D 0
2012-06-13 FE3C1872-0F13-48CC-A6C9-BBE0EAB07B9D 0
2012-06-12 FE3C1872-0F13-48CC-A6C9-BBE0EAB07B9D 0
2012-06-11 FE3C1872-0F13-48CC-A6C9-BBE0EAB07B9D 0
It should return FE3C1872-0F13-48CC-A6C9-BBE0EAB07B9 for example.

Try this:
SELECT t1.id
FROM tableName t1
WHERE t1.kontrol = 0
AND EXISTS (SELECT 1
FROM tableName t2
WHERE t2.id = t1.id
AND t2.kontrol = 0
AND t2.timeS = t1.timeS + 1
AND EXISTS (SELECT 1
FROM tableName t3
WHERE t3.id = t2.id
AND t3.kontrol = 0
AND t3.timeS = t2.timeS + 1
AND EXISTS (SELECT 1
FROM tableName t4
WHERE t4.id = t3.id
AND t4.kontrol = 0
AND t4.timeS = t3.timeS + 1)))

SELECT DISTINCT t1.id
FROM table t1
INNER JOIN
table t2 ON t2.TimeS+1 days=t1.TimeS
INNER JOIN
table t3 ON t3.TimeS+2 days=t1.TimeS
INNER JOIN
table t4 ON t4.TimeS+3 days=t1.TimeS
WHERE t1.Control = 0 AND
t2.Control = 0 AND
t3.Control = 0 AND
t4.Control = 0

Try this query
select
id,
count(distinct TimeS)
from
table1
where
kontrol=0
group by
id
having
count(distinct TimeS) >= 4;
FIDDLE

Sql Count by multiple Where condition

i've really bad problem with this sql query.
base on my tables i need some thing like this result:
table1
Id | Type | Size |Count | OwnerId
___________________________________
1 A 1 12 1
2 A 2 12 1
3 B 1 14 1
4 B 1 20 1
5 A 1 12 2
6 A 1 17 2
table2
Id | name
_________
1 A
2 B
The Result
______________________
Name | Size1Type1 Count | Size2Type1 Count | Size1Type2 Count
thanks indeeeeed .

You did not specify what RDBMS you are using but you should be able to get the result by implementing an aggregate function with a CASE statement. This process is similar to a PIVOT:
select t2.name,
sum(case when t1.size = 1 and t1.type = 'a' then 1 else 0 end) Size1Type1Count,
sum(case when t1.size = 2 and t1.type = 'a' then 1 else 0 end) Size2Type1Count,
sum(case when t1.size = 1 and t1.type = 'b' then 1 else 0 end) Size1Type2Count,
sum(case when t1.size = 2 and t1.type = 'b' then 1 else 0 end) Size2Type2Count
from table1 t1
inner join table2 t2
on t1.ownerid = t2.id
group by t2.name
See SQL Fiddle with Demo
Result:
| NAME | SIZE1TYPE1COUNT | SIZE2TYPE1COUNT | SIZE1TYPE2COUNT | SIZE2TYPE2COUNT |
--------------------------------------------------------------------------------
| A | 1 | 1 | 2 | 0 |
| B | 2 | 0 | 0 | 0 |
If you want to include your count field, then you would use something like this:
select t2.name,
sum(case when t1.size = 1 and t1.type = 'a' then "Count" end) Size1Type1Count,
sum(case when t1.size = 2 and t1.type = 'a' then "Count" end) Size2Type1Count,
sum(case when t1.size = 1 and t1.type = 'b' then "Count" end) Size1Type2Count,
sum(case when t1.size = 2 and t1.type = 'b' then "Count" end) Size2Type2Count
from table1 t1
inner join table2 t2
on t1.ownerid = t2.id
group by t2.name;
See SQL Fiddle with Demo
Result:
| NAME | SIZE1TYPE1COUNT | SIZE2TYPE1COUNT | SIZE1TYPE2COUNT | SIZE2TYPE2COUNT |
--------------------------------------------------------------------------------
| A | 12 | 12 | 34 | (null) |
| B | 29 | (null) | (null) | (null) |
Or you could even perform multiple joins on the tables to get the result that you want:
select t2.name,
sum(t1_a1."count") Size1Type1Count,
sum(t1_a2."count") Size2Type1Count,
sum(t1_b1."count") Size1Type2Count,
sum(t1_b2."count") Size2Type2Count
from table2 t2
left join table1 t1_a1
on t1_a1.ownerid = t2.id
and t1_a1.size = 1
and t1_a1.type = 'a'
left join table1 t1_a2
on t1_a2.ownerid = t2.id
and t1_a2.size = 2
and t1_a2.type = 'a'
left join table1 t1_b1
on t1_b1.ownerid = t2.id
and t1_b1.size = 1
and t1_b1.type = 'b'
left join table1 t1_b2
on t1_b2.ownerid = t2.id
and t1_b2.size = 2
and t1_b2.type = 'b'
group by t2.name
See SQL Fiddle with Demo

SELECT Name, Type, Size, SUM(Count) AS 'Count' FROM Table1, Table2
WHERE Table1.OwnerID = Tabel2.Id
GROUP BY Name, Type, Size

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove and Identify Duplicate Records in SQL - sql

Related

Getting the wrong values after SUM() query

SQL Server Select ID from table 1 where all ID in table2 match criteria

Running multiple SET statements in CTE?

SQL Query For Counting Sequence of Days Based on a Value

Sql Count by multiple Where condition

Categories

Resources