Duplicate Checks with Multiple Values - sql

I am doing some manual duplicate checks on my database, and have a complicated case.
I need to check for duplicate rows based on a value in Column A, which I have done. However, in this specific case, there might be multiple records that have the same value for Column A but a different value for Column E.
Here is my original query:
SELECT ColumnA, COUNT(*) TotalCount
FROM TableA
INNER JOIN TableA_1 on fID = hID
WHERE dateCreated > '2013-05-08 00:00:00'
GROUP BY ColumnA
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
I now need to filter out duplicates for ColumnA where ColumnE is different, or unique. I have added psuedocode to my original query
SELECT ColumnA, COUNT(*) TotalCount
FROM TableA
INNER JOIN TableA_1 on fID = hID
WHERE dateCreated > '2013-05-08 00:00:00'
AND ColumnE is not unique
GROUP BY ColumnA
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
I hope this makes sense.

You need a GROUP BY clause on a ColumnA column and HAVING clause on DISTINCT ColumnE
SELECT ColumnA, COUNT(*) TotalCount
FROM TableA INNER JOIN TableA_1 on fID = hID
WHERE dateCreated > '2013-05-08 00:00:00'
GROUP BY ColumnA
HAVING COUNT(DISTINCT ColumnE) > 1
ORDER BY COUNT(*) DESC

You could just add ColumnE into the grouping, as shown below:
SELECT ColumnA, ColumnE, COUNT(*) TotalCount
FROM TableA
INNER JOIN TableA_1 on fID = hID
WHERE dateCreated > '2013-05-08 00:00:00'
GROUP BY ColumnA, ColumnE
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC

Related

Nested Select with Count Distinct and Group By

So I'm trying to get 2 sets or results from the same table grouped by a 3rd column, it's best i let my example explain;
SELECT
(SELECT COUNT(DISTINCT id)
FROM Database
WHERE Status NOT LIKE 'closed') AS ColumnA,
(SELECT COUNT(DISTINCT id)
FROM Database
WHERE Status NOT LIKE 'closed' AND Datevalue <= getdate()) AS ColumnB
Group By ColumnC
Now I know this wont/doesn't work but it explains what I want. If I leave the group by out then i get the figures as a whole but i want them grouping by another Column.
Mind is melting, ready to be enlightened.
Is this what you want?
select columnC,
count(distinct case when Status <> 'closed' then id end) as columnA,
count(distinct case when Status <> 'closed' and datevalue <= getdate() then id end) as columnb
from database -- a very curious name for a table
group by ColumnC;
Try this
SELECT ColumnC,
COUNT(DISTINCT CASE WHEN Status NOT LIKE 'closed' THEN id END) as ColumnA,
COUNT(DISTINCT CASE WHEN Status NOT LIKE 'closed' AND Datevalue <= getdate() THEN id END) as ColumnB
FROM mydatabase
GROUP BY ColumnC

SQL Case depending on previous status of record

I have a table containing status of a records. Something like this:
ID STATUS TIMESTAMP
1 I 01-01-2016
1 A 01-03-2016
1 P 01-04-2016
2 I 01-01-2016
2 P 01-02-2016
3 P 01-01-2016
I want to make a case where I take the newest version of each row, and for all P that has at some point been an I, they should be cased as a 'G' instead of P.
When I try to do something like
Select case when ID in (select ID from TABLE where ID = 'I') else ID END as status)
From TABLE
where ID in (select max(ID) from TABLE)
I get an error that this isn't possible using IN when casing.
So my question is, how do I do it then?
Want to end up with:
ID STATUS TIMESTAMP
1 G 01-04-2016
2 G 01-02-2016
3 P 01-01-2016
DBMS is IBM DB2
Have a derived table which returns each id with its newest timestamp. Join with that result:
select t1.ID, t1.STATUS, t1.TIMESTAMP
from tablename t1
join (select id, max(timestamp) as max_timestamp
from tablename
group by id) t2
ON t1.id = t2.id and t1.TIMESTAMP = t2.max_timestamp
Will return both rows in case of a tie (two rows with same newest timestamp.)
Note that ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP".
You can do this by using a common table expression find all IDs that have had a status of 'I', and then using an outer join with your table to determine which IDs have had a status of 'I' at some point.
To get the final result (with only the newest record) you can use the row_number() OLAP function and select only the "newest" record (this is shown in the ranked common table expression below:
with irecs (ID) as (
select distinct
ID
from
TABLE
where
status = 'I'
),
ranked as (
select
rownumber() over (partition by t.ID order by t.timestamp desc) as rn,
t.id,
case when i.id is null then t.status else 'G' end as status,
t.timestamp
from
TABLE t
left outer join irecs i
on t.id = i.id
)
select
id,
status,
timestamp
from
ranked
where
rn = 1;
other solution
with youtableranked as (
select f1.id,
case (select count(*) from yourtable f2 where f2.ID=f1.ID and f2."TIMESTAMP"<f1."TIMESTAMP" and f2.STATUS='I')>0 then 'G' else f1.STATUS end as STATUS,
rownumber() over(partition by f1.id order by f1.TIMESTAMP desc, rrn(f1) desc) rang,
f1."TIMESTAMP"
from yourtable f1
)
select * from youtableranked f0
where f0.rang=1
ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP"
try this
select distinct f1.id, f4.*
from yourtable f1
inner join lateral
(
select
case (select count(*) from yourtable f3 where f3.ID=f2.ID and f3."TIMESTAMP"<f2."TIMESTAMP" and f3.STATUS='I')>0 then 'G' else f2.STATUS end as STATUS,
f2."TIMESTAMP"
from yourtable f2 where f2.ID=f3.ID
order by f2."TIMESTAMP" desc, rrn(f2) desc
fetch first rows only
) f4 on 1=1
rrn(f2) order is for same last date
ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP"

advice needed for SQL query

Can some one help to provide the SQL query should be used to pull out the "columna" value which has max number "columnb" value as "Active". Means in columnb there is a value "Active" , I want to pull the columna value which has max of value Active n columnb.
I am looking output to be columna = M1 and Count = 4
columna columnb
M1 Active
M1 Active
M1 Active
M1 Active
M2 failed
M2 failed
M2 failed
M3 pending
M3 pending
M3 pending
The results you request would be produced by:
SELECT columna,COUNT(*)
FROM Table
WHERE columnb = 'Active'
GROUP BY columna
SELECT top 1 columna,COUNT(*) as cnt
FROM Table1
WHERE columnb = 'Active'
GROUP BY columna
order by cnt desc
FIDDLE
SELECT columna,count(*) FROM TABLE_NAME where columnb = "Active" GROUP BY columna
Syntax is slightly different between RDMBS-es, but logic remains. Filter your rows based on columnb, group them by columna, order them on count(*) and select top 1
SQL Server:
SELECT TOP 1 columna, COUNT(*) AS Count
FROM YourTable
WHERE columnb = 'Active'
GROUP BY columna
ORDER BY COUNT(*) DESC
SQLFiddle DEMO
MySQL:
SELECT columna, COUNT(*) AS Count
FROM YourTable
WHERE columnb = 'Active'
GROUP BY columna
ORDER BY COUNT(*) DESC
LIMIT 1
SQLFiddle DEMO

SQL query: how to distinct count of a column group by another column

In my table I need to know if each ID has one and only one ID_name. How can I write such query?
I tried:
select ID, count(distinct ID_name) as count_name
from table
group by ID
having count_name > 1
But it takes forever to run.
Any thoughts?
select ID
from YourTable
group by
ID
having count(distinct ID_name) > 1
or
select *
from YourTable yt1
where exists
(
select *
from YourTable yt2
where yt1.ID = yt2.ID
and yt1.ID_Name <> yt2.ID_Name
)
Now, most ID columns are defined as primary key and are unique. So in a regular database you'd expect both queries to return an empty set.
select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_Number() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
group by tt.ID
This gives you every ID with it's total number of ID_Name
If you want only those ID's which have more than one name associated just add a where clause
e.g.
select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_NUMBER() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
**where tt.myRank > 1**
group by tt.ID

Find duplicates in SQL

I have a large table with the following data on users.
social security number
name
address
I want to find all possible duplicates in the table
where the ssn is equal but the name is not
My attempt is:
SELECT * FROM Table t1
WHERE (SELECT count(*) from Table t2 where t1.name <> t2.name) > 1
A grouping on SSN should do it
SELECT
ssn
FROM
Table t1
GROUP BY
ssn
HAVING COUNT(*) > 1
..or if you have many rows per ssn and only want to find duplicate names)
...
HAVING COUNT(DISTINCT name) > 1
Edit, oops, misunderstood
SELECT
ssn
FROM
Table t1
GROUP BY
ssn
HAVING MIN(name) <> MAX(name)
This will handle more than two records with duplicate ssn's:
select count(*), name from table t1, (
select count(*) ssn_count, ssn
from table
group by ssn
having count(*) > 1
) t2
where t1.ssn = t2.ssn
group by t1.name
having count(*) <> t2.ssn_count