Finding duplicate differences between two tables in sql - sql

I try to find duplicate rows between two tables. This code works only if records are not duplicated:
(select [Name], [Age] from PeopleA
except
select [Name], [Age] from PeopleB)
union all
(select [Name], [Age] from PeopleB
except
select [Name], [Age] from PeopleA)
How to find missing, duplicate records. Robert 34 in PersonA table for example below:
PersonA:
Name | Age
-------------
John | 45
Robert | 34
Adam | 26
Robert | 34
PersonB:
Name | Age
-------------
John | 45
Robert | 34
Adam | 26

You can use UNION ALL to concat both tables and Group By with Having clause to find duplicates:
SELECT x.Name, x.Age, Cnt = Count(*)
FROM (
SELECT a.Name, a.Age
FROM PersonA a
UNION ALL
SELECT b.Name, b.Age
FROM PersonB b
) x
GROUP BY x.Name, x.Age
HAVING COUNT(*) > 1
According to your clarification in the comment, you could use following query to find all name-age combinations in PersonA which are different in PersonB:
WITH A AS(
SELECT a.Name, a.Age, cnt = count(*)
FROM PersonA a
GROUP BY a.Name, a.Age
),
B AS(
SELECT b.Name, b.Age, cnt = count(*)
FROM PersonB b
GROUP BY b.Name, b.Age
)
SELECT a.Name, a.Age
FROM A a LEFT OUTER JOIN B b
ON a.Name = b.Name AND a.Age = b.Age
WHERE a.cnt <> ISNULL(b.cnt, 0)
Demo
If you also want to find persons which are in PersonB but not in PersonA you should use a FULL OUTER JOIN as Gordon Linoff has commented:
WITH A AS(
SELECT a.Name, a.Age, cnt = count(*)
FROM PersonA a
GROUP BY a.Name, a.Age
),
B AS(
SELECT b.Name, b.Age, cnt = count(*)
FROM PersonB b
GROUP BY b.Name, b.Age
)
SELECT Name = ISNULL(a.Name, b.Name), Age = ISNULL(a.Age, b.Age)
FROM A a FULL OUTER JOIN B b
ON a.Name = b.Name AND a.Age = b.Age
WHERE ISNULL(a.cnt, 0) <> ISNULL(b.cnt, 0)
Demo

I like Tim's answer but you need to check in both tables if the records are missing. He is only checking if the records are missing in table A. Try this to check if records are missing in either of the tables and how many times.
Select *, 'PersonB' MissingInTable, a.cnt - isnull(b.cnt,0) TimesMissing From
(
Select *, count(1) cnt from PersonA group by Name, Age) A Left join
(Select *, count(1) cnt from PersonB group by Name, Age) B
On a.age=b.age and a.name=b.name
where a.cnt>isnull(b.cnt,0)
Union All
Select *, 'PersonA' MissingInTable, b.cnt - isnull(a.cnt,0) TimesMissing From
(
Select *, count(1) cnt from PersonA group by Name, Age) A Right join
(Select *, count(1) cnt from PersonB group by Name, Age) B
On a.age=b.age and a.name=b.name
where b.cnt>isnull(a.cnt,0)
See demo here : http://sqlfiddle.com/#!6/06020/13

Add another UNION ALL!
Code:
(SELECT [Name], [Age], 'Missing from B' AS [Type] from PeopleA
EXCEPT
SELECT [Name], [Age], 'Missing from B' AS [Type] from PeopleB)
UNION ALL
(SELECT [Name], [Age], 'Missing from A' as [Type] from PeopleB
EXCEPT
SELECT [Name], [Age], 'Missing from A' AS [Type] from PeopleA)
UNION ALL
SELECT [Name], [Age], 'Duplicate' AS [Type] FROM PeopleA INNER JOIN PeopleB ON PeopleA.Name = PeopleB.Name AND
PeopleA.Age=PeopleB.Age

Related

SQL - Finding Duplicate Records based certain criteria

I have these records in the table - employee_projects
id
employee_id
project_id
status
1
emp1
proj1
VERIFIED
2
emp2
proj2
REJECTED
3
emp1
proj1
VERIFIED
4
emp1
proj3
REJECTED
5
emp2
proj2
REQUIRED
6
emp3
proj4
SUBMITTED
7
emp4
proj5
VERIFIED
8
emp4
proj6
VERIFIED
9
emp3
proj4
REQUIRED
Here are the criteria for determining duplicates:
Same employee ID, same project ID under the same status (Example: rows 1 and 3 are duplicates)
Same employee ID, same project ID but in different status (Example: rows 6 and 9 are duplicates).
An exception to duplication criteria#2 is if one project is REQUIRED and the same project is also REJECTED under the same employee, this is NOT considered a duplicate. For example, rows 2 and 5 are NOT duplicates.
I have a query for the first criterion:
select
emp_id,
proj_id,
status,
COUNT(*)
from
employee_projects
group by
emp_id,
proj_id,
status
having
COUNT(*) > 1
What I'm struggling to construct is the SQL for the second criterion.
maybe a self join can help you.
with t (employee_id ,project_id,status)
as
(
select 'emp1', 'proj1' , 'VERIFIED'
Union all select 'emp2', 'proj2' , 'REJECTED'
Union all select 'emp1', 'proj1' , 'VERIFIED'
Union all select 'emp1', 'proj3' , 'REJECTED'
Union all select 'emp2', 'proj2' , 'REQUIRED'
Union all select 'emp3', 'proj4' , 'SUBMITTED'
Union all select 'emp4', 'proj5' , 'VERIFIED'
Union all select 'emp4', 'proj6' , 'VERIFIED'
Union all select 'emp3', 'proj4' , 'REQUIRED'
)
select
t.employee_id,
t.project_id,
t.status,
'' as status,
'criteria#1' as SQL
from
t
group by
t.employee_id,
t.project_id,
t.status
having
COUNT(*) > 1
union all
SELECT
t.employee_id,
t.project_id,
t.status,
a.status,
'criteria#2' as SQL
FROM
t
left join t as a on
t.employee_id = a.employee_id and
t.project_id = a.project_id
where
t.status != a.status and
concat(t.status,a.status) != 'REQUIREDREJECTED' and
concat(t.status,a.status) != 'REJECTEDREQUIRED'
Try the following:
select T.emp_id, T.proj_id, T.status, D.dup_cnt
from employee_projects T join
(
select emp_id, proj_id, count(*) as dup_cnt
from employee_projects
group by emp_id, proj_id
having count(*) > 1 and
count(distinct case when status in ('REQUIRED', 'REJECTED') then status end) < 2
) D
on T.emp_id = D.emp_id and T.proj_id = D.proj_id
order by T.emp_id, T.proj_id
If you want to consider an employee with statuses ('REQUIRED', 'REJECTED', any other statuses) as duplicate, modify the having clause as the following:
select T.emp_id, T.proj_id, T.status, D.dup_cnt
from employee_projects T join
(
select emp_id, proj_id, count(*) as dup_cnt
from employee_projects
group by emp_id, proj_id
having count(*) > 1 and
(count(distinct case when status in ('REQUIRED', 'REJECTED') then status end) < 2 or count(distinct status) > 2)
) D
on T.emp_id = D.emp_id and T.proj_id = D.proj_id
order by T.emp_id, T.proj_id
See a demo.

Tips for Creating Summary Count of Value in other Tables

I have multiple tables with a status column in each. I want to display a summary of the counts of each status per table. Something like this:
=============================================
Status | Table A | Table B | Table C |
Status A | 3 | 8 | 2 |
Status B | 5 | 7 | 4 |
==============================================
I need help getting started as I'm not sure how to approach this issue. I can do simple COUNT functions like:
SELECT status, count(status) from TABLE_A group by status
But I'm not sure how to populate the data in the form I want or how to, if possible, use the table names as the column headers. I'd appreciate a point in the right direction. Thanks!
May be try doing left joins after you have calculated counts for each table separately.Something like:
select distinct t1.status,
count(t1.status) as [tableA],
t2.TableB,
t3.TableC from Table A t1
left join (
select distinct status,
count(status) as [TableB] from Table B
group by status
) t2 on t1.status=t2.status
left join (
select distinct status,
count(status) as [TableC] from Table C
group by status
) t3 on t1.status=t3.status
group by t1.Status
I would use union all and aggregation:
select status, sum(a) as a, sum(b) as b, sum(c) as c
from ((select status, count(*) as a, 0 as b, 0 as c
from tablea
group by status
) union all
(select status, 0, count(*), 0
from tableb
group by status
) union all
(select status, 0, 0, count(*)
from tablea
group by status
)
) abc
group by status;
This ensures that all rows appear, even when one or more tables are missing some values of status.
could be using left join
select t.status, a.cnt A, b.cnt B,c.cnt C
from(
select status
from tableA
union
select status
from tableB
select status
from tableC
) t
left join (
select status, count(*) cnt
from tableA
group by status
) a ON on t.status = a.status
left join (
select status, count(*) cnt
from tableB
group by status
) b ON on t.status = b.status
left join (
select status, count(*) cnt
from tableC
group by status
) c ON on t.status = c.status

How to display a ROW based on value in CASE STATEMENT

I have a query as below, and want to display a row only if the value is 1 using CASE. Please can you advice how I can do that
SELECT DISTINCT
a.AccountID,
a.ForeName,
a.Surname,
a.Gender,
CASE
WHEN B.Value = '1145' THEN '1'
WHEN B.Value = '1007' THEN '2' ELSE '0'
END AS Value,
b.Address,
b.Town
FROM
Customer a
LEFT OUTER JOIN
AdditionalDetails b
ON
b.ID = a.AccountID
The result I am getting:
AccountID ForeName Surname Gender NoName Address Town
00012 Eric Manse Male 0 Porto Porto
00013 Peter Mark Male 0 Porto Porto
00014 Tom Jerry Male 0 Porto Porto
00014 Tom Jerry Male 1 Porto Porto
00015 Sarah Parker Female 0 Porto Porto
00015 Sarah Parker Female 1 Porto Porto
If there is a 1 in the CASE statement, it should not display the 0 just the row with the value 1
I speculate that you want either MAX() or MIN():
SELECT c.AccountID, c.ForeName, c.Surname, c.Gender,
MAX(CASE WHEN ad.Value = '1145' THEN '1'
WHEN ad.Value = '1007' THEN '2'
ELSE '0'
END),
ad.Address, ad.Town
FROM Customer c LEFT OUTER JOIN
AdditionalDetails ad
ON c.ID = ad.ID
GROUP BY c.AccountID, c.ForeName, c.Surname, c.Gender, ad.Address, ad.Town;
EDIT:
You seem to want prioritization:
SELECT cad.*
FROM (SELECT c.AccountID, c.ForeName, c.Surname, c.Gender,
ad.Address, ad.Town,
ROW_NUMBER() OVER (PARTITION BY c.ACCOUNTID
ORDER BY (CASE WHEN ad.Value = '1145' THEN 1
WHEN ad.Value = '1007' THEN 2
ELSE 0'
END) DESC
) as seqnum
FROM Customer c LEFT OUTER JOIN
AdditionalDetails ad
ON c.ID = ad.ID
) cad
WHERE seqnum = 1;
You have altered your question. You are not looking for distinct rows, but you want to rank the rows and only display best matches.
Depending on your exact requirements you'd use RANK or ROW_NUMBER with an appropriate ORDER BY and PARTITION BY clause for this.
For instance:
select c.*, ad.address, ad.town
from customer c
left join
(
select
address,
town,
customer_id,
rank() over (partition by customer_id
order by case value when 1145 then 1 when 1007 then 2 else 0 end desc) as rnk
from additionaldetails
) ad on ad.customer_id = c.id and d.rnk = 1;
you could try like below
with cte as (
SELECT a.AccountID, a.ForeName, a.Surname, a.Gender,
b.Address, b.Town,
row_number() over(partition by a.AccountID
order by
(CASE WHEN b.Value = '1145' THEN 1 WHEN b.Value = '1007' THEN 2 ELSE 0
END) desc) as val
FROM Customer a
LEFT OUTER JOIN AdditionalDetails b
ON b.ID = a.AccountID
) select * from cte where val=1

T-SQL using SUM for a running total

I have a simple table with some dummy data setup like:
|id|user|value|
---------------
1 John 2
2 Ted 1
3 John 4
4 Ted 2
I can select a running total by executing the following sql(MSSQL 2008) statement:
SELECT a.id, a.user, a.value, SUM(b.value) AS total
FROM table a INNER JOIN table b
ON a.id >= b.id
AND a.user = b.user
GROUP BY a.id, a.user, a.value
ORDER BY a.id
This will give me results like:
|id|user|value|total|
---------------------
1 John 2 2
3 John 4 6
2 Ted 1 1
4 Ted 2 3
Now is it possible to only retrieve the most recent rows for each user? So the result would be:
|id|user|value|total|
---------------------
3 John 4 6
4 Ted 2 3
Am I going about this the right way? any suggestions or a new path to follow would be great!
No join is needed, you can speed up the query this way:
select id, [user], value, total
from
(
select id, [user], value,
row_number() over (partition by [user] order by id desc) rn,
sum(value) over (partition by [user]) total
from users
) a
where rn = 1
try this:
;with cte as
(SELECT a.id, a.[user], a.value, SUM(b.value) AS total
FROM users a INNER JOIN users b
ON a.id >= b.id
AND a.[user] = b.[user]
GROUP BY a.id, a.[user], a.value
),
cte1 as (select *,ROW_NUMBER() over (partition by [user]
order by total desc) as row_num
from cte)
select id,[user],value,total from cte1 where row_num=1
SQL Fiddle Demo
add where statement:
select * from
(
your select statement
) t
where t.id in (select max(id) from table group by user)
also you can use this query:
SELECT a.id, a.user, a.value,
(select max(b.value) from table b where b.user=a.user) AS total
FROM table a
where a.id in (select max(id) from table group by user)
ORDER BY a.id
Adding a right join would perform better than nested select.
Or even simpler:
SELECT MAX(id), [user], MAX(value), SUM(value)
FROM table
GROUP BY [user]
Compatible with SQL Server 2008 or later
DECLARE #AnotherTbl TABLE
(
id INT
, somedate DATE
, somevalue DECIMAL(18, 4)
, runningtotal DECIMAL(18, 4)
)
INSERT INTO #AnotherTbl
(
id
, somedate
, somevalue
, runningtotal
)
SELECT LEDGER_ID
, LL.LEDGER_DocDate
, LL.LEDGER_Amount
, NULL
FROM ACC_Ledger LL
ORDER BY LL.LEDGER_DocDate
DECLARE #RunningTotal DECIMAL(18, 4)
SET #RunningTotal = 0
UPDATE #AnotherTbl
SET #RunningTotal=runningtotal = #RunningTotal + somevalue
FROM #AnotherTbl
SELECT *
FROM #AnotherTbl

Group by SQL with count

Lets say we have got rows like that:
MyTable
ID Name Product
------------------
1 Adam x
2 Adam y
3 Adam z
4 Peter a
5 Peter b
Using query like:
Select Name, Count(Product) from MyTable
group by Name
results will be:
Adam 3
Peter 2
But I would like results like:
1 Adam x 3
2 Adam y 3
3 Adam z 3
4 Peter a 2
5 Peter b 2
I hope Ypu know what I mean
Could You help me with that query,
thanks for help,
Bye
You can join the table with a subquery run on the table to select the counts:
SELECT a.ID as ID, a.Name as Name, a.Product as Product, ISNULL(b.cnt,0) as Cnt
FROM MyTable a
LEFT JOIN (SELECT Name, COUNT(*) as Cnt FROM MyTable GROUP BY Name) b
ON a.Name = b.Name
How about?
Select *, Count() OVER(PARTITION BY Name) As C
from MyTable
Select a.Id,
a.Name,
a.Product,
IsNull(b.CountOfUsers,0) as CountOfUsers
From MyTable a
Left Join (Select Name, Count(Product) as CountOfUsers from MyTable
group by Name)b on a.Name = b.Name
;WITH c AS (SELECT Name, COUNT(Product) CountOfProduct
FROM MyTable
GROUP BY Name)
SELECT t.Id, t.Name, t.Product, c.CountOfProduct
FROM MyTable t
INNER JOIN c ON c.Name = t.Name
Use:
SELECT x.id,
x.name,
x.product,
COALESCE(y.name_count, 0) AS num_instances
FROM MyTable x
LEFT JOIN (SELECT t.name,
COUNT(*) AS name_count
FROM MyTable t
GROUP BY t.name) y ON y.name = x.name
COALESCE is the ANSI standard means of handling NULL values, and is supported by MySQL, SQL Server, Oracle, Postgre, etc.