Merge duplicate rows - sql

I have a Customer table which contains an ID and Email field. I've written the following query to return all duplicate Customers with the same Email:
SELECT ID, Email
FROM Customer a
WHERE EXISTS (SELECT 1
FROM Customer b
WHERE a.Email = b.Email
GROUP BY Email
HAVING COUNT(Email) = 2)
ORDER BY Email
This is returning records that look like the following:
ID Email
1 a#hotmail.com
2 a#hotmail.com
3 b#gmail.com
4 b#gmail.com
While this works, I actually need the data in the following format:
ID1 Email1 ID2 Email2
1 a#hotmail.com 2 a#hotmail.com
3 b#gmail.com 4 b#gmail.com
What is the best way to achieve this?

One method is conditional aggregation . . . assuming you have at most two emails:
select max(case when seqnum = 1 then id end) as id_1,
email as email_1,
max(case when seqnum = 2 then id end) as id_2,
email as email_2
from (select t.*, row_number() over (partition by email order by id) as seqnum
from t
) t
group by email;
Actually, why not just do:
select email, count(*) as num_dups, min(id) as id_1,
(case when count(*) > 1 then max(id) end) as id_2
from t
group by email;

Try:
SELECT MIN(ID) ID, Email, MAX(ID) ID2, Email AS EMAIL2
FROM Customer GROUP BY Email
if you want HAVING COUNT(Email) = 2, it will be like this
SELECT MIN(ID) ID, Email, MAX(ID) ID2, Email AS EMAIL2
FROM Customer GROUP BY Email
HAVING COUNT(Email) = 2

Your layout assumes that you can only have a total of 2 duplicates.
Maybe list the IDs instead like below?
declare #Duplicates table (Email varchar(50), Customers varchar(100))
insert #Duplicates select Email, '' from Customer group by Email having count(*) > 1
UPDATE d
SET
Customers= STUFF(( SELECT ','+ cast(ID as varchar(10))
FROM Customer c
WHERE c.Email = d.Email
FOR XML PATH(''), TYPE).value('.','VARCHAR(max)'), 1, 1, '')
FROM #Duplicates AS d
select * from #Duplicates
order by Email

Related

Select the duplicate rows with specific values

How can I only get the data with the same ID, but not the same Name?
The following is the example to explain my thought. Thanks.
ID Name Date
123 Amy 08/03/2022
123 Amy 12/03/2022
456 Billy 08/03/2022
456 Cat 09/03/2022
789 Peter 10/03/2022
Expected Output:
ID Name Date
456 Billy 08/03/2022
456 Cat 09/03/2022
How I have done.
select ID, Name, count(*)
from table
groupby ID, Name
having count(*) > 1
But the result included the following parts that I do not want it.
ID Name Date
123 Amy 08/03/2022
123 Amy 12/03/2022
One approach would be to use a subquery to identify IDs that have multiple names.
SELECT *
FROM YourTable
WHERE ID IN (SELECT ID FROM YourTable GROUP BY ID HAVING COUNT(DISTINCT Name) > 1)
I'd join the table to its self like this:
SELECT DISTINCT
a.Id as ID_A,
b.Id as ID_B,
a.[Name] as Name_A
FROM
Test as a
INNER JOIN Test as b
ON A.Id = B.Id
WHERE
A.[Name] <> B.[Name]
Do you want
SELECT * FROM table_name
WHERE ID = 456;
or
SELECT * FROM table_name
WHERE ID IN
(SELECT
ID
FROM table_name
GROUP BY ID
HAVING COUNT(DISTINCT name) > 1
);
?
Window functions are likely to be the most efficient here. They do not require self-joining of the source table.
Unfortunately, SQL Server does not support COUNT(DISTINCT as a window function. But we can simulate it by using DENSE_RANK and MAX
WITH DistinctRanks AS (
SELECT *,
rnk = DENSE_RANK(*) OVER (PARTITION BY ID ORDER BY Name)
FROM YourTable
),
MaxRanks AS (
SELECT *,
mr = MAX(rnk) OVER (PARTITION BY ID)
FROM DistinctRanks
)
SELECT
ID,
Name,
Count
FROM MaxRanks t
WHERE t.mr > 1;

Fetch the id associated with only the given list of values

I've a simple table say
customerid clientid
----------------------
4567 1
5678 1
1298 2
4567 2
5678 2
4567 3
I want to get the clientid with only the list of customer id so say for
customerid in (4567,5678) clientid should be 1,
cutomerid in (4567) clientid should be 3
SELECT clientid
FROM customer_client
WHERE customerid IN (4567,5678)
GROUP BY clientid
HAVING COUNT(customerid) = 2
This returns
clientid
---------
1
2
while all I want is
clientid
---------
1
Try this-
SELECT DISTINCT clientid
FROM customer_client
WHERE customerid IN (4567,5678)
EXCEPT
SELECT DISTINCT clientid
FROM customer_client
WHERE customerid NOT IN (4567,5678)
Move the conditions to the having clause. If you have no duplicates, you can do:
SELECT cc.clientid
FROM customer_client cc
GROUP BY cc.clientid
HAVING SUM(CASE WHEN cc.customerid IN (4567, 5678) THEN 1 ELSE 0 END) = COUNT(*) AND
COUNT(*) = 2;
with t (customerid, clientid) as (values
(4567, 1)
, (5678, 1)
, (1298, 2)
, (4567, 2)
, (5678, 2)
, (4567, 3)
)
, l (customerid) as (values
--4567, 5678
--4567
4567, 5678, 1298
)
select g1.clientid
from (
select t.clientid, count(1) cnt
from t
join l on l.customerid=t.customerid
group by t.clientid
) g1
join (
select clientid, count(1) cnt
from t
group by clientid
) g2 on g1.clientid=g2.clientid and g1.cnt=g2.cnt and g1.cnt=(select count(1) from l);
You don't need to specify the number of customer IDs (which may vary as in the example) anywhere in the select statement along the corresponding list of IDs.
Just a list of these IDs itself.

Query to get the list of Addresses that have AddressType one else AddressType two?

Consider the folowing table
Id PersonId Address AddressTypeId
--------------------------------------------------------------------
1 1 AI1P1T1 1
2 1 AI2P1T2 2
3 2 AI3P2T2 2
I want to write a query to print the list of Addresses of Persons who have AddressType =1 or AddressTypeId=2 and
When person has AddressType =1 then select it,
else select person with AddressType =2
Expected result:
Address
--------------
AI1P1T1
AI3P2T2
Good day,
Please check if this solve your needs:
/***************************** DDL+DML */
drop table if exists T;
create table T(Id int,PersonId int, [Address] nvarchar(10), AddressTypeId int)
INSERT T(Id,PersonId, [Address], AddressTypeId)
values
(1,1,'AI1P1T1',1),
(2,1,'AI2P1T2',2),
(3,2,'AI3P2T2',2)
GO
select * from T
GO
/***************************** Solution */
With MyCTE as (
select *, ROW_NUMBER() OVER (partition by PersonId order by AddressTypeId) as RN
from T
)
select [Address]
from MyCTE
where
AddressTypeId in (1,2) -- if there can be only positive numbers then you can use "< 3"
and RN = 1
GO
You can try this also using joins:
select t1.PersonId,t1.Address from #T t1
inner join (select personid,min(AddressTypeId)atype from #T
group by PersonId )x
on x.atype=t1.AddressTypeId and x.PersonId=t1.PersonId
I would write a subquery to make ROW_NUMBER by window function, then use MAX in the main query.
SELECT
PersonId, MAX(Address) Address
FROM
(SELECT
PersonId,
(CASE
WHEN ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY PersonId) = 1
THEN Address
END) Address
FROM
T
WHERE
AddressTypeId IN (1,2)
) t1
GROUP BY
PersonId
sqlfiddle
[Results]:
| PersonId | Address |
+----------+---------+
| 1 | AI1P1T1 |
| 2 | AI3P2T2 |
Here's the top 1 with ties trick:
select top 1 * with ties
from yourtable
order by row_number() over (partition by PersonId order by AddressTypeId)
This will also work for versions <2012, and can return every field
You could use an union between the result for the result for only 1, only 2 and 1 when 1 and 2
select Address
from my_table m
Inner join (
select PersonId , count(distinct distinct AddressTypeId)
from my_table
where AddressTypeId in (1, 2)
group by PersonId
having count(distinct AddressTypeId) = 2
) t on t.personId = m.personId andm.AddressTypeId = 1
UNION
select Address
from my_table m
Inner join (
select PersonId , count(distinct distinct AddressTypeId)
from my_table
where AddressTypeId in ( 2)
group by PersonId
having count(distinct AddressTypeId) = 1
) t on t.personId = m.personId andm.AddressTypeId = 2
UNION
select Address
from my_table m
Inner join (
select PersonId , count(distinct distinct AddressTypeId)
from my_table
where AddressTypeId in ( 1)
group by PersonId
having count(distinct AddressTypeId) = 1
) t on t.personId = m.personId andm.AddressTypeId = 1
Try this one
select personId, last_value(Address) over(partition by personId order by AddressTypeId) as Address
from table
--use the where statement optionally
--where AddressTypeId in (1,2);

How to check duplicate column values?

I have create stored procedure to select data and result is:
ID NAME EMAIL
1 John asd#asd.com
2 Sam asd#asd.com
3 Tom asd#asd.com
4 Bob bob#asd.com
5 Tom asc#asd.com
and I would like to get result like:
ID NAME EMAIL
1 John asd#asd.com
2 Sam asd#asd.com
3 Tom asd#asd.com, asc#asd.com
4 Bob bob#asd.com
so, how can I do it?
Thanks.
select
id,
name,
email
from (
select
rn = row_number() over(partition by name order by id asc),
id,
name,
email = stuff((select ', ' + convert(varchar, t2.email)
from #table_var t2
where t1.name = t2.name
for xml path(''))
,1,2,'')
from #table_var t1
group by t1.id, t1.name
)t
where rn = 1
order by id
GROUP BY is what you're after.
For example
SELECT name, email, count(email)
FROM table
GROUP BY name, email
will return something like
1 John asd#asd.com 1
2 Sam asd#asd.com 1
3 Tom asd#asd.com 2
4 Bob bob#asd.com 1
adding
HAVING count(email) > 1
to the end will result in
1 Tom asd#asd.com 2
Just another way, could help
;WITH cte
AS
(
SELECT Id
,Name
,Email
,ROW_NUMBER() OVER(PARTITION BY Name,Email ORDER BY Id) AS rowNum
FROM Table
)
SELECT Id,Name,Email
FROM cte
WHERE rowNum=1;
A solution is :
select distinct e1.Name,
(case when e2.Email is null then e1.Email else
( case when e1.Email > e2.Email then e1.Email + ','+ e2.Email else e2.Email + ','+ e1.Email end )
end ) from MyTable e1
left join MyTable e2 on e1.Name = e2.Name and e1.Email <> e2.Email

SQL Server query + joining results

I have a query like this:
SELECT recipientid AS ID,
COUNT(*) AS Recieved FROM Inbox
GROUP BY recipientid
UNION
SELECT SenderId,
COUNT(*) AS [Sent] FROM Inbox
GROUP BY SenderId
The output:
RecipientID Recieved
001 3
001 4
002 4
002 2
003 18
003 55
How can I rewrite is such a way that it displays like this:
RecipientID Recieved Sent
001 3 4
002 4 2
003 18 55
Thanks.
Just join the subqueries:
select a.ID,Received,Sent
from(
SELECT recipientid AS ID,
COUNT(*) AS Recieved FROM Inbox
GROUP BY recipientid
)a
full outer join(
SELECT SenderId as ID,
COUNT(*) AS [Sent] FROM Inbox
GROUP BY SenderId
)b
on (a.ID = b.ID)
order by a.ID;
Note that this grabs all of the sent and received values for any recipients or senders. If you only want results for IDs belonging to recipients and senders, then do an inner join.
I would add a source column to your query and do a simple pivot
select ID,
max (case when source=1 then Cnt else 0 end) as Received,
max (case when source=2 then Cnt else 0 end) as Sent
from (
SELECT 1 as Source,
recipientid AS ID,
COUNT(*) AS Cnt
FROM Inbox
GROUP BY recipientid
UNION
SELECT 2 as Source,
SenderId,
COUNT(*)
FROM Inbox
GROUP BY SenderId
) x
GROUP BY ID
If it's Postgres, MS SQL or others that support CTEs -
With Both as
(
SELECT
recipientid AS ID,
Count(*) AS Recieved,
0 as [Sent]
FROM Inbox
GROUP BY recipientid
UNION
SELECT
SenderId as ID,
0 as Recieved,
Count(*) AS [Sent]
FROM Inbox
GROUP BY SenderId
)
SELECT
ID,
Sum(Received) as [Received],
Sum(Sent) as [Sent]
FROM BOTH
GROUP BY ID
ORDER BY 1
Assuming you have a users table with the IDs, you could do something like:
SELECT
users.id,
COUNT(sent.senderid) AS sent,
COUNT(received.recipientid) AS received
FROM
users
LEFT JOIN inbox AS sent ON sent.senderid = users.id
LEFT JOIN inbox AS received ON received.recipientid = users.id
GROUP BY sent.senderid, received.recipientid
ORDER BY users.id;