SQL SELECT Full Row with Duplicated Data in One Column - sql

I am using Microsoft SQL Server 2014.
I am able to list emails which are duplicated.
But I am unable to list the entire row, which contain other fields such as EmployeeId, Username, FirstName, LastName, etc.
SELECT Email,
COUNT(Email) AS NumOccurrences
FROM EmployeeProfile
GROUP BY Email
HAVING ( COUNT(Email) > 1 )
May I know how can I list all field in the rows that contains Email appearing more than once in the table?
Thank you.

Try this:
WITH DataSource AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY email) count_calc
FROM EmployeeProfile
)
SELECT *
FROM DataSource
WHERE count_calc > 1

select distinct * from EmployeeProfile where email in (SELECT
Email
FROM EmployeeProfile
GROUP BY Email
HAVING COUNT(*) > 1 )

SQL Fiddle
with cte as (
select *
, count(1) over (partition by email) noDuplicates
from Demo
)
select *
from cte
where noDuplicates > 1
order by Email, EmployeeId
Explanation:
I've used a common table expression (cte) here; but you could equally use a subquery; it makes no difference.
This cte/subquery fetches every row, and includes a new field called noDuplicates which says how many records have that same email address (including the record itself; so noDuplicates=1 actually means there are no duplicates; whilst noDuplicates=2 means the record itself and 1 duplicate, or 2 records with this email address). This field is calculated using an aggregate function over a window. You can read up on window functions here: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-2017
In out outer query we're then selecting only those records with noDuplicates greater than 1; i.e. where there are multiple records with the same mail address.
Finally I've sorted by Email and EmployeeId; so that duplicates are listed alongside one another, and are presented in the sequence in which they were (presumably) created; just to make whoever's then dealing with these results life easy.

If EmployeeId is unique, then you can EXISTS :
SELECT ep.*
FROM EmployeeProfile ep
WHERE EXISTS (SELECT 1
FROM EmployeeProfile ep1
WHERE ep1.Email = ep.Email AND ep1.EmployeeId <> ep.EmployeeId
);

Related

Query to update duplicates

I have the following query that returns duplicates that i am running on sql server. So I need to write a query that will update second instance of the email address based on the select query that retrieves dulicates to for example
imports#rohnis.com to imports#rohnis.com.duplicate. If it is info#eps.ws then info#eps.ws.duplicate. So basically postfix
the email address with the word duplicate. It could be any email address
Query to search duplicates
SELECT ta.Id
,ta.Email
,ta.ClientCompanyId
FROM [IdentityDB_CSR].[dbo].[User] ta
WHERE (SELECT COUNT(*)
FROM [IdentityDB_CSR].[dbo].[User] ta2
WHERE ta.Email=ta2.Email
AND ta.ClientCompanyId=ta2.ClientCompanyId)>1
Output of the query
Query to update
update [IdentityDB_CSR].[dbo].[User]
set Email = 'info#eps.ws.duplicate'
where id = 87183
You could use an updatable cte:
with cte as (
select
Email,
row_number() over(partition by Email, ClientCompanyId order by id desc) rn
from [IdentityDB_CSR].[dbo].[User]
)
update cte
set Email = Email + '.duplicate'
where rn > 1
This identifies duplicates as record that share the same Email and ClientCompanyId. The record that has the greatest id is left untouched, while for others we add '.duplicate at the end of the Email.

DISTINCT AND COUNT(*)=1 not working on SQL

I need to show the ID (which is unique in every case) and the name, which is sometimes different. In my code I only want to show the names IF they are unique.
I tried with both distinct and count(*)=1, nothing solves my problem.
SELECT DISTINCT id, name
FROM person
GROUP BY id, name
HAVING count(name) = 1;
The result is still showing the names multiple times
By "unique", I assume you mean names that only appear once. That is not what "distinct" means in SQL; the use of distinct is to remove duplicates (either for counting or in a result set).
If so:
SELECT MAX(id), name
FROM person
GROUP BY name
HAVING COUNT(*) = 1;
If your DBMS supports it, you can use a window function:
SELECT id, name
FROM (
SELECT id, name, COUNT(*) OVER(PARTITION BY name) AS NameCount -- get count of each name
FROM person
) src
WHERE NameCount = 1
If not, you can do:
SELECT id, name
FROM person
WHERE name IN (
SELECT name
FROM person
GROUP BY name
HAVING COUNT(*) = 1 -- Only get names that occur once
)

GroupBy Query Which Shows Related Data In Addition to Grouping Clause

Ugh, I know it's a terrible title, but I can't think of a way to summarize my question in a simple statement. It's a fairly basic T-SQL query question but I haven't used T-SQL much in the last year or so and my brain simply doesn't want to work today.
Basically I have a table with usernames (email address) and a client id. There can't be multiple emails per client, but there can be multiple emails for different clients. I'm trying to do a group on email addresses to get a count of how many emails are associated with 1 or more clients - that's the easy part. Where I'm struggling is trying to also list which client ids the email address is associated to.
For example, I have this query which gives me 1/2 of what I'm looking for:
select UserName, COUNT(*)
from UserTable
group by UserName
having COUNT(*) > 1
order by COUNT(*) desc
But I would also like to have either a row-per client, or even just multiple new columns showing each client id the email address is associated with such as:
user1#test.com 3
user1#test.com 34
user1#test.com 9
OR
user1#test.com 3 34 9
Any assistance is appreciated.
If you're using SQL-Server, you can use the COUNT window function:
SELECT UserName, UserId, COUNT(UserId) OVER (PARTITION BY UserName) AS Counts
FROM UserTable
Then to pick out only those with a count greater than 1:
SELECT * FROM (
SELECT UserName, UserId, COUNT(UserId) OVER (PARTITION BY UserName) AS Counts
FROM UserTable
) rows
WHERE rows.Counts > 1
To get them into the second format, you'd need to use some row concatenation strategy - FOR XML PATH is a popular one.
You can use FOR XML PATH:
Select UserName, COUNT(*),
substring(
(
Select ','+clientID AS [text()]
From UserTable UTI
Where UTI.UserName = UTO.UserName
ORDER BY UTI.clientID
For XML PATH ('')
), 2, 1000)
From UserTable UTO
group by UserName
having COUNT(*) > 1
order by COUNT(*) desc

How to display all columns associated with duplicate emails in SQL server 2008

I've done some research on looking for a way to filter duplicate emails so all columns display the data associated with these duplicate emails, but can't find an answer to help me with this.
I currently have data pulled using the following code:
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email
HAVING
(COUNT(Email) > 1)
Order by Email
It then gives me xxxxxx amount of rows. I then want to be able to pull any data (columns) that are associated with these duplicate emails -and just the duplicates.
SELECT * FROM [marks_party_MasterInvite] .[dbo].[InviteList]
WHERE
Email in(Select Email FROM [marks_party_MasterInvite].[dbo].[InviteList] GROUP BY Email HAVING COUNT(Email)>1)
I know I am doing something wrong, because the row count doesn't match.
So any help would be greatly appreciated!
Thanks guys,
You want to use window functions. The following adds the count to each row. Then you can use a where filter to get all the columns:
SELECT il.*
FROM (select il.*, count(*) over (partition by email) as cnt
from [cem_farmers_masterinvitelist].[dbo].InviteList
) il
where cnt > 1
Order by Email
The counts don't match because when you fetch every row, you are going to get duplicates. In the first query, you are getting distinct emails.
Join the source table, to the knowledge you just generated about your source:
SELECT *
FROM [cem_farmers_masterinvitelist].[dbo].InviteList src
INNER JOIN
(
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email
) qDupes
ON qDupes.email = src.email AND qDupes.dup_count > 1
Row count will be different, if you want to get count of all dupe records involved, you can use *WITH rollup *, witch will total the number for you... This number should match the row number of second Query...
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email WITH ROLEUP
HAVING
(COUNT(Email) > 1)
Order by Email

SQL Server Duplicate Checking

What is the best way to determine duplicate records in a SQL Server table?
For instance, I want to find the last duplicate email received in a table (table has primary key, receiveddate and email fields).
Sample data:
1 01/01/2008 stuff#stuff.com
2 02/01/2008 stuff#stuff.com
3 01/12/2008 noone#stuff.com
something like this
select email ,max(receiveddate) as MaxDate
from YourTable
group by email
having count(email) > 1
Try something like:
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ReceivedDate, Email ORDER BY ReceivedDate, Email DESC) AS RowNumber
FROM EmailTable
) a
WHERE RowNumber = 1
See http://www.technicaloverload.com/working-with-duplicates-in-sql-server/
Couldn't you join the list on the e-mail field and then see what nulls you get in your result?
Or better yet, count the instances of each e-mail address? And only return the ones with count > 1
Or even take the email and id fields. And return the entries where the e-mail is the same, and the IDs are different. (To avoid duplicates don't use != but rather either < or >.)
SELECT [id], [receivedate], [email]
FROM [mytable]
WHERE [email] IN ( SELECT [email]
FROM [myTable]
GROUP BY [email]
HAVING COUNT([email]) > 1 )
Do you want a list of the last items? If so you could use:
SELECT [info] FROM [table] t WHERE NOT EXISTS (SELECT * FROM [table] tCheck WHERE t.date > tCheck.date)
If you want a list of all duplicate email address use GROUP BY to collect similar data, then a HAVING clause to make sure the quantity is more than 1:
SELECT [info] FROM [table] GROUP BY [email] HAVING Count(*) > 1 DESC
If you want the last duplicate e-mail (a single result) you simply add a "TOP 1" and "ORDER BY":
SELECT TOP 1 [info] FROM [table] GROUP BY [email] HAVING Count(*) > 1 ORDER BY Date DESC
If you have surrogate key, it is relatively easy to use the group by syntax mentioned in SQLMenance's post. Essentially, group by all the fields that make two or more rows 'the same'.
Example pseudo-code to delete duplicate records.
Create table people (ID(PK), Name, Address, DOB)
Delete from people where id not in (
Select min(ID) from people group by name, address, dob
)
Try this
select * from table a, table b
where a.email = b.email