How to display all columns associated with duplicate emails in SQL server 2008 - sql

I've done some research on looking for a way to filter duplicate emails so all columns display the data associated with these duplicate emails, but can't find an answer to help me with this.
I currently have data pulled using the following code:
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email
HAVING
(COUNT(Email) > 1)
Order by Email
It then gives me xxxxxx amount of rows. I then want to be able to pull any data (columns) that are associated with these duplicate emails -and just the duplicates.
SELECT * FROM [marks_party_MasterInvite] .[dbo].[InviteList]
WHERE
Email in(Select Email FROM [marks_party_MasterInvite].[dbo].[InviteList] GROUP BY Email HAVING COUNT(Email)>1)
I know I am doing something wrong, because the row count doesn't match.
So any help would be greatly appreciated!
Thanks guys,

You want to use window functions. The following adds the count to each row. Then you can use a where filter to get all the columns:
SELECT il.*
FROM (select il.*, count(*) over (partition by email) as cnt
from [cem_farmers_masterinvitelist].[dbo].InviteList
) il
where cnt > 1
Order by Email
The counts don't match because when you fetch every row, you are going to get duplicates. In the first query, you are getting distinct emails.

Join the source table, to the knowledge you just generated about your source:
SELECT *
FROM [cem_farmers_masterinvitelist].[dbo].InviteList src
INNER JOIN
(
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email
) qDupes
ON qDupes.email = src.email AND qDupes.dup_count > 1

Row count will be different, if you want to get count of all dupe records involved, you can use *WITH rollup *, witch will total the number for you... This number should match the row number of second Query...
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email WITH ROLEUP
HAVING
(COUNT(Email) > 1)
Order by Email

Related

SQL help - Summary with data from values listed

Trying to figure this out, but am stuck... I am looking to calculate distinct emails from a system that holds email address from multiple companies, and want to summarize by the companies that are values within a column..
Current query:
select count(*), count(EMAIL), count(distinct EMAIL), count(company) from "email_db"
GROUP BY(company);
I can not get the company values to show up, just the counts, so ideally the results would be:
Company XYZ 2
Company ABC 1
What statement should replace count(company) to show the actual values within the field to summarize it by?
select count(*), count(EMAIL), count(distinct EMAIL), count(company) from "email_db"
GROUP BY(company);
Any help would be greatly appreciated to get me to the correct results... This is simple enough to do in Excel.
You seem to want to add company to the select:
select company, count(*), count(EMAIL), count(distinct EMAIL)
from "email_db"
group by company;
The three columns are:
count(*) -- number of matching rows for the company, even if email is null.
count(email) -- number of matching rows where email is not null.
count(distinct email) -- number of different emails.

SQL SELECT Full Row with Duplicated Data in One Column

I am using Microsoft SQL Server 2014.
I am able to list emails which are duplicated.
But I am unable to list the entire row, which contain other fields such as EmployeeId, Username, FirstName, LastName, etc.
SELECT Email,
COUNT(Email) AS NumOccurrences
FROM EmployeeProfile
GROUP BY Email
HAVING ( COUNT(Email) > 1 )
May I know how can I list all field in the rows that contains Email appearing more than once in the table?
Thank you.
Try this:
WITH DataSource AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY email) count_calc
FROM EmployeeProfile
)
SELECT *
FROM DataSource
WHERE count_calc > 1
select distinct * from EmployeeProfile where email in (SELECT
Email
FROM EmployeeProfile
GROUP BY Email
HAVING COUNT(*) > 1 )
SQL Fiddle
with cte as (
select *
, count(1) over (partition by email) noDuplicates
from Demo
)
select *
from cte
where noDuplicates > 1
order by Email, EmployeeId
Explanation:
I've used a common table expression (cte) here; but you could equally use a subquery; it makes no difference.
This cte/subquery fetches every row, and includes a new field called noDuplicates which says how many records have that same email address (including the record itself; so noDuplicates=1 actually means there are no duplicates; whilst noDuplicates=2 means the record itself and 1 duplicate, or 2 records with this email address). This field is calculated using an aggregate function over a window. You can read up on window functions here: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-2017
In out outer query we're then selecting only those records with noDuplicates greater than 1; i.e. where there are multiple records with the same mail address.
Finally I've sorted by Email and EmployeeId; so that duplicates are listed alongside one another, and are presented in the sequence in which they were (presumably) created; just to make whoever's then dealing with these results life easy.
If EmployeeId is unique, then you can EXISTS :
SELECT ep.*
FROM EmployeeProfile ep
WHERE EXISTS (SELECT 1
FROM EmployeeProfile ep1
WHERE ep1.Email = ep.Email AND ep1.EmployeeId <> ep.EmployeeId
);

Sql query where column value is unique and don't include

I have table called users with columns
Logins first name last name and email addresses.
What I want to be appeared is
Email which is not unique and don’t show email addresses which starts from “geen” and “na”.
Can someone help me please?
You can use like to remove emails that start or contain certain strings in the where clause.
If you just want the email column and the number of duplicates you can use count() aggregation with group by and use having clause to only show results with a count(*) > 1:
select email, count(*) as cnt
from users
where email not like 'green%'
and email not like 'na%'
group by email
having count(*) > 1
If you want to see all of the row data for those emails that have duplicates you could use a common table expression along with the count(*) over() window aggregation function or row_number()
;with cte as (
select *
, row_number() over (partition by email order by lastname, firstname) as rn
, count(*) over (partition by email) as cnt
from users
where email not like 'green%'
and email not like 'na%'
)
select *
from cte
where cnt > 1
--or email like '%[0123456789]' /* uncomment to also show emails ending in a number */
order by email, rn
You can use this:
SELECT email, COUNT(email) AS qty
FROM users
WHERE email NOT LIKE 'geen%'
AND email NOT LIKE 'na%'
GROUP BY email
HAVING ( COUNT(email) > 1 )
SELECT *
FROM USERS
WHERE EMAIL NOT LIKE 'geen%'
AND EMAIL NOT LIKE 'na%'

GroupBy Query Which Shows Related Data In Addition to Grouping Clause

Ugh, I know it's a terrible title, but I can't think of a way to summarize my question in a simple statement. It's a fairly basic T-SQL query question but I haven't used T-SQL much in the last year or so and my brain simply doesn't want to work today.
Basically I have a table with usernames (email address) and a client id. There can't be multiple emails per client, but there can be multiple emails for different clients. I'm trying to do a group on email addresses to get a count of how many emails are associated with 1 or more clients - that's the easy part. Where I'm struggling is trying to also list which client ids the email address is associated to.
For example, I have this query which gives me 1/2 of what I'm looking for:
select UserName, COUNT(*)
from UserTable
group by UserName
having COUNT(*) > 1
order by COUNT(*) desc
But I would also like to have either a row-per client, or even just multiple new columns showing each client id the email address is associated with such as:
user1#test.com 3
user1#test.com 34
user1#test.com 9
OR
user1#test.com 3 34 9
Any assistance is appreciated.
If you're using SQL-Server, you can use the COUNT window function:
SELECT UserName, UserId, COUNT(UserId) OVER (PARTITION BY UserName) AS Counts
FROM UserTable
Then to pick out only those with a count greater than 1:
SELECT * FROM (
SELECT UserName, UserId, COUNT(UserId) OVER (PARTITION BY UserName) AS Counts
FROM UserTable
) rows
WHERE rows.Counts > 1
To get them into the second format, you'd need to use some row concatenation strategy - FOR XML PATH is a popular one.
You can use FOR XML PATH:
Select UserName, COUNT(*),
substring(
(
Select ','+clientID AS [text()]
From UserTable UTI
Where UTI.UserName = UTO.UserName
ORDER BY UTI.clientID
For XML PATH ('')
), 2, 1000)
From UserTable UTO
group by UserName
having COUNT(*) > 1
order by COUNT(*) desc

SQL Server Duplicate Checking

What is the best way to determine duplicate records in a SQL Server table?
For instance, I want to find the last duplicate email received in a table (table has primary key, receiveddate and email fields).
Sample data:
1 01/01/2008 stuff#stuff.com
2 02/01/2008 stuff#stuff.com
3 01/12/2008 noone#stuff.com
something like this
select email ,max(receiveddate) as MaxDate
from YourTable
group by email
having count(email) > 1
Try something like:
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ReceivedDate, Email ORDER BY ReceivedDate, Email DESC) AS RowNumber
FROM EmailTable
) a
WHERE RowNumber = 1
See http://www.technicaloverload.com/working-with-duplicates-in-sql-server/
Couldn't you join the list on the e-mail field and then see what nulls you get in your result?
Or better yet, count the instances of each e-mail address? And only return the ones with count > 1
Or even take the email and id fields. And return the entries where the e-mail is the same, and the IDs are different. (To avoid duplicates don't use != but rather either < or >.)
SELECT [id], [receivedate], [email]
FROM [mytable]
WHERE [email] IN ( SELECT [email]
FROM [myTable]
GROUP BY [email]
HAVING COUNT([email]) > 1 )
Do you want a list of the last items? If so you could use:
SELECT [info] FROM [table] t WHERE NOT EXISTS (SELECT * FROM [table] tCheck WHERE t.date > tCheck.date)
If you want a list of all duplicate email address use GROUP BY to collect similar data, then a HAVING clause to make sure the quantity is more than 1:
SELECT [info] FROM [table] GROUP BY [email] HAVING Count(*) > 1 DESC
If you want the last duplicate e-mail (a single result) you simply add a "TOP 1" and "ORDER BY":
SELECT TOP 1 [info] FROM [table] GROUP BY [email] HAVING Count(*) > 1 ORDER BY Date DESC
If you have surrogate key, it is relatively easy to use the group by syntax mentioned in SQLMenance's post. Essentially, group by all the fields that make two or more rows 'the same'.
Example pseudo-code to delete duplicate records.
Create table people (ID(PK), Name, Address, DOB)
Delete from people where id not in (
Select min(ID) from people group by name, address, dob
)
Try this
select * from table a, table b
where a.email = b.email