Finding duplicate email addresses - sql

I need to find duplicate emails in our database. I am looking on one table for this information. What I have so far
SELECT name.email, name.ID
From Name
Group BY Name.ID, Name.EMAIL
Having Count(*) > 1
I know that its wrong, but not sure how to write it appropriately.

remove the ID
SELECT name.email
From Name
Group BY Name.EMAIL
Having Count(*) > 1
if you want to get the number of email,
SELECT name.email, COUNT(*) totalEmailCount
From Name
Group BY Name.EMAIL
Having Count(*) > 1

The query would be
SELECT name.email, COUNT(*) FROM Name
GROUP BY Name.email HAVING COUNT(*) > 1
What you need to know is that if you group also by ID the count would be 1, thats why your query didn't work.
If you need to know the IDs of the users with emails duplicated you can do this:
select Name.ID, Name.Email from Name where Name.Email in (
SELECT name.email FROM Name
GROUP BY Name.email HAVING COUNT(*) > 1
)

The below SQL query will return the ID and EMAIL of the first matching row which contain the same(duplicate) EMAIL
select ID, EMAIL from Name group by EMAIL having count(EMAIL) > 1
If someone, wants all the duplicate EMAIL from the Name table then, he/she can execute the following SQL query
select EMAIL from Name group by EMAIL having count(EMAIL) > 1
Note that: SQL is a fully case-insensitive language.

Here you go:
SELECT name.email, COUNT(*)
FROM
Name
GROUP BY
Name.email
HAVING
COUNT(*) > 1

select id,email from
(select id,email,count(email) over (partion by email order by id) cnt from name) where cnt>1

About all posted answers here, using Group by methods. This query can also be written with self join method.
select distinct f.email
from Name f
join Name s
where f.id <> s.id and f.email = s.email

No need to put the tableName in selecting columns
SELECT email
From Name
Group BY EMAIL
Having Count(*) > 1

Related

sql duplicates showing all data

Given this data
id Name group
1 Jhon 001
2 Paul 002
3 Mary 001
How can I get the duplicates values showing all the fields? The duplicate is only on group, id and name won't be duplicates.
Should end up looking like one of those (any would be valid):
:::::::::::::::::::::::::::::::::::::::::::::::
group count values
001 2 1,3
:::::::::::::::::::::::::::::::::::::::::::::::
id name group
1 Jhon 001
3 Mary 001
I tried with
SELECT
group, COUNT(*)
FROM
people
GROUP BY
group
HAVING
COUNT(*) > 1
But if I try to add id and name to the group by, it won´t find any duplicate.
Thanks in advance.
Try this.
SELECT Id, Name, [Group]
FROM people
WHERE [Group] IN(
SELECT [Group]
FROM people
GROUP BY [Group]
HAVING COUNT(*) > 1)
I would do an inner query to find the groups with more than one member, and then use that inner query to bring back a list of the names.
For example:
SELECT Id, Name, group
FROM people
WHERE group in
(SELECT group
FROM people
GROUP BY group
HAVING count(*) > 1);
Avoid using Group because it is a reserved keyword in SQL :
SELECT *
FROM MyTable
WHERE groups IN(
SELECT groups
FROM MyTable
GROUP BY groups
HAVING COUNT(*) > 1)
Check Execution here
Just use exists:
select p.*
from people p
where exists (select 1
from people p2
where p2.group = p.group and
p2.id <> p.id
);
This should be the most performant solution. With an index on people(group, id), it should have very good performance.
Note: All the advice to avoid using group as a column name is good advice. You should change the name.

Sql query where column value is unique and don't include

I have table called users with columns
Logins first name last name and email addresses.
What I want to be appeared is
Email which is not unique and don’t show email addresses which starts from “geen” and “na”.
Can someone help me please?
You can use like to remove emails that start or contain certain strings in the where clause.
If you just want the email column and the number of duplicates you can use count() aggregation with group by and use having clause to only show results with a count(*) > 1:
select email, count(*) as cnt
from users
where email not like 'green%'
and email not like 'na%'
group by email
having count(*) > 1
If you want to see all of the row data for those emails that have duplicates you could use a common table expression along with the count(*) over() window aggregation function or row_number()
;with cte as (
select *
, row_number() over (partition by email order by lastname, firstname) as rn
, count(*) over (partition by email) as cnt
from users
where email not like 'green%'
and email not like 'na%'
)
select *
from cte
where cnt > 1
--or email like '%[0123456789]' /* uncomment to also show emails ending in a number */
order by email, rn
You can use this:
SELECT email, COUNT(email) AS qty
FROM users
WHERE email NOT LIKE 'geen%'
AND email NOT LIKE 'na%'
GROUP BY email
HAVING ( COUNT(email) > 1 )
SELECT *
FROM USERS
WHERE EMAIL NOT LIKE 'geen%'
AND EMAIL NOT LIKE 'na%'

using group by operators in sql

i have two columns - email id and customer id, where an email id can be associated with multiple customer ids. Now, I need to list only those email ids (along with their corresponding customer ids) which are having a count of more than 1 customer id. I tried using grouping sets, rollup and cube operators, however, am not getting the desired result.
Any help or pointers would be appreciated.
SELECT emailid
FROM
( SELECT emailid, count(custid)
FROM table
Group by emailid
Having count(custid) > 1
)
I think this will get you what you want, if I am understanding you question correctly
select emailid, customerid from tablename where emailid in
(
select emailid from tablename group by emailid having count(emailid) > 1
)
Sounds like you would need to use HAVING
e.g
SELECT email_id, COUNT(customer_id)
From sometable
GROUP BY email_id
HAVING COUNT(customer_id) > 1
HAVING allows you to filter following the grouping of a particular column.
WITH email_ids AS (
SELECT email_id, COUNT(customer_id) customer_count
FROM Table
GROUP BY email_id
HAVING count(customer_id) > 1
)
SELECT t.email_id, t.customer_id
FROM Table t
INNER JOIN email_ids ei
ON ei.email_id = t.email_id
If you need a comma separated list of all of their customer id's returned with the single email id, you could use GROUP_CONCAT for that.
This would find all email_id's with at least 1 customer_id, and give you a comma separated list of all customer_ids for that email_id:
SELECT email_id, GROUP_CONCAT(customer_id)
FROM your_table
GROUP BY email_id
HAVING count(customer_id) > 1;
Assuming email_id #1 was assigned to customer_ids 1, 2, & 3, your output would look like:
email_id | customer_id
1 | 1,2,3
I didn't realize you were using MS SQL, there's a thread here about simulating GROUP_CONCAT in MS SQL: Simulating group_concat MySQL function in Microsoft SQL Server 2005?
SELECT t1.email, t1.customer
FROM table t1
INNER JOIN (
SELECT email, COUNT(customer)
FROM table
GROUP BY email
HAVING COUNT(customer)>1
) t2 on t1.email = t2.email
This should get you what your looking for.
Basically, as other ppl have stated, you can filter group by results with HAVING. But since you want the customerids afterwards, join the entire select back to your original table to get your results. Could probably be done prettier but this is easy to understand.
SELECT
email_id,
STUFF((SELECT ',' + CONVERT(VARCHAR,customer_id) FROM cust_email_table T1 WHERE T1.email_id = T2.email_id
FOR
XML PATH('')
),1,1,'') AS customer_ids
FROM
cust_email_table T2
GROUP BY email_id
HAVING COUNT(*) > 1
this would give you a single row per email id and comma seperated list of customer id's.

Can we use join with in same table while using group by function?

For instance, I have a table with columns below:
pk_id,address,first_name,last_name
and I have a query like this to display the first name ans last name that are repetitive(duplicates)
select first_name,last_name
from table
group by first_name,last_name
having count(*)>1;
but the above query just returns first and last names but I want to display pk_id and address too that are tied to these duplicate first and last names
Can we use joins to do this on the same table.Please help!!
A simple way of doing is to build a view with the pk_id and the count of duplicates. Once you have it, it is only a matter of using a JOIN on the base table, and a filter to only keep rows having a duplicate:
SELECT T.*
FROM T
JOIN (SELECT "pk_id",
COUNT(*) OVER(PARTITION BY "first_name", "last_name") cnt
FROM T) V
ON T."pk_id" = V."pk_id"
WHERE cnt > 1
See http://sqlfiddle.com/#!4/3ecd0/9
You have to call it from an outer query, like this:
select * from table
where first_name||last_name in
(select first_name||last_name from
(select first_name, last_name, count( * )
from table
group by first_name,last_name
having count( * ) > 1
)
)
note: you may not need to concatenate the 2 fields, but I haven't tested thaT.
with
my_duplicates as
(
select
first_name,
last_name
from
my_table
group by
first_name,
last_name
having
count(*) > 1
)
select
bb.pk_id,
bb.address,
bb.first_name,
bb.last_name
from
my_duplicates aa
join my_table bb on
(
aa.first_name = bb.first_name
and
aa.last_name = bb.last_name
)
order by
bb.last_name,
bb.first_name,
bb.pk_id

SQL Server Duplicate Checking

What is the best way to determine duplicate records in a SQL Server table?
For instance, I want to find the last duplicate email received in a table (table has primary key, receiveddate and email fields).
Sample data:
1 01/01/2008 stuff#stuff.com
2 02/01/2008 stuff#stuff.com
3 01/12/2008 noone#stuff.com
something like this
select email ,max(receiveddate) as MaxDate
from YourTable
group by email
having count(email) > 1
Try something like:
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ReceivedDate, Email ORDER BY ReceivedDate, Email DESC) AS RowNumber
FROM EmailTable
) a
WHERE RowNumber = 1
See http://www.technicaloverload.com/working-with-duplicates-in-sql-server/
Couldn't you join the list on the e-mail field and then see what nulls you get in your result?
Or better yet, count the instances of each e-mail address? And only return the ones with count > 1
Or even take the email and id fields. And return the entries where the e-mail is the same, and the IDs are different. (To avoid duplicates don't use != but rather either < or >.)
SELECT [id], [receivedate], [email]
FROM [mytable]
WHERE [email] IN ( SELECT [email]
FROM [myTable]
GROUP BY [email]
HAVING COUNT([email]) > 1 )
Do you want a list of the last items? If so you could use:
SELECT [info] FROM [table] t WHERE NOT EXISTS (SELECT * FROM [table] tCheck WHERE t.date > tCheck.date)
If you want a list of all duplicate email address use GROUP BY to collect similar data, then a HAVING clause to make sure the quantity is more than 1:
SELECT [info] FROM [table] GROUP BY [email] HAVING Count(*) > 1 DESC
If you want the last duplicate e-mail (a single result) you simply add a "TOP 1" and "ORDER BY":
SELECT TOP 1 [info] FROM [table] GROUP BY [email] HAVING Count(*) > 1 ORDER BY Date DESC
If you have surrogate key, it is relatively easy to use the group by syntax mentioned in SQLMenance's post. Essentially, group by all the fields that make two or more rows 'the same'.
Example pseudo-code to delete duplicate records.
Create table people (ID(PK), Name, Address, DOB)
Delete from people where id not in (
Select min(ID) from people group by name, address, dob
)
Try this
select * from table a, table b
where a.email = b.email