Sql query where column value is unique and don't include - sql

I have table called users with columns
Logins first name last name and email addresses.
What I want to be appeared is
Email which is not unique and don’t show email addresses which starts from “geen” and “na”.
Can someone help me please?

You can use like to remove emails that start or contain certain strings in the where clause.
If you just want the email column and the number of duplicates you can use count() aggregation with group by and use having clause to only show results with a count(*) > 1:
select email, count(*) as cnt
from users
where email not like 'green%'
and email not like 'na%'
group by email
having count(*) > 1
If you want to see all of the row data for those emails that have duplicates you could use a common table expression along with the count(*) over() window aggregation function or row_number()
;with cte as (
select *
, row_number() over (partition by email order by lastname, firstname) as rn
, count(*) over (partition by email) as cnt
from users
where email not like 'green%'
and email not like 'na%'
)
select *
from cte
where cnt > 1
--or email like '%[0123456789]' /* uncomment to also show emails ending in a number */
order by email, rn

You can use this:
SELECT email, COUNT(email) AS qty
FROM users
WHERE email NOT LIKE 'geen%'
AND email NOT LIKE 'na%'
GROUP BY email
HAVING ( COUNT(email) > 1 )

SELECT *
FROM USERS
WHERE EMAIL NOT LIKE 'geen%'
AND EMAIL NOT LIKE 'na%'

Related

Postgresql query: update status of limit number of records based on group size

I have a postgresql table contains a list of email addresses. The table has three columns, Email, EmailServer (e.g., gmail.com, outlook.com, msn.com, and yahoo.com.ca etc.), and Valid (boolean).
Now, I want to group those emails by EmailServer and then update the first 3 records of each large group (count >=6) as Valid = true while leaving the rest of each group as Valid = false.
I failed to get the wanted output by below query:
UPDATE public."EmailContacts"
SET "Valid"=true
WHERE "EmailServer" IN (
SELECT "EmailServer"
FROM public."EmailContacts"
GROUP by "EmailServer"
HAVING count(*) >=6
LIMIT 5)
Please help to modify so as to get the expected results. Would be greatly appreciated for any kind of your help!
WITH major_servers AS (
SELECT email_server
FROM email_address
GROUP by email_server
HAVING count(*) >=6
),
enumerated_emails AS (
SELECT email,
email_server,
row_number() OVER (PARTITION BY email_server ORDER BY email) AS row_number --TODO:: ORDER BY email - attention
FROM email_address
WHERE email_server IN (SELECT email_server FROM major_servers)
)
UPDATE email_address
SET valid = true
WHERE email IN (SELECT email
FROM enumerated_emails ee
WHERE ee.row_number <= 3);
The first query major_servers finds major groups where more than 5 email servers exist.
The second query enumerated_emails enumerates emails by their natural order (see a TODO comment, I think you should choose another ORDER BY criteria) which belong to major groups using window function row_number().
The last query updates the first 3 rows in each major server group.
Find the sql-fiddle here.
You need to get the servers, then order the mails from which one and then perform the update. Something like this:
WITH DataSourceServers AS
(
SELECT "EmailServer"
FROM public."EmailContacts"
GROUP by "EmailServer"
HAVING count(*) >=6
),DataSourceEmails AS
(
SELECT "Email", row_number() OVER (PARTITION BY "EmailServer" ORDER BY "Email") AS rn
FROM public."EmailContacts"
WHERE "EmailServer" IN (SELECT "EmailServer" FROM DataSourceServers)
)
UPDATE public."EmailContacts"
SET "Valid" = true
FROM public."EmailContacts" E
INNER JOIN DataSourceEmails SE
WHERE E."EmailServer" = SE."EmailServer"
AND E."Email" = SE."Email"
AND SE.rn <= 3;

SQL SELECT Full Row with Duplicated Data in One Column

I am using Microsoft SQL Server 2014.
I am able to list emails which are duplicated.
But I am unable to list the entire row, which contain other fields such as EmployeeId, Username, FirstName, LastName, etc.
SELECT Email,
COUNT(Email) AS NumOccurrences
FROM EmployeeProfile
GROUP BY Email
HAVING ( COUNT(Email) > 1 )
May I know how can I list all field in the rows that contains Email appearing more than once in the table?
Thank you.
Try this:
WITH DataSource AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY email) count_calc
FROM EmployeeProfile
)
SELECT *
FROM DataSource
WHERE count_calc > 1
select distinct * from EmployeeProfile where email in (SELECT
Email
FROM EmployeeProfile
GROUP BY Email
HAVING COUNT(*) > 1 )
SQL Fiddle
with cte as (
select *
, count(1) over (partition by email) noDuplicates
from Demo
)
select *
from cte
where noDuplicates > 1
order by Email, EmployeeId
Explanation:
I've used a common table expression (cte) here; but you could equally use a subquery; it makes no difference.
This cte/subquery fetches every row, and includes a new field called noDuplicates which says how many records have that same email address (including the record itself; so noDuplicates=1 actually means there are no duplicates; whilst noDuplicates=2 means the record itself and 1 duplicate, or 2 records with this email address). This field is calculated using an aggregate function over a window. You can read up on window functions here: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-2017
In out outer query we're then selecting only those records with noDuplicates greater than 1; i.e. where there are multiple records with the same mail address.
Finally I've sorted by Email and EmployeeId; so that duplicates are listed alongside one another, and are presented in the sequence in which they were (presumably) created; just to make whoever's then dealing with these results life easy.
If EmployeeId is unique, then you can EXISTS :
SELECT ep.*
FROM EmployeeProfile ep
WHERE EXISTS (SELECT 1
FROM EmployeeProfile ep1
WHERE ep1.Email = ep.Email AND ep1.EmployeeId <> ep.EmployeeId
);

GroupBy Query Which Shows Related Data In Addition to Grouping Clause

Ugh, I know it's a terrible title, but I can't think of a way to summarize my question in a simple statement. It's a fairly basic T-SQL query question but I haven't used T-SQL much in the last year or so and my brain simply doesn't want to work today.
Basically I have a table with usernames (email address) and a client id. There can't be multiple emails per client, but there can be multiple emails for different clients. I'm trying to do a group on email addresses to get a count of how many emails are associated with 1 or more clients - that's the easy part. Where I'm struggling is trying to also list which client ids the email address is associated to.
For example, I have this query which gives me 1/2 of what I'm looking for:
select UserName, COUNT(*)
from UserTable
group by UserName
having COUNT(*) > 1
order by COUNT(*) desc
But I would also like to have either a row-per client, or even just multiple new columns showing each client id the email address is associated with such as:
user1#test.com 3
user1#test.com 34
user1#test.com 9
OR
user1#test.com 3 34 9
Any assistance is appreciated.
If you're using SQL-Server, you can use the COUNT window function:
SELECT UserName, UserId, COUNT(UserId) OVER (PARTITION BY UserName) AS Counts
FROM UserTable
Then to pick out only those with a count greater than 1:
SELECT * FROM (
SELECT UserName, UserId, COUNT(UserId) OVER (PARTITION BY UserName) AS Counts
FROM UserTable
) rows
WHERE rows.Counts > 1
To get them into the second format, you'd need to use some row concatenation strategy - FOR XML PATH is a popular one.
You can use FOR XML PATH:
Select UserName, COUNT(*),
substring(
(
Select ','+clientID AS [text()]
From UserTable UTI
Where UTI.UserName = UTO.UserName
ORDER BY UTI.clientID
For XML PATH ('')
), 2, 1000)
From UserTable UTO
group by UserName
having COUNT(*) > 1
order by COUNT(*) desc

How to display all columns associated with duplicate emails in SQL server 2008

I've done some research on looking for a way to filter duplicate emails so all columns display the data associated with these duplicate emails, but can't find an answer to help me with this.
I currently have data pulled using the following code:
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email
HAVING
(COUNT(Email) > 1)
Order by Email
It then gives me xxxxxx amount of rows. I then want to be able to pull any data (columns) that are associated with these duplicate emails -and just the duplicates.
SELECT * FROM [marks_party_MasterInvite] .[dbo].[InviteList]
WHERE
Email in(Select Email FROM [marks_party_MasterInvite].[dbo].[InviteList] GROUP BY Email HAVING COUNT(Email)>1)
I know I am doing something wrong, because the row count doesn't match.
So any help would be greatly appreciated!
Thanks guys,
You want to use window functions. The following adds the count to each row. Then you can use a where filter to get all the columns:
SELECT il.*
FROM (select il.*, count(*) over (partition by email) as cnt
from [cem_farmers_masterinvitelist].[dbo].InviteList
) il
where cnt > 1
Order by Email
The counts don't match because when you fetch every row, you are going to get duplicates. In the first query, you are getting distinct emails.
Join the source table, to the knowledge you just generated about your source:
SELECT *
FROM [cem_farmers_masterinvitelist].[dbo].InviteList src
INNER JOIN
(
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email
) qDupes
ON qDupes.email = src.email AND qDupes.dup_count > 1
Row count will be different, if you want to get count of all dupe records involved, you can use *WITH rollup *, witch will total the number for you... This number should match the row number of second Query...
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email WITH ROLEUP
HAVING
(COUNT(Email) > 1)
Order by Email

Finding duplicate email addresses

I need to find duplicate emails in our database. I am looking on one table for this information. What I have so far
SELECT name.email, name.ID
From Name
Group BY Name.ID, Name.EMAIL
Having Count(*) > 1
I know that its wrong, but not sure how to write it appropriately.
remove the ID
SELECT name.email
From Name
Group BY Name.EMAIL
Having Count(*) > 1
if you want to get the number of email,
SELECT name.email, COUNT(*) totalEmailCount
From Name
Group BY Name.EMAIL
Having Count(*) > 1
The query would be
SELECT name.email, COUNT(*) FROM Name
GROUP BY Name.email HAVING COUNT(*) > 1
What you need to know is that if you group also by ID the count would be 1, thats why your query didn't work.
If you need to know the IDs of the users with emails duplicated you can do this:
select Name.ID, Name.Email from Name where Name.Email in (
SELECT name.email FROM Name
GROUP BY Name.email HAVING COUNT(*) > 1
)
The below SQL query will return the ID and EMAIL of the first matching row which contain the same(duplicate) EMAIL
select ID, EMAIL from Name group by EMAIL having count(EMAIL) > 1
If someone, wants all the duplicate EMAIL from the Name table then, he/she can execute the following SQL query
select EMAIL from Name group by EMAIL having count(EMAIL) > 1
Note that: SQL is a fully case-insensitive language.
Here you go:
SELECT name.email, COUNT(*)
FROM
Name
GROUP BY
Name.email
HAVING
COUNT(*) > 1
select id,email from
(select id,email,count(email) over (partion by email order by id) cnt from name) where cnt>1
About all posted answers here, using Group by methods. This query can also be written with self join method.
select distinct f.email
from Name f
join Name s
where f.id <> s.id and f.email = s.email
No need to put the tableName in selecting columns
SELECT email
From Name
Group BY EMAIL
Having Count(*) > 1