Query to update duplicates - sql

I have the following query that returns duplicates that i am running on sql server. So I need to write a query that will update second instance of the email address based on the select query that retrieves dulicates to for example
imports#rohnis.com to imports#rohnis.com.duplicate. If it is info#eps.ws then info#eps.ws.duplicate. So basically postfix
the email address with the word duplicate. It could be any email address
Query to search duplicates
SELECT ta.Id
,ta.Email
,ta.ClientCompanyId
FROM [IdentityDB_CSR].[dbo].[User] ta
WHERE (SELECT COUNT(*)
FROM [IdentityDB_CSR].[dbo].[User] ta2
WHERE ta.Email=ta2.Email
AND ta.ClientCompanyId=ta2.ClientCompanyId)>1
Output of the query
Query to update
update [IdentityDB_CSR].[dbo].[User]
set Email = 'info#eps.ws.duplicate'
where id = 87183

You could use an updatable cte:
with cte as (
select
Email,
row_number() over(partition by Email, ClientCompanyId order by id desc) rn
from [IdentityDB_CSR].[dbo].[User]
)
update cte
set Email = Email + '.duplicate'
where rn > 1
This identifies duplicates as record that share the same Email and ClientCompanyId. The record that has the greatest id is left untouched, while for others we add '.duplicate at the end of the Email.

Related

Postgresql query: update status of limit number of records based on group size

I have a postgresql table contains a list of email addresses. The table has three columns, Email, EmailServer (e.g., gmail.com, outlook.com, msn.com, and yahoo.com.ca etc.), and Valid (boolean).
Now, I want to group those emails by EmailServer and then update the first 3 records of each large group (count >=6) as Valid = true while leaving the rest of each group as Valid = false.
I failed to get the wanted output by below query:
UPDATE public."EmailContacts"
SET "Valid"=true
WHERE "EmailServer" IN (
SELECT "EmailServer"
FROM public."EmailContacts"
GROUP by "EmailServer"
HAVING count(*) >=6
LIMIT 5)
Please help to modify so as to get the expected results. Would be greatly appreciated for any kind of your help!
WITH major_servers AS (
SELECT email_server
FROM email_address
GROUP by email_server
HAVING count(*) >=6
),
enumerated_emails AS (
SELECT email,
email_server,
row_number() OVER (PARTITION BY email_server ORDER BY email) AS row_number --TODO:: ORDER BY email - attention
FROM email_address
WHERE email_server IN (SELECT email_server FROM major_servers)
)
UPDATE email_address
SET valid = true
WHERE email IN (SELECT email
FROM enumerated_emails ee
WHERE ee.row_number <= 3);
The first query major_servers finds major groups where more than 5 email servers exist.
The second query enumerated_emails enumerates emails by their natural order (see a TODO comment, I think you should choose another ORDER BY criteria) which belong to major groups using window function row_number().
The last query updates the first 3 rows in each major server group.
Find the sql-fiddle here.
You need to get the servers, then order the mails from which one and then perform the update. Something like this:
WITH DataSourceServers AS
(
SELECT "EmailServer"
FROM public."EmailContacts"
GROUP by "EmailServer"
HAVING count(*) >=6
),DataSourceEmails AS
(
SELECT "Email", row_number() OVER (PARTITION BY "EmailServer" ORDER BY "Email") AS rn
FROM public."EmailContacts"
WHERE "EmailServer" IN (SELECT "EmailServer" FROM DataSourceServers)
)
UPDATE public."EmailContacts"
SET "Valid" = true
FROM public."EmailContacts" E
INNER JOIN DataSourceEmails SE
WHERE E."EmailServer" = SE."EmailServer"
AND E."Email" = SE."Email"
AND SE.rn <= 3;

SQL SELECT Full Row with Duplicated Data in One Column

I am using Microsoft SQL Server 2014.
I am able to list emails which are duplicated.
But I am unable to list the entire row, which contain other fields such as EmployeeId, Username, FirstName, LastName, etc.
SELECT Email,
COUNT(Email) AS NumOccurrences
FROM EmployeeProfile
GROUP BY Email
HAVING ( COUNT(Email) > 1 )
May I know how can I list all field in the rows that contains Email appearing more than once in the table?
Thank you.
Try this:
WITH DataSource AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY email) count_calc
FROM EmployeeProfile
)
SELECT *
FROM DataSource
WHERE count_calc > 1
select distinct * from EmployeeProfile where email in (SELECT
Email
FROM EmployeeProfile
GROUP BY Email
HAVING COUNT(*) > 1 )
SQL Fiddle
with cte as (
select *
, count(1) over (partition by email) noDuplicates
from Demo
)
select *
from cte
where noDuplicates > 1
order by Email, EmployeeId
Explanation:
I've used a common table expression (cte) here; but you could equally use a subquery; it makes no difference.
This cte/subquery fetches every row, and includes a new field called noDuplicates which says how many records have that same email address (including the record itself; so noDuplicates=1 actually means there are no duplicates; whilst noDuplicates=2 means the record itself and 1 duplicate, or 2 records with this email address). This field is calculated using an aggregate function over a window. You can read up on window functions here: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-2017
In out outer query we're then selecting only those records with noDuplicates greater than 1; i.e. where there are multiple records with the same mail address.
Finally I've sorted by Email and EmployeeId; so that duplicates are listed alongside one another, and are presented in the sequence in which they were (presumably) created; just to make whoever's then dealing with these results life easy.
If EmployeeId is unique, then you can EXISTS :
SELECT ep.*
FROM EmployeeProfile ep
WHERE EXISTS (SELECT 1
FROM EmployeeProfile ep1
WHERE ep1.Email = ep.Email AND ep1.EmployeeId <> ep.EmployeeId
);

Sql query where column value is unique and don't include

I have table called users with columns
Logins first name last name and email addresses.
What I want to be appeared is
Email which is not unique and don’t show email addresses which starts from “geen” and “na”.
Can someone help me please?
You can use like to remove emails that start or contain certain strings in the where clause.
If you just want the email column and the number of duplicates you can use count() aggregation with group by and use having clause to only show results with a count(*) > 1:
select email, count(*) as cnt
from users
where email not like 'green%'
and email not like 'na%'
group by email
having count(*) > 1
If you want to see all of the row data for those emails that have duplicates you could use a common table expression along with the count(*) over() window aggregation function or row_number()
;with cte as (
select *
, row_number() over (partition by email order by lastname, firstname) as rn
, count(*) over (partition by email) as cnt
from users
where email not like 'green%'
and email not like 'na%'
)
select *
from cte
where cnt > 1
--or email like '%[0123456789]' /* uncomment to also show emails ending in a number */
order by email, rn
You can use this:
SELECT email, COUNT(email) AS qty
FROM users
WHERE email NOT LIKE 'geen%'
AND email NOT LIKE 'na%'
GROUP BY email
HAVING ( COUNT(email) > 1 )
SELECT *
FROM USERS
WHERE EMAIL NOT LIKE 'geen%'
AND EMAIL NOT LIKE 'na%'

How to check if a person has duplicate date records?

I am looking to query my Access database from Excel (DAO) to determine if any name in the table has more than one record per date. E.g. If Bob has two records on 05/05/17 then I want to return both records as part of a recordset.
Seems like you are looking for something like:
SELECT *
FROM yourtable
INNER JOIN
(
SELECT count(*), name, date
FROM yourtable
GROUP BY name, date
HAVING COUNT(*) > 1
) multi
ON multi.name = yourtable.name
AND multi.date = yourtable.date
The inner select returns rows with more than 1 entry for the same name and date.
In Access you can do
select name, date
from your_table
group by name, date
having count(*) > 1

How to display all columns associated with duplicate emails in SQL server 2008

I've done some research on looking for a way to filter duplicate emails so all columns display the data associated with these duplicate emails, but can't find an answer to help me with this.
I currently have data pulled using the following code:
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email
HAVING
(COUNT(Email) > 1)
Order by Email
It then gives me xxxxxx amount of rows. I then want to be able to pull any data (columns) that are associated with these duplicate emails -and just the duplicates.
SELECT * FROM [marks_party_MasterInvite] .[dbo].[InviteList]
WHERE
Email in(Select Email FROM [marks_party_MasterInvite].[dbo].[InviteList] GROUP BY Email HAVING COUNT(Email)>1)
I know I am doing something wrong, because the row count doesn't match.
So any help would be greatly appreciated!
Thanks guys,
You want to use window functions. The following adds the count to each row. Then you can use a where filter to get all the columns:
SELECT il.*
FROM (select il.*, count(*) over (partition by email) as cnt
from [cem_farmers_masterinvitelist].[dbo].InviteList
) il
where cnt > 1
Order by Email
The counts don't match because when you fetch every row, you are going to get duplicates. In the first query, you are getting distinct emails.
Join the source table, to the knowledge you just generated about your source:
SELECT *
FROM [cem_farmers_masterinvitelist].[dbo].InviteList src
INNER JOIN
(
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email
) qDupes
ON qDupes.email = src.email AND qDupes.dup_count > 1
Row count will be different, if you want to get count of all dupe records involved, you can use *WITH rollup *, witch will total the number for you... This number should match the row number of second Query...
SELECT
Email, COUNT(Email) AS dup_count
FROM
[cem_farmers_masterinvitelist].[dbo].InviteList
GROUP BY
Email WITH ROLEUP
HAVING
(COUNT(Email) > 1)
Order by Email