update rows with duplicate entries

update rows with duplicate entries - sql

I have the same situation as this other question, but I don't want to select the rows, I want to update these rows.
I used the solution Scott Saunders made:
select * from table where email in (
select email from table group by email having count(*) > 1
)
That worked, but I wanted to change/update a row-value in these entries, so I tried:
UPDATE `members` SET `banned` = "1" WHERE `ip` IN (
SELECT `ip` FROM `members` GROUP BY `ip` HAVING COUNT(*) > 1
)
but I get this error:
You can't specify target table
'members' for update in FROM
clause

Use an intermediate subquery to get around the 1093 error:
UPDATE `members`
SET `banned` = '1'
WHERE `ip` IN (SELECT x.ip
FROM (SELECT `ip`
FROM `members`
GROUP BY `ip`
HAVING COUNT(*) > 1) x)
Otherwise, use a JOIN on a derived table:
UPDATE MEMBERS
JOIN (SELECT `ip`
FROM `members`
GROUP BY `ip`
HAVING COUNT(*) > 1) x ON x.ip = MEMBERS.ip
SET banned = '1'

This error means you can't update the members table based on criteria of the members table. In your case, you are attempting to update the members table based on a subquery of the members table. In the process you are changing that table. Think of it like a chicken before the egg paradox.
You'll need to make a temporary reference table or save/paste the ip ranges in order to run that update statement.

Related

Alter a column to indicate there are duplicates in a separate column?

I have a table in which one of the columns, column_A has duplicate values. I also have a blank column_indicator which I would like to populate with 1s for all cases where the value in column_A occurs more than once.
I know how to SELECT the duplicates and have used the following formula:
SELECT [dbo].[myTable].[column_A], COUNT(*)
FROM [dbo].[myTable]
GROUP BY [dbo].[myTable].[column_A]
HAVING COUNT(*) > 1
How do I update column_indicator? I have tried:
UPDATE [dbo].[myTable]
SET [dbo].[myTable].[column_indicator] = 1
WHERE
GROUP BY [dbo].[myTable].[column_A]
HAVING COUNT(*) > 1
I know I am off base but cannot figure out how to proceed with this column update.

You can do a window count in a common table expression and then update it:
WITH cte AS (
SELECT
[column_indicator],
[column_A],
COUNT(*) OVER(PARTITION BY [column_A]) cnt
FROM [dbo].[myTable]
)
UPDATE cte SET [column_indicator] = 1 WHERE cnt > 1

I think you can also use a nested update query to solve this.
UPDATE [dbo].[myTable] SET [column_indicator] = 1
FROM
(
SELECT a.[column_A],a.[column_indicator], COUNT(*) AS COUNT
FROM (select [dbo].[myTable].[column_A],[dbo].[myTable].[column_indicator]
from [dbo].[myTable]) a
GROUP BY a.[column_A],a.[column_indicator]
having COUNT > 1
) t
WHERE [dbo].[myTable].[column_A] = t.[column_A];

Select a NON-DISTINCT column in a query that return distincts rows

The following query returns the results that I need but I have to add the ID of the row to then update it. If I add the ID directly in the select statement it will return me more results then I need because each ID is unique so the DISTINCT statement see the line as unique.
SELECT DISTINCT ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
So basically I need to add ucpse.ID in the Select statement while keeping DISTINCT values for MemberID,ProductID and UserID.
Any Ideas ?
Thank you

According to you comment:
If the data has been duplicated 67 times for a given employee with a given product and a given client, I need to keep only one of thoses records. It's not important which one, so this is why I use DISTINC to obtain unique combinaison of given employee with a given product and a given client.
You can use MIN() or MAX() and GROUP BY instead of DISTINCT
SELECT MAX(ucpse.ID) AS ID, ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
UPDATE:
From you comments I think the below query is what you need
DELETE FROM UserCustomerProductSalaryExceptions
WHERE ID NOT IN ( SELECT MAX(ucpse.ID) AS ID
FROM #UserCustomerProductSalaryExceptions
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
HAVING COUNT(ucpse.ID) >= 2
)

If all you want is to delete the duplicates, this will do it:
WITH X AS
(SELECT ID,
ROW_NUMBER() OVER (PARTITION BY MemberID, ProductID, UserID ORDER BY ID) AS DupRowNum<br
FROM UserCustomerProductSalaryExceptions
)
DELETE X WHERE DupRowNum > 1

ID's not necessary - try:
UPDATE uu SET
<your settings here>
FROM UserCustomerProductSalaryExceptions uu
JOIN ( <paste your entire query above here>
) uc ON uc.MemberID=uu.MemberId AND uc.ProductID=uu.ProductId AND uc.UserID=uu.UserId

From the sound of your data structure (which I would STRONGLY advise normalizing as soon as possible), it sounds like you should be updating all the records. It sounds as if each duplicate is important because it contains some information about an employee's relation to a customer or product.
I would probably update all the records. Try this:
UPDATE UCPSE
SET
--Do your updates here
FROM UserCustomerProductSalaryExceptions as ucpse
JOIN
(
SELECT UserID, MemberID, ProductID
FROM UserCustomerProductSalaryExceptions
GROUP BY UserID, MemberID, ProductID
HAVING COUNT(UserID) >= 2
) T
ON ucpse.UserID = T.UserID AND ucpse.MemberID = T.MemberID AND ucpse.ProductID = T.ProductID

SQL: Delete all NOT MAX Records in GroupBy

My goal is to delete all records from my table that are NOT the MAX(recordDate) of a grouped CaseKey. So if I have 9 records with 3 sets of 3 casekeys, and each casekey has its 3 dates. I'd delete the 2 lower dates of each set and come up with 3 total records, only the MAX(recordDate) of each remaining.
I have the following SQL Query:
DELETE FROM table
WHERE tableID NOT IN (
SELECT tableID
FROM (
Select MAX(recordDate) As myDate, tableID From table
Group By CaseKey
) As foo
)
I receive the error:
Error on Line 3... Column 'table.tableID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Obviously I could add tableID to my Group By clause, but then the result of that statement is incorrect and returns all rows instead of just returning the MAX recordDate of the grouped CaseKeys.
Server is down right now, but the apparent answer is: (tiny tweak from WildPlasser's answer)
DELETE zt FROM ztable zt
WHERE EXISTS (
SELECT * FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ex.recordDate > zt.recordDate
);
In other words, for each record in zt, run a query to see if the same record also has a record with a higher recordDate. If so, the WHERE EXISTS statement passes and the record is deleted, otherwise the WHERE statement fails and the record is its own MAX recordDate.
Thank you, WildPlasser, for that simplistic methodology that I was somehow blowing up.

There is one special property of MAX: there is no record with a higher value than max. So we can delete all the records for which a record with the same CaseKey, but with a higher recordDate exists:
DELETE FROM ztable zt
WHERE EXISTS (
SELECT *
FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ex.recordDate > zt.recordDate
);
BTW: The above query (as well as the MAX() version) assumes that there is only one record with the maximum date. There could be ties.
In the case of ties, you'll need to add an extra field to the where clause; as a tie-breaker. Assuming that TableId can function as such, the query would become:
DELETE FROM ztable zt
WHERE EXISTS (
SELECT *
FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ( ex.recordDate > zt.recordDate
OR (ex.recordDate = zt.recordDate AND ex.TableId > zt.TableId)
)
);

Just express
delete all records from my table that are NOT the MAX(recordDate) of a
grouped CaseKey
in sql as
DELETE FROM table t1
WHERE t1.recordDate <>
(SELECT MAX(recordDate)
FROM table t2
WHERE t2.CaseKey = t1.CaseKey)

You can rank all records with the same caseKey where the rank > 1 to only return the lower dates. That way you can use your tableID.
DELETE FROM [table]
WHERE [tableID] IN
(SELECT
[sub].[tableID]
FROM
(
SELECT
[tableID],
Rank() OVER (PARTITION BY [caseKey] ORDER BY [recordDate] DESC, [tableID] DESC) AS [rank]
FROM [table]
) AS [sub]
WHERE [sub].[rank] > 1)

sql query - filtering duplicate values to create report

I am trying to list all the duplicate records in a table. This table does not have a Primary Key and has been specifically created only for creating a report to list out duplicates. It comprises of both unique and duplicate values.
The query I have so far is:
SELECT [OfficeCD]
,[NewID]
,[Year]
,[Type]
FROM [Test].[dbo].[Duplicates]
GROUP BY [OfficeCD]
,[NewID]
,[Year]
,[Type]
HAVING COUNT(*) > 1
This works right and gives me all the duplicates - that is the number of times it occurs.
But I want to display all the values in my report of all the columns. How can I do that without querying for each record separately?
For example:
Each table has 10 fields and [NewID] is the field which is occuring multiple times.I need to create a report with all the data in all the fields where newID has been duplicated.
Please help.
Thank you.

You need a subquery:
SELECT * FROM yourtable
WHERE NewID IN (
SELECT NewID FROM yourtable
GROUP BY OfficeCD,NewID,Year,Type
HAVING Count(*)>1
)
Additionally you might want to check your tags: You tagged mysql, but the Syntax lets me think you mean sql-server

Try this:
SELECT * FROM [Duplicates] WHERE NewID IN
(
SELECT [NewID] FROM [Duplicates] GROUP BY [NewID] HAVING COUNT(*) > 1
)

select d.*
from Duplicates d
inner join (
select NewID
from Duplicates
group by NewID
having COUNT(*) > 1
) dd on d.NewID = dd.NewID

How to keep only one row of a table, removing duplicate rows?

I have a table that has a lot of duplicates in the Name column. I'd
like to only keep one row for each.
The following lists the duplicates, but I don't know how to delete the
duplicates and just keep one:
SELECT name FROM members GROUP BY name HAVING COUNT(*) > 1;
Thank you.

See the following question: Deleting duplicate rows from a table.
The adapted accepted answer from there (which is my answer, so no "theft" here...):
You can do it in a simple way assuming you have a unique ID field: you can delete all records that are the same except for the ID, but don't have "the minimum ID" for their name.
Example query:
DELETE FROM members
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM members
GROUP BY name
)
In case you don't have a unique index, my recommendation is to simply add an auto-incremental unique index. Mainly because it's good design, but also because it will allow you to run the query above.

It would probably be easier to select the unique ones into a new table, drop the old table, then rename the temp table to replace it.
#create a table with same schema as members
CREATE TABLE tmp (...);
#insert the unique records
INSERT INTO tmp SELECT * FROM members GROUP BY name;
#swap it in
RENAME TABLE members TO members_old, tmp TO members;
#drop the old one
DROP TABLE members_old;

We have a huge database where deleting duplicates is part of the regular maintenance process. We use DISTINCT to select the unique records then write them into a TEMPORARY TABLE. After TRUNCATE we write back the TEMPORARY data into the TABLE.
That is one way of doing it and works as a STORED PROCEDURE.

If we want to see first which rows you are about to delete. Then delete them.
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
Full example at http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/

You can join table with yourself by matched field and delete unmatching rows
DELETE t1 FROM table_name t1
LEFT JOIN tablename t2 ON t1.match_field = t2.match_field
WHERE t1.id <> t2.id;

delete dup row keep one
table has duplicate rows and may be some rows have no duplicate rows then it keep one rows if have duplicate or single in a table.
table has two column id and name if we have to remove duplicate name from table
and keep one. Its Work Fine at My end You have to Use this query.
DELETE FROM tablename
WHERE id NOT IN(
SELECT id FROM
(
SELECT MIN(id)AS id
FROM tablename
GROUP BY name HAVING
COUNT(*) > 1
)AS a )
AND id NOT IN(
(SELECT ids FROM
(
SELECT MIN(id)AS ids
FROM tablename
GROUP BY name HAVING
COUNT(*) =1
)AS a1
)
)
before delete table is below see the screenshot:
enter image description here
after delete table is below see the screenshot this query delete amit and akhil duplicate rows and keep one record (amit and akhil):
enter image description here

if you want to remove duplicate record from table.
CREATE TABLE tmp SELECT lastname, firstname, sex
FROM user_tbl;
GROUP BY (lastname, firstname);
DROP TABLE user_tbl;
ALTER TABLE tmp RENAME TO user_tbl;

show record
SELECT `page_url`,count(*) FROM wl_meta_tags GROUP BY page_url HAVING count(*) > 1
delete record
DELETE FROM wl_meta_tags
WHERE meta_id NOT IN( SELECT meta_id
FROM ( SELECT MIN(meta_id)AS meta_id FROM wl_meta_tags GROUP BY page_url HAVING COUNT(*) > 1 )AS a )
AND meta_id NOT IN( (SELECT ids FROM (
SELECT MIN(meta_id)AS ids FROM wl_meta_tags GROUP BY page_url HAVING COUNT(*) =1 )AS a1 ) )
Source url

WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [emp_id] ORDER BY [emp_id]) AS Row, * FROM employee_salary
)
DELETE FROM CTE
WHERE ROW <> 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

update rows with duplicate entries - sql

Related

Alter a column to indicate there are duplicates in a separate column?

Select a NON-DISTINCT column in a query that return distincts rows

SQL: Delete all NOT MAX Records in GroupBy

sql query - filtering duplicate values to create report

How to keep only one row of a table, removing duplicate rows?

Categories

Resources