SQL: Delete all NOT MAX Records in GroupBy

SQL: Delete all NOT MAX Records in GroupBy - sql

My goal is to delete all records from my table that are NOT the MAX(recordDate) of a grouped CaseKey. So if I have 9 records with 3 sets of 3 casekeys, and each casekey has its 3 dates. I'd delete the 2 lower dates of each set and come up with 3 total records, only the MAX(recordDate) of each remaining.
I have the following SQL Query:
DELETE FROM table
WHERE tableID NOT IN (
SELECT tableID
FROM (
Select MAX(recordDate) As myDate, tableID From table
Group By CaseKey
) As foo
)
I receive the error:
Error on Line 3... Column 'table.tableID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Obviously I could add tableID to my Group By clause, but then the result of that statement is incorrect and returns all rows instead of just returning the MAX recordDate of the grouped CaseKeys.
Server is down right now, but the apparent answer is: (tiny tweak from WildPlasser's answer)
DELETE zt FROM ztable zt
WHERE EXISTS (
SELECT * FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ex.recordDate > zt.recordDate
);
In other words, for each record in zt, run a query to see if the same record also has a record with a higher recordDate. If so, the WHERE EXISTS statement passes and the record is deleted, otherwise the WHERE statement fails and the record is its own MAX recordDate.
Thank you, WildPlasser, for that simplistic methodology that I was somehow blowing up.

There is one special property of MAX: there is no record with a higher value than max. So we can delete all the records for which a record with the same CaseKey, but with a higher recordDate exists:
DELETE FROM ztable zt
WHERE EXISTS (
SELECT *
FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ex.recordDate > zt.recordDate
);
BTW: The above query (as well as the MAX() version) assumes that there is only one record with the maximum date. There could be ties.
In the case of ties, you'll need to add an extra field to the where clause; as a tie-breaker. Assuming that TableId can function as such, the query would become:
DELETE FROM ztable zt
WHERE EXISTS (
SELECT *
FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ( ex.recordDate > zt.recordDate
OR (ex.recordDate = zt.recordDate AND ex.TableId > zt.TableId)
)
);

Just express
delete all records from my table that are NOT the MAX(recordDate) of a
grouped CaseKey
in sql as
DELETE FROM table t1
WHERE t1.recordDate <>
(SELECT MAX(recordDate)
FROM table t2
WHERE t2.CaseKey = t1.CaseKey)

You can rank all records with the same caseKey where the rank > 1 to only return the lower dates. That way you can use your tableID.
DELETE FROM [table]
WHERE [tableID] IN
(SELECT
[sub].[tableID]
FROM
(
SELECT
[tableID],
Rank() OVER (PARTITION BY [caseKey] ORDER BY [recordDate] DESC, [tableID] DESC) AS [rank]
FROM [table]
) AS [sub]
WHERE [sub].[rank] > 1)

Related

SQL remove a row based on condition

I have a stored procedure in Bigquery and a resulting table where 2 rows are not exactly duplicates but I want to filter one of the rows based on a condition.
SQL query:
Results:

WITH DupCodes AS (
SELECT AccCode
FROM Table
GROUP BY AccCode
HAVING COUNT(*) > 1
)
SELECT *
FROM table
WHERE (AccCode IN (SELECT AccCode FROM DupCodes) AND AccountName IS NOT NULL)
OR (AccCode NOT IN (SELECT AccCode FROM DupCodes))

One method uses not exists logic:
select t.*
from t
where t.accountname is not null or
not exists (select 1
from t t2
where t2.accCode = t.accCode and t2.accountname is not null
);
That is, show all rows where accountname is not empty. Then show empty rows only when there is no non-empty accountname for the same accCode.

Select a NON-DISTINCT column in a query that return distincts rows

The following query returns the results that I need but I have to add the ID of the row to then update it. If I add the ID directly in the select statement it will return me more results then I need because each ID is unique so the DISTINCT statement see the line as unique.
SELECT DISTINCT ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
So basically I need to add ucpse.ID in the Select statement while keeping DISTINCT values for MemberID,ProductID and UserID.
Any Ideas ?
Thank you

According to you comment:
If the data has been duplicated 67 times for a given employee with a given product and a given client, I need to keep only one of thoses records. It's not important which one, so this is why I use DISTINC to obtain unique combinaison of given employee with a given product and a given client.
You can use MIN() or MAX() and GROUP BY instead of DISTINCT
SELECT MAX(ucpse.ID) AS ID, ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
UPDATE:
From you comments I think the below query is what you need
DELETE FROM UserCustomerProductSalaryExceptions
WHERE ID NOT IN ( SELECT MAX(ucpse.ID) AS ID
FROM #UserCustomerProductSalaryExceptions
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
HAVING COUNT(ucpse.ID) >= 2
)

If all you want is to delete the duplicates, this will do it:
WITH X AS
(SELECT ID,
ROW_NUMBER() OVER (PARTITION BY MemberID, ProductID, UserID ORDER BY ID) AS DupRowNum<br
FROM UserCustomerProductSalaryExceptions
)
DELETE X WHERE DupRowNum > 1

ID's not necessary - try:
UPDATE uu SET
<your settings here>
FROM UserCustomerProductSalaryExceptions uu
JOIN ( <paste your entire query above here>
) uc ON uc.MemberID=uu.MemberId AND uc.ProductID=uu.ProductId AND uc.UserID=uu.UserId

From the sound of your data structure (which I would STRONGLY advise normalizing as soon as possible), it sounds like you should be updating all the records. It sounds as if each duplicate is important because it contains some information about an employee's relation to a customer or product.
I would probably update all the records. Try this:
UPDATE UCPSE
SET
--Do your updates here
FROM UserCustomerProductSalaryExceptions as ucpse
JOIN
(
SELECT UserID, MemberID, ProductID
FROM UserCustomerProductSalaryExceptions
GROUP BY UserID, MemberID, ProductID
HAVING COUNT(UserID) >= 2
) T
ON ucpse.UserID = T.UserID AND ucpse.MemberID = T.MemberID AND ucpse.ProductID = T.ProductID

delete records from the table sorted by two columns

I have a table with these columns:
tel_number,date,time
There might be several records for a single tel_number. For each tel_number, I want to delete all except the record which has the most recent date from the table and in cases where there are multiple records for the most recent date, the one with the most recent time get to be selected and all the rest should be removed from the table.
for example from records like these:
1 2223333,14/01/28,08:30
2 2223333,14/01/27,08:30
3 2223333,14/01/28,16:30
4 2225555,14/01/27,10:34
5 2225555,13/12/29,10:34
all record except these two should be deleted:
3 2223333,14/01/28,16:30
4 2225555,14/01/27,10:34
edit:
I have tried this so far, but it doesn't delete records which has the same date but different times:
delete from table where (tel_number,date) not in
(select tel_number,max(date) from table group by tel_number);

According to Deleting Duplicate Rows In Oracle:
1. Using MIN(rowid) : The most common method of removing duplicate rows.
DELETE FROM Table1
WHERE ROWID NOT IN (SELECT MIN (ROWID)
FROM Table1
GROUP BY tel_number, Date, Time);
2. Using MIN(rowid) & Join: More or less the same as first one
DELETE FROM Table1 t
WHERE t.ROWID NOT IN (SELECT MIN (b.ROWID)
FROM Table1 b
WHERE b.tel_number = t.tel_number
AND b.Date = t.Date
AND b.Time = t.Time);
3. Using Analytic functions
DELETE FROM Table1
WHERE ROWID IN (
SELECT rid
FROM (SELECT ROWID rid,
ROW_NUMBER () OVER (PARTITION BY tel_number,Date, Time ORDER BY ROWID) rn
FROM Table1 )
WHERE rn <> 1);

I solved the problem with this query:
delete from table a where (a.tel_number,a.DATE,a.TIME) not in
(
select tel_number,date,max(time)
from table group by tel_number,date
having date in
(
select max(date) from table group by tel_number
)
);

sql query - filtering duplicate values to create report

I am trying to list all the duplicate records in a table. This table does not have a Primary Key and has been specifically created only for creating a report to list out duplicates. It comprises of both unique and duplicate values.
The query I have so far is:
SELECT [OfficeCD]
,[NewID]
,[Year]
,[Type]
FROM [Test].[dbo].[Duplicates]
GROUP BY [OfficeCD]
,[NewID]
,[Year]
,[Type]
HAVING COUNT(*) > 1
This works right and gives me all the duplicates - that is the number of times it occurs.
But I want to display all the values in my report of all the columns. How can I do that without querying for each record separately?
For example:
Each table has 10 fields and [NewID] is the field which is occuring multiple times.I need to create a report with all the data in all the fields where newID has been duplicated.
Please help.
Thank you.

You need a subquery:
SELECT * FROM yourtable
WHERE NewID IN (
SELECT NewID FROM yourtable
GROUP BY OfficeCD,NewID,Year,Type
HAVING Count(*)>1
)
Additionally you might want to check your tags: You tagged mysql, but the Syntax lets me think you mean sql-server

Try this:
SELECT * FROM [Duplicates] WHERE NewID IN
(
SELECT [NewID] FROM [Duplicates] GROUP BY [NewID] HAVING COUNT(*) > 1
)

select d.*
from Duplicates d
inner join (
select NewID
from Duplicates
group by NewID
having COUNT(*) > 1
) dd on d.NewID = dd.NewID

How to keep only one row of a table, removing duplicate rows?

I have a table that has a lot of duplicates in the Name column. I'd
like to only keep one row for each.
The following lists the duplicates, but I don't know how to delete the
duplicates and just keep one:
SELECT name FROM members GROUP BY name HAVING COUNT(*) > 1;
Thank you.

See the following question: Deleting duplicate rows from a table.
The adapted accepted answer from there (which is my answer, so no "theft" here...):
You can do it in a simple way assuming you have a unique ID field: you can delete all records that are the same except for the ID, but don't have "the minimum ID" for their name.
Example query:
DELETE FROM members
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM members
GROUP BY name
)
In case you don't have a unique index, my recommendation is to simply add an auto-incremental unique index. Mainly because it's good design, but also because it will allow you to run the query above.

It would probably be easier to select the unique ones into a new table, drop the old table, then rename the temp table to replace it.
#create a table with same schema as members
CREATE TABLE tmp (...);
#insert the unique records
INSERT INTO tmp SELECT * FROM members GROUP BY name;
#swap it in
RENAME TABLE members TO members_old, tmp TO members;
#drop the old one
DROP TABLE members_old;

We have a huge database where deleting duplicates is part of the regular maintenance process. We use DISTINCT to select the unique records then write them into a TEMPORARY TABLE. After TRUNCATE we write back the TEMPORARY data into the TABLE.
That is one way of doing it and works as a STORED PROCEDURE.

If we want to see first which rows you are about to delete. Then delete them.
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
Full example at http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/

You can join table with yourself by matched field and delete unmatching rows
DELETE t1 FROM table_name t1
LEFT JOIN tablename t2 ON t1.match_field = t2.match_field
WHERE t1.id <> t2.id;

delete dup row keep one
table has duplicate rows and may be some rows have no duplicate rows then it keep one rows if have duplicate or single in a table.
table has two column id and name if we have to remove duplicate name from table
and keep one. Its Work Fine at My end You have to Use this query.
DELETE FROM tablename
WHERE id NOT IN(
SELECT id FROM
(
SELECT MIN(id)AS id
FROM tablename
GROUP BY name HAVING
COUNT(*) > 1
)AS a )
AND id NOT IN(
(SELECT ids FROM
(
SELECT MIN(id)AS ids
FROM tablename
GROUP BY name HAVING
COUNT(*) =1
)AS a1
)
)
before delete table is below see the screenshot:
enter image description here
after delete table is below see the screenshot this query delete amit and akhil duplicate rows and keep one record (amit and akhil):
enter image description here

if you want to remove duplicate record from table.
CREATE TABLE tmp SELECT lastname, firstname, sex
FROM user_tbl;
GROUP BY (lastname, firstname);
DROP TABLE user_tbl;
ALTER TABLE tmp RENAME TO user_tbl;

show record
SELECT `page_url`,count(*) FROM wl_meta_tags GROUP BY page_url HAVING count(*) > 1
delete record
DELETE FROM wl_meta_tags
WHERE meta_id NOT IN( SELECT meta_id
FROM ( SELECT MIN(meta_id)AS meta_id FROM wl_meta_tags GROUP BY page_url HAVING COUNT(*) > 1 )AS a )
AND meta_id NOT IN( (SELECT ids FROM (
SELECT MIN(meta_id)AS ids FROM wl_meta_tags GROUP BY page_url HAVING COUNT(*) =1 )AS a1 ) )
Source url

WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [emp_id] ORDER BY [emp_id]) AS Row, * FROM employee_salary
)
DELETE FROM CTE
WHERE ROW <> 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: Delete all NOT MAX Records in GroupBy - sql

Just express delete all records from my table that are NOT the MAX(recordDate) of a grouped CaseKey in sql as DELETE FROM table t1 WHERE t1.recordDate <> (SELECT MAX(recordDate) FROM table t2 WHERE t2.CaseKey = t1.CaseKey)

Related

SQL remove a row based on condition

Select a NON-DISTINCT column in a query that return distincts rows

delete records from the table sorted by two columns

sql query - filtering duplicate values to create report

How to keep only one row of a table, removing duplicate rows?

Categories

Resources