Select rows based on two columns in SQL Server - sql

I have a table which stores data where accidentally data has been stored multiple times because of case sensivity for the username field on server side code. The username field should be regarded as case insensitive. The important columns and data for the table can be found below.
My requirements now is to delete all but the most recent saved data. I'm writing an sql script for this, and started out by identifying all rows that are duplicates. This selection returns a table like below.
For each row, the most recent save is LASTUPDATEDDATE if it exist, otherwise CREATEDDATE. For this example, the most recent save for 'username' would be row 3.
ID CREATEDDATE LASTUPDATEDDATE USERNAME
-- ----------- --------------- --------
1 11-NOV-11 USERNAME
2 01-NOV-11 02-NOV-11 username
3 8-JAN-12 USERname
My script (which selects all rows where a duplicated username appears) looks like:
SELECT
id, createddate, lastupdateddate, username
FROM
table
WHERE
LOWER(username)
IN
(
SELECT
LOWER(username)
FROM
table
GROUP BY
LOWER(username)
HAVING
COUNT(*) > 1
)
ORDER BY
LOWER(username)
My question now is: How do I select everything but row 3? I have searched Stack Overflow for a good match to this question, but found no match good enough. I know I probably have to make a join of some kind, but can't really get my head around it. Would be really thankful for a push in the right direction.
We are using SQL Server, probably a quite new version.

To delete duplicates, you can use:
with todelete as (
select t.*,
row_number() over (partition by lower(username) order by createddate desc) as seqnum
from table
)
delete from t
where seqnum > 1
This assigns a sequential number to each row, starting with 1 for the most recent. It then deletes all but the most recent.
For two dates, you can use:
with todelete as (
select t.*,
row_number() over (partition by lower(username) order by thedate desc) as seqnum
from (select t.*,
(case when createddate >= coalesdce(updateddate, createddate)
then createddate
else updateddate
end) as thedate
from table
) t
)
delete from t
where seqnum > 1

A couple of things to note -- there is no reason to use LOWER in your query. A = a in SQL Server.
Also, to get the correct date, you can use COALESCE to determine if LastUpdatedDate exists and if so, sort by it, else sort by CreatedDate.
Putting that together, this should work:
DELETE T
FROM YourTable T
JOIN (
SELECT *, ROW_NUMBER() OVER (PARTITION BY username
ORDER BY COALESCE(lastupdateddate, createddate) DESC) as RN
FROM YourTable
) T2 ON T.Id = T2.Id
WHERE T2.RN > 1
Here is a sample fiddle: http://www.sqlfiddle.com/#!3/51f7c/1
As #Gordon correctly suggests, you could also use a CTE depending on the version of SQL Server you use (2005+):
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY username
ORDER BY COALESCE(lastupdateddate, createddate) DESC) as RN
FROM YourTable
)
DELETE FROM CTE WHERE RN > 1

Related

How to grab the last value in a column per user for the last date

I have a table that contains three columns: ACCOUNT_ID, STATUS, CREATE_DATE.
I want to grab only the LAST status for each account_id based on the latest create_date.
In the example above, I should only see three records and the last STATUS per that account_2.
Do you know a way to do this?
create table TBL 1 (
account_id int,
status string,
create_date date)
select account_id, max(create_date) from table group by account_id;
will give you the account_id and create_date at the closest past date to today (assuming create_date can never be in the future, which makes sense).
Now you can join with that data to get what you want, something along the lines for example:
select account_id, status, create_date from table where (account_id, create_date) in (<the select expression from above>);
If you use that frequently (account with the latest create date), then consider defining a view for that.
If you have many columns and want keep the row that is the last line, you can use QUALIFY to run the ranking logic, and keep the best, like so:
SELECT *
FROM tbl
QUALIFY row_number() over (partition by account_id order by create_date desc) = 1;
The long form is the same pattern the Ely shows in the second answer. But with the MAX(CREATE_DATE) solution, if you have two rows on the same last day, the IN solution with give you both. you can also get via QUALIFY if you use RANK
So the SQL is the same as:
SELECT account_id, status, create_date
FROM (
SELECT *,
row_number() over (partition by account_id order by create_date desc) as rn
FROM tbl
)
WHERE rn = 1;
So the RANK for, which will show all equal rows is:
SELECT *
FROM tbl
QUALIFY rank() over (partition by account_id order by create_date desc) = 1;

SQL Server : return all rows in set if one row has value equal to target value

I'm currently working on a SQL query that searches an "archive" database and returns a row for each change that occurred on an order from the beginning of time to today.
What I would like to do with this query is only return the orders that are currently or have been associated with a specific order handler. The best way for me to explain it is that every order is currently grouped in a "set" with a row number for each change, but if one of the rows ever holds the value I'm looking for either "handler" columns, I want it to return all the rows, not just the one with that target value.
Here is what I have so far.
SELECT
ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY EventDateTime) AS RowNumber,
ace.[OrderId],
ace.[OrderHandler],
ace.[EventDateTime],
ace.[OrderStatus],
LAG(ace.[OrderHandler], 1) OVER (PARTITION BY [OrderId] ORDER BY ace.[EventDateTime]) AS PreviousOrderHandler,
LAG(ace.[EventDateTime], 1) OVER (PARTITION BY [OrderId] ORDER BY ace.[EventDateTime]) AS PreviousEventDateTime,
LAG(ace.[OrderStatus], 1) OVER (PARTITION BY [OrderId] ORDER BY ace.[EventDateTime]) AS PreviousOrderStatus
FROM
Archive AS ace
Here is the sample data I receive when running the above query:
So instead of just returning row number 9 where the OrderHandler = POOL, I want to query if the OrderId has an OrderHandler of POOL at ANY TIME in history, return all the rows.
I figured I could potentially use a WHERE EXISTS but I'm not sure how I could return the whole set of results instead of just the results that match.
Any help is extremely appreciated!
You can use exists like this:
select a.*
from ace a
where exists (select 1
from ace a2
where a2.orderid = a.orderid and
a2.orderhandler = #orderhandler
);
Script for solution:
SELECT ROW_NUMBER() OVER (PARTITION BY ace.OrderId ORDER BY ace.EventDateTime) AS RowNumber
,ace.[OrderId]
,ace.[OrderHandler]
,ace.[EventDateTime]
,ace.[OrderStatus]
,LAG(ace.[OrderHandler], 1) OVER ( PARTITION BY ace.[OrderId] ORDER BY ace.[EventDateTime] ) as PreviousOrderHandler
,LAG(ace.[EventDateTime], 1) OVER ( PARTITION BY ace.[OrderId] ORDER BY ace.[EventDateTime] ) as PreviousEventDateTime
,LAG(ace.[OrderId], 1) OVER (PARTITION BY ace.[OrderId] ORDER BY ace.[OrderId] ) as PreviousOrderId
,LAG(ace.[OrderStatus], 1) OVER ( PARTITION BY ace.[OrderId] ORDER BY ace.[EventDateTime] ) as PreviousOrderStatus
FROM Archive as ace
WHERE EXISTS
(SELECT * FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY EventDateTime) AS RowNumber
,ace.[OrderId]
,ace.[OrderHandler]
,ace.[EventDateTime]
,ace.[OrderStatus]
FROM Archive as ace
GROUP BY ace.[OrderId]
,ace.[OrderHandler]
,ace.[EventDateTime]
,ace.[OrderStatus]
HAVING ace.OrderHandler LIKE '%POOL%'
)x
WHERE ace.OrderId = x.OrderId)

Avoid duplicate records from a particular column of a table

I have a table as shown in the image.In Number column, the values are appeared more than once (for example 63 appeared twice). I would like to keep only one value. Please see my code:
delete from t1 where
(SELECT *,row_number() OVER (
PARTITION BY
Number
ORDER BY
Date) as rn from t1 where rn > 1)
It shows error. Can anyone please assist.
enter image description here
The column created by row_number() was not accessed by your main query, in order to enable that, you can create a quick sub query and use the desired filter
SELECT *
FROM
(
SELECT *,
row_number() OVER (PARTITION BY Number ORDER BY Date) as rn
FROM t1 ) T
where rn = 1;
The partition by determines how row numbers repeat. The row numbers are assigned per group of partition by keys. So, you can get duplicates.
If you want a unique row number over all rows, just leave out the partition by:
select t1.*
from (select t1.*,
row_number() over (order by date) as rn
from t1
) t1
where rn > 1
if you want to keep only one value, rn = 1 instead of "> 1"

Delete duplicates but keep 1 with multiple column key

I have the following SQL select. How can I convert it to a delete statement so it keeps 1 of the rows but deletes the duplicate?
select s.ForsNr, t.*
from [testDeleteDublicates] s
join (
select ForsNr, period, count(*) as qty
from [testDeleteDublicates]
group by ForsNr, period
having count(*) > 1
) t on s.ForsNr = t.ForsNr and s.Period = t.Period
Try using following:
Method 1:
DELETE FROM Mytable WHERE RowID NOT IN (SELECT MIN(RowID) FROM Mytable GROUP BY Col1,Col2,Col3)
Method 2:
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY ForsNr, period
ORDER BY ( SELECT 0)) RN
FROM testDeleteDublicates)
DELETE FROM cte
WHERE RN > 1
Hope this helps!
NOTE:
Please change the table & column names according to your need!
This is easy as long as you have a generated primary key column (which is a good idea). You can simply select the min(id) of each duplicate group and delete everything else - Note that I have removed the having clause so that the ids of non-duplicate rows are also excluded from the delete.
delete from [testDeleteDublicates]
where id not in (
select Min(Id) as Id
from [testDeleteDublicates]
group by ForsNr, period
)
If you don't have an artificial primary key you may have to achieve the same effect using row numbers, which will be a bit more fiddly as their implementation varies from vendor to vendor.
You can do with 2 option.
Add primary-key and delete accordingly
http://www.mssqltips.com/sqlservertip/1103/delete-duplicate-rows-with-no-primary-key-on-a-sql-server-table/
'2. Use row_number() with partition option, runtime add row to each row and then delete duplicate row.
Removing duplicates using partition by SQL Server
--give group by field in partition.
;with cte(
select ROW_NUMBER() over( order by ForsNr, period partition ForsNr, period) RowNo , * from [testDeleteDublicates]
group by ForsNr, period
having count(*) > 1
)
select RowNo from cte
group by ForsNr, period

Delete field Duplicates from the Same table

I am writing this query to display a bunch of Names from a table filled automatically from an outside source:
select MAX(UN_ID) as [ID] , MAX(UN_Name) from UnavailableNames group by (UN_Name)
I have a lot of name duplicates, so I used "Group by"
I want to delete all the duplicates right after I do this select query..
(Delete where the field UN_Name is available twice, leave it once)
Any way to do this?
Something likes this should work:
WITH CTE AS
(
SELECT rn = ROW_NUMBER()
OVER(
PARTITION BY UN_Name
ORDER BY UN_ID ASC), *
FROM dbo.UnavailableNames
)
DELETE FROM cte
WHERE rn > 1
You basically assign an increasing "row number" within each group that shares the same "un_name".
Then you just delete all rows which have a "row number" higher than 1 and keep all the ones that appeared first.
With CTE As
(
Select uid,ROW_NUMBER() OVER( PARTITION BY uname order by uid) as rownum
From yourTable
)
Delete
From yourTable
where uid in (select uid from CTE where rownum> 1 )