Delete rows where date was least updated - sql

How can I delete rows where dateupdated was least updated ?
My table is
Name Dateupdated ID status
john 1/02/17 JHN1 A
john 1/03/17 JHN2 A
sally 1/02/17 SLLY1 A
sally 1/03/17 SLLY2 A
Mike 1/03/17 MK1 A
Mike 1/04/17 MK2 A
I want to be left with the following after the data removal:
Name Date ID status
john 1/03/17 JHN2 A
sally 1/03/17 SLLY2 A
Mike 1/04/17 MK2 A

If you really want to "delete rows where dateupdated was least updated" then a simple single-row subquery should do the trick.
DELETE MyTable
WHERE Date = (SELECT MIN(Date) From MyTable)
If on the other hand you just want to delete the row with the earliest Date per person (as identified by their ID) you could use:
DELETE MyTable
FROM MyTable a
JOIN (SELECT ID, MIN(Date) MinDate FROM MyTable GROUP BY ID) b
ON a.ID = b.ID AND a.Date = b.MinDate
The idea here is you create an aggregate query that returns rows containing the columns that would match the rows you want deleted, then join to it. Because it's an inner join, rows that do not match the criteria will be excluded.
If people are uniquely identified by something else (e.g. Name then you can just substitute that for the ID in my example above.
I am thinking though that you don't want either of these. I think you want to delete everything except for each person's latest row. If that is the case, try this:
DELETE MyTable
WHERE EXISTS (SELECT 0 FROM MyTable b WHERE b.ID = MyTable.ID AND b.Date > MyTable.Date)
The idea here is you check for existence of another data row with the same ID and a later date. If there is a later record, delete this one.
The nice thing about the last example is you can run it over and over and every person will still be left with exactly one row. The other two queries, if run over and over, will nibble away at the table until it is empty.
P.S. As these are significantly different solutions, I suggest you spend some effort learning how to articulate unambiguous requirements. This is an extremely important skill for any developer.

This deletes rows where the name is a duplicate, and deletes all but the latest row for each name. This is different from your stated question.
Using a common table expression (cte) and row_number():
;with cte as (
select *
, rn = row_number() over (
partition by Name
order by Dateupdated desc
)
from t
)
/* ------------------------------------------------
-- Remove duplicates by deleting rows
-- where the row number (rn) is greater than 1
-- leaving the first row for each partition
------------------------------------------------ */
delete
from cte
where cte.rn > 1
select * from t
rextester: http://rextester.com/HZBQ50469
returns:
+-------+-------------+-------+--------+
| Name | Dateupdated | ID | status |
+-------+-------------+-------+--------+
| john | 2017-01-03 | JHN2 | A |
| sally | 2017-01-03 | SLLY2 | A |
| Mike | 2017-01-04 | MK2 | A |
+-------+-------------+-------+--------+
Without using the cte it can be written as:
delete d
from (
select *
, rn = row_number() over (
partition by Name
order by Dateupdated desc
)
from t
) as d
where d.rn > 1

This should do the trick:
delete
from MyTable a
where not exists (
select top 1 1
from MyTable b
where b.name = a.name
and b.DateUpdated < a.DateUpdated
)
i.e. remove any entries from the table for which there is no record on the same name with a date earlier than the record to be deleted's.

Your Name column has Mike and Mik2 which is different for each other.
So, if you did not make a mistake, standard column to group by must be ID column without last digit.
I think following is more accurate if you did not mistaken.
delete a
from MyTable a
inner join
(select substring(ID, 1, len(ID) - 1) as ID, min(Dateupdated) as MinDate
from MyTable
group by substring(ID, 1, len(ID) - 1)
) b
on substring(a.ID, 1, len(a.ID) - 1) = b.ID and a.Dateupdated = b.MinDate
You can test it at SQLFiddle: http://sqlfiddle.com/#!6/9c440/1

Related

deleting specific duplicate and original entries in a table based on date

i have a table called "main" which has 4 columns, ID, name, DateID and Sign.
i want to create a query that will delete entries in this table if there is the same ID record in twice within a certain DateID.
i have my where clause that searches the previous 3 weeks
where DateID =((SELECT MAX( DateID)
WHERE DateID < ( SELECT MAX( DateID )-3))
e.g of my dataset im working with:
id
name
DateID
sign
12345
Paul
1915
Up
23658
Danny
1915
Down
37868
Jake
1916
Up
37542
Elle
1917
Up
12345
Paul
1917
Down
87456
John
1918
Up
78563
Luke
1919
Up
23658
Danny
1920
Up
in the case above, both entries for ID 12345 would need to be removed.
however the entries for ID 23658 would need to be kept as the DateID > 3
how would this be possible?
You can use window functions for this.
It's not quite clear, but it seems LAG and conditional COUNT should fit what you need.
DELETE t
FROM (
SELECT *,
CountWithinDate = COUNT(CASE WHEN t.PrevDate >= t.DateId - 3 THEN 1 END) OVER (PARTITION BY t.id)
FROM (
SELECT *,
PrevDate = LAG(t.DateID) OVER (PARTITION BY t.id ORDER BY t.DateID)
FROM YourTable t
) t
) t
WHERE CountWithinDate > 0;
db<>fiddle
Note that you do not need to re-join the table, you can delete directly from the t derived table.
Hope this works:
DELETE FROM test_tbl
WHERE id IN (
SELECT T1.id
FROM test_tbl T1
WHERE EXISTS (SELECT 1 FROM test_tbl T2 WHERE T1.id = T2.id AND ABS(T2.dateid - T1.dateid) < 3 AND T1.dateid <> T2.dateid)
)
In case you need more logic for data processing, I would suggest using Stored Procedure.

update in oracle sql : multiple rows in 1 table

I am new to SQL and I am no good with more advanced queries and functions.
So, I have this 1 table with sales:
id date seller_name buyer_name
---- ------------ ------------- ------------
1 2015-02-02 null Adrian
1 2013-05-02 null John B
1 2007-11-15 null Chris F
2 2014-07-12 null Jane A
2 2011-06-05 null Ted D
2 2010-08-22 null Maryanne A
3 2015-12-02 null Don P
3 2012-11-07 null Chris T
3 2011-10-02 null James O
I would like to update the seller_name for each id, by putting the buyer_name from previous sale as seller_name to newer sale date. For example, for on id 1 John B would then be seller in 2015-02-02 and buyer in 2013-05-02. Does that make sense?
P.S. This is the perfect case, the table is big and the ids are not ordered so neat.
merge into your_table a
using ( select rowid rid,
lead(buyer_name, 1) over (partition by id order by date desc) seller
from your_table
) b
on (a.rowid = b.rid )
when matched then update set a.seller_name= b.seller;
Explanation : Merge into statement performs different operations based on matched or not matched criterias. Here you have to merge into your table, in the using having the new values that you want to take and also the rowid which will be your matching key. The lead function gets the result from the next n rows depending on what number you specify after the comma. After specifying how many rows to jump you also specify on what part to work, which in your case is partitioned by id and ordered by date so you can get the seller, who was the previous buyer. Hope this clears it up a bit.
Either of the below query can be used to perform the desire action
merge into sandeep24nov16_2 table1
using(select rowid r, lag(buyer_name) over (partition by id order by "DATE" asc) update_value from sandeep24nov16_2 ) table2
on (table1.rowid=table2.r)
when matched then update set table1.seller_name=table2.update_value;
or
merge into sandeep24nov16_2 table1
using(select rowid r, lead(buyer_name) over (partition by id order by "DATE" desc) update_value from sandeep24nov16_2 ) table2
on (table1.rowid=table2.r)
when matched then update set table1.seller_name=table2.update_value;
select a.*,
lag(buyer_name, 1) over(partition by id order by sale_date) seller_name
from <your_table> a;

Getting last record with a certain status

I am having trouble figuring out how to get the results that I need for this query.
I am looking for the last record for a dog that has a status of adopted. If the last record is returned, I don't want that record - only the adopted records.
If my table contains these rows:
ID NAME DATE STATUS WANT THIS ONE?
14 Fido 7/1/2014 Adopted Yes - last record for Fido that is Adopted
13 Elle 6/15/2014 Returned No - last record for Elle but not Adopted
12 Elle 6/1/2014 Adopted No - not the last record for Elle
11 Spot 5/30/14 Adopted Yes - last record for Spot that is Adopted
10 Spot 5/15/2014 Returned No - not Adopted
9 Spot 5/1/2014 Adopted No - not the last record for Spot
select * from (
select * ,
row_number() over (partition by name order by date desc) rn
from tbl
) t1 where t1.rn = 1
and status = 'Adopted'
or
select * from tbl t1
where status = 'Adopted'
and not exists (
select 1 from tbl t2
where t2.Name = t1.Name
and t2.Date > t1.Date
)
If you want to return the latest record for each dog, only where the latest status is 'Adopted':
select *
from tbl t
where date = (select max(x.date) from tbl x where x.name = t.name)
and status = 'Adopted'
Fiddle: http://sqlfiddle.com/#!6/e2cae/1/0
In this query if the latest record for a dog is anything other than 'Adopted', the dog will not be returned. This matches your desired output, based on the comments you've placed beside the table.
If you want to return the latest 'Adopted' record for each dog (if any):
select *
from tbl t
where date = (select max(x.date)
from tbl x
where x.name = t.name
and x.status = 'Adopted')
However both queries are vulnerable to mixing up 2 dogs who have the same name. You should have another table to uniquely identify the dogs that you can join into, and a unique DOG_ID field on this table that references that table.
For the data you showed in the question, it is pretty tricky. Let's start with the assumption that dogs can't be adopted in the future. Something like this should work:
select dog, maxAdoptedDate
from (
select adopted.name dog
, isnull(max(returned.date), dateadd(day, 1, getdate())) maxreturnedDate
, max(adopted.date) maxAdoptedDate
from yourTable adopted left join yourTable returned
on adopted.name = returned.name
and returned.status = 'Returned'
and adopted.status = 'Adopted'
where whatever
group by adopted.name) temp
where maxAdoptedDate > maxReturnedDate
and whatever
The two whatevers should be the same. As mentioned in another answer, if two dogs have the same name, you are in trouble.

SQL - Search a table for all instances where a value is repeated

I'm looking to find a way to search a table for duplicate values and return those duplicates (or even just one of the set of duplicates) as the result set.
For instance, let's say I have these data:
uid | semi-unique id
1 | 12345
2 | 21345
3 | 54321
4 | 41235
5 | 12345
6 | 21345
I need to return either:
12345
12345
21345
21345
Or:
12345
21345
I've tried googling around and keep coming up short. Any help please?
To get each row, you can use window functions:
select t.*
from (select t.*, count(*) over (partition by [semi-unique id]) as totcnt
from t
) t
where totcnt > 1
To get just one instance, try this:
select t.*
from (select t.*, count(*) over (partition by [semi-unique id]) as totcnt,
row_number() over (partition by [semi-unique id] order by (select NULL)
) as seqnum
from t
) t
where totcnt > 1 and seqnum = 1
The advantage of this approach is that you get all the columns, instead of just the id (if that helps).
Sorry, I was short on time earlier so I couldn't explain my answer. The first query groups the semi_unique_ids that are the same and only returns the ones that have a duplicate.
SELECT semi_unique_id
FROM your_table
GROUP BY semi_unique_id
HAVING COUNT(semi_unique_id) > 1
If you wanted to get the uid in the query too you can easily add it like so.
SELECT uid,
semi_unique_uid
FROM your_table
GROUP BY
semi_unique_id,
uid
HAVING COUNT(semi_unique_id) > 1
Lastly if you would like to get an idea of how many duplicates per row returned you would do the following.
SELECT uid,
semi_unique_uid,
COUNT(semi_unique_uid) AS unique_id_count
FROM your_table
GROUP BY
semi_unique_id,
uid
HAVING COUNT(semi_unique_id) > 1
SELECT t.semi_unique_id AS i
FROM TABLE t
GROUP BY
t.semi_unique_id
HAVING (COUNT(t.semi_unique_id) > 1)
Try this for sql-server

SQL select value if no corresponding value exists in another table

I have a database which tries to acheive point-in-time information by having a master table and a history table which records when fields in the other table will/did change. e.g.
Table: Employee
Id | Name | Department
-----------------------------
0 | Alice | 1
1 | Bob | 1
Table: History
ChangeDate | Field | RowId | NewValue
---------------------------------------------
05/05/2009 | Department | 0 | 2
That records that employee 0 (Alice) will move to department 2 on 05/05/2009.
I want to write a query to determine the employee's department on a particular date. So it needs to:
Find the first history record for that field and employee before given date
If none exists then default to the value currently in the master employee table.
How can I do this? My intuition is to select the first row of a result set which has all suitable history records reverse ordered by date and with the value in the master table last (so it's only the first result if there are no suitable history records), but I don't have the required SQL-fu to achieve this.
Note: I am conscious that this may not be the best way to implement this system - I am not able to change this in the short term - though if you can suggest a better way to implement this I'd be glad to hear it.
SELECT COALESCE (
(
SELECT newValue
FROM history
WHERE field = 'Department'
AND rowID = ID
AND changeDate =
(
SELECT MAX(changedate)
FROM history
WHERE field = 'Department'
AND rowID = ID
AND changeDate <= '01/01/2009'
)
), department)
FROM employee
WHERE id = #id
In both Oracle and MS SQL, you can also use this:
SELECT COALESCE(newValue, department)
FROM (
SELECT e.*, h.*,
ROW_NUMBER() OVER (PARTITION BY e.id ORDER BY changeDate) AS rn
FROM employee e
LEFT OUTER JOIN
history h
ON field = 'Department'
AND rowID = ID
AND changeDate <= '01/01/2009'
WHERE e.id = #id
)
WHERE rn = 1
Note, though, that ROWID is reserved word in Oracle, so you'll need to rename this column when porting.
This should work:
select iif(history.newvalue is null, employee.department, history.newvalue)
as Department
from employee left outer join history on history.RowId = employee.Id
and history.changedate < '2008-05-20' // (i.e. given date)
and history.changedate = (select max(changedate) from history h1
where h1.RowId = history.RowId and h1.changedate <= history.changedate)