delete duplicate rows from table row number partition by - sql

delete duplicates per com_id:
I have to write a generic DELETE that will remove vald records to the end of the world (vald_to = 9999-01-01)
if:
tar_id = -1
vald_from = 0001-01-01
there is another record for this com_id with vald_to = 9999-01-01
The query is:
delete from C
where (COM_ID, VALD_TO) in
(
select
COM_ID,
VALD_TO,
row_number()
over
(partition by COM_IDorder by VALD_TO DESC) dup
from C
where
tar_id=-1
and
vald_from = 0001-01-01
and
dup > 1
);
The script removes all records for the com_id

You could use rowid, this delete worked for me:
delete from c
where rowid in (
select rwd
from (
select rowid rwd,
row_number() over (partition by com_id order by null) dup
from c
where tar_id = -1
and vald_from = date '0001-01-01'
and vald_to = date '9999-01-01' )
where dup > 1 );

I don't really understand what you consider a duplicate you want removed. So just guessing: You are considering a record a duplicate, when its range is covered by another record. E.g.:
COM_ID vald_from vald_to
123 0001-01-01 9999-01-01
123 0001-01-01 2017-01-01
The second record is superfluous, because its range is only part of the larger range of the first record. But the same would be true for:
COM_ID vald_from vald_to
123 2016-01-01 2017-01-01
123 2016-02-01 2016-03-01
Again, the second record's range is only part of the first one's.
A query to remove those superfluous records would be:
delete from c
where exists
(
select *
from c other
where other.com_id = c.com_id
and other.vald_from <= c.vald_from
and other.vald_to >= c.vald_to
and other.rowid <> c.rowid
)
and tar_id = -1;
I don't know, whether you want tar_id = -1 inside the subquery, too. And maybe you still want to restrict this somehow to dates date '0001-01-01' and date '9999-01-01'. If so, adjust the statement accordingly.

Related

SQL conditional join with subselect

I do not understand why this code isn't working. I have a join which needs a different join condition depending on the result of a subselect.
When the term date is null, I want this to come after the and operator: and vs.term_date is null
When the term date is not null, I want the subquery below to come after the and operator. I need to join on the max effective date: vs.eff_date = (subquery)
I realize this code is messy, but I have to aggregate with a subselect since there are multiple rows coming back and I only want one. (but open to other solutions)
select *
From loc
inner join VendorSite vs
on loc.Record_Number = vs.SITE_NO
and --conditional join
case
when
(
case
when --if term date is null i want this after "and"
(select top 1 TERM_DATE
from VendorSite vs2
where vs2.SITE_NO = vs.SITE_NO
order by TERM_DATE asc) is null
then 1
else 0
end
) = 1
then vs.term_date is null
else --when term_date isn't null use max eff date
vs.eff_date =
(select max(eff_date)
from VendorSite vs2
where vs2.SITE_NO = loc.Record_Number)
end
Example: From the below dataset, I would only want to see the row where term_date is null come back.
site_no eff_date term_date
13588 2007-01-01 00:00:00.000 NULL
13588 2007-03-01 00:00:00.000 2007-11-09 00:00:00.000

Delete rows where date was least updated

How can I delete rows where dateupdated was least updated ?
My table is
Name Dateupdated ID status
john 1/02/17 JHN1 A
john 1/03/17 JHN2 A
sally 1/02/17 SLLY1 A
sally 1/03/17 SLLY2 A
Mike 1/03/17 MK1 A
Mike 1/04/17 MK2 A
I want to be left with the following after the data removal:
Name Date ID status
john 1/03/17 JHN2 A
sally 1/03/17 SLLY2 A
Mike 1/04/17 MK2 A
If you really want to "delete rows where dateupdated was least updated" then a simple single-row subquery should do the trick.
DELETE MyTable
WHERE Date = (SELECT MIN(Date) From MyTable)
If on the other hand you just want to delete the row with the earliest Date per person (as identified by their ID) you could use:
DELETE MyTable
FROM MyTable a
JOIN (SELECT ID, MIN(Date) MinDate FROM MyTable GROUP BY ID) b
ON a.ID = b.ID AND a.Date = b.MinDate
The idea here is you create an aggregate query that returns rows containing the columns that would match the rows you want deleted, then join to it. Because it's an inner join, rows that do not match the criteria will be excluded.
If people are uniquely identified by something else (e.g. Name then you can just substitute that for the ID in my example above.
I am thinking though that you don't want either of these. I think you want to delete everything except for each person's latest row. If that is the case, try this:
DELETE MyTable
WHERE EXISTS (SELECT 0 FROM MyTable b WHERE b.ID = MyTable.ID AND b.Date > MyTable.Date)
The idea here is you check for existence of another data row with the same ID and a later date. If there is a later record, delete this one.
The nice thing about the last example is you can run it over and over and every person will still be left with exactly one row. The other two queries, if run over and over, will nibble away at the table until it is empty.
P.S. As these are significantly different solutions, I suggest you spend some effort learning how to articulate unambiguous requirements. This is an extremely important skill for any developer.
This deletes rows where the name is a duplicate, and deletes all but the latest row for each name. This is different from your stated question.
Using a common table expression (cte) and row_number():
;with cte as (
select *
, rn = row_number() over (
partition by Name
order by Dateupdated desc
)
from t
)
/* ------------------------------------------------
-- Remove duplicates by deleting rows
-- where the row number (rn) is greater than 1
-- leaving the first row for each partition
------------------------------------------------ */
delete
from cte
where cte.rn > 1
select * from t
rextester: http://rextester.com/HZBQ50469
returns:
+-------+-------------+-------+--------+
| Name | Dateupdated | ID | status |
+-------+-------------+-------+--------+
| john | 2017-01-03 | JHN2 | A |
| sally | 2017-01-03 | SLLY2 | A |
| Mike | 2017-01-04 | MK2 | A |
+-------+-------------+-------+--------+
Without using the cte it can be written as:
delete d
from (
select *
, rn = row_number() over (
partition by Name
order by Dateupdated desc
)
from t
) as d
where d.rn > 1
This should do the trick:
delete
from MyTable a
where not exists (
select top 1 1
from MyTable b
where b.name = a.name
and b.DateUpdated < a.DateUpdated
)
i.e. remove any entries from the table for which there is no record on the same name with a date earlier than the record to be deleted's.
Your Name column has Mike and Mik2 which is different for each other.
So, if you did not make a mistake, standard column to group by must be ID column without last digit.
I think following is more accurate if you did not mistaken.
delete a
from MyTable a
inner join
(select substring(ID, 1, len(ID) - 1) as ID, min(Dateupdated) as MinDate
from MyTable
group by substring(ID, 1, len(ID) - 1)
) b
on substring(a.ID, 1, len(a.ID) - 1) = b.ID and a.Dateupdated = b.MinDate
You can test it at SQLFiddle: http://sqlfiddle.com/#!6/9c440/1

update in oracle sql : multiple rows in 1 table

I am new to SQL and I am no good with more advanced queries and functions.
So, I have this 1 table with sales:
id date seller_name buyer_name
---- ------------ ------------- ------------
1 2015-02-02 null Adrian
1 2013-05-02 null John B
1 2007-11-15 null Chris F
2 2014-07-12 null Jane A
2 2011-06-05 null Ted D
2 2010-08-22 null Maryanne A
3 2015-12-02 null Don P
3 2012-11-07 null Chris T
3 2011-10-02 null James O
I would like to update the seller_name for each id, by putting the buyer_name from previous sale as seller_name to newer sale date. For example, for on id 1 John B would then be seller in 2015-02-02 and buyer in 2013-05-02. Does that make sense?
P.S. This is the perfect case, the table is big and the ids are not ordered so neat.
merge into your_table a
using ( select rowid rid,
lead(buyer_name, 1) over (partition by id order by date desc) seller
from your_table
) b
on (a.rowid = b.rid )
when matched then update set a.seller_name= b.seller;
Explanation : Merge into statement performs different operations based on matched or not matched criterias. Here you have to merge into your table, in the using having the new values that you want to take and also the rowid which will be your matching key. The lead function gets the result from the next n rows depending on what number you specify after the comma. After specifying how many rows to jump you also specify on what part to work, which in your case is partitioned by id and ordered by date so you can get the seller, who was the previous buyer. Hope this clears it up a bit.
Either of the below query can be used to perform the desire action
merge into sandeep24nov16_2 table1
using(select rowid r, lag(buyer_name) over (partition by id order by "DATE" asc) update_value from sandeep24nov16_2 ) table2
on (table1.rowid=table2.r)
when matched then update set table1.seller_name=table2.update_value;
or
merge into sandeep24nov16_2 table1
using(select rowid r, lead(buyer_name) over (partition by id order by "DATE" desc) update_value from sandeep24nov16_2 ) table2
on (table1.rowid=table2.r)
when matched then update set table1.seller_name=table2.update_value;
select a.*,
lag(buyer_name, 1) over(partition by id order by sale_date) seller_name
from <your_table> a;

Select only Contiguous Records in DB2 SQL

So i have a table of readings (heavily simplified version below) - sometimes there is a break in the reading history (see the record i have flagged as N) - The 'From Read' should always match a previous 'To Read' or the 'To Read' should always match a later 'From Read' BUT I want to only select records as far back as the first 'break' in the reads.
How would i write a query in DB2 SQL to only return the rows flagged with a 'Y'?
EDIT: The contiguous flag is something i have added manually to represent the records i would like to select, it does not exist on the table.
ID From To Contiguous
ABC 01/01/2014 30/06/2014 Y
ABC 01/06/2013 01/01/2014 Y
ABC 01/05/2013 01/06/2013 Y
ABC 01/01/2013 01/02/2013 N
ABC 01/10/2012 01/01/2013 N
Thanks in advance!
J
you will need a recursive select
something like that:
WITH RECURSIVE
contiguous_intervals(start, end) AS (
select start, end
from intervals
where end = (select max(end) from intervals)
UNION ALL
select i.start, i.end
from contiguous_intervals m, intervals i
where i.end = m.start
)
select * from contiguous_intervals;
You can do this with lead(), lag(). I'm not sure what the exact logic is for your case, but I think it is something like:
select r.*,
(case when (prev_to = from or prev_to is null) and
(next_from = to or next_from is null)
then 'Y'
else 'N'
end) as Contiguous
from (select r.*, lead(from) over (partition by id order by from) as next_from,
lag(to) over (partition by id order by to) as prev_to
from readings r
) r;

Retrieve/update rows with a minimal deviation in a certain column value

I have a database table with one column being dates. However, some of the rows should share the same date but due to lag on insertion there's a one second difference between them. The insert part has been fixed already but the current data in the table needs to be fixed as well.
As an example the following data is present:
2008-10-08 12:23:01 1 1 x
2008-10-08 12:23:01 1 2 y
2008-10-08 12:23:02 1 3 z
Now I want to update the last row in this example and set the date to '2008-10-08 12:23:01'.
The best way I can think of is writing an external script to do that. It's tricky to determine which columns are correct and which should be updated without having more control over the grouping. Pseudo-code:
all_rows = SELECT * FROM table ORDER BY date
last_date = NULL
rows_to_update = []
for row in all_rows:
if last_date is NULL or row.date - last_date > X seconds:
set date to last_date for all rows from rows_to_update
last_date = row.date
rows_to_update = []
else if row.date != last_date:
rows_to_update += row
Alternatively, something like this could work, but you might need more than one run if want to handle cases where all three dates are different and you want to normalize two of them to the first one.
UPDATE
tbl t,
(SELECT
t.date,
(SELECT min(date)
FROM tbl
WHERE timestampdiff(SECOND,date,t.date) BETWEEN 1 AND 3) AS new_date
FROM tbl t) t2
SET t.date=t2.new_date
WHERE t.date=t2.date AND t2.new_date IS NOT NULL
For all rows::.
update yourtable set date_added=date_added-'01';
for a specific row add a where clause
due to lag in insertion
Why don't you get the date for insert before inserting/updating the first row and use that for all the other rows?
Assuming you have this structure:
create table tbl(id int identity, dt datetime)
insert into tbl (dt) values('2009-10-08 12:23:01')
insert into tbl (dt) values('2009-10-08 12:23:01')
insert into tbl (dt) values('2009-10-08 12:23:02')
insert into tbl (dt) values('2009-10-08 12:23:05')
insert into tbl (dt) values('2009-10-08 12:23:05')
insert into tbl (dt) values('2009-10-08 12:23:06')
This query will only show the last item of each set that's 1 second late:
select distinct A.* from tbl A
join (select * from tbl) AS T on datediff(ss, T.dt, A.dt) = 1
Using that in conjunction with an UPDATE statement, you get this:
update tbl set dt = (select top 1 dt from tbl where tbl.id < A.id order by tbl.id desc)
from tbl A
join (select * from tbl) AS T on datediff(ss, T.dt, A.dt) = 1
And that updates the last record of each set to the date above it, giving the results:
1 2009-10-08 12:23:01.000
2 2009-10-08 12:23:01.000
3 2009-10-08 12:23:01.000
4 2009-10-08 12:23:05.000
5 2009-10-08 12:23:05.000
6 2009-10-08 12:23:05.000
Its quick and dirty and unoptimized, but for a once-off data-scrub it should work.
Remember to back up!