Retrieve/update rows with a minimal deviation in a certain column value - sql

I have a database table with one column being dates. However, some of the rows should share the same date but due to lag on insertion there's a one second difference between them. The insert part has been fixed already but the current data in the table needs to be fixed as well.
As an example the following data is present:
2008-10-08 12:23:01 1 1 x
2008-10-08 12:23:01 1 2 y
2008-10-08 12:23:02 1 3 z
Now I want to update the last row in this example and set the date to '2008-10-08 12:23:01'.

The best way I can think of is writing an external script to do that. It's tricky to determine which columns are correct and which should be updated without having more control over the grouping. Pseudo-code:
all_rows = SELECT * FROM table ORDER BY date
last_date = NULL
rows_to_update = []
for row in all_rows:
if last_date is NULL or row.date - last_date > X seconds:
set date to last_date for all rows from rows_to_update
last_date = row.date
rows_to_update = []
else if row.date != last_date:
rows_to_update += row
Alternatively, something like this could work, but you might need more than one run if want to handle cases where all three dates are different and you want to normalize two of them to the first one.
UPDATE
tbl t,
(SELECT
t.date,
(SELECT min(date)
FROM tbl
WHERE timestampdiff(SECOND,date,t.date) BETWEEN 1 AND 3) AS new_date
FROM tbl t) t2
SET t.date=t2.new_date
WHERE t.date=t2.date AND t2.new_date IS NOT NULL

For all rows::.
update yourtable set date_added=date_added-'01';
for a specific row add a where clause

due to lag in insertion
Why don't you get the date for insert before inserting/updating the first row and use that for all the other rows?

Assuming you have this structure:
create table tbl(id int identity, dt datetime)
insert into tbl (dt) values('2009-10-08 12:23:01')
insert into tbl (dt) values('2009-10-08 12:23:01')
insert into tbl (dt) values('2009-10-08 12:23:02')
insert into tbl (dt) values('2009-10-08 12:23:05')
insert into tbl (dt) values('2009-10-08 12:23:05')
insert into tbl (dt) values('2009-10-08 12:23:06')
This query will only show the last item of each set that's 1 second late:
select distinct A.* from tbl A
join (select * from tbl) AS T on datediff(ss, T.dt, A.dt) = 1
Using that in conjunction with an UPDATE statement, you get this:
update tbl set dt = (select top 1 dt from tbl where tbl.id < A.id order by tbl.id desc)
from tbl A
join (select * from tbl) AS T on datediff(ss, T.dt, A.dt) = 1
And that updates the last record of each set to the date above it, giving the results:
1 2009-10-08 12:23:01.000
2 2009-10-08 12:23:01.000
3 2009-10-08 12:23:01.000
4 2009-10-08 12:23:05.000
5 2009-10-08 12:23:05.000
6 2009-10-08 12:23:05.000
Its quick and dirty and unoptimized, but for a once-off data-scrub it should work.
Remember to back up!

Related

SQL - for each entry in a table - check for associated row

I have a log table which logs a start row, and a finish row for a particular event.
Each event should have a start row, and if everything goes ok it should have an end row.
But if something goes wrong then the end row may not be created.
I want to SELECT everything in the table that has a start row but not an associated end row.
For example, consider the table like this:
id event_id event_status
1 123 1
2 123 2
3 234 1
4 234 2
5 456 1
6 678 1
7 678 2
Notice that the id column 5 has a start row but no end row. Start is an event_status of 1, end is an event_status of 2.
How can i pull back all the event_ids which have a start row but not an end row>?
This is for mssql.
You could use a not exists subquery to demand that no other row exists that ends the event:
select *
from YourTable t1
where status = 1
and not exists
(
select *
from YourTable t2
where t2.event_id = t1.event_id
and t2.status = 2
)
You can try with left self join as below:
select y1.event_id from #yourevents y1 left join #yourevents y2
on y1.event_id = y2.event_id
and y1.event_status = 1
and y2.event_status = 2
where y2.event_id is null
and y1.event_status = 1
In this particular case you could use one of 3 solutions:
Solution 1. The classic
Check if there is no end status
SELECT *
FROM myTable t1
WHERE NOT EXISTS (
SELECT *
FROM myTable t2
WHERE t1.event_id = t2.event_id AND t2.status=2
)
Solution 2. Make it pretty. Don't do subqueries with so many parentheses
The same check, but in a more concise and pretty manner
SELECT t1.*
FROM myTable t1
LEFT JOIN myTable t2 ON t1.event_id = t2.event_id AND t2.status=2
-- Doesn't exist
WHERE t2.event_id IS NULL
Solution 3. Look for the last status for each event
More flexibility in case the status logic becomes more complicated
WITH last_status AS (
SELECT
id,
event_id,
status,
-- The ROWS BETWEEN ..yadda yadda ... FOLLOWING might be unnecessary. Try, check.
last_value(status) OVER (PARTITION BY event_id ORDER BY status ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_status
FROM myTable
)
SELECT
id,
event_id,
status
FROM last_events
WHERE last_status<>2
There are more, with min/max queries and others. Pick what best suits your need for cleanliness, readability and versatility.

Update based on order

Is it possible to update data based on priority defined in column.
I have input data like this
id Start_date active_flag
1 21-03-2013 N
1 23-03-2013 N
1 22-02-2013 N
1 20-02-2013 N
we have to maintain SC2 in our data and have to keep the data for latest date ( i,e 23-02-2013 here) as active in our database.
we will be getting files daily but in some case, we can get files with combined data for 2 days. now I have to make sure all the history is maintained and data with the latest date as active.
My target data will look like
id Start_date active_flag
1 21-03-2013 N
1 23-03-2013 Y
1 22-02-2013 N
1 20-02-2013 N
but how to write an update which can update data for the column id , based on the order of Start_date.
Thanks in advance
CREATE TABLE #tst(id int,start_data datetime,active_flage varchar(2))
insert into #tst
SELECT 1,'2013-03-21','' UNION
SELECT 1,'2013-03-23','' UNION
SELECT 1,'2013-03-20','' UNION
SELECT 1,'2013-03-19','' UNION
SELECT 1,'2013-03-18',''
UPDATE #tst set active_flage=CASE when r_Id=1 then 'Y' ELSE 'N' END
FROM #tst a JOIN
(SELECT ROW_NUMBER() over(PARTITION by id order by start_data desc) as r_Id,* from #tst)b
ON a.Id=b.Id AND a.start_data=b.start_data
select * from #tst
Considering there will not be duplicate date for same id

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x

How to subtracts values between from different dates in SQL?

Let's say that I'm using the following SQL table called TestTable:
Date Value1 Value2 Value3 ... Name
2013/01/01 1 4 7 Name1
2013/01/14 6 10 8 Name1
2013/02/23 10 32 9 Name1
And I'd like to get the increment of the values between to dates, like:
Value1Inc Value2Inc Value3Inc Name
4 22 1 Name1
between 2013/02/23 and 2013/01/14.
Please note that the values always increment. I'm trying the following approach found in StackOverflow:
select (
(select value1 from TestTable where date < '2013/01/14') -
(select value1 from TestTable where date < '2013/02/23')
) as Value1Inc,
(select value2 from TestTable where date < '2013/01/14') -
(select value2 from TestTable where date < '2013/02/23')
as Value2Inc
...
and so on, but this approach gives me a huge query.
I'd like to use MAX & MIN SQL functions in order to simplify the query, but I don't know exaclty how to do, as I'm not a SQL maste (at least yet:-).
Could you please guys give me a hand here?
Edit: Ups, I think that I have found the solution by myselft by adding a "GROUP BY Name" at the end of the query like this:
select name,max(value1) - min(value1) from TestTable where date < '2013-02-23' and date > '2013-01-01' GROUP BY Name
That was it!
You want to match the next record, using a join. Probably the easiest way is to enumerate and join:
with tt as (
select tt.*, row_number() over (partition by name order by date) as seqnum
from testtable tt
)
select tt.name, tt.date, ttnext.date as nextdate,
(ttnext.value1 - tt.value1) as Diff_Value1,
(ttnext.value2 - tt.value2) as Diff_Value2,
(ttnext.value3 - tt.value3) as Diff_Value2
from tt left outer join
tt ttnext
on tt.seqnum = ttnext.seqnum - 1;
If your database does not support row_number(), you can do something similar with correlated subqueries.

Help in correcting the SQL

I am trying to solve this query. I have the following data:
Input
Date Id Value
25-May-2011 1 10
26-May-2011 1 10
26-May-2011 2 10
27-May-2011 1 20
27-May-2011 2 20
28-May-2011 1 10
I need to query and output as:
Output
FromDate ToDate Id Value
25-May-2011 26-May-2011 1 10
26-May-2011 26-May-2011 2 10
27-May-2011 27-May-2011 1 20
28-May-2011 28-May-2011 1 10
I tried this sql but I'm not getting the correct result:
SELECT START_DATE, END_DATE, A.KEY, B.VALUE FROM
(
SELECT MIN(DATE) START_DATE, KEY, VALUE
FROM
KEY_VALUE
GROUP
BY KEY,VALUE
) A INNER JOIN
(
SELECT MAX(DATE) END_DATE, KEY, VALUE
FROM
KEY_VALUE
GROUP
BY KEY, VALUE
) B ON A.KEY = B.KEY AND A.VALUE = B.VALUE;
I think that you are trying too hard. Should be more like this:
SELECT MIN(START_DATE) AS FromDate, MAX(END_DATE) AS ToDate, KEY, VALUE
FROM KEY_VALUE
GROUP BY KEY, VALUE
This query appears to produce the correct results, though it pointed out that you missed a line in your example output '27-May-2011 ... 27-May-2011 ... 2 ... 20'.
select id, [value], date as fromdate, (
select top 1 date
from key_value kv2
where id = kv.id
and [value] = kv.[value]
and date >= kv.date
and datediff(d, kv.date, date) = (
select count(*)
from key_value
where id = kv.id
and [value] = kv.[value]
and date > kv.date
and date <= kv2.date
)
order by date desc
) as todate
from key_value kv
where not exists (
select *
from key_value
where id = kv.id
and [value] = kv.[value]
and date = dateadd(d, -1, kv.[date])
)
First it finds the min date records with the where clause, looking for records that do not have another record on the day before. Then the todate subquery gets the greatest date record by finding the number of days between it and min date then finding the number of records between the two and making sure they match. This of course assumes that the records in the table are distinct.
However if you are processing a massive table your best option may be to sort the records by key, id, date and then use a cursor to programmatically find the min and max dates as you loop over and look for values to change, then push them into a new table whether real or temp along with any other calculations you might need to do on other fields along the way.