Select only Contiguous Records in DB2 SQL - sql

So i have a table of readings (heavily simplified version below) - sometimes there is a break in the reading history (see the record i have flagged as N) - The 'From Read' should always match a previous 'To Read' or the 'To Read' should always match a later 'From Read' BUT I want to only select records as far back as the first 'break' in the reads.
How would i write a query in DB2 SQL to only return the rows flagged with a 'Y'?
EDIT: The contiguous flag is something i have added manually to represent the records i would like to select, it does not exist on the table.
ID From To Contiguous
ABC 01/01/2014 30/06/2014 Y
ABC 01/06/2013 01/01/2014 Y
ABC 01/05/2013 01/06/2013 Y
ABC 01/01/2013 01/02/2013 N
ABC 01/10/2012 01/01/2013 N
Thanks in advance!
J

you will need a recursive select
something like that:
WITH RECURSIVE
contiguous_intervals(start, end) AS (
select start, end
from intervals
where end = (select max(end) from intervals)
UNION ALL
select i.start, i.end
from contiguous_intervals m, intervals i
where i.end = m.start
)
select * from contiguous_intervals;

You can do this with lead(), lag(). I'm not sure what the exact logic is for your case, but I think it is something like:
select r.*,
(case when (prev_to = from or prev_to is null) and
(next_from = to or next_from is null)
then 'Y'
else 'N'
end) as Contiguous
from (select r.*, lead(from) over (partition by id order by from) as next_from,
lag(to) over (partition by id order by to) as prev_to
from readings r
) r;

Related

How do I group by a date range?

I have 3 fields: id, date, treatment. There are 3 types of treatment: Cold, fever, cholera. Assume there are 1000 patients and the first patient's data looks like this
pt treatment_date treatment
A 05-05-2017 Cold
A 05-07-2017 Cold
A 05-09-2017 Fever
A 05-13-2017 Fever
A 05-15-2017 Cholera
A 05-17-2017 Cholera
A 05-19-2017 Cold
A 05-21-2017 Cold
A 05-23-2017 Fever
I need my output to look like this-
pt start_date end_date treatment Number_of_days Conversion_date Days_before_cholera(start date of cholera- end date of treatment immediately before it)
A 05-05-2017 05-07-2017 Cold 2 0 0
A 05-09-2017 05-13-2017 Fever 4 0 0
A 05-15-2017 05-17-2017 Cholera 2 05-13-2017 2
A 05-19-2017 05-21-2017 Cold 2 0 0
A 05-23-2017 05-23-2017 Fever 1 0 0
So goes on for all patient_ids.
This is a "gaps-and-islands" problem. I show you have to handle the calculation of the rows. You can fill in the additional columns.
One way to solve it is using the difference of row numbers:
select pt, min(treatment_date), max(treatment_date), . . .
from (select t.*,
row_number() over (partition by pt order by treatment_date) as seqnum_p,
row_number() over (partition by pt, treatment order by treatment_date) as seqnum_ptt
from t
) t
group by pt, (seqnum_p - seqnum_ptt);
You're going to need to join the table to itself for this one. I'd try something along these lines.
SELECT
a.pt
,a.treatment
,a.treatment_date AS start_date
,CASE /*this is for your last fever row with the same date*/
WHEN b.treatment_date IS NULL
THEN a.treatment_date
ELSE b.treatment_date
END AS end_date
/*other fields here*/
FROM
MyTable a
LEFT JOIN MyTable b
ON a.pt = b.pt
AND a.treatment = b.treatment
WHERE
a.treatment_date < b.treatment_date
/*make sure there isn't any date in between,
this should stop you from joining rows you didn't intend on joining on*/
AND NOT EXISTS (
SELECT
x.treatment_date
FROM
MyTable x
WHERE
a.pt = x.pt
AND a.treatment = x.treatment
AND x.treatment_date < b.treatment_date
AND x.treatment_date > a.treatment_date
)

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x

SQL query in ORACLE - select most recent date

I am a newbie in the world of SQL query.
I need to eliminate the duplicate Staff # and retrieve only the highlighted row.
Any help is highly appreciated.
Staff# Pay_DT Due_DT loan_flag housing
------------------------------------------------------------------------
123-45-6789 14-Feb-14 3-Jan-14 Y null
123-45-6789 14-Feb-14 3-Jan-14 Y Annual
123-45-6789 14-Feb-14 13-Jan-14 Y null
**123-45-6789 14-Feb-14 13-Jan-14 Y Annual**
123-45-6789 null null Y null
123-45-6789 null null Y Annual
Perhaps you want
SELECT *
FROM (SELECT a.*,
rank() over (partition by staff# order by pay_dt desc) rnk
FROM table_name a
WHERE housing = 'Annual')
WHERE rnk = 1
Alternately
SELECT *
FROM (SELECT a.*,
max(pay_dt) over (partition by staff#) max_pay_dt
FROM table_name a
WHERE housing = 'Annual')
WHERE pay_dt = max_pay_dt
If you can have ties (two rows with the same pay_dt where the housing column has a value of Annual for a particular Staff# value, both of these queries would return all the rows with that condition. If you want to break the tie arbitrarily, you could use row_number rather than rank. Otherwise, you'd need to tell us what logic you'd want to use to break the ties.

GROUP values separated by specific records

I want to make a specific counter which will raise by one after a specific record is found in a row.
time event revenue counter
13.37 START 20 1
13.38 action A 10 1
13.40 action B 5 1
13.42 end 1
14.15 START 20 2
14.16 action B 5 2
14.18 end 2
15.10 START 20 3
15.12 end 3
I need to find out total revenue for every visit (actions between START and END). I was thinking the best way would be to set a counter like this:
so I could group events. But if you have a better solution, I would be grateful.
You can use a query similar to the following:
with StartTimes as
(
select time,
startRank = row_number() over (order by time)
from events
where event = 'START'
)
select e.*, counter = st.startRank
from events e
outer apply
(
select top 1 st.startRank
from StartTimes st
where e.time >= st.time
order by st.time desc
) st
SQL Fiddle with demo.
May need to be updated based on the particular characteristics of the actual data, things like duplicate times, missing events, etc. But it works for the sample data.
SQL Server 2012 supports an OVER clause for aggregates, so if you're up to date on version, this will give you the counter you want:
count(case when eventname='START' then 1 end) over (order by eventtime)
You could also use the latest START time instead of a counter to group by, like this:
with t as (
select
*,
max(case when eventname='START' then eventtime end)
over (order by eventtime) as timeStart
from YourTable
)
select
timeStart,
max(eventtime) as timeEnd,
sum(revenue) as totalRevenue
from t
group by timeStart;
Here's a SQL Fiddle demo using the schema Ian posted for his solution.

Check previous records to update current record in Oracle

I'm having difficulty figuring out how to check Previous records in order to see if the current record should be updated.
Don't want to use the lag function because I will not have the information on how many records to go back.
I have a table that contains Employee Raise information. I want to Put a X in the IND field if there has been a previous Merit increase PCT greater than the current Merit increase within the last 6 months. The current Record is the 2012/05 record.
Emp Action Date Code proj PCT Ind
====================================================
123 raise 2012/01 COL acct 2
123 raise 2012/01 Merit soft 7
123 raise 2012/02 Merit Acct 4
123 Raise 2012/05 merit soft 6 ?
It's not particularly efficient but you can use a brute force approach
UPDATE <<table_name>> a
SET ind = 'X'
WHERE <<date column>> = (SELECT MAX(<<date column>>
FROM <<table name>> b
WHERE a.emp = b.emp)
AND EXISTS( SELECT 1
FROM <<table name>> c
WHERE a.emp = c.emp
AND c.code = 'Merit'
AND c.action = 'raise'
AND c.pct > a.pct
AND c.<<date column>> > sysdate - interval '6' month
AND c.rowid != a.rowid);
If you are just looking for a query rather than an update, this might work
Select
emp,
action,
date,
code,
project,
pct,
case when max(case when code='Merit' and action='raise' then pct end)
 over (partition by emp order by date
range between interval '6' month preceding and current row
) > pct then 'X' end as ind
From the_table
I don't have a database to test this against right now so I'm not entirely sure this will work but think it should. Edit: worked out how to get SQL Fiddle going on my iPad. This seems to work, however it will put an X in all the rows that meet the condition and not just the most recent.