How do I group by a date range? - sql

I have 3 fields: id, date, treatment. There are 3 types of treatment: Cold, fever, cholera. Assume there are 1000 patients and the first patient's data looks like this
pt treatment_date treatment
A 05-05-2017 Cold
A 05-07-2017 Cold
A 05-09-2017 Fever
A 05-13-2017 Fever
A 05-15-2017 Cholera
A 05-17-2017 Cholera
A 05-19-2017 Cold
A 05-21-2017 Cold
A 05-23-2017 Fever
I need my output to look like this-
pt start_date end_date treatment Number_of_days Conversion_date Days_before_cholera(start date of cholera- end date of treatment immediately before it)
A 05-05-2017 05-07-2017 Cold 2 0 0
A 05-09-2017 05-13-2017 Fever 4 0 0
A 05-15-2017 05-17-2017 Cholera 2 05-13-2017 2
A 05-19-2017 05-21-2017 Cold 2 0 0
A 05-23-2017 05-23-2017 Fever 1 0 0
So goes on for all patient_ids.

This is a "gaps-and-islands" problem. I show you have to handle the calculation of the rows. You can fill in the additional columns.
One way to solve it is using the difference of row numbers:
select pt, min(treatment_date), max(treatment_date), . . .
from (select t.*,
row_number() over (partition by pt order by treatment_date) as seqnum_p,
row_number() over (partition by pt, treatment order by treatment_date) as seqnum_ptt
from t
) t
group by pt, (seqnum_p - seqnum_ptt);

You're going to need to join the table to itself for this one. I'd try something along these lines.
SELECT
a.pt
,a.treatment
,a.treatment_date AS start_date
,CASE /*this is for your last fever row with the same date*/
WHEN b.treatment_date IS NULL
THEN a.treatment_date
ELSE b.treatment_date
END AS end_date
/*other fields here*/
FROM
MyTable a
LEFT JOIN MyTable b
ON a.pt = b.pt
AND a.treatment = b.treatment
WHERE
a.treatment_date < b.treatment_date
/*make sure there isn't any date in between,
this should stop you from joining rows you didn't intend on joining on*/
AND NOT EXISTS (
SELECT
x.treatment_date
FROM
MyTable x
WHERE
a.pt = x.pt
AND a.treatment = x.treatment
AND x.treatment_date < b.treatment_date
AND x.treatment_date > a.treatment_date
)

Related

Separate columns for product counts using CTEs

Asking a question again as my post did not follow community rules.
I first tried to write a PIVOT statement to get the desired output. However, I am now trying to approach this using CTEs.
Here's the raw data. Let's call it ProductMaster:
PRODUCT_NUM
CO_CD
PROD_CD
MASTER_ID
Date
ROW_NUM
1854
MAWC
STATIONERY
10003493039
1/1/2021
1
1567
PREF
PRINTER
10003493039
2/1/2021
2
2151
MAWC
STATIONERY
10003497290
3/2/2021
1
I require the Count of each product for every Household from this data in separate columns, Printer_CT, Stationery_Ct
Each Master_ID represents a household. And a household can have multiple products.
So each household represents one row in my final output and I need the Product Counts in separate columns. There can be multiple products in each household, 4 or even more. But I have simplified this example.
I'm writing a query with CTEs to give me the output that I want. In my output, each row is grouped by Master ID
ORGL_CO_CD
ORGL_PROD_CD
STATIONERY_CT
PRINTER_CT
MAWC
STATIONERY
1
1
MAWC
STATIONERY
1
0
Here's my query. I'm not sure where to introduce Column 'Stationery_Ct'
WITH CTE AS
(
SELECT
CO_CD, Prod_CD, MASTER_ID,
'' as S1_CT, '' as P1_CT
FROM
ProductMaster
WHERE
ROW_NUM = 1
), CTE_2 AS
(
SELECT Prod_CD, MASTER_ID
FROM ProductMaster
WHERE ROW_NUM = 2
)
SELECT
CO_CD AS ORGL_CO_CD,
c.Prod_CD AS ORGL_PROD_CD,
(CASE WHEN c2.Prod_CD = ‘PRINTER’ THEN P1_CT = 1 END) PRINTER_CT
FROM
CTE AS c
LEFT OUTER JOIN
CTE_2 AS c2 ON c.MASTER_ID = c2.MASTER_ID
Any pointers would be appreciated.
Thank you!
I guess you can solve that using just GROUP BY and SUM:
-- Test data
DECLARE #ProductMaster AS TABLE (PRODUCT_NUM INT, CO_CD VARCHAR(30), PROD_CD VARCHAR(30), MASTER_ID BIGINT)
INSERT #ProductMaster VALUES (1854, 'MAWC', 'STATIONERY', 10003493039)
INSERT #ProductMaster VALUES (1567, 'PREF', 'PRINTER', 10003493039)
INSERT #ProductMaster VALUES (2151, 'MAWC', 'STATIONERY', 10003497290)
SELECT
MASTER_ID,
SUM(CASE PROD_CD WHEN 'STATIONERY' THEN 1 ELSE 0 END) AS STATIONERY_CT,
SUM(CASE PROD_CD WHEN 'PRINTER' THEN 1 ELSE 0 END) AS PRINTER_CT
FROM #ProductMaster
GROUP BY MASTER_ID
The result is:
MASTER_ID
STATIONERY_CT
PRINTER_CT
10003493039
1
1
10003497290
1
0

datediff for row that meets my condition only once per row

I want to do a datediff between 2 dates on different rows only if the rows have a condition.
my table looks like the following, with additional columns (like guid)
Id | CreateDateAndTime | condition
---------------------------------------------------------------
1 | 2018-12-11 12:07:55.273 | with this
2 | 2018-12-11 12:07:53.550 | I need to compare this state
3 | 2018-12-11 12:07:53.550 | with this
4 | 2018-12-11 12:06:40.780 | state 3
5 | 2018-12-11 12:06:39.317 | I need to compare this state
with this example I would like to have 2 rows in my selection which represent the difference between the dates from id 5-3 and from id 2-1.
As of now I come with a request that gives me the difference between dates from id 5-3 , id 5-1 and id 2-1 :
with t as (
SELECT TOP (100000)
*
FROM mydatatable
order by CreateDateAndTime desc)
select
DATEDIFF(SECOND, f.CreateDateAndTime, s.CreateDateAndTime) time
from t f
join t s on (f.[guid] = s.[guid] )
where f.condition like '%I need to compare this state%'
and s.condition like '%with this%'
and (f.id - s.id) < 0
My problem is I cannot set f.id - s.id to a value since other rows can be between the ones I want to make the diff on.
How can I make the datediff only on the first rows that meet my conditions?
EDIT : To make it more clear
My condition is an eventname and I want to calculate the time between the occurence of my event 1 and my event 2 and fill a column named time for example.
#Salman A answer is really close to what I want except it will not work when my event 2 is not happening (which was not in my initial example)
i.e. in table like the following , it will make the datediff between row id 5 and row id 2
Id | CreateDateAndTime | condition
---------------------------------------------------------------
1 | 2018-12-11 12:07:55.273 | with this
2 | 2018-12-11 12:07:53.550 | I need to compare this state
3 | 2018-12-11 12:07:53.550 | state 3
4 | 2018-12-11 12:06:40.780 | state 3
5 | 2018-12-11 12:06:39.317 | I need to compare this state
the code I modified :
WITH cte AS (
SELECT id
, CreateDateAndTime AS currdate
, LAG(CreateDateAndTime) OVER (PARTITION BY guid ORDER BY id desc ) AS prevdate
, condition
FROM t
WHERE condition IN ('I need to compare this state', 'with this ')
)
SELECT *
,DATEDIFF(second, currdate, prevdate) time
FROM cte
WHERE condition = 'I need to compare this state '
and DATEDIFF(second, currdate, prevdate) != 0
order by id desc
Perhaps you want to match ids with the nearest smaller id. You can use window functions for this:
WITH cte AS (
SELECT id
, CreateDateAndTime AS currdate
, CASE WHEN LAG(condition) OVER (PARTITION BY guid ORDER BY id) = 'with this'
THEN LAG(CreateDateAndTime) OVER (PARTITION BY guid ORDER BY id) AS prevdate
, condition
FROM t
WHERE condition IN ('I need to compare this state', 'with this')
)
SELECT *
, DATEDIFF(second, currdate, prevdate)
FROM cte
WHERE condition = 'I need to compare this state'
The CASE expression will match this state with with this. If you have mismatching pairs then it'll return NULL.
try by using analytic function lead()
with cte as
(
select 1 as id, '2018-12-11 12:07:55.273' as CreateDateAndTime,'with this' as condition union all
select 2,'2018-12-11 12:07:53.550','I need to compare this state' union all
select 3,'2018-12-11 12:07:53.550','with this' union all
select 4,'2018-12-11 12:06:40.780','state 3' union all
select 5,'2018-12-11 12:06:39.317','I need to compare this state'
) select *,
DATEDIFF(SECOND,CreateDateAndTime,lead(CreateDateAndTime) over(order by Id))
from cte
where condition in ('with this','I need to compare this state')
You Ideally want LEADIF/LAGIF functions, because you are looking for the previous row where condition = 'with this'. Since there are no LEADIF/LAGIFI think the best option is to use OUTER/CROSS APPLY with TOP 1, e.g
CREATE TABLE #T (Id INT, CreateDateAndTime DATETIME, condition VARCHAR(28));
INSERT INTO #T (Id, CreateDateAndTime, condition)
VALUES
(1, '2018-12-11 12:07:55', 'with this'),
(2, '2018-12-11 12:07:53', 'I need to compare this state'),
(3, '2018-12-11 12:07:53', 'with this'),
(4, '2018-12-11 12:06:40', 'state 3'),
(5, '2018-12-11 12:06:39', 'I need to compare this state');
SELECT ID1 = t1.ID,
Date1 = t1.CreateDateAndTime,
ID2 = t2.ID,
Date2 = t2.CreateDateAndTime,
Difference = DATEDIFF(SECOND, t1.CreateDateAndTime, t2.CreateDateAndTime)
FROM #T AS t1
CROSS APPLY
( SELECT TOP 1 t2.CreateDateAndTime, t2.ID
FROM #T AS t2
WHERE t2.Condition = 'with this'
AND t2.CreateDateAndTime > t1.CreateDateAndTime
--AND t2.GUID = t.GUID
ORDER BY CreateDateAndTime
) AS t2
WHERE t1.Condition = 'I need to compare this state';
Which Gives:
ID1 Date1 D2 Date2 Difference
-------------------------------------------------------------------------------
2 2018-12-11 12:07:53.000 1 2018-12-11 12:07:55.000 2
5 2018-12-11 12:06:39.000 3 2018-12-11 12:07:53.000 74
I would enumerate the values and then use window functions for the difference.
select min(id), max(id),
datediff(second, min(CreateDateAndTime), max(CreateDateAndTime)) as seconds
from (select t.*,
row_number() over (partition by condition order by CreateDateAndTime) as seqnum
from t
where condition in ('I need to compare this state', 'with this')
) t
group by seqnum;
I cannot tell what you want the results to look like. This version only output the differences, with the ids of the rows you care about. The difference can also be applied to the original rows, rather than put into summary rows.

Select only Contiguous Records in DB2 SQL

So i have a table of readings (heavily simplified version below) - sometimes there is a break in the reading history (see the record i have flagged as N) - The 'From Read' should always match a previous 'To Read' or the 'To Read' should always match a later 'From Read' BUT I want to only select records as far back as the first 'break' in the reads.
How would i write a query in DB2 SQL to only return the rows flagged with a 'Y'?
EDIT: The contiguous flag is something i have added manually to represent the records i would like to select, it does not exist on the table.
ID From To Contiguous
ABC 01/01/2014 30/06/2014 Y
ABC 01/06/2013 01/01/2014 Y
ABC 01/05/2013 01/06/2013 Y
ABC 01/01/2013 01/02/2013 N
ABC 01/10/2012 01/01/2013 N
Thanks in advance!
J
you will need a recursive select
something like that:
WITH RECURSIVE
contiguous_intervals(start, end) AS (
select start, end
from intervals
where end = (select max(end) from intervals)
UNION ALL
select i.start, i.end
from contiguous_intervals m, intervals i
where i.end = m.start
)
select * from contiguous_intervals;
You can do this with lead(), lag(). I'm not sure what the exact logic is for your case, but I think it is something like:
select r.*,
(case when (prev_to = from or prev_to is null) and
(next_from = to or next_from is null)
then 'Y'
else 'N'
end) as Contiguous
from (select r.*, lead(from) over (partition by id order by from) as next_from,
lag(to) over (partition by id order by to) as prev_to
from readings r
) r;

How to identify subsequent user actions based on prior visits

I want to identify the users who visited section a and then subsequently visited b. Given the following data structure. The table contains 300,000 rows and updates daily with approx. 8,000 rows:
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 1 b 0
2 1 b 0
1 3 b 1
Ideally I want a new column that flags the visit to section b. For example on the third visit User 1 visited section b for the first time. I was attempting to do this using a CASE WHEN statement but after many failed attempts I am not sure it is even possible with CASE WHEN and feel that I should take a different approach, I am just not sure what that approach should be. I do also have a date column at my disposal.
Any suggestions on a new way to approach the problem would be appreciated. Thanks!
Correlated sub-queries should be avoided at all cost when working with Redshift. Keep in mind there are no indexes for Redshift so you'd have to rescan and restitch the column data back together for each value in the parent resulting in an O(n^2) operation (in this particular case going from 300 thousand values scanned to 90 billion).
The best approach when you are looking to span a series of rows is to use an analytic function. There are a couple of options depending on how your data is structured but in the simplest case, you could use something like
select case
when section != lag(section) over (partition by userid order by visitid)
then 1
else 0
end
from ...
This assumes that your data for userid 2 increments the visitid as below. If not, you could also order by your timestamp column
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 *2* b 0
2 *3* b 0
1 3 b 1
select t.*, case when v.ts is null then 0 else 1 end as conversion
from tbl t
left join (select *
from tbl x
where section = 'b'
and exists (select 1
from tbl y
where y.userid = x.userid
and y.section = 'a'
and y.ts < x.ts)) v
on t.userid = v.userid
and t.visitid = v.visitid
and t.section = v.section
Fiddle:
http://sqlfiddle.com/#!15/5b954/5/0
I added sample timestamp data as that field is necessary to determine whether a comes before b or after b.
To incorporate analytic functions you could use:
(I've also made it so that only the first occurrence of B (after an A) will get flagged with the 1)
select t.*,
case
when v.first_b_after_a is not null
then 1
else 0
end as conversion
from tbl t
left join (select userid, min(ts) as first_b_after_a
from (select t.*,
sum( case when t.section = 'a' then 1 end)
over( partition by userid
order by ts ) as a_sum
from tbl t) x
where section = 'b'
and a_sum is not null
group by userid) v
on t.userid = v.userid
and t.ts = v.first_b_after_a
Fiddle: http://sqlfiddle.com/#!1/fa88f/2/0

SQL gaps in dates

I am trying to find gaps in the a table based on a state code the tables look like this.
StateTable:
StateID (PK) | Code
--------------------
1 | AK
2 | AL
3 | AR
StateModel Table:
StateModelID | StateID | EfftiveDate | ExpirationDate
-------------------------------------------------------------------------
1 | 1 | 2012-06-28 00:00:00.000| 2012-08-02 23:59:59.000
2 | 1 | 2012-08-03 00:00:00.000| 2050-12-31 23:59:59.000
3 | 1 | 2055-01-01 00:00:00.000| 2075-12-31 23:59:59.000
The query I am using is the following:
Declare #gapMessage varchar(250)
SET #gapMessage = ''
select
#gapMessage = #gapMessage +
(Select StateTable.Code FROM StateTable where t1.StateID = StateTable.StateID)
+ ' Row ' +CAST(t1.StateModelID as varchar(6))+' has a gap with '+
CAST(t2.StateModelID as varchar(6))+ CHAR(10)
from StateModel t1
inner join StateModel t2
on
t1.StateID = t2.StateID
and DATEADD(ss, 1,t1.ExpirationDate) < t2.EffectiveDate
and t1.EffectiveDate < t2.EffectiveDate
if(#gapMessage != '')
begin
Print 'States with a gap problem'
PRINT #gapMessage
end
else
begin
PRINT 'No States with a gap problem'
end
But with the above table example I get the following output:
States with a gap problem
AK Row 1 has a gap with 3
AK Row 2 has a gap with 3
Is there anyway to restructure my query so that the gap between 1 and 3 does not display because there is not a gap between 1 and 2?
I am using MS sql server 2008
Thanks
WITH
sequenced AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY StateID ORDER BY EfftiveDate) AS SequenceID,
*
FROM
StateModel
)
SELECT
*
FROM
sequenced AS a
INNER JOIN
sequenced AS b
ON a.StateID = b.StateID
AND a.SequenceID = b.SequenceID - 1
WHERE
a.ExpirationDate < DATEADD(second, -1, b.EfftiveDate)
To make this as effective as possible, also add an index on (StateID, EfftiveDate)
I wanted to just give credit to MatBailie, but don't have the points to do it yet, so I thought I would help out anyone else looking for a similar solution that may want to take it a step further like I needed to. I have changed my application of his code (which involves member enrollment) to the same language as the example here.
In my case, I needed these things:
I have two similar tables that I need to develop into one total table. In this example, let's make the tables like this: SomeStates + OtherStates = UpdatedTable. These are UNIONED in the AS clause.
I didn't want to remove any rows due to gaps, but I wanted to flag them on the StateID level. This is added as an additional column 'StateID_GapFlag'.
I also wanted to add a column to hold the oldest or MIN(EffectiveDate). This would be used in later calculations of SUM(period) to get a total duration, excluding gaps. This is the column 'MIN_EffectiveDate'.
;WITH sequenced
( SequenceID
,EffectiveDate
,ExpirationDate)
AS
(select
ROW_NUMBER() OVER (PARTITION BY StateID ORDER by EffectiveDate) as SequenceID,
* from (select EffectiveDate, ExpirationDate from SomeStates
UNION ALL
(select EffectiveDate, ExpirationDate from OtherStates)
) StateModel
where
EffectiveDate > 'filter'
)
Select DISTINCT
IJ1.[MIN_EffectiveDate]
,coalesce(IJ2.GapFlag,'') as [MemberEnrollmentGapFlag]
,EffectiveDate
,ExpirationDate
into UpdatedTable
from sequenced seq
inner join
(select StateID, min(EffectiveDate) as 'MIN_EffectiveDate'
from sequenced
group by StateID
) IJ1
on seq.member# = IJ1.member
left join
(select a.member#, 'GAP' as 'StateID_GapFlag'
from sequenced a
inner join
sequenced b
on a.StateID = b.StateID
and a.SequenceID = (b.sequenceID - 1)
where a.ExpirationDate < DATEADD(day, -1, b.EffectiveDate)
) LJ2
on seq.StateID = LJ2.StateID
You could use ROW_NUMBER to provide an ordering of stateModel's for each state, then check that the second difference for consecutive rows doesn't exceed 1. Something like:
;WITH Models (StateModelID, StateID, Effective, Expiration, RowOrder) AS (
SELECT StateModelID, StateID, EffectiveDate, ExpirationDate,
ROW_NUMBER() OVER (PARTITION BY StateID, ORDER BY EffectiveDate)
FROM StateModel
)
SELECT F.StateModelId, S.StateModelId
FROM Models F
CROSS APPLY (
SELECT M.StateModelId
FROM Models M
WHERE M.RowOrder = F.RowOrder + 1
AND M.StateId = F.StateId
AND DATEDIFF(SECOND, F.Expiration, M.Effective) > 1
) S
This will get you the state model IDs of the rows with gaps, which you can format how you wish.