How to create a start and end date with no gaps from one date column and to sum a value within the dates - sql

I am new SQL coding using in SQL developer.
I have a table that has 4 columns: Patient ID (ptid), service date (dt), insurance payment amount (insr_amt), out of pocket payment amount (op_amt). (see table 1 below)
What I would like to do is (1) create two columns "start_dt" and "end_dt" using the "dt" column where if there are no gaps in the date by the patient ID then populate the start and end date with the first and last date by patient ID, however if there is a gap in service date within the patient ID then to create the separate start and end date rows per patient ID, along with (2) summing the two payment amounts by patient ID with in the one set of start and end date visits (see table 2 below).
What would be the way to run this using SQL code in SQL developer?
Thank you!
Table 1:
Ptid
dt
insr_amt
op_amt
A
1/1/2021
30
20
A
1/2/2021
30
10
A
1/3/2021
30
10
A
1/4/2021
30
30
B
1/6/2021
10
10
B
1/7/2021
20
10
C
2/1/2021
15
30
C
2/2/2021
15
30
C
2/6/2021
60
30
Table 2:
Ptid
start_dt
end_dt
total_insr_amt
total_op_amt
A
1/1/2021
1/4/2021
120
70
B
1/6/2021
1/7/2021
30
20
C
2/1/2021
2/2/2021
30
60
C
2/6/2021
2/6/2021
60
30

You didn't mention the specific database so this solution works in PostgreSQL. You can do:
select
ptid,
min(dt) as start_dt,
max(dt) as end_dt,
sum(insr_amt) as total_insr_amt,
sum(op_amt) as total_op_amt
from (
select *,
sum(inc) over(partition by ptid order by dt) as grp
from (
select *,
case when dt - interval '1 day' = lag(dt) over(partition by ptid order by dt)
then 0 else 1 end as inc
from t
) x
) y
group by ptid, grp
order by ptid, grp
Result:
ptid start_dt end_dt total_insr_amt total_op_amt
----- ---------- ---------- -------------- -----------
A 2021-01-01 2021-01-04 120 70
B 2021-01-06 2021-01-07 30 20
C 2021-02-01 2021-02-02 30 60
C 2021-02-06 2021-02-06 60 30
See running example at DB Fiddle 1.
EDIT for Oracle
As requested, the modified query that works in Oracle is:
select
ptid,
min(dt) as start_dt,
max(dt) as end_dt,
sum(insr_amt) as total_insr_amt,
sum(op_amt) as total_op_amt
from (
select x.*,
sum(inc) over(partition by ptid order by dt) as grp
from (
select t.*,
case when dt - 1 = lag(dt) over(partition by ptid order by dt)
then 0 else 1 end as inc
from t
) x
) y
group by ptid, grp
order by ptid, grp
See running example at db<>fiddle 2.

Related

SQL: Deduce Effective Pricing from overlapping Dates

I have pricing record with overlapping dates. On few dates there are more than one overlapping prices. Please follow the example below:
Example on 2022/02/15 there are 2 prices 10 and 8 .
article
price
startdate
enddate
123
10
2022/02/02
2049/12/31
123
8
2022/02/14
2022/09/14
123
5
2022/03/14
2022/04/06
123
4
2022/04/11
2022/04/27
I want to apply the effective price for date ranges like below and avoid conflicting prices in the output.
article
price
startdate
enddate
123
10
2022/02/02
2022/02/13
123
8
2022/02/14
2022/03/13
123
5
2022/03/14
2022/04/06
123
8
2022/04/07
2022/04/10
123
4
2022/04/11
2022/04/27
123
8
2022/04/28
2022/09/14
123
10
2022/09/15
2049/12/31
I can think of window functions to adjust the end dates and prices, but I cannot wrap my head around the problem completely to get the complete solution. Any suggestion/solution is appreciated.
Database: Snowflake
Thank you
Using the logic of new starting price window wins for overlaps.
Discreate Date version:
with data(article,price,startdate,enddate) as (
select * FROM VALUES
(123, 10, '2022-02-02'::date, '2049-12-31'::date),
(123, 8, '2022-02-14'::date, '2022-09-14'::date),
(123, 5, '2022-03-14'::date, '2022-04-06'::date),
(123, 4, '2022-04-11'::date, '2022-04-27'::date)
), dis_times as (
select article,
date as startdate,
lead(date) over(partition by article order by date)-1 as enddate
from (
select distinct article, startdate as date from data
union
select distinct article, enddate+1 as date from data
)
qualify enddate is not null
)
select
d1.article,
d1.price,
d2.startdate,
d2.enddate
from data as d1
join dis_times as d2
on d1.article = d2.article
and d2.startdate between d1.startdate and d1.enddate qualify row_number() over (partition by d1.article, s_startdate order by d1.startdate desc) = 1
order by 1,3;
gives:
ARTICLE
PRICE
S_STARTDATE
S_ENDDATE
123
10
2022-02-02
2022-02-13
123
8
2022-02-14
2022-03-13
123
5
2022-03-14
2022-04-06
123
8
2022-04-07
2022-04-10
123
4
2022-04-11
2022-04-27
123
8
2022-04-28
2022-09-14
123
10
2022-09-15
2049-12-31
Continuous Timestamp version:
with data(article,price,startdate,enddate) as (
select * FROM VALUES
(123, 10, '2022-02-02'::date, '2049-12-31'::date),
(123, 8, '2022-02-14'::date, '2022-09-14'::date),
(123, 5, '2022-03-14'::date, '2022-04-06'::date),
(123, 4, '2022-04-11'::date, '2022-04-27'::date)
), dis_times as (
select article,
date as startdate,
lead(date) over(partition by article order by date) as enddate
from (
select distinct article, startdate as date from data
union
select distinct article, enddate as date from data
)
qualify enddate is not null
)
select
d1.article,
d1.price,
d2.startdate,
d2.enddate
from data as d1
join dis_times as d2
on d1.article = d2.article
and d2.startdate >= d1.startdate and d2.startdate < d1.enddate
qualify row_number() over (partition by d1.article, s_startdate order by d1.startdate desc) = 1
order by 1,3;
which gives:
ARTICLE
PRICE
S_STARTDATE
S_ENDDATE
123
10
2022-02-02
2022-02-14
123
8
2022-02-14
2022-03-14
123
5
2022-03-14
2022-04-06
123
8
2022-04-06
2022-04-11
123
4
2022-04-11
2022-04-27
123
8
2022-04-27
2022-09-14
123
10
2022-09-14
2049-12-31
Thanks to MatBailie for the tighter join suggestion.
join dis_times as d2
on d1.article = d2.article
and d2.startdate between d1.startdate and d1.enddate
the continuous range I would normally do in this for
and d2.startdate between d1.startdate and d1.enddate and d2.startdate < d1.enddate
instead of this form
and d2.startdate >= d1.startdate and d2.startdate < d1.enddate
because I in experience it performed better. always test your complexities.
First thing I did was --I turned your price-per-date range data into a price-per-date lookup table.
create or replace temporary table price_date_lookup as
select distinct
article,
dateadd('day',b.index-1,start_date) as dates,
first_value(price) over (partition by article, dates order by end_date) as price
from my_table,
lateral split_to_table(repeat('.',datediff(day,start_date,end_date)), '.') b;
Notes:
first_value handles overlaps by overriding prices based on their end dates.
lateral... basically helps create a date column with all the days in the range
As soon as I created that table, I figured the rest could be approached like a gaps and island problem.
with cte1 as
(select *, case when lag(price) over (partition by article order by dates)=price then 0 else 1 end as price_start --flag start of a new price island
from price_date_lookup),
cte2 as
(select *, sum(price_start) over (partition by article order by dates) as price_id --assign id to all the price islands
from cte1)
select article,
price,
min(dates) as start_date,
max(dates) as end_date
from cte2
group by article,price,price_id;

Group Records based on predefined date range in SQL (Oracle)

Is it possible to group records based on a predefined date range differences (e.g. 30 days) based on the start_date of a row and the end_date of the previous row for non-consecutive dates? I want to take the min(start_date) and max(end_date) of each group. I tried the lead and lag function with partition by in Oracle but couldn't come up with a proper solution. A related but unanswered post related to my question can be found here.
E.g.
ROW_NUM PROJECT_ID START_DATE END_DATE
1 1 2016-01-14 2016-08-15
2 1 2016-08-16 2016-09-10 --- Date diff Row 1&2 = 1 Day
3 1 2016-11-15 2017-01-10 --- Date diff Row 2&3 = 66 Days
4 1 2016-01-17 2017-04-10 --- Date diff Row 3&4 = 7 Days
5 2 2018-04-28 2018-06-01 --- Other Project
6 2 2019-02-01 2019-04-05 --- Diff > 30 Days
7 2 2019-04-08 2019-07-28 --- Diff 3 Days
Expected Result:
ROW_NUM PROJECT_ID START_DATE END_DATE
1 1 2016-01-14 2016-09-10
3 1 2016-11-15 2017-04-10
5 2 2018-04-28 2018-06-01
6 2 2019-02-01 2019-07-28
Use lag() and a cumulative sum to define where the groups begin. Then aggregate:
select project_id, min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date > start_date - interval '30' day then 0 else 1 end) over
(partition by project_id order by start_date) as grp
from (select t.*,
lag(end_date) over (partition by project_id order by start_date) as prev_end_date
from t
) t
) t
group by project_id, grp;

cumlative sum missing values of the month in sql

i have input data below
date amount
01-01-2020 10
01-02-2020 15
01-03-2020 10
01-05-2020 20
01-06-2020 30
01-08-2020 5
01-09-2020 6
01-10-2020 10
select sum(date),over(partition date) from table;
after add the missing month values i need output
output
Date amount cum_sum
01-01-2020 10 10
01-02-2020 15 25
01-03-2020 10 35
01-04-2020 0 35
01-05-2020 20 55
01-06-2020 30 85
01-07-2020 0 85
01-08-2020 5 90
01-09-2020 6 96
01-10-2020 10 106
You would typically generate the dates with a recursive query, then use window functions.
You don't tell which database you use. The exact syntax of recursive queries and date artithmetics varies across vendors, but here is what it would look like:
with recursive all_dates (dt, max_dt) as (
select min(date) dt, max(date) max_dt from mytable
union all
select dt + interval '1' day, max_dt from all_dates where dt < max_dt
)
select d.dt, sum(t.amount) over(order by c.dt) amount
from all_dates d
left join mytable t on t.date = d.dt
order by d.dt
You simply want a window function:
select t.*, sum(amount) over (order by date)
from table t

Insert the table data based on grouping of two columns

I have a oracle table with the following format,
For eg:
JLID Dcode SID TDT QTY
8295783 3119255 9842 3/5/2018 14
8269771 3119255 9842 3/6/2018 11
8302211 3119255 1126 3/1/2018 19
Here I have different SID for the same Dcode, now I need to get the SID with the maximum Qty. (i.e) for SID 9842 - (14+11)=25, for SID 1126 it is 19, then the results should be on SID 9842. So, our query should returns the following results
JLID Dcode START_DT END_DT SID
111 3119255 3/1/2018 3/31/2018 12:00 9842
Startdate and enddate should be calculated from TDT (i.e) start date is the first date of the month and the end date is the last date of the month
Can anyone please suggest me some ideas to do it.
It might be as simple as this:
SELECT Dcode, start_date, end_date, SID FROM (
SELECT Dcode, SID, TRUNC(start_date, 'MONTH') AS start_date
, LAST_DAY(end_date) AS end_date
, ROW_NUMBER() OVER ( PARTITION BY Dcode ORDER BY total_qty DESC ) AS rn
FROM (
SELECT Dcode, SID, MIN(TDT) AS start_date, MAX(TDT) AS end_date
, SUM(QTY) AS total_qty
FROM mytable
GROUP BY Dcode, SID
)
) WHERE rn = 1
In the inner most subquery I aggregation to get the range of dates and total quantity for particular values of Dcode and SID. Then I use an anaylitic (window) function to get the row for which total quantity is the greatest. (You would want to use RANK() in place of ROW_NUMBER() in the event you want to return more than one value of SID with the same quantity.)
Here's one option which doesn't contain JLID = 111 in the final result as I have no idea where you took it from.
SQL> with test (jlid, dcode, sid, tdt, qty) as
2 (select 8295783, 3119255, 9842, date '2018-03-05', 14 from dual union
3 select 8269771, 3119255, 9842, date '2018-08-22', 11 from dual union
4 select 8302211, 3119255, 1126, date '2018-03-01', 19 from dual union
5 --
6 select 1234567, 1112223, 1000, date '2018-06-16', 88 from dual
7 )
8 select dcode,
9 min (trunc (tdt, 'mm')) start_dt, --> MIN
10 max (last_day (tdt)) end_dt, --> MAX
11 sid
12 from (select dcode,
13 sid,
14 tdt,
15 sqty,
16 rank () over (partition by dcode order by sqty desc) rnk
17 from (select dcode,
18 sid,
19 tdt,
20 sum (qty) over (partition by dcode, sid) sqty
21 from test))
22 where rnk = 1
23 group by dcode, sid; --> GROUP BY
DCODE START_DT END_DT SID
---------- ---------------- ---------------- ----------
1112223 01.06.2018 00:00 30.06.2018 00:00 1000
3119255 01.03.2018 00:00 31.08.2018 00:00 9842
SQL>

start date end date combine rows

In Redshift, through SQL script want to consolidate monthly records as long as gap between the end date of first and the start date of the next record is 32 days or less (<=32) into single record with minimum startdate of continuous month as output startdate and maximum of end date of continuous month as output enddate.
The below input data refers to the table's data and also listed the expected output. The input data is listed ORDER BY ID,STARTDT,ENDDT in ASC.
For example, in below table, consider ID 100, the gab between the end of the first record and start of the next record <=32, however gap between the second record end date and third records start date falls more than 32 days, hence the first two records to be consolidate into one record i.e. (ID),MIN(STARTSDT),MAX(ENDDT) which corresponds to first record in the expected output. Similarly gab between 3 and 4 record in the input data falls within the 32 days and thus these 2 records to be consolidated into single records which corresponds to the second record in the expected output.
INPUT DATA:
ID STARTDT ENDDT
100 2000-01-01 2000-01-31
100 2000-02-01 2000-02-29
100 2000-05-01 2000-05-31
100 2000-06-01 2000-06-30
100 2000-09-01 2000-09-30
100 2000-10-01 2000-10-31
101 2012-06-01 2012-06-30
101 2012-07-01 2012-07-31
102 2000-01-01 2000-01-31
103 2013-03-01 2013-03-31
103 2013-05-01 2013-05-31
EXPECTED OUTPUT:
ID MIN_STARTDT MAX_END_DT
100 2000-01-01 2000-02-29
100 2000-05-01 2000-06-30
100 2000-09-01 2000-10-31
101 2012-06-01 2012-07-31
102 2000-01-01 2000-01-31
103 2013-03-01 2013-03-31
103 2013-05-01 2013-05-31
You can do this in steps:
Use a join to identify where two adjacent records should be combined.
Then do a cumulative sum to assign all such adjacent records a grouping identifier.
Aggregate.
It looks like:
select id, min(startdt), max(enddte)
from (select t.*,
count(case when tprev.id is null then 1 else 0 end) over
(partition by t.idid
order by t.startdt
rows between unbounded preceding and current row
) as grp
from t left join
t tprev
on t.id = tprev.id and
t.startdt = tprev.enddt + interval '1 day'
) t
group by id, grp;
The question is very similar to this one and my answer is also similar: Fetch rows based on condition
The gist of the idea is to use Window Functions to identify transitions between period (events which are less than 33 days apart), and then do some filtering to remove the rows within the period, and then Window Functions again.
Complete solution:
SELECT
id,
startdt AS period_start,
period_end
FROM (
SELECT
id,
startdt,
enddt,
lead(enddt, 1)
OVER (PARTITION BY id
ORDER BY enddt) AS period_end,
period_boundary
FROM (
SELECT
id,
startdt,
enddt,
CASE WHEN period_switch = 0 AND reverse_period_switch = 1
THEN 'start'
ELSE 'end' END AS period_boundary
FROM (
SELECT
id,
startdt,
enddt,
CASE WHEN datediff(days, enddt, lead(startdt, 1)
OVER (PARTITION BY id
ORDER BY enddt ASC)) > 32
THEN 1
ELSE 0 END AS period_switch,
CASE WHEN datediff(days, lead(enddt, 1)
OVER (PARTITION BY id
ORDER BY enddt DESC), startdt) > 32
THEN 1
ELSE 0 END AS reverse_period_switch
FROM date_test
)
AS sessioned
WHERE period_switch != 0 OR reverse_period_switch != 0
UNION
SELECT -- adding start rows without transition
id,
startdt,
enddt,
'start'
FROM (
SELECT
id,
startdt,
enddt,
row_number()
OVER (PARTITION BY id
ORDER BY enddt ASC) AS row_num
FROM date_test
) AS with_row_number
WHERE row_num = 1
UNION
SELECT -- adding end rows without transition
id,
startdt,
enddt,
'end'
FROM (
SELECT
id,
startdt,
enddt,
row_number()
OVER (PARTITION BY id
ORDER BY enddt desc) AS row_num
FROM date_test
) AS with_row_number
WHERE row_num = 1
) AS with_boundary -- data set containing start/end boundaries
) AS with_end -- data set where end date is propagated into the start row of the period
WHERE period_boundary = 'start'
ORDER BY id, startdt ASC;
Note that in your expected output, you had a row for 103 2013-05-01 2013-05-31, however its start date is 31 days apart from end date of the previous row, so this row should instead be merged with the previous row for id 103 according to your requirements.
So the output that I get looks like this:
id start end
100 2000-01-01 2000-02-29
100 2000-05-01 2000-06-30
100 2000-09-01 2000-10-31
101 2012-06-01 2012-07-31
102 2000-01-01 2000-01-31
103 2013-03-01 2013-05-31