Grouping based on start date matching the previous row's end date SQL - sql

Hoping someone can help me out with this problem.
I have the following sample dataset:
MEM_ID
CLM_ID
ADM_DT
DCHG_DT
1
111
01-01-2020
02-01-2020
1
112
03-01-2020
04-01-2020
1
113
04-01-2020
05-01-2020
1
114
06-01-2020
07-01-2020
2
211
01-01-2020
02-01-2020
2
212
05-01-2020
08-01-2020
3
311
02-01-2020
03-01-2020
3
312
03-01-2020
05-01-2020
3
313
05-01-2020
06-01-2020
3
314
07-01-2020
08-01-2020
I am trying to create groupings based on MEM_ID. If a ADM_DT is equal to the previous DCHG_DT then the records should be grouped together
Below is the expected output:
MEM_ID
CLM_ID
ADM_DT
DCHG_DT
GROUP_ID
1
111
01-01-2020
02-01-2020
1
1
112
03-01-2020
04-01-2020
2
1
113
04-01-2020
05-01-2020
2
1
114
06-01-2020
07-01-2020
3
2
211
01-01-2020
02-01-2020
1
2
212
05-01-2020
08-01-2020
2
3
311
02-01-2020
03-01-2020
1
3
312
03-01-2020
05-01-2020
1
3
313
05-01-2020
06-01-2020
1
3
314
07-01-2020
08-01-2020
2
I have attempted the following:
select DISTINCT MEM_ID
,CLM_ID
,ADM_DT
,DCHG_DT
,CASE WHEN ADM_DT = LAG(DCHG_DT) OVER(PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT) THEN 0 ELSE 1 END AS ISSTART
FROM
table
Which produces something like this:
MEM_ID
CLM_ID
ADM_DT
DCHG_DT
ISSTART
1
111
01-01-2020
02-01-2020
1
1
112
03-01-2020
04-01-2020
1
1
113
04-01-2020
05-01-2020
0
1
114
06-01-2020
07-01-2020
1
2
211
01-01-2020
02-01-2020
1
2
212
05-01-2020
08-01-2020
1
3
311
02-01-2020
03-01-2020
1
3
312
03-01-2020
05-01-2020
0
3
313
05-01-2020
06-01-2020
0
3
314
07-01-2020
08-01-2020
1
I have also looked into other external sources such as https://www.kodyaz.com/t-sql/sql-query-for-overlapping-time-periods-on-sql-server.aspx
This got me pretty close but I realized that the author was using a recursive CTE and Netezza does not support that function.
Ultimately I would like to create these groupings so that i can then merge to the original table that I am using and sum values based on the assigned group for each MEM_ID.
Thank you in advance for any help provided.

Try this:
select MEM_ID, CLM_ID, ADM_DT, DCHG_DT,
sum(ISSTART) over(partition by MEM_ID order by ADM_DT, DCHG_DT rows unbounded preceding) as GROUP_ID from
(select MEM_ID
,CLM_ID
,ADM_DT
,DCHG_DT
,CASE WHEN ADM_DT = LAG(DCHG_DT) OVER(PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT) THEN 0 ELSE 1 END AS ISSTART
FROM
table_name) t
Fiddle
Basically using your ISSTART in a sum to get the desired output.

Related

Getting the first and the last id grouped by the date and the user

I have two tables, jobs and users
The example structure from jobs is
id
created_at
444
2022-12-12 08:00:00
333
2022-12-12 09:00:00
222
2022-12-12 10:00:00
555
2022-12-12 07:00:00
111
2022-12-12 12:00:00
888
2022-12-12 08:00:00
and users
id
user_id
job_id
1
2
111
2
1
222
3
1
333
4
1
444
5
2
555
6
2
888
I need to get the first and last job id for each day for each user in the same row. So the result should look something like this.
user_id
date
first_id
last_id
1
2022-12-12
444
222
2
2022-12-12
555
111
select distinct u.user_id
,date(created_at) as date
,first_value(j.id) over(partition by user_id, date(created_at) order by created_at) as first_id
,first_value(j.id) over(partition by user_id, date(created_at) order by created_at desc) as last_id
from jobs j join users u on u.job_id = j.id
user_id
date
first_id
last_id
2
2022-12-12
555
111
1
2022-12-12
444
222
Fiddle

Finding most recent startdate, and endDate from consecutive dates

I have a table like below:
user_id
store_id
stock
date
116
2
0
2021-10-18
116
2
0
2021-10-19
116
2
0
2021-10-20
116
2
0
2021-08-16
116
2
0
2021-08-15
116
2
0
2021-07-04
116
2
0
,2021-07-03
389
2
0
2021-07-02
389
2
0
2021-07-01
389
2
0
2021-10-27
52
6
0
2021-10-28
52
6
0
2021-10-29
52
6
0
2021-10-30
116
38
0
2021-05-02
116
38
0
2021-05-03
116
38
0
2021-05-04
116
38
0
2021-04-06
The table can have multiple consecutive days where a product ran out of stock, so I'd like to create a query with the last startDate and endDate where the product ran out of stock. For the table above, the results have to be:
user_Id
store_id
startDate
endDate
116
2
2021-10-18
2021-10-20
116
38
2021-05-02
2021-05-04
389
2
2021-07-01
2021-07-02
52
6
2021-10-28
2021-10-30
I have tried the solution with row_number(), but it didn't work. Does someone have a tip or idea to solve this problem with SQL (PostgreSQL)?
here is how you can do it :
select user_id, store_id,min(date) startdate,max(date) enddate
from (
select *, rank() over (partition by user_id, store_id order by grp desc) rn from (
select *, date - row_number() over (partition by user_id,store_id order by date) * interval '1 day' grp
from tablename
) t) t where rn = 1
group by user_id, store_id,grp
db<>fiddle here

Selecting first element in Group by object Postgres

I have the following table and I want to get the specidic Amount per loan_ID that corresponds to the earliest observation with greater than or equal to 10 dpd per month.
Loan_ID date dpd Amount
1 1/1/2017 1 55
1 1/2/2017 2 100
1 1/3/2017 3 5000
1 1/4/2017 5 6000
1 1/5/2017 10 50000
1 1/6/2017 15 50001
1 1/9/2017 31 50004
1 1/10/2017 55 50005
1 1/11/2017 59 50006
1 1/12/2017 65 50007
1 1/13/2017 70 80000
1 1/20/2017 85 900000
1 1/29/2017 92 100000
1 1/30/2017 93 10000
2 1/1/2017 0 522
2 1/2/2017 8 5444
2 1/3/2017 12 8784
2 1/6/2017 15 6221
2 1/12/2017 18 2220
2 1/13/2017 20 177
2 1/29/2017 35 5151
2 1/30/2017 60 40000
2 1/31/2017 61 5500
The expected output:
Loan_ID Month Amount
1 1 50000
2 1 8784
SELECT DISTINCT ON ("Loan_ID", date_trunc('month', "date"))
"Loan_ID",
date_trunc('month', "date")::date as month,
"Amount"
FROM
loans
WHERE
dpd >= 10
ORDER BY
"Loan_ID",
date_trunc('month', "date"),
"date"
;
Returns:
Loan_ID
month
Amount
1
2017-01-01
50000
2
2017-01-01
8784
You can find test case in db<>fiddle
Hmmm . . . if you want the amount per month and the first date that matches the condition, then you want conditional aggregation:
select loan_id, date_trunc('month', date) as mon,
sum(dpd),
min(case when dpd >= 10 then dpd end) as first_dpd_10
from t
group by load_id, mon;
Edit: Based on your comment, you can use distinct on:
select distinct on (loan_id, date_trunc('month', date)) t.*
min(case when dpd >= 10 then dpd end) as first_dpd_10
from t
where dpd >= 10
order by load_id, date_trunc('month', date), date

Count median days per ID between one zero and the first transaction after the last zero in a running balance

I have a running balance sheet showing customer balances after inflows and (outflows) by date. It looks something like this:
ID DATE AMOUNT RUNNING AMOUNT
-- ---------------- ------- --------------
10 27/06/2019 14:30 100 100
10 29/06/2019 15:26 -100 0
10 03/07/2019 01:56 83 83
10 04/07/2019 17:53 15 98
10 05/07/2019 15:09 -98 0
10 05/07/2019 15:53 98.98 98.98
10 05/07/2019 19:54 -98.98 0
10 07/07/2019 01:36 90.97 90.97
10 07/07/2019 13:02 -90.97 0
10 07/07/2019 16:32 39.88 39.88
10 08/07/2019 13:41 50 89.88
20 08/01/2019 09:03 890.97 890.97
20 09/01/2019 14:47 -91.09 799.88
20 09/01/2019 14:53 100 899.88
20 09/01/2019 14:59 -399 500.88
20 09/01/2019 18:24 311 811.88
20 09/01/2019 23:25 50 861.88
20 10/01/2019 16:18 -861.88 0
20 12/01/2019 16:46 894.49 894.49
20 25/01/2019 05:40 -871.05 23.44
I have attempted using lag() but I seem not to understand how to use it yet.
SELECT ID, MEDIAN(DIFF) MEDIAN_AGE
FROM
(
SELECT *, DATEDIFF(day, Lag(DATE, 1) OVER(ORDER BY ID), DATE
)AS DIFF
FROM TABLE 1
WHERE RUNNING AMOUNT = 0
)
GROUP BY ID;
The expected result would be:
ID MEDIAN_AGE
-- ----------
10 1
20 2
Please help in writing out the query that gives the expected result.
As already pointed out, you are using syntax that isn't valid for Oracle, including functions that don't exist and column names that aren't allowed.
You seem to want to calculate the number of days between a zero running-amount and the following non-zero running-amount; lead() is probably easier than lag() here, and you can use a case expression to only calculate it when needed:
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table;
ID DATE_ AMOUNT RUNNING_AMOUNT DIFF
---------- -------------------- ---------- -------------- ----------
10 2019-06-27 14:30:00 100 100
10 2019-06-29 15:26:00 -100 0 3.4375
10 2019-07-03 01:56:00 83 83
10 2019-07-04 17:53:00 15 98
10 2019-07-05 15:09:00 -98 0 .0305555556
10 2019-07-05 15:53:00 98.98 98.98
10 2019-07-05 19:54:00 -98.98 0 1.2375
10 2019-07-07 01:36:00 90.97 90.97
10 2019-07-07 13:02:00 -90.97 0 .145833333
10 2019-07-07 16:32:00 39.88 39.88
10 2019-07-08 13:41:00 50 89.88
20 2019-01-08 09:03:00 890.97 890.97
20 2019-01-09 14:47:00 -91.09 799.88
20 2019-01-09 14:53:00 100 899.88
20 2019-01-09 14:59:00 -399 500.88
20 2019-01-09 18:24:00 311 811.88
20 2019-01-09 23:25:00 50 861.88
20 2019-01-10 16:18:00 -861.88 0 2.01944444
20 2019-01-12 16:46:00 894.49 894.49
20 2019-01-25 05:40:00 -871.05 23.44
Then use the median() function, rounding if desired to get your expected result:
select id, median(diff) as median_age, round(median(diff)) as median_age_rounded
from (
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table
)
group by id;
ID MEDIAN_AGE MEDIAN_AGE_ROUNDED
---------- ---------- ------------------
10 .691666667 1
20 2.01944444 2
db<>fiddle

Grouping into series based on days since

I need to create a new grouping every time I have a period of more than 60 days since my previous record.
Basically, I need too take the data I have here:
RowNo StartDate StopDate DaysBetween
1 3/21/2017 3/21/2017 14
2 4/4/2017 4/4/2017 14
3 4/18/2017 4/18/2017 14
4 6/23/2017 6/23/2017 66
5 7/5/2017 7/5/2017 12
6 7/19/2017 7/19/2017 14
7 9/27/2017 9/27/2017 70
8 10/24/2017 10/24/2017 27
9 10/31/2017 10/31/2017 7
10 11/14/2017 11/14/2017 14
And turn it into this:
RowNo StartDate StopDate DaysBetween Series
1 3/21/2017 3/21/2017 14 1
2 4/4/2017 4/4/2017 14 1
3 4/18/2017 4/18/2017 14 1
4 6/23/2017 6/23/2017 66 2
5 7/5/2017 7/5/2017 12 2
6 7/19/2017 7/19/2017 14 2
7 9/27/2017 9/27/2017 70 3
8 10/24/2017 10/24/2017 27 3
9 10/31/2017 10/31/2017 7 3
10 11/14/2017 11/14/2017 14 3
Once I have that I'll group by Series and get the min(StartDate) and max(StopDate) for individual durations.
I could do this using a cursor but I'm sure someone much smarter than me has figured out a more elegant solution. Thanks in advance!
You can use the window function sum() over with a conditional FLAG
Example
Select *
,Series= 1+sum(case when [DaysBetween]>60 then 1 else 0 end) over (Order by RowNo)
From YourTable
Returns
RowNo StartDate StopDate DaysBetween Series
1 2017-03-21 2017-03-21 14 1
2 2017-04-04 2017-04-04 14 1
3 2017-04-18 2017-04-18 14 1
4 2017-06-23 2017-06-23 66 2
5 2017-07-05 2017-07-05 12 2
6 2017-07-19 2017-07-19 14 2
7 2017-09-27 2017-09-27 70 3
8 2017-10-24 2017-10-24 27 3
9 2017-10-31 2017-10-31 7 3
10 2017-11-14 2017-11-14 14 3
EDIT - 2008 Version
Select A.*
,B.*
From YourTable A
Cross Apply (
Select Series=1+sum( case when [DaysBetween]>60 then 1 else 0 end)
From YourTable
Where RowNo <= A.RowNo
) B