Conditional counting of late payments monthly, including reset to zero - sql

I have a table with the following structure:
EOM ID Principal Pay_plan cum_Payments
2019-12-31 AY4525 25000.000000 796.000000 936.000000
2020-01-31 AY4525 25000.000000 1592.000000 936.000000
2020-02-29 AY4525 25000.000000 2388.000000 936.000000
2020-03-31 AY4525 25000.000000 3184.000000 3184.00000
2020-04-30 AY4525 25000.000000 3980.000000 3980.00000
2020-05-31 AY4525 25000.000000 4776.000000 3980.00000
2020-06-30 AY4525 25000.000000 5572.000000 3980.00000
2020-04-30 KD4525 35000.000000 500.000000 500.000000
2020-05-31 KD4525 35000.000000 1000.000000 1000.00000
2020-06-30 KD4525 35000.000000 1500.000000 1000.00000
2020-07-31 KD4525 35000.000000 2000.000000 2500.00000
So I have a cumulative payment plan and cumulative payments for unique client IDs per month. Now I want to add a column which starts counting months that the client is late with payments, hence when pay_plan > cum_payments:
EOM ID Principal Pay_plan cum_Payments months_Late
2019-12-31 AY4525 25000.000000 796.000000 936.000000 0
2020-01-31 AY4525 25000.000000 1592.000000 936.000000 1
2020-02-29 AY4525 25000.000000 2388.000000 936.000000 2
2020-03-31 AY4525 25000.000000 3184.000000 3184.00000 0
2020-04-30 AY4525 25000.000000 3980.000000 3980.00000 0
2020-05-31 AY4525 25000.000000 4776.000000 3980.00000 1
2020-06-30 AY4525 25000.000000 5572.000000 3980.00000 2
2020-04-30 KD4525 35000.000000 500.000000 500.000000 0
2020-05-31 KD4525 35000.000000 1000.000000 1000.00000 0
2020-06-30 KD4525 35000.000000 1500.000000 1000.00000 1
2020-07-31 KD4525 35000.000000 2000.000000 2500.00000 0
The counter must be reset when pay_plan = cum_payments again. I have tried numerous ways of doing this with OVER(), but not found a solid solution. Anyone got an idea how to solve this?

This is a gaps-and-islands problem. The islands are when the cumulative amount is less than the planned amount. So, you can use a cumulative sum to define the islands and then row_number():
select t.*,
(case when cum_payments >= pay_plan then 0
else row_number() over (partition by id, grp order by eom) - 1
end) as months_late
from (select t.*,
sum(case when cum_payments >= pay_plan then 1 else 0 end) over (partition by id order by eom) as grp
from t
) t;
Here is a db<>fiddle.
You can handle the situation where the first payment is late by using:
select t.*,
(case when cum_payments >= pay_plan then 0
else row_number() over (partition by id, grp order by eom) - 1 +
(case when min(eom) over (partition by id) = min(eom) over (partition by id, grp) and
first_value(cum_payments) over (partition by id, grp order by eom) < first_value(pay_plan) over (partition by id, grp order by eom)
then 1 else 0
end)
end) as months_late
from (select t.*,
SUM(case when cum_payments >= pay_plan then 1 else 0 end) over (partition by id order by eom) as grp
from t
) t
I actually left this logic out of the above, because it seems inelegant. There might be a better solution, but it does not readily occur to me. Here is the revised db<>fiddle.

If I understand your logic correct, Months_late value for your last row should be 0 and if it is correct, you can use this below logic to achieve your requirement-
Demo Here
Let cum_Payments will always increase or same per day for a specific ID
SELECT *,
(
SELECT COUNT(*)
FROM your_table B
WHERE B.cum_Payments = A.cum_Payments
AND B.EOM < A.EOM
AND B.ID = A.ID
) Months_late
FROM your_table A
ORDER BY ID,EOM
This following query will return the exact result as you are looking for. It is true that the query is bit heavy for a table with huge data but it is acceptable if you have one time use of this query. In case or more use, you can think about creating a view for improving query execution performance.
Demo2 Here
SELECT *,
(
SELECT count(1) FROM your_table b
WHERE b.id = a.id AND b.eom <= a.eom
AND b.eom >
(
ISNULL
(
(
SELECT MAX(eom) FROM your_table c
WHERE c.id = a.id AND (c.pay_plan - c.cum_payments) = 0
AND c.eom <= a.eom
)
,
(
SELECT MIN(eom) FROM your_table d
WHERE d.id = a.id
)
)
)
)
FROM your_table a
ORDER BY id, eom

Related

Select the first record after the last but one X

I'm trying to get the first BEG_PERIOD date immediately after the last but one record of X (DEF_ENDING) of each user (USER_ID).
So I have this:
USER_ID
BEG_PERIOD
END_PERIOD
DEF_ENDING
159
01-07-2022
31-07-2022
X
159
25-09-2022
15-10-2022
X
159
01-11-2022
13-11-2022
159
14-11-2022
21-12-2022
X
159
01-01-2023
30-01-2023
X
414
01-04-2022
31-05-2022
X
414
01-07-2022
30-09-2022
414
01-10-2022
01-12-2022
X
480
01-07-2022
30-06-2022
480
01-07-2022
30-08-2022
X
480
02-09-2022
01-11-2022
X
503
15-03-2022
16-06-2022
X
503
19-07-2022
23-07-2022
503
24-07-2022
31-10-2022
503
01-11-2022
21-12-2022
X
The dates I need are the ones in bold
Can you help me?
I tried this but I only get the latest dates :(
SELECT
p.USER_ID,
p.BEG_PERIOD
FROM
PERIODS p
INNER JOIN PERIODS p2 ON
p.USER_ID = p2.USER_ID
AND
p.BEG_PERIOD = (
SELECT
MAX( BEG_PERIOD )
FROM
PERIODS
WHERE
PERIODS.USER_ID = p.USER_ID
)
WHERE
p.USER_ID > 10
This should work based on the sample data:
with data as (
select *,
sum(case when DEF_ENDING = 'X' then 1 end)
over (partition by USER_ID order by BEG_PERIOD desc) as grp
from PERIODS
)
select
USER_ID,
min(BEG_PERIOD) as BEG_PERIOD,
min(END_PERIOD) as END_PERIOD,
min(DEF_ENDING) as DEF_ENDING
from data
where grp = 1
group by USER_ID;
If you can't rely on the two dates being minimums then:
with data as (
select *,
sum(case when DEF_ENDING = 'X' then 1 end)
over (partition by USER_ID order by BEG_PERIOD desc) as grp
from PERIODS
), data2 as (
select *,
row_number() over (partition by USER_ID order by BEG_PERIOD) as rn
from data
where grp = 1
)
select *
from data2
where rn = 1;
This can also be done entirely via subqueries if that's more appropriate at the level of your class:
select USER_ID, min(BEG_PERIOD), min(END_PERIOD), min(DEF_ENDING)
from periods p1
where p1.BEG_PERIOD > (
select max(BEG_PERIOD)
from periods p2
where p2.USER_ID = p1.USER_ID and p2.DEF_ENDING = 'X'
and exists (
select 1
from periods p3
where p3.USER_ID = p2.USER_ID and p3.DEF_ENDING = 'X'
and p3.BEG_PERIOD > p2.BEG_PERIOD
)
)
group by USER_ID;
Try the following using the ROW_NUMBER and `LAG' window functions:
/* this to assign row numbers only for rows where def_ending = 'X' */
with order_def_ending as
(
select *,
case def_ending when 'X' then
row_number() over (partition by user_id order by
case def_ending when 'X' then 1 else 2 end,
end_period desc)
else null end rn,
lag(def_ending, 1, def_ending) over (partition by user_id order by end_period) pde /* previous end_period value */
from yourTbl
),
lag_rn as
(
select *,
lag(rn) over (partition by user_id order by end_period) prn /* previous row_number value */
from order_def_ending
)
select user_id, beg_period, end_period, def_ending
from lag_rn
where (
prn = 2 or /* when there are multiple rows with def_ending = 'X' */
(prn = 1 and rn is null) /* when there is only one row with def_ending = 'X' */
) and pde = 'X' /* ensure that the previous value of def_ending is = 'X' */
order by user_id, end_period
See demo
I think, this works on SQL server 2008
with periods as(
select USER_ID, cast(BEG_PERIOD as date)BEG_PERIOD,cast(END_PERIOD as date)END_PERIOD,DEF_ENDING
from (values
(159,'01-07-2022','31-07-2022','X')
,(159,'25-09-2022','15-10-2022','X')
,(159,'01-11-2022','13-11-2022',null)
,(159,'14-11-2022','21-12-2022','X')
,(159,'01-01-2023','30-01-2023','X')
,(414,'01-04-2022','31-05-2022','X')
,(414,'01-07-2022','30-09-2022',null)
,(414,'01-10-2022','01-12-2022','X')
,(480,'01-07-2022','30-06-2022',null)
,(480,'01-07-2022','30-08-2022','X')
,(480,'02-09-2022','01-11-2022','X')
,(503,'15-03-2022','16-06-2022','X')
,(503,'19-07-2022','23-07-2022',null)
,(503,'24-07-2022','31-10-2022',null)
,(503,'01-11-2022','21-12-2022','X')
)t(USER_ID, BEG_PERIOD, END_PERIOD, DEF_ENDING)
)
,cte as (
select *
,(select sum(case when def_ending='X' then 1 else 0 end)
from periods t2 where t2.user_id=t1.USER_ID and t2.BEG_PERIOD>=t1.BEG_PERIOD
) N -- last but one has N=2, all next N=1 (reverse order of counts)
from periods t1
)
select *
,(select min(t2.BEG_PERIOD)
from cte t2 where t2.user_id=t1.USER_ID and t2.N=1
) LastButOne -- first after last but one with N=1
from cte t1
Result
USER_ID
BEG_PERIOD
END_PERIOD
DEF_ENDING
N
LastButOne
159
2022-07-01
2022-07-31
X
4
2023-01-01
159
2022-09-25
2022-10-15
X
3
2023-01-01
159
2022-11-01
2022-11-13
NULL
2
2023-01-01
159
2022-11-14
2022-12-21
X
2
2023-01-01
159
2023-01-01
2023-01-30
X
1
2023-01-01
414
2022-04-01
2022-05-31
X
2
2022-07-01
414
2022-07-01
2022-09-30
NULL
1
2022-07-01
414
2022-10-01
2022-12-01
X
1
2022-07-01
480
2022-07-01
2022-06-30
NULL
2
2022-09-02
480
2022-07-01
2022-08-30
X
2
2022-09-02
480
2022-09-02
2022-11-01
X
1
2022-09-02
503
2022-03-15
2022-06-16
X
2
2022-07-19
503
2022-07-19
2022-07-23
NULL
1
2022-07-19
503
2022-07-24
2022-10-31
NULL
1
2022-07-19
503
2022-11-01
2022-12-21
X
1
2022-07-19
About Parallel Data Warehouse,
as mentioned here, Non-PDW versions of SQL Server before 2012 do not support the ORDER BY clause with aggregate functions like MIN.
Windowing function support was considerably extended in 2012, compared with the basic implementation available starting with SQL Server 2005. The extensions were made available in Parallel Data Warehouse before being incorporated in the box product.

Propagate missing dates in teradata - select query

I have a table that looks like this:
my_date
item_id.
sales
2020-03-01
GMZS72429
2
2020-03-07
GMZS72429
2
2020-03-09
GMZS72429
1
2020-03-04
GMZS72425
1
And I want it to look like this
my_date
item_id
sales
2020-03-01
GMZS72429
2
2020-03-02
GMZS72429
0
...
...
...
2020-03-05
GMZS72429
0
2020-03-06
GMZS72429
0
2020-03-07
GMZS72429
2
2020-03-08
GMZS72429
0
2020-03-09
GMZS72429
1
2020-03-01
GMZS72425
0
2020-03-02
GMZS72425
0
2020-03-03
GMZS72425
0
2020-03-04
GMZS72425
1
...
...
...
2020-03-09
GMZS72425
0
Since I was struggling with the documentation from Teradata, I have tried generating the pair item_id - my_date using another table, followed by a left join:
with a1 as(
select distinct my_date, item_id from some_table_with_the_item_ids_and_all_dates
)
select a1.my_date, a1.item_id, coalesce(sales, 0) as sales
from a1 left join my_table on a1.item_id=my_table.item_id and a1.my_date=my_table.my_date;
This worked but it is terribly slow, and ugly. I was wondering if there is a better built-in (or alternative) method to do this. Thanks
This is a use case for Teradata's EXPAND ON syntax:
select
new_date
,item_id
,case when my_date = new_date then sales else 0 end
from
(
select dt.*, begin(p2) as new_date
from
(
select t.*
-- create a period for expansion in the next step
,period(my_date, lead(my_date, 1, my_date+1)
over (partition by item_id
order by my_date)) as pd
from vt as t
) as dt
-- now create the missing dates
expand on pd as p2
) as dt
One simple option is to use Teradata's built in date view as your driver:
select
coalesce(v.my_date,c.calendar_date),
item_id,
coalesce(v.sales,0)
from
sys_calendar.calendar c
left join your_table v
on v.my_date = c.calendar_date
where
c.calendar_date between (select min(my_date) from your_table ) and (select max(my_date) from your_table)
order by 1

Add first and last date of a sequence

I am working on a database which have a huge collection of rows. I want to update it so repeated records will be deleted. Now, I have a date column in table and I want to convert it into startDate and endDate. Please check:
id | date | price | minutes | prefixId | sellerId | routeTypeId
1234 2020-01-01 0.123 0 1 1 1
1235 2020-01-04 0.123 0 1 1 1
1236 2020-01-05 0.123 123 1 1 1
1237 2020-01-06 0.123 31 1 1 1
1238 2020-01-07 0.123 23 1 1 1
1239 2020-01-08 0.130 41 1 2 1
1240 2020-01-09 0.130 0 1 1 1
What I am looking for is:
id | startDate | endDate | price | minutes | prefixId | sellerId | routeTypeId
1234 2020-01-01 2020-01-01 0.123 0 1 1 1
1235 2020-01-04 2020-01-07 0.123 0 1 1 1
1239 2020-01-08 2020-01-08 0.130 41 1 2 1
1240 2020-01-09 2020-01-09 0.130 0 1 2 2
Dates will be considered in a series if price, prefixId, sellerId, routeTypeId will remain same with previous row and date column generates a series (without any gap between dates. So, 2020-01-01, 2020-01-2, 2020-01-10 are two different series for example)
This is a gaps-and-islands problem. You can use lag() and a cumulative sum:
select price, prefixId, sellerId, routeTypeId,
min(minutes),
min(date), max(date)
from (select t.*,
sum(case when prev_date = date - interval '1 day' then 0 else 1 end) over (order by date) as grp
from (select t.*,
lag(date) over (partition by price, prefixId, sellerId, routeTypeId order by date) as prev_date
from t
) t
) t
group by grp, price, prefixId, sellerId, routeTypeId
This is a "Gaps & Islands" problem. You can do it using:
select
min(id) as id,
min(date) as start_date,
max(date) as end_date,
min(price) as price,
...
from (
select *,
sum(inc) over(order by id) as grp
from (
select *,
case when price = lag(price) over(order by id)
and date = lag(date) over(
partition by price, prefixId, sellerId, routeTypeId
order by id)
+ interval '1 day'
then 0 else 1 end as inc
from t
) x
) y
group by grp

Matching previous and current records WITH and WITHOUT LAG and LEAD

I have table like below. Records do not have any primary key and I want to achieve it both WITH and WITHOUT LAG and LEAD function.
ID ENTID INOUTDATE YEAR MONTH STATUS
1923 1923 [NULL] 2099 12 Out
1923 10690 [NULL] 2099 12 Out
1923 9670 2012-08-24 00:00:00 2012 8 In
1923 1923 2013-06-01 00:00:00 2013 6 In
1923 9670 2018-04-19 00:00:00 2018 4 Out
1923 10690 2019-02-01 00:00:00 2019 2 In
And I want to get the records as per below.
ID ENTID INOUTDATE YEAR MONTH STATUS
1923 10690 [NULL] 2099 12 Out
1923 9670 2012-08-24 00:00:00 2012 8 In
1923 9670 2018-04-19 00:00:00 2018 4 Out
1923 10690 2019-02-01 00:00:00 2019 2 In
lag() is the simplest method:
select t.*
from (select t.*,
lag(status) over (partition by id, (case when inoutdate is null then 1 else 2 end)
order by inoutdate
) as prev_status
from t
) t
where prev_status is null or prev_status <> status;
You can treat this as a group-and-islands problem, identifying the islands using row_number(). The logic is more complicated:
select t.*
from (select t.*,
row_number() over (partition by id, (case when inoutdate is null then 1 else 2 end), status, (seqnum - seqnum_s)
order by inoutdate
) as seqnum_g
from (select t.*,
row_number() over (partition by id, (case when inoutdate is null then 1 else 2 end) order by inoutdate) as seqnum,
row_number() over (partition by id, (case when inoutdate is null then 1 else 2 end), status order by inoutdate) as seqnum_s
from t
) t
) t
where seqnum_g = 1;

calculate date difference between two dates in the same column

Hi I have a input table format like.
ID1 ID2 date
1002 9648 2011-01-02
1003 9648 2011-06-06
1004 9648 2012-08-08
1005 9648 2016-01-06
1006 9648 2016-09-12
1007 9648 2018-01-22
1009 9744 2009-10-03
1010 9744 2012-01-10
1011 9744 2016-09-23
1012 9744 2017-10-25
1013 9923 2006-10-10
1014 10124 2017-10-11
1015 10124 2018-01-24
am looking for an output table as shown below.
Could you please help me with a sql query or how it can be achieved in talend.
if no of days between two dates exceed 1096 which is approximately 3 years I want to take it as zero and update the type to new.
ID1 ID2 date daysdifference type
1002 9648 2011-01-02 0 new
1003 9648 2011-06-06 156 old
1004 9648 2012-08-08 429 old
1005 9648 2016-01-06 0 new
1006 9648 2016-09-12 250 old
1007 9648 2018-01-22 497 old
1009 9744 2009-10-03 0 new
1010 9744 2012-01-10 829 old
1011 9744 2016-09-23 0 new
1012 9744 2017-10-25 397 old
1013 9923 2006-10-10 0 new
1014 10124 2017-10-11 0 new
1015 10124 2018-01-24 91 old
Thanks,
Ankush Reddy.
You can use LAG function to get the desired result you want. Basically, LAG function access the data of the previous row. From there, you can now calculate the difference between the previous date and the current date using DATEDIFF function. However, this function exist starting version 2012.
SELECT ID1, ID2, Date,
DaysDifference = CASE WHEN PreviousDate IS NULL THEN 0 ELSE
CASE WHEN Date_Diff < 1096 THEN Date_Diff ELSE 0 END END,
CASE WHEN PreviousDate IS NULL THEN 'NEW' ELSE
CASE WHEN Date_Diff < 1096 THEN 'OLD' ELSE 'NEW' END END
FROM
(
SELECT *,
LAG(date, 1,NULL) OVER (PARTITION BY ID2 ORDER BY ID1) AS PreviousDate ,
DATEDIFF(DAY, LAG(date, 1,0) OVER (PARTITION BY ID2 ORDER BY ID1), date) Date_Diff
FROM TableA
) a
Here's a Demo.
However, if you are using version below 2012, you can still have the result you want using ROW_NUMBER()
;WITH rows AS
(
SELECT *,
RN = ROW_NUMBER() OVER (PARTITION BY ID2 ORDER BY ID1)
FROM TableA
)
SELECT ID1, ID2, Date,
DaysDifference = CASE WHEN PreviousDate IS NULL THEN 0 ELSE
CASE WHEN Date_Diff < 1096 THEN Date_Diff ELSE 0 END END,
CASE WHEN PreviousDate IS NULL THEN 'NEW' ELSE
CASE WHEN Date_Diff < 1096 THEN 'OLD' ELSE 'NEW' END END
FROM
(
SELECT a.ID1, a.ID2, a.Date, b.Date as PreviousDate,
DATEDIFF(DAY, b.date, a.date) Date_Diff
FROM rows a
LEFT JOIN rows b
ON a.RN = b.RN + 1
AND a.ID2 = b.ID2
) a
ORDER BY ID1, ID2
Here's a Demo.
Try this:
select [id1], [id2], [date],
case when [daysdifference] > 1096 then 0 else [daysdifference] end [daysdifference],
case when [daysdifference] > 1096 then 'new' else 'old' end [type]
from (
select *, abs(isnull(datediff(day, LAG([date], 1) over (order by id2, id1),
[date]), 0)) [daysdifference] from #x
) [a]