Finding most recent startdate, and endDate from consecutive dates - sql

I have a table like below:
user_id
store_id
stock
date
116
2
0
2021-10-18
116
2
0
2021-10-19
116
2
0
2021-10-20
116
2
0
2021-08-16
116
2
0
2021-08-15
116
2
0
2021-07-04
116
2
0
,2021-07-03
389
2
0
2021-07-02
389
2
0
2021-07-01
389
2
0
2021-10-27
52
6
0
2021-10-28
52
6
0
2021-10-29
52
6
0
2021-10-30
116
38
0
2021-05-02
116
38
0
2021-05-03
116
38
0
2021-05-04
116
38
0
2021-04-06
The table can have multiple consecutive days where a product ran out of stock, so I'd like to create a query with the last startDate and endDate where the product ran out of stock. For the table above, the results have to be:
user_Id
store_id
startDate
endDate
116
2
2021-10-18
2021-10-20
116
38
2021-05-02
2021-05-04
389
2
2021-07-01
2021-07-02
52
6
2021-10-28
2021-10-30
I have tried the solution with row_number(), but it didn't work. Does someone have a tip or idea to solve this problem with SQL (PostgreSQL)?

here is how you can do it :
select user_id, store_id,min(date) startdate,max(date) enddate
from (
select *, rank() over (partition by user_id, store_id order by grp desc) rn from (
select *, date - row_number() over (partition by user_id,store_id order by date) * interval '1 day' grp
from tablename
) t) t where rn = 1
group by user_id, store_id,grp
db<>fiddle here

Related

Cumulative sum by month with missing months

I have to cumulative sum by month a quantity but in some months there's no quantity and SQL does not show these rows.
I have tried multiple other solutions I found here but none of them worked or at least I couldn't get them working. Currently, my code is as follows:
SELECT DISTINCT
A.FromDate
,A.ToDate
,A.OperationType
,A.[ItemCode]
,SUM(A.[Quantity]) OVER (PARTITION BY [ItemCode],OperationType,YEAR ORDER BY MONTH) [Quantity]
FROM (
SELECT
CONVERT(DATE,DATEADD(yy, DATEDIFF(yy, 0, T.OrderDate), 0)) AS FromDate
,EOMONTH(T.OrderDate) ToDate
,DATEPART(MONTH, t.OrderDate) AS [Month]
,DATEPART(YEAR, t.OrderDate) AS [Year]
,SUM(T.[Quantity]) [Quantity]
,OperationType
,[ItemCode]
FROM TEST T
WHERE [ItemCode] != ''
GROUP BY T.OrderDate,[ItemCode],OperationType
) A
With these results:
FromDate
ToDate
OType
ItemCode
Quantity
2021-01-01
2021-01-31
Type1
1
19
2021-01-01
2021-02-28
Type1
1
96
2021-01-01
2021-03-31
Type1
1
116
2021-01-01
2021-04-30
Type1
1
138
2021-01-01
2021-06-30
Type1
1
178
2021-01-01
2021-07-31
Type1
1
203
2021-01-01
2021-08-31
Type1
1
228
2021-01-01
2021-09-30
Type1
1
253
2021-01-01
2021-11-30
Type1
1
330
2021-01-01
2021-12-31
Type1
1
364
2022-01-01
2022-02-28
Type1
1
18
2022-01-01
2022-03-31
Type1
1
42
2022-01-01
2022-04-30
Type1
1
53
And I was expecting these results:
FromDate
ToDate
OType
ItemCode
Quantity
2021-01-01
2021-01-31
Type1
1
19
2021-01-01
2021-02-28
Type1
1
96
2021-01-01
2021-03-31
Type1
1
116
2021-01-01
2021-04-30
Type1
1
138
2021-01-01
2021-05-31
Type1
1
138
2021-01-01
2021-06-30
Type1
1
178
2021-01-01
2021-07-31
Type1
1
203
2021-01-01
2021-08-31
Type1
1
228
2021-01-01
2021-09-30
Type1
1
253
2021-01-01
2021-10-31
Type1
1
253
2021-01-01
2021-11-30
Type1
1
330
2021-01-01
2021-12-31
Type1
1
364
2022-01-01
2022-02-28
Type1
1
18
2022-01-01
2022-03-31
Type1
1
42
2022-01-01
2022-04-30
Type1
1
53
SQL Fiddle link: http://www.sqlfiddle.com/#!18/04a997/1
I would really appreciate some help. Thank you
Here is one way:
WITH m(Earliest,Latest) AS
(
SELECT DATEADD(DAY,1,MIN(EOMONTH(OrderDate,-1))),
MAX(EOMONTH(OrderDate)) FROM dbo.TEST
), TypeCodes AS
(
SELECT DISTINCT ItemCode, OperationType
FROM dbo.TEST
), Months AS
(
SELECT Month = DATEADD(MONTH, ROW_NUMBER()
OVER (ORDER BY ##SPID)-1, Earliest)
FROM m CROSS APPLY STRING_SPLIT(REPLICATE(',',
DATEDIFF(MONTH,Earliest,Latest)),',')
), raw AS
(
SELECT m.Month, i.OperationType, i.ItemCode,
Q = COALESCE(SUM(Quantity),0)
FROM Months AS m
CROSS JOIN TypeCodes AS i
LEFT OUTER JOIN dbo.TEST AS t
ON t.OrderDate >= m.Month
AND t.OrderDate < DATEADD(MONTH, 1, m.Month)
AND i.ItemCode = t.ItemCode
AND i.OperationType = t.OperationType
GROUP BY m.Month, i.OperationType, i.ItemCode
)
SELECT FromDate = Month,
ToDate = EOMONTH(Month),
OperationType,
ItemCode,
Quantity = SUM(Q) OVER (ORDER BY Month)
FROM raw;
Working example in this fiddle.
If you can't use STRING_SPLIT() because your database is stuck on an older compatibility level, you could put this function in a database that isn't:
USE ModernDatabase;
GO
CREATE FUNCTION dbo.StringSplit(#list nvarchar(max), #delim nchar(1))
RETURNS TABLE
AS
RETURN (SELECT value FROM STRING_SPLIT(#list, #delim));
Then you change:
FROM m CROSS APPLY STRING_SPLIT(...
To:
FROM m CROSS APPLY ModernDatabase.dbo.StringSplit(...

Selecting first element in Group by object Postgres

I have the following table and I want to get the specidic Amount per loan_ID that corresponds to the earliest observation with greater than or equal to 10 dpd per month.
Loan_ID date dpd Amount
1 1/1/2017 1 55
1 1/2/2017 2 100
1 1/3/2017 3 5000
1 1/4/2017 5 6000
1 1/5/2017 10 50000
1 1/6/2017 15 50001
1 1/9/2017 31 50004
1 1/10/2017 55 50005
1 1/11/2017 59 50006
1 1/12/2017 65 50007
1 1/13/2017 70 80000
1 1/20/2017 85 900000
1 1/29/2017 92 100000
1 1/30/2017 93 10000
2 1/1/2017 0 522
2 1/2/2017 8 5444
2 1/3/2017 12 8784
2 1/6/2017 15 6221
2 1/12/2017 18 2220
2 1/13/2017 20 177
2 1/29/2017 35 5151
2 1/30/2017 60 40000
2 1/31/2017 61 5500
The expected output:
Loan_ID Month Amount
1 1 50000
2 1 8784
SELECT DISTINCT ON ("Loan_ID", date_trunc('month', "date"))
"Loan_ID",
date_trunc('month', "date")::date as month,
"Amount"
FROM
loans
WHERE
dpd >= 10
ORDER BY
"Loan_ID",
date_trunc('month', "date"),
"date"
;
Returns:
Loan_ID
month
Amount
1
2017-01-01
50000
2
2017-01-01
8784
You can find test case in db<>fiddle
Hmmm . . . if you want the amount per month and the first date that matches the condition, then you want conditional aggregation:
select loan_id, date_trunc('month', date) as mon,
sum(dpd),
min(case when dpd >= 10 then dpd end) as first_dpd_10
from t
group by load_id, mon;
Edit: Based on your comment, you can use distinct on:
select distinct on (loan_id, date_trunc('month', date)) t.*
min(case when dpd >= 10 then dpd end) as first_dpd_10
from t
where dpd >= 10
order by load_id, date_trunc('month', date), date

Get the max month from a query that returns several years

I have a table with dates, one date per month (some months will be missing but that is expected) but several years are return. I need to get the latest month only. So if I have data for say months 8, 7, 6, etc. in 2020 then return those startDate. And for months 10, 11, and 12 it should return the StartDate from 2019 or wherever it finds it that is the latest. id and courseLength are part of the table but irrelevant for this task. StartDate is of type date.
This is the top 15 rows of the table
id StartDate courseLength
153 2020-08-31 63
153 2020-07-31 35
153 2020-06-30 60
153 2020-05-31 17
153 2020-03-31 51
153 2020-01-31 59
153 2019-12-31 30
153 2019-10-31 51
153 2019-08-31 59
153 2019-06-30 54
153 2019-05-31 17
153 2019-03-31 56
153 2019-01-31 55
153 2018-12-31 27
153 2018-10-31 54
And this is what I am expecting
id StartDate courseLength
153 2020-08-31 63
153 2020-07-31 35
153 2020-06-30 60
153 2020-05-31 17
153 2020-03-31 51
153 2020-01-31 59
153 2019-12-31 30
153 2019-10-31 51
153 2018-11-30 65
153 2018-09-31 53
153 2019-05-31 17
153 2018-04-30 13
You can use window functions:
select *
from (
select t.*,
row_number() over(partition by id, month(startdate) order by startdate desc) rn
from mytable t
) t
where rn = 1
try with this
SELECT
R.id, R.StartDate, R.courseLength
FROM (
SELECT
id, StartDate, courseLength, RANK() OVER(PARTITION BY MONTH(StartDate) ORDER BY StartDate DESC) as rank
FROM
#t
) R
WHERE
R.rank = 1
or you can use this :
select * from table
join in (
select
max(date) maxdate
, id
from table
group by
month(date) , id
) max
on max.id = table.id
and max.maxdate = table.date

Count median days per ID between one zero and the first transaction after the last zero in a running balance

I have a running balance sheet showing customer balances after inflows and (outflows) by date. It looks something like this:
ID DATE AMOUNT RUNNING AMOUNT
-- ---------------- ------- --------------
10 27/06/2019 14:30 100 100
10 29/06/2019 15:26 -100 0
10 03/07/2019 01:56 83 83
10 04/07/2019 17:53 15 98
10 05/07/2019 15:09 -98 0
10 05/07/2019 15:53 98.98 98.98
10 05/07/2019 19:54 -98.98 0
10 07/07/2019 01:36 90.97 90.97
10 07/07/2019 13:02 -90.97 0
10 07/07/2019 16:32 39.88 39.88
10 08/07/2019 13:41 50 89.88
20 08/01/2019 09:03 890.97 890.97
20 09/01/2019 14:47 -91.09 799.88
20 09/01/2019 14:53 100 899.88
20 09/01/2019 14:59 -399 500.88
20 09/01/2019 18:24 311 811.88
20 09/01/2019 23:25 50 861.88
20 10/01/2019 16:18 -861.88 0
20 12/01/2019 16:46 894.49 894.49
20 25/01/2019 05:40 -871.05 23.44
I have attempted using lag() but I seem not to understand how to use it yet.
SELECT ID, MEDIAN(DIFF) MEDIAN_AGE
FROM
(
SELECT *, DATEDIFF(day, Lag(DATE, 1) OVER(ORDER BY ID), DATE
)AS DIFF
FROM TABLE 1
WHERE RUNNING AMOUNT = 0
)
GROUP BY ID;
The expected result would be:
ID MEDIAN_AGE
-- ----------
10 1
20 2
Please help in writing out the query that gives the expected result.
As already pointed out, you are using syntax that isn't valid for Oracle, including functions that don't exist and column names that aren't allowed.
You seem to want to calculate the number of days between a zero running-amount and the following non-zero running-amount; lead() is probably easier than lag() here, and you can use a case expression to only calculate it when needed:
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table;
ID DATE_ AMOUNT RUNNING_AMOUNT DIFF
---------- -------------------- ---------- -------------- ----------
10 2019-06-27 14:30:00 100 100
10 2019-06-29 15:26:00 -100 0 3.4375
10 2019-07-03 01:56:00 83 83
10 2019-07-04 17:53:00 15 98
10 2019-07-05 15:09:00 -98 0 .0305555556
10 2019-07-05 15:53:00 98.98 98.98
10 2019-07-05 19:54:00 -98.98 0 1.2375
10 2019-07-07 01:36:00 90.97 90.97
10 2019-07-07 13:02:00 -90.97 0 .145833333
10 2019-07-07 16:32:00 39.88 39.88
10 2019-07-08 13:41:00 50 89.88
20 2019-01-08 09:03:00 890.97 890.97
20 2019-01-09 14:47:00 -91.09 799.88
20 2019-01-09 14:53:00 100 899.88
20 2019-01-09 14:59:00 -399 500.88
20 2019-01-09 18:24:00 311 811.88
20 2019-01-09 23:25:00 50 861.88
20 2019-01-10 16:18:00 -861.88 0 2.01944444
20 2019-01-12 16:46:00 894.49 894.49
20 2019-01-25 05:40:00 -871.05 23.44
Then use the median() function, rounding if desired to get your expected result:
select id, median(diff) as median_age, round(median(diff)) as median_age_rounded
from (
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table
)
group by id;
ID MEDIAN_AGE MEDIAN_AGE_ROUNDED
---------- ---------- ------------------
10 .691666667 1
20 2.01944444 2
db<>fiddle

Grouping into series based on days since

I need to create a new grouping every time I have a period of more than 60 days since my previous record.
Basically, I need too take the data I have here:
RowNo StartDate StopDate DaysBetween
1 3/21/2017 3/21/2017 14
2 4/4/2017 4/4/2017 14
3 4/18/2017 4/18/2017 14
4 6/23/2017 6/23/2017 66
5 7/5/2017 7/5/2017 12
6 7/19/2017 7/19/2017 14
7 9/27/2017 9/27/2017 70
8 10/24/2017 10/24/2017 27
9 10/31/2017 10/31/2017 7
10 11/14/2017 11/14/2017 14
And turn it into this:
RowNo StartDate StopDate DaysBetween Series
1 3/21/2017 3/21/2017 14 1
2 4/4/2017 4/4/2017 14 1
3 4/18/2017 4/18/2017 14 1
4 6/23/2017 6/23/2017 66 2
5 7/5/2017 7/5/2017 12 2
6 7/19/2017 7/19/2017 14 2
7 9/27/2017 9/27/2017 70 3
8 10/24/2017 10/24/2017 27 3
9 10/31/2017 10/31/2017 7 3
10 11/14/2017 11/14/2017 14 3
Once I have that I'll group by Series and get the min(StartDate) and max(StopDate) for individual durations.
I could do this using a cursor but I'm sure someone much smarter than me has figured out a more elegant solution. Thanks in advance!
You can use the window function sum() over with a conditional FLAG
Example
Select *
,Series= 1+sum(case when [DaysBetween]>60 then 1 else 0 end) over (Order by RowNo)
From YourTable
Returns
RowNo StartDate StopDate DaysBetween Series
1 2017-03-21 2017-03-21 14 1
2 2017-04-04 2017-04-04 14 1
3 2017-04-18 2017-04-18 14 1
4 2017-06-23 2017-06-23 66 2
5 2017-07-05 2017-07-05 12 2
6 2017-07-19 2017-07-19 14 2
7 2017-09-27 2017-09-27 70 3
8 2017-10-24 2017-10-24 27 3
9 2017-10-31 2017-10-31 7 3
10 2017-11-14 2017-11-14 14 3
EDIT - 2008 Version
Select A.*
,B.*
From YourTable A
Cross Apply (
Select Series=1+sum( case when [DaysBetween]>60 then 1 else 0 end)
From YourTable
Where RowNo <= A.RowNo
) B