Cumulative sum by month with missing months - sql

I have to cumulative sum by month a quantity but in some months there's no quantity and SQL does not show these rows.
I have tried multiple other solutions I found here but none of them worked or at least I couldn't get them working. Currently, my code is as follows:
SELECT DISTINCT
A.FromDate
,A.ToDate
,A.OperationType
,A.[ItemCode]
,SUM(A.[Quantity]) OVER (PARTITION BY [ItemCode],OperationType,YEAR ORDER BY MONTH) [Quantity]
FROM (
SELECT
CONVERT(DATE,DATEADD(yy, DATEDIFF(yy, 0, T.OrderDate), 0)) AS FromDate
,EOMONTH(T.OrderDate) ToDate
,DATEPART(MONTH, t.OrderDate) AS [Month]
,DATEPART(YEAR, t.OrderDate) AS [Year]
,SUM(T.[Quantity]) [Quantity]
,OperationType
,[ItemCode]
FROM TEST T
WHERE [ItemCode] != ''
GROUP BY T.OrderDate,[ItemCode],OperationType
) A
With these results:
FromDate
ToDate
OType
ItemCode
Quantity
2021-01-01
2021-01-31
Type1
1
19
2021-01-01
2021-02-28
Type1
1
96
2021-01-01
2021-03-31
Type1
1
116
2021-01-01
2021-04-30
Type1
1
138
2021-01-01
2021-06-30
Type1
1
178
2021-01-01
2021-07-31
Type1
1
203
2021-01-01
2021-08-31
Type1
1
228
2021-01-01
2021-09-30
Type1
1
253
2021-01-01
2021-11-30
Type1
1
330
2021-01-01
2021-12-31
Type1
1
364
2022-01-01
2022-02-28
Type1
1
18
2022-01-01
2022-03-31
Type1
1
42
2022-01-01
2022-04-30
Type1
1
53
And I was expecting these results:
FromDate
ToDate
OType
ItemCode
Quantity
2021-01-01
2021-01-31
Type1
1
19
2021-01-01
2021-02-28
Type1
1
96
2021-01-01
2021-03-31
Type1
1
116
2021-01-01
2021-04-30
Type1
1
138
2021-01-01
2021-05-31
Type1
1
138
2021-01-01
2021-06-30
Type1
1
178
2021-01-01
2021-07-31
Type1
1
203
2021-01-01
2021-08-31
Type1
1
228
2021-01-01
2021-09-30
Type1
1
253
2021-01-01
2021-10-31
Type1
1
253
2021-01-01
2021-11-30
Type1
1
330
2021-01-01
2021-12-31
Type1
1
364
2022-01-01
2022-02-28
Type1
1
18
2022-01-01
2022-03-31
Type1
1
42
2022-01-01
2022-04-30
Type1
1
53
SQL Fiddle link: http://www.sqlfiddle.com/#!18/04a997/1
I would really appreciate some help. Thank you

Here is one way:
WITH m(Earliest,Latest) AS
(
SELECT DATEADD(DAY,1,MIN(EOMONTH(OrderDate,-1))),
MAX(EOMONTH(OrderDate)) FROM dbo.TEST
), TypeCodes AS
(
SELECT DISTINCT ItemCode, OperationType
FROM dbo.TEST
), Months AS
(
SELECT Month = DATEADD(MONTH, ROW_NUMBER()
OVER (ORDER BY ##SPID)-1, Earliest)
FROM m CROSS APPLY STRING_SPLIT(REPLICATE(',',
DATEDIFF(MONTH,Earliest,Latest)),',')
), raw AS
(
SELECT m.Month, i.OperationType, i.ItemCode,
Q = COALESCE(SUM(Quantity),0)
FROM Months AS m
CROSS JOIN TypeCodes AS i
LEFT OUTER JOIN dbo.TEST AS t
ON t.OrderDate >= m.Month
AND t.OrderDate < DATEADD(MONTH, 1, m.Month)
AND i.ItemCode = t.ItemCode
AND i.OperationType = t.OperationType
GROUP BY m.Month, i.OperationType, i.ItemCode
)
SELECT FromDate = Month,
ToDate = EOMONTH(Month),
OperationType,
ItemCode,
Quantity = SUM(Q) OVER (ORDER BY Month)
FROM raw;
Working example in this fiddle.
If you can't use STRING_SPLIT() because your database is stuck on an older compatibility level, you could put this function in a database that isn't:
USE ModernDatabase;
GO
CREATE FUNCTION dbo.StringSplit(#list nvarchar(max), #delim nchar(1))
RETURNS TABLE
AS
RETURN (SELECT value FROM STRING_SPLIT(#list, #delim));
Then you change:
FROM m CROSS APPLY STRING_SPLIT(...
To:
FROM m CROSS APPLY ModernDatabase.dbo.StringSplit(...

Related

Finding most recent startdate, and endDate from consecutive dates

I have a table like below:
user_id
store_id
stock
date
116
2
0
2021-10-18
116
2
0
2021-10-19
116
2
0
2021-10-20
116
2
0
2021-08-16
116
2
0
2021-08-15
116
2
0
2021-07-04
116
2
0
,2021-07-03
389
2
0
2021-07-02
389
2
0
2021-07-01
389
2
0
2021-10-27
52
6
0
2021-10-28
52
6
0
2021-10-29
52
6
0
2021-10-30
116
38
0
2021-05-02
116
38
0
2021-05-03
116
38
0
2021-05-04
116
38
0
2021-04-06
The table can have multiple consecutive days where a product ran out of stock, so I'd like to create a query with the last startDate and endDate where the product ran out of stock. For the table above, the results have to be:
user_Id
store_id
startDate
endDate
116
2
2021-10-18
2021-10-20
116
38
2021-05-02
2021-05-04
389
2
2021-07-01
2021-07-02
52
6
2021-10-28
2021-10-30
I have tried the solution with row_number(), but it didn't work. Does someone have a tip or idea to solve this problem with SQL (PostgreSQL)?
here is how you can do it :
select user_id, store_id,min(date) startdate,max(date) enddate
from (
select *, rank() over (partition by user_id, store_id order by grp desc) rn from (
select *, date - row_number() over (partition by user_id,store_id order by date) * interval '1 day' grp
from tablename
) t) t where rn = 1
group by user_id, store_id,grp
db<>fiddle here

Get all rows from one table stream and the row before in time from an other table

Suppose I have one table (table_1) and one table stream (stream_1) that gets changes made to table_1, in my case only inserts of new rows. And once I have acted on these changes, the rowes will be removed from stream_1 but remain in table_1.
From that I would like to calculate delta values for var1 (var1 - lag(var1) as delta_var1) partitioned on a customer and just leave var2 as it is. So the data in table_1 could look something like this:
timemessage
customerid
var1
var2
2021-04-01 06:00:00
1
10
5
2021-04-01 07:00:00
2
100
7
2021-04-01 08:00:00
1
20
10
2021-04-01 09:00:00
1
40
3
2021-04-01 15:00:00
2
150
5
2021-04-01 23:00:00
1
50
6
2021-04-02 06:00:00
2
180
2
2021-04-02 07:00:00
1
55
9
2021-04-02 08:00:00
2
200
4
And the data in stream_1 that I want to act on could looks like this:
timemessage
customerid
var1
var2
2021-04-01 23:00:00
1
50
6
2021-04-02 06:00:00
2
180
2
2021-04-02 07:00:00
1
55
9
2021-04-02 08:00:00
2
200
4
But to be able to calculate delta_var1 for all customers I would need the previous row in time for each customer before the ones in stream_1.
For example: To be able to calculate how much var1 has increased for customerid = 1 between 2021-04-01 09:00:00 and 2021-04-01 23:00:00 I want to include the 2021-04-01 09:00:00 row for customerid = 1 in my output.
So I would like to create a select containing all rows in stream_1 + the previous row in time for each customerid from table_1: The wanted output is the following in regard to the mentioned table_1 and stream_1.
timemessage
customerid
var1
var2
2021-04-01 09:00:00
1
40
3
2021-04-01 15:00:00
2
150
5
2021-04-01 23:00:00
1
50
6
2021-04-02 06:00:00
2
180
2
2021-04-02 07:00:00
1
55
9
2021-04-02 08:00:00
2
200
4
So given you have the "last value per day" in your wanted output, you are want a QUALIFY to keep only the wanted rows and using ROW_NUMBER partitioned by customerid and timemessage. Assuming the accumulator it positive only you can order by accumulatedvalue thus:
WITH data(timemessage, customerid, accumulatedvalue) AS (
SELECT * FROM VALUES
('2021-04-01', 1, 10)
,('2021-04-01', 2, 100)
,('2021-04-02', 1, 20)
,('2021-04-03', 1, 40)
,('2021-04-03', 2, 150)
,('2021-04-04', 1, 50)
,('2021-04-04', 2, 180)
,('2021-04-05', 1, 55)
,('2021-04-05', 2, 200)
)
SELECT * FROM data
QUALIFY ROW_NUMBER() OVER (PARTITION BY customerid,timemessage ORDER BY accumulatedvalue DESC) = 1
ORDER BY 1,2;
gives:
TIMEMESSAGE CUSTOMERID ACCUMULATEDVALUE
2021-04-01 1 10
2021-04-01 2 100
2021-04-02 1 20
2021-04-03 1 40
2021-04-03 2 150
2021-04-04 1 50
2021-04-04 2 180
2021-04-05 1 55
2021-04-05 2 200
if you can trust your data and data in table2 starts right after data in table1 then you can just get the last records for each customer from table1 and union with table2:
select * from table1
qualify row_number() over (partitioned by customerid order by timemessage desc) = 1
union all
select * from table2
if not
select a.* from table1 a
join table2 b
on a.customerid = b.customerid
and a.timemessage < b.timemessage
qualify row_number() over (partitioned by a.customerid order by a.timemessage desc) = 1
union all
select * from table2
also you can add a condition to not look to data for more than 1 day (or 1 hour or whatever safe interval is to look at) for better performance

How to do sum with where clause group by with date?

I want to sum up where clause. I don't know how to show off_duration in the next column.
SELECT CAST([RecordDateTime] AS DATE) AS DATE
,SUM(CAST([units] AS FLOAT)) AS Units
,sum(cast(duration AS INT) / 60) on_duration
FROM [Energies]
WHERE duration_mode = 'ON'
GROUP BY CAST([RecordDateTime] AS DATE)
ORDER BY CAST([RecordDateTime] AS DATE) DESC
op
Datet units on_duration
-------------------------------
2020-01-17 3.53 758
2020-01-16 7.66 973
2020-01-15 15.12 1806
2020-01-13 10.4 500
ex op
date units on_duration off_duration
-----------------------------------------------
2020-01-17 3.53 758 28
2020-01-16 7.66 973 9
2020-01-15 15.12 1806 96
2020-01-13 10.4 500 95
sample data
duration_mode duration RecordDateTime units
-------------------------------------------------------------
ON 187 2020-01-07 20:18:33.9744232 0.19
ON 187 2020-01-07 20:19:03.1554359 0.19
OFF 10 2020-01-07 20:22:13.5283932 0.00
ON 187 2020-01-07 20:24:39.0510166 0.19
I think that you are looking for conditional aggregation:
SELECT
CAST([RecordDateTime] AS DATE) as Date,
SUM(CAST([units] as float)) as Units,
SUM(CASE WHEN duration_mode = 'ON' THEN CAST(duration as int)/60 END) on_duration,
SUM(CASE WHEN duration_mode = 'OFF' THEN CAST(duration as int)/60 END) off_duration
FROM [Energies]
GROUP BY CAST([RecordDateTime] AS DATE)
ORDER BY CAST([RecordDateTime] AS DATE) desc

PostgreSQL group by with interval but without window functions

This is follow-up of my previous question:
PostgreSQL group by with interval
There was a very good answer but unfortunately it is not working with PostgreSQL 8.0 - some clients still use this old version.
So I need to find another solution without using window functions
Here is what I have as a table:
id quantity price1 price2 date
1 100 1 0 2018-01-01 10:00:00
2 200 1 0 2018-01-02 10:00:00
3 50 5 0 2018-01-02 11:00:00
4 100 1 1 2018-01-03 10:00:00
5 100 1 1 2018-01-03 11:00:00
6 300 1 0 2018-01-03 12:00:00
I need to sum "quantity" grouped by "price1" and "price2" but only when they change
So the end result should look like this:
quantity price1 price2 dateStart dateEnd
300 1 0 2018-01-01 10:00:00 2018-01-02 10:00:00
50 5 0 2018-01-02 11:00:00 2018-01-02 11:00:00
200 1 1 2018-01-03 10:00:00 2018-01-03 11:00:00
300 1 0 2018-01-03 12:00:00 2018-01-03 12:00:00
It is not efficient, but you can implement the same logic with subqueries:
select sum(quantity), price1, price2,
min(date) as dateStart, max(date) as dateend
from (select d.*,
(select count(*)
from data d2
where d2.date <= d.date
) as seqnum,
(select count(*)
from data d2
where d2.price1 = d.price1 and d2.price2 = d.price2 and d2.date <= d.date
) as seqnum_pp
from data d
) t
group by price1, price2, (seqnum - seqnum_pp)
order by dateStart

Transposing SQLite rows and columns with average per hour

I have a table in SQLite called param_vals_breaches that looks like the following:
id param queue date_time param_val breach_count
1 c a 2013-01-01 00:00:00 188 7
2 c b 2013-01-01 00:00:00 156 8
3 c c 2013-01-01 00:00:00 100 2
4 d a 2013-01-01 00:00:00 657 0
5 d b 2013-01-01 00:00:00 23 6
6 d c 2013-01-01 00:00:00 230 12
7 c a 2013-01-01 01:00:00 100 0
8 c b 2013-01-01 01:00:00 143 9
9 c c 2013-01-01 01:00:00 12 2
10 d a 2013-01-01 01:00:00 0 1
11 d b 2013-01-01 01:00:00 29 5
12 d c 2013-01-01 01:00:00 22 14
13 c a 2013-01-01 02:00:00 188 7
14 c b 2013-01-01 02:00:00 156 8
15 c c 2013-01-01 02:00:00 100 2
16 d a 2013-01-01 02:00:00 657 0
17 d b 2013-01-01 02:00:00 23 6
18 d c 2013-01-01 02:00:00 230 12
I want to write a query that will show me a particular queue (e.g. "a") with the average param_val and breach_count for each param on an hour by hour basis. So transposing the data to get something that looks like this:
Results for Queue A
Hour 0 Hour 0 Hour 1 Hour 1 Hour 2 Hour 2
param avg_param_val avg_breach_count avg_param_val avg_breach_count avg_param_val avg_breach_count
c xxx xxx xxx xxx xxx xxx
d xxx xxx xxx xxx xxx xxx
is this possible? I'm not sure how to go about it. Thanks!
SQLite does not have a PIVOT function but you can use an aggregate function with a CASE expression to turn the rows into columns:
select param,
avg(case when time = '00' then param_val end) AvgHour0Val,
avg(case when time = '00' then breach_count end) AvgHour0Count,
avg(case when time = '01' then param_val end) AvgHour1Val,
avg(case when time = '01' then breach_count end) AvgHour1Count,
avg(case when time = '02' then param_val end) AvgHour2Val,
avg(case when time = '02' then breach_count end) AvgHour2Count
from
(
select param,
strftime('%H', date_time) time,
param_val,
breach_count
from param_vals_breaches
where queue = 'a'
) src
group by param;
See SQL Fiddle with Demo