PostgreSQL group by with interval but without window functions - sql

This is follow-up of my previous question:
PostgreSQL group by with interval
There was a very good answer but unfortunately it is not working with PostgreSQL 8.0 - some clients still use this old version.
So I need to find another solution without using window functions
Here is what I have as a table:
id quantity price1 price2 date
1 100 1 0 2018-01-01 10:00:00
2 200 1 0 2018-01-02 10:00:00
3 50 5 0 2018-01-02 11:00:00
4 100 1 1 2018-01-03 10:00:00
5 100 1 1 2018-01-03 11:00:00
6 300 1 0 2018-01-03 12:00:00
I need to sum "quantity" grouped by "price1" and "price2" but only when they change
So the end result should look like this:
quantity price1 price2 dateStart dateEnd
300 1 0 2018-01-01 10:00:00 2018-01-02 10:00:00
50 5 0 2018-01-02 11:00:00 2018-01-02 11:00:00
200 1 1 2018-01-03 10:00:00 2018-01-03 11:00:00
300 1 0 2018-01-03 12:00:00 2018-01-03 12:00:00

It is not efficient, but you can implement the same logic with subqueries:
select sum(quantity), price1, price2,
min(date) as dateStart, max(date) as dateend
from (select d.*,
(select count(*)
from data d2
where d2.date <= d.date
) as seqnum,
(select count(*)
from data d2
where d2.price1 = d.price1 and d2.price2 = d.price2 and d2.date <= d.date
) as seqnum_pp
from data d
) t
group by price1, price2, (seqnum - seqnum_pp)
order by dateStart

Related

Cumulative sum by month with missing months

I have to cumulative sum by month a quantity but in some months there's no quantity and SQL does not show these rows.
I have tried multiple other solutions I found here but none of them worked or at least I couldn't get them working. Currently, my code is as follows:
SELECT DISTINCT
A.FromDate
,A.ToDate
,A.OperationType
,A.[ItemCode]
,SUM(A.[Quantity]) OVER (PARTITION BY [ItemCode],OperationType,YEAR ORDER BY MONTH) [Quantity]
FROM (
SELECT
CONVERT(DATE,DATEADD(yy, DATEDIFF(yy, 0, T.OrderDate), 0)) AS FromDate
,EOMONTH(T.OrderDate) ToDate
,DATEPART(MONTH, t.OrderDate) AS [Month]
,DATEPART(YEAR, t.OrderDate) AS [Year]
,SUM(T.[Quantity]) [Quantity]
,OperationType
,[ItemCode]
FROM TEST T
WHERE [ItemCode] != ''
GROUP BY T.OrderDate,[ItemCode],OperationType
) A
With these results:
FromDate
ToDate
OType
ItemCode
Quantity
2021-01-01
2021-01-31
Type1
1
19
2021-01-01
2021-02-28
Type1
1
96
2021-01-01
2021-03-31
Type1
1
116
2021-01-01
2021-04-30
Type1
1
138
2021-01-01
2021-06-30
Type1
1
178
2021-01-01
2021-07-31
Type1
1
203
2021-01-01
2021-08-31
Type1
1
228
2021-01-01
2021-09-30
Type1
1
253
2021-01-01
2021-11-30
Type1
1
330
2021-01-01
2021-12-31
Type1
1
364
2022-01-01
2022-02-28
Type1
1
18
2022-01-01
2022-03-31
Type1
1
42
2022-01-01
2022-04-30
Type1
1
53
And I was expecting these results:
FromDate
ToDate
OType
ItemCode
Quantity
2021-01-01
2021-01-31
Type1
1
19
2021-01-01
2021-02-28
Type1
1
96
2021-01-01
2021-03-31
Type1
1
116
2021-01-01
2021-04-30
Type1
1
138
2021-01-01
2021-05-31
Type1
1
138
2021-01-01
2021-06-30
Type1
1
178
2021-01-01
2021-07-31
Type1
1
203
2021-01-01
2021-08-31
Type1
1
228
2021-01-01
2021-09-30
Type1
1
253
2021-01-01
2021-10-31
Type1
1
253
2021-01-01
2021-11-30
Type1
1
330
2021-01-01
2021-12-31
Type1
1
364
2022-01-01
2022-02-28
Type1
1
18
2022-01-01
2022-03-31
Type1
1
42
2022-01-01
2022-04-30
Type1
1
53
SQL Fiddle link: http://www.sqlfiddle.com/#!18/04a997/1
I would really appreciate some help. Thank you
Here is one way:
WITH m(Earliest,Latest) AS
(
SELECT DATEADD(DAY,1,MIN(EOMONTH(OrderDate,-1))),
MAX(EOMONTH(OrderDate)) FROM dbo.TEST
), TypeCodes AS
(
SELECT DISTINCT ItemCode, OperationType
FROM dbo.TEST
), Months AS
(
SELECT Month = DATEADD(MONTH, ROW_NUMBER()
OVER (ORDER BY ##SPID)-1, Earliest)
FROM m CROSS APPLY STRING_SPLIT(REPLICATE(',',
DATEDIFF(MONTH,Earliest,Latest)),',')
), raw AS
(
SELECT m.Month, i.OperationType, i.ItemCode,
Q = COALESCE(SUM(Quantity),0)
FROM Months AS m
CROSS JOIN TypeCodes AS i
LEFT OUTER JOIN dbo.TEST AS t
ON t.OrderDate >= m.Month
AND t.OrderDate < DATEADD(MONTH, 1, m.Month)
AND i.ItemCode = t.ItemCode
AND i.OperationType = t.OperationType
GROUP BY m.Month, i.OperationType, i.ItemCode
)
SELECT FromDate = Month,
ToDate = EOMONTH(Month),
OperationType,
ItemCode,
Quantity = SUM(Q) OVER (ORDER BY Month)
FROM raw;
Working example in this fiddle.
If you can't use STRING_SPLIT() because your database is stuck on an older compatibility level, you could put this function in a database that isn't:
USE ModernDatabase;
GO
CREATE FUNCTION dbo.StringSplit(#list nvarchar(max), #delim nchar(1))
RETURNS TABLE
AS
RETURN (SELECT value FROM STRING_SPLIT(#list, #delim));
Then you change:
FROM m CROSS APPLY STRING_SPLIT(...
To:
FROM m CROSS APPLY ModernDatabase.dbo.StringSplit(...

Get all rows from one table stream and the row before in time from an other table

Suppose I have one table (table_1) and one table stream (stream_1) that gets changes made to table_1, in my case only inserts of new rows. And once I have acted on these changes, the rowes will be removed from stream_1 but remain in table_1.
From that I would like to calculate delta values for var1 (var1 - lag(var1) as delta_var1) partitioned on a customer and just leave var2 as it is. So the data in table_1 could look something like this:
timemessage
customerid
var1
var2
2021-04-01 06:00:00
1
10
5
2021-04-01 07:00:00
2
100
7
2021-04-01 08:00:00
1
20
10
2021-04-01 09:00:00
1
40
3
2021-04-01 15:00:00
2
150
5
2021-04-01 23:00:00
1
50
6
2021-04-02 06:00:00
2
180
2
2021-04-02 07:00:00
1
55
9
2021-04-02 08:00:00
2
200
4
And the data in stream_1 that I want to act on could looks like this:
timemessage
customerid
var1
var2
2021-04-01 23:00:00
1
50
6
2021-04-02 06:00:00
2
180
2
2021-04-02 07:00:00
1
55
9
2021-04-02 08:00:00
2
200
4
But to be able to calculate delta_var1 for all customers I would need the previous row in time for each customer before the ones in stream_1.
For example: To be able to calculate how much var1 has increased for customerid = 1 between 2021-04-01 09:00:00 and 2021-04-01 23:00:00 I want to include the 2021-04-01 09:00:00 row for customerid = 1 in my output.
So I would like to create a select containing all rows in stream_1 + the previous row in time for each customerid from table_1: The wanted output is the following in regard to the mentioned table_1 and stream_1.
timemessage
customerid
var1
var2
2021-04-01 09:00:00
1
40
3
2021-04-01 15:00:00
2
150
5
2021-04-01 23:00:00
1
50
6
2021-04-02 06:00:00
2
180
2
2021-04-02 07:00:00
1
55
9
2021-04-02 08:00:00
2
200
4
So given you have the "last value per day" in your wanted output, you are want a QUALIFY to keep only the wanted rows and using ROW_NUMBER partitioned by customerid and timemessage. Assuming the accumulator it positive only you can order by accumulatedvalue thus:
WITH data(timemessage, customerid, accumulatedvalue) AS (
SELECT * FROM VALUES
('2021-04-01', 1, 10)
,('2021-04-01', 2, 100)
,('2021-04-02', 1, 20)
,('2021-04-03', 1, 40)
,('2021-04-03', 2, 150)
,('2021-04-04', 1, 50)
,('2021-04-04', 2, 180)
,('2021-04-05', 1, 55)
,('2021-04-05', 2, 200)
)
SELECT * FROM data
QUALIFY ROW_NUMBER() OVER (PARTITION BY customerid,timemessage ORDER BY accumulatedvalue DESC) = 1
ORDER BY 1,2;
gives:
TIMEMESSAGE CUSTOMERID ACCUMULATEDVALUE
2021-04-01 1 10
2021-04-01 2 100
2021-04-02 1 20
2021-04-03 1 40
2021-04-03 2 150
2021-04-04 1 50
2021-04-04 2 180
2021-04-05 1 55
2021-04-05 2 200
if you can trust your data and data in table2 starts right after data in table1 then you can just get the last records for each customer from table1 and union with table2:
select * from table1
qualify row_number() over (partitioned by customerid order by timemessage desc) = 1
union all
select * from table2
if not
select a.* from table1 a
join table2 b
on a.customerid = b.customerid
and a.timemessage < b.timemessage
qualify row_number() over (partitioned by a.customerid order by a.timemessage desc) = 1
union all
select * from table2
also you can add a condition to not look to data for more than 1 day (or 1 hour or whatever safe interval is to look at) for better performance

Generate 15 minute date intervals and join matching rows

What I would like to do is get the 15 minute intervals based on a date range in a row and insert them into another table.
Given the following code gets me the date range which is part of my goal:
DECLARE #Table1 TABLE (ID INT IDENTITY(0,1), TIMEVALUE DATETIME, TIMEVALUE2 DATETIME);
DECLARE #start DATETIME2(7) = '2018-01-04 10:55:00'
DECLARE #end DATETIME2(7) = '2018-01-05 03:55:00'
SELECT #start = dateadd(minute, datediff(minute,0,#start) / 15 * 15, 0);
WITH CTE_DT AS
(
SELECT #start AS DT
UNION ALL
SELECT DATEADD(MINUTE,15,DT) FROM CTE_DT
WHERE DT< #end
)
INSERT INTO #Table1
SELECT DT, DATEADD(minute,14,dt) FROM CTE_DT
OPTION (MAXRECURSION 0);
SELECT * FROM #Table1
result:
ID TIMEVALUE TIMEVALUE2
0 2018-01-04 10:45:00.000 2018-01-04 10:59:00.000
1 2018-01-04 11:00:00.000 2018-01-04 11:14:00.000
2 2018-01-04 11:15:00.000 2018-01-04 11:29:00.000
3 2018-01-04 11:30:00.000 2018-01-04 11:44:00.000
4 2018-01-04 11:45:00.000 2018-01-04 11:59:00.000
5 2018-01-04 12:00:00.000 2018-01-04 12:14:00.000
6 2018-01-04 12:15:00.000 2018-01-04 12:29:00.000
7 2018-01-04 12:30:00.000 2018-01-04 12:44:00.000
8 2018-01-04 12:45:00.000 2018-01-04 12:59:00.000
..
..
What I want to accomplish is the apply the same logic above i from a record source.
So if my SourceData is
Col1 Col2 StartDate EndDate
AA AA 2018-01-01 13:25 2018-01-02 13:00
AA BB 2018-01-02 13:25 2018-01-03 13:00
so with a query somehow use the start and endate to produce this result with just a query
Col1 Col2 TIMEVALUE TIMEVALUE2
AA AA 2018-01-01 13:15:00 2018-01-01 13:29:00
AA AA 2018-01-01 13:30:00 2018-01-01 13:44:00
AA AA 2018-01-01 13:45:00 2018-01-01 13:59:00
...
...
AA AA 2018-01-02 12:30:00 2018-01-02 12:44:00
AA AA 2018-01-02 12:45:00 2018-01-02 12:59:00
AA AA 2018-01-02 13:00:00 2018-01-02 13:14:00
AA BB 2018-01-02 13:15:00 2018-01-02 13:29:00
AA BB 2018-01-02 13:30:00 2018-01-02 13:44:00
AA BB 2018-01-02 13:45:00 2018-01-02 13:59:00
...
...
AA BB 2018-01-03 12:30:00 2018-01-03 12:44:00
AA BB 2018-01-03 12:45:00 2018-01-03 12:59:00
AA BB 2018-01-03 13:00:00 2018-01-03 13:14:00
I would like to avoid using a cursor if I can. I have managed to make this work with a User Defined Function by passing the required columns with the select statement. I am hoping I can avoid using that if I can.
Change the end of the first query so that instead of
SELECT * FROM #Table1
It says:
SELECT * FROM #Table1 d
INNER JOIN SourceData sd
ON NOT(d.timevalue2 < sd.startdate OR d.timevalue1 > sd.enddate)
Consider taking your first query that generates the dates, and just run it for now until year 2030 and insert the dates into an actual table. Keep the query hanging around so it can be used again in ~11 years to add some more rows to the calendar table
Rather than using a rCTE (which is a form of RBAR), I would use a virtual tally table to generate your dates:
--; is a statement terminator, not a "beginninator". It goes at the end, for the start.
WITH N AS(
SELECT NULL AS N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1 --10
CROSS JOIN N N2 --100
CROSS JOIN N N3 --1000
CROSS JOIN N N4 --10000
)
SELECT DATEADD(MINUTE,15*I,#Start)
FROM Tally
WHERE DATEADD(MINUTE,15*I,#Start) < #End;
If you want to generate 15 minute intervals for a combination of columns, you can then do a further CROSS JOIN. For example:
--Assume CTEs are already declared
SELECT V.Col1, V.Col2,
DATEADD(MINUTE,15*T.I,#Start)
FROM Tally T
CROSS JOIN (VALUES('AA','AA'),('AA','BB')) V(Col1, Col2) --This could be a CROSS APPLY to a DISTINCT, or similar is you wish
WHERE DATEADD(MINUTE,15*T.I,#Start) < #End;

SQL Collapse Data

I am trying to collapse data that is in a sequence sorted by date. While grouping on the person and the type.
The data is stored in an SQL server and looks like the following -
seq person date type
--- ------ ------------------- ----
1 1 2018-02-10 08:00:00 1
2 1 2018-02-11 08:00:00 1
3 1 2018-02-12 08:00:00 1
4 1 2018-02-14 16:00:00 1
5 1 2018-02-15 16:00:00 1
6 1 2018-02-16 16:00:00 1
7 1 2018-02-20 08:00:00 2
8 1 2018-02-21 08:00:00 2
9 1 2018-02-22 08:00:00 2
10 1 2018-02-23 08:00:00 1
11 1 2018-02-24 08:00:00 1
12 1 2018-02-25 08:00:00 2
13 2 2018-02-10 08:00:00 1
14 2 2018-02-11 08:00:00 1
15 2 2018-02-12 08:00:00 1
16 2 2018-02-14 16:00:00 3
17 2 2018-02-15 16:00:00 3
18 2 2018-02-16 16:00:00 3
This data set contains about 1.2 million records that resemble the above.
The result that I would like to get from this would be -
person start type
------ ------------------- ----
1 2018-02-10 08:00:00 1
1 2018-02-20 08:00:00 2
1 2018-02-23 08:00:00 1
1 2018-02-25 08:00:00 2
2 2018-02-10 08:00:00 1
2 2018-02-14 16:00:00 3
I have the data in the first format by running the following query -
select
ROW_NUMBER() OVER (ORDER BY date) AS seq
person,
date,
type,
from table
group by person, date, type
I am just not sure how to keep the minimum date with the other distinct values from person and type.
This is a gaps-and-islands problem so, you can use differences of row_number() & use them in grouping :
select person, min(date) as start, type
from (select *,
row_number() over (partition by person order by seq) seq1,
row_number() over (partition by person, type order by seq) seq2
from table
) t
group by person, type, (seq1 - seq2)
order by person, start;
The correct solution using the difference of row numbers is:
select person, type, min(date) as start
from (select t.*,
row_number() over (partition by person order by seq) as seqnum_p,
row_number() over (partition by person, type order by seq) as seqnum_pt
from t
) t
group by person, type, (seqnum_p - seqnum_pt)
order by person, start;
type needs to be included in the GROUP BY.

Select value on next date to be calculated on current date SQL

I have the following table:
ID GroupID oDate oTime oValue
1 A 2014-06-01 00:00:00 100
2 A 2014-06-01 01:00:00 200
3 A 2014-06-01 02:00:00 300
4 A 2014-06-02 00:00:00 400
5 A 2014-06-02 01:00:00 425
6 A 2014-06-02 02:00:00 475
7 B 2014-06-01 00:00:00 1000
8 B 2014-06-01 01:00:00 1500
9 B 2014-06-01 02:00:00 2000
10 B 2014-06-02 00:00:00 3000
11 B 2014-06-02 01:00:00 3100
12 A 2014-06-03 00:00:00 525
13 A 2014-06-03 01:00:00 600
14 A 2014-06-03 02:00:00 625
I want to have the following result:
GroupID oDate oResult
A 2014-06-01 300
A 2014-06-02 125
B 2014-06-01 2000
oResult is coming from:
Value on next date at 00:00:00 subtract value on selected date at 00:00:00.
For example, I want to know the Result for 2014-06-01. Then,
2014-06-02 00:00:00 400 substract 2014-06-01 00:00:00 100
oResult = 400 - 100 = 300
How can I achieve this in SQL syntax?
Thank you.
You can write a query using Common Table Expression as :
;with CTE as
( select row_number() over ( partition by GroupID, oDate order by oTime Asc) as rownum,
GroupID, oDate, oValue,oTime
from Test
)
select CTE.GroupID,CTE1.oDate, (CTE.oValue - CTE1.oValue) as oResult
from CTE
inner join CTE as CTE1 on datediff (day,CTE1.oDate, CTE.oDate) = 1
and CTE1.rownum= CTE.rownum
and CTE1.GroupID= CTE.GroupID
where CTE.rownum = 1
Check Demo here ...
You can use cross apply operator here
Please check this,
select a.GroupID,a.oDate, (ab.oValue - a.oValue) oResult from T as a
cross apply
(
select top 1 * from T as b
where a.oDate < b.oDate
and oTime = '00:00:00.0000000'
and a.ID < b.ID
)as ab
where a.ID in(1,4,7)
Demo