Running 3 Month Count - sql

I get a list of new accounts twice a month. Majority of my accounts have either gone through the funnel or can be categorized as a bad lead by the end of the third month. So I am working on sql query that will tell me the number of accounts in my funnel at each date when I received a new list.
Here is what I have been working with (Yes I joined table1 on itself):
select t.receiveddate, count(*)
from table1 t
join table1 t2 on t2.number = t.number
and (t2.receiveddate > Dateadd(month , -3, t.received) AND t2.receiveddate<=
t.receiveddate)
group by t.receiveddate
What I am hoping to end up with is a list of the dates that I receive new business, with a count of how many accounts are in the funnel (accounts that I received no more than 3 months ago). The count should include the new accounts received on that date as well.
Here is an example, lets assume the business started on 1/1/2000, there is no one in the funnel for the first count. Lets also assume that I get 100 new accounts every time, just to make things simple for this example.
receiveddate Count
1/1/2000 100
1/15/2000 200
2/1/2000 300
2/15/2000 300
3/1/2000 300
3/15/2000 300

Welcome to Stack Overflow!
I see where you were going with that, but I would use a correlated sub-query, instead. I've included my sample data, which I randomized to make it similar to the real world - you can use an Update statement to change them all to 100 to validate the query):
create table Pipeline
(
ID int identity
, Received_Date Date
, Received_Count Int
)
Insert Into Pipeline
select '01/01/2018', 75 + (RAND() * 50)
union all select '01/15/2018', 75 + (RAND() * 50)
union all select '02/01/2018', 75 + (RAND() * 50)
union all select '02/15/2018', 75 + (RAND() * 50)
union all select '03/01/2018', 75 + (RAND() * 50)
union all select '03/15/2018', 75 + (RAND() * 50)
union all select '04/01/2018', 75 + (RAND() * 50)
union all select '04/15/2018', 75 + (RAND() * 50)
union all select '05/01/2018', 75 + (RAND() * 50)
union all select '05/15/2018', 75 + (RAND() * 50)
union all select '06/01/2018', 75 + (RAND() * 50)
union all select '06/15/2018', 75 + (RAND() * 50)
select * from Pipeline -- so you can see the values you got
select
P1.Received_Date
, (Select Sum(P2.Received_Count)
from Pipeline as P2
Where P2.Received_Date > DateAdd(MONTH, -3, P1.Received_Date)
and P2.Received_Date <= P1.Received_Date
) As Pipeline_Total
from Pipeline as P1

Your question is not entirely clear. I assume that you have a table of accounts, and for each account you have the received-date.
-- One record per account, each with a received-date
create table Account ( AccountID int identity(1,1),
ReceivedDate date )
-- Populate with 1000 random received-dates.
-- DateAdd(month,... gives a range of 60 months
-- DateAdd(day,... forces either 1st or 15th of the month
-- Of course, the query below will work regardless of the
-- distribution of dates
declare #k int
set #k = 0
while #k < 1000
begin
insert into Account ( ReceivedDate ) values
( DateAdd ( day, 14*cast( 2*rand()as int),
DateAdd ( month, cast(60*rand()as int), '2000-01-01' ) ) )
set #k = #k + 1
end
-- Let's see the list of dates
select * from Account
-- T1 is the list of DISTINCT received-dates
-- Other than that this query almost matches your own attempt
select T1.ReceivedDate, count(*) as InFunnel
from
(select distinct ReceivedDate from Account) T1,
Account T2
where T2.ReceivedDate > DateAdd ( month, -3, T1.ReceivedDate )
and T2.ReceivedDate <= T1.ReceivedDate
group by T1.ReceivedDate
order by T1.ReceivedDate

Related

How to spread month to day with amount value divided by total days per month

I have data with an amount of 1 month and want to change it to 30 days.
if 1 month the amount is 20000 then per day is 666.67
The following are sample data and results:
Account
Project
Date
Segment
Amount
Acc1
1
September 2022
Actual
20000
Result :
I need a query using sql server
You may try a set-based approach using an appropriate number table and a calculation with windowed COUNT().
Data:
SELECT *
INTO Data
FROM (VALUES
('Acc1', 1, CONVERT(date, '20220901'), 'Actual', 20000.00)
) v (Account, Project, [Date], Segment, Amount)
Statement for all versions, starting from SQL Server 2016 (the number table is generated using JSON-based approach with OPENJSON()):
SELECT d.Account, d.Project, a.[Date], d.Segment, a.Amount
FROM Data d
CROSS APPLY (
SELECT
d.Amount / COUNT(*) OVER (ORDER BY (SELECT NULL)),
DATEADD(day, CONVERT(int, [key]), d.[Date])
FROM OPENJSON('[1' + REPLICATE(',1', DATEDIFF(day, d.[Date], EOMONTH(d.[Date]))) + ']')
) a (Amount, Date)
Statement for SQL Server 2022 (the number table is generated with GENERATE_SERIES()):
SELECT d.Account, d.Project, a.[Date], d.Segment, a.Amount
FROM Data d
CROSS APPLY (
SELECT
d.Amount / COUNT(*) OVER (ORDER BY (SELECT NULL)),
DATEADD(day, [value], d.[Date])
FROM GENERATE_SERIES(0, DATEDIFF(day, d.[Date], EOMONTH(d.[Date])))
) a (Amount, Date)
Notes:
Both approaches calculate the days for each month. If you always want 30 days per month, replace DATEDIFF(day, d.[Date], EOMONTH(d.[Date])) with 29.
There is a rounding issue with this calculation. You may need to implement an additional calculation for the last day of the month.
You can use a recursive CTE to generate each day of the month and then divide the amount by the number of days in the month to achive the required output
DECLARE #Amount NUMERIC(18,2) = 20000,
#MonthStart DATE = '2022-09-01'
;WITH CTE
AS
(
SELECT
CurrentDate = #MonthStart,
DayAmount = CAST(#Amount/DAY(EOMONTH(#MonthStart)) AS NUMERIC(18,2)),
RemainingAmount = CAST(#Amount - (#Amount/DAY(EOMONTH(#MonthStart))) AS NUMERIC(18,2))
UNION ALL
SELECT
CurrentDate = DATEADD(DAY,1,CurrentDate),
DayAmount = CASE WHEN DATEADD(DAY,1,CurrentDate) = EOMONTH(#MonthStart)
THEN RemainingAmount
ELSE DayAmount END,
RemainingAmount = CASE WHEN DATEADD(DAY,1,CurrentDate) = EOMONTH(#MonthStart)
THEN 0
ELSE CAST(RemainingAmount-DayAmount AS NUMERIC(18,2)) END
FROM CTE
WHERE CurrentDate < EOMONTH(#MonthStart)
)
SELECT
CurrentDate,
DayAmount
FROM CTE
In case you want an equal split without rounding errors and without loops you can use this calculation. It spreads the rounding error across all entries, so they are all as equal as possible.
DECLARE #Amount NUMERIC(18,2) = 20000,
#MonthStart DATE = '20220901'
SELECT DATEADD(DAY,Numbers.i - 1,#MonthStart)
, ShareSplit.Calculated_Share
, SUM(ShareSplit.Calculated_Share) OVER (ORDER BY (SELECT NULL)) AS Calculated_Total
FROM (SELECT DISTINCT number FROM master..spt_values WHERE number BETWEEN 1 AND DAY(EOMONTH(#MonthStart)))Numbers(i)
CROSS APPLY(SELECT CAST(ROUND(#Amount * 100 / DAY(EOMONTH(#MonthStart)),0) * 0.01
+ CASE
WHEN Numbers.i
<= ABS((#Amount - (ROUND(#Amount * 100 / DAY(EOMONTH(#MonthStart)),0) / 100.0 * DAY(EOMONTH(#MonthStart)))) * 100)
THEN 0.01 * SIGN(#Amount - (ROUND(#Amount * 100 / DAY(EOMONTH(#MonthStart)),0) / 100.0 * DAY(EOMONTH(#MonthStart))))
ELSE 0
END AS DEC(18,2)) AS Calculated_Share
)ShareSplit

Proportional distribution of a given value between two dates in SQL Server

There's a table with three columns: start date, end date and task duration in hours. For example, something like that:
Id
StartDate
EndDate
Duration
1
07-11-2022
15-11-2022
40
2
02-09-2022
02-11-2022
122
3
10-10-2022
05-11-2022
52
And I want to get a table like that:
Id
Month
HoursPerMonth
1
11
40
2
09
56
2
10
62
2
11
4
3
10
42
3
11
10
Briefly, I wanted to know, how many working hours is in each month between start and end dates. Proportionally. How can I achieve that by MS SQL Query? Data is quite big so the query speed is important enough. Thanks in advance!
I've tried DATEDIFF and EOMONTH, but that solution doesn't work with tasks > 2 months. And I'm sure that this solution is bad decision. I hope, that it can be done more elegant way.
Here is an option using an ad-hoc tally/calendar table
Not sure I'm agree with your desired results
Select ID
,Month = month(D)
,HoursPerMonth = (sum(1.0) / (1+max(datediff(DAY,StartDate,EndDate)))) * max(Duration)
From YourTable A
Join (
Select Top 75000 D=dateadd(day,Row_Number() Over (Order By (Select NULL)),0)
From master..spt_values n1, master..spt_values n2
) B on D between StartDate and EndDate
Group By ID,month(D)
Order by ID,Month
Results
This answer uses CTE recursion.
This part just sets up a temp table with the OP's example data.
DECLARE #source
TABLE (
SOURCE_ID INT
,STARTDATE DATE
,ENDDATE DATE
,DURATION INT
)
;
INSERT
INTO
#source
VALUES
(1, '20221107', '20221115', 40 )
,(2, '20220902', '20221102', 122 )
,(3, '20221010', '20221105', 52 )
;
This part is the query based on the above data. The recursive CTE breaks the time period into months. The second CTE does the math. The final selection does some more math and presents the results the way you want to seem them.
WITH CTE AS (
SELECT
SRC.SOURCE_ID
,SRC.STARTDATE
,SRC.ENDDATE
,SRC.STARTDATE AS 'INTERIM_START_DATE'
,CASE WHEN EOMONTH(SRC.STARTDATE) < SRC.ENDDATE
THEN EOMONTH(SRC.STARTDATE)
ELSE SRC.ENDDATE
END AS 'INTERIM_END_DATE'
,SRC.DURATION
FROM
#source SRC
UNION ALL
SELECT
CTE.SOURCE_ID
,CTE.STARTDATE
,CTE.ENDDATE
,CASE WHEN EOMONTH(CTE.INTERIM_START_DATE) < CTE.ENDDATE
THEN DATEADD( DAY, 1, EOMONTH(CTE.INTERIM_START_DATE) )
ELSE CTE.STARTDATE
END
,CASE WHEN EOMONTH(CTE.INTERIM_START_DATE, 1) < CTE.ENDDATE
THEN EOMONTH(CTE.INTERIM_START_DATE, 1)
ELSE CTE.ENDDATE
END
,CTE.DURATION
FROM
CTE
WHERE
CTE.INTERIM_END_DATE < CTE.ENDDATE
)
, CTE2 AS (
SELECT
CTE.SOURCE_ID
,CTE.STARTDATE
,CTE.ENDDATE
,CTE.INTERIM_START_DATE
,CTE.INTERIM_END_DATE
,CAST( DATEDIFF( DAY, CTE.INTERIM_START_DATE, CTE.INTERIM_END_DATE ) + 1 AS FLOAT ) AS 'MNTH_DAYS'
,CAST( DATEDIFF( DAY, CTE.STARTDATE, CTE.ENDDATE ) + 1 AS FLOAT ) AS 'TTL_DAYS'
,CAST( CTE.DURATION AS FLOAT ) AS 'DURATION'
FROM
CTE
)
SELECT
CTE2.SOURCE_ID AS 'Id'
,MONTH( CTE2.INTERIM_START_DATE ) AS 'Month'
,ROUND( CTE2.MNTH_DAYS/CTE2.TTL_DAYS * CTE2.DURATION, 0 ) AS 'HoursPerMonth'
FROM
CTE2
ORDER BY
CTE2.SOURCE_ID
,CTE2.INTERIM_END_DATE
;
My results agree with Mr. Cappelletti's, not the OP's. Perhaps some tweaking regarding the definition of a "Day" is needed. I don't know.
If time between start and end date is large (more than 100 months) you may want to specify OPTION (MAXRECURSION 0) at the end.

Grouping the result set based on conditions

I am calculating Age of a user based on his date of birth.
select UserId, (Convert(int,Convert(char(8),GETDATE(),112))-Convert(char(8),[DateOfBirth],112))/10000 AS [Age] FROM dbo.[User]
This gives me the UserId and his age.
Now I want to group this result.
How many users are in 30's, How many users in 40's and how many users in their 50's.. need the count of users with their age groups
If the user is > 0 and less than 30 he should be grouped to 20's
If the user is >= 30 and < 40 then he should be added to 30's list, same with 40's and 50's
Can this be achieved without creating any temp table?
I believe this will get you what you want.
Anything < 30 will be placed in the '20' group.
Anything >= 50 will be placed in the '50' group.
If they are 30-39 or 40-49, they will be placed in their appropriate decade group.
SELECT y.AgeDecade, COUNT(*)
FROM dbo.[User] u
CROSS APPLY (SELECT Age = (CONVERT(int, CONVERT(char(8), GETDATE(), 112)) - CONVERT(int, CONVERT(char(8), u.DateOfBirth, 112))) / 10000) x
CROSS APPLY (SELECT AgeDecade = CASE
WHEN x.Age <= 29 THEN 20
WHEN x.Age BETWEEN 30 AND 39 THEN 30
WHEN x.Age BETWEEN 40 AND 49 THEN 40
WHEN x.Age >= 50 THEN 50
ELSE NULL
END
) y
GROUP BY y.AgeDecade
Placing the logic into CROSS APPLY makes it easier to reuse the logic within the same query, this way you can use it in SELECT, GROUP BY, ORDER BY, WHERE, etc, without having to duplicate it. This could also be done using a cte, but in this scenario, this is my preference.
Update:
You asked in the comments how it would be possible to show a count of 0 when no people exist for an age group. In most cases the answer is simple, LEFT JOIN. As with everything, there's always more than one way to bake a cake.
Here are a couple ways you can do it:
The simple left join, take the query from my original answer, and just do a left join to a table. You could do this in a couple ways, CTE, temp table, table variable, sub-query, etc. The takeaway is, you need to isolate your User table somehow.
Simple Sub-query method, nothing fancy. Just stuck the whole query into a sub-query, then left joined it to our new lookup table.
DECLARE #AgeGroup TABLE (AgeGroupID tinyint NOT NULL);
INSERT INTO #AgeGroup (AgeGroupID) VALUES (20),(30),(40),(50);
SELECT ag.AgeGroupID, TotalCount = COUNT(a.AgeDecade)
FROM #AgeGroup ag
LEFT JOIN (
SELECT y.AgeDecade
FROM #User u
CROSS APPLY (SELECT Age = (CONVERT(int, CONVERT(char(8), GETDATE(), 112)) - CONVERT(int, CONVERT(char(8), u.DateOfBirth, 112))) / 10000) x
CROSS APPLY (SELECT AgeDecade = CASE
WHEN x.Age <= 29 THEN 20
WHEN x.Age BETWEEN 30 AND 39 THEN 30
WHEN x.Age BETWEEN 40 AND 49 THEN 40
WHEN x.Age >= 50 THEN 50
ELSE NULL
END
) y
) a ON a.AgeDecade = ag.AgeGroupID
GROUP BY ag.AgeGroupID;
This would be the exact same thing as using a cte:
DECLARE #AgeGroup TABLE (AgeGroupID tinyint NOT NULL);
INSERT INTO #AgeGroup (AgeGroupID) VALUES (20),(30),(40),(50);
WITH cte_Users AS (
SELECT y.AgeDecade
FROM #User u
CROSS APPLY (SELECT Age = (CONVERT(int, CONVERT(char(8), GETDATE(), 112)) - CONVERT(int, CONVERT(char(8), u.DateOfBirth, 112))) / 10000) x
CROSS APPLY (SELECT AgeDecade = CASE
WHEN x.Age <= 29 THEN 20
WHEN x.Age BETWEEN 30 AND 39 THEN 30
WHEN x.Age BETWEEN 40 AND 49 THEN 40
WHEN x.Age >= 50 THEN 50
ELSE NULL
END
) y
)
SELECT ag.AgeGroupID, TotalCount = COUNT(a.AgeDecade)
FROM #AgeGroup ag
LEFT JOIN cte_Users a ON a.AgeDecade = ag.AgeGroupID
GROUP BY ag.AgeGroupID;
Choosing between the two is purely preference. There's no performance gain to using a CTE here.
Bonus:
If you wanted to table drive your groups and also have 0 counts, you could do something like this...though I will warn you to be careful using APPLY operators because they can be tricky with performance sometimes.
IF OBJECT_ID('tempdb..#User','U') IS NOT NULL DROP TABLE #User; --SELECT * FROM #User
WITH c1 AS (SELECT x.x FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) x(x)) -- 10
, c2(x) AS (SELECT 1 FROM c1 x CROSS JOIN c1 y) -- 10 * 10
SELECT UserID = IDENTITY(int,1,1)
, DateOfBirth = CONVERT(date, GETDATE()-(RAND(CHECKSUM(NEWID()))*18500))
INTO #User
FROM c2 u;
IF OBJECT_ID('tempdb..#AgeRange','U') IS NOT NULL DROP TABLE #AgeRange; --SELECT * FROM #AgeRange
CREATE TABLE #AgeRange (
AgeRangeID tinyint NOT NULL IDENTITY(1,1),
RangeStart tinyint NOT NULL,
RangeEnd tinyint NOT NULL,
RangeLabel varchar(100) NOT NULL,
);
INSERT INTO #AgeRange (RangeStart, RangeEnd, RangeLabel)
VALUES ( 0, 29, '< 29')
, (30, 39, '30 - 39')
, (40, 49, '40 - 49')
, (50, 255, '50+');
-- Using an OUTER APPLY
SELECT ar.RangeLabel, COUNT(y.UserID)
FROM #AgeRange ar
OUTER APPLY (
SELECT u.UserID
FROM #User u
CROSS APPLY (SELECT Age = (CONVERT(int, CONVERT(char(8), GETDATE(), 112)) - CONVERT(int, CONVERT(char(8), u.DateOfBirth, 112))) / 10000) x
WHERE x.Age BETWEEN ar.RangeStart AND ar.RangeEnd
) y
GROUP BY ar.RangeLabel, ar.RangeStart
ORDER BY ar.RangeStart;
-- Using a CTE
WITH cte_users AS (
SELECT u.UserID
, Age = (CONVERT(int, CONVERT(char(8), GETDATE(), 112)) - CONVERT(int, CONVERT(char(8), u.DateOfBirth, 112))) / 10000
FROM #User u
)
SELECT ar.RangeLabel, COUNT(u.UserID)
FROM #AgeRange ar
LEFT JOIN cte_users u ON u.Age BETWEEN ar.RangeStart AND ar.RangeEnd
GROUP BY ar.RangeStart, ar.RangeLabel
ORDER BY ar.RangeStart;
I would start by putting the age computation in a lateral join, so it can easily be referred to. Then, if you want the age groups as rows, you can join a derived table that describes the intervals:
select v.age_group, count(*) as cnt_users
from dbo.[User] u
cross apply (values
((convert(int, convert(char(8), getdate(),112)) - convert(char(8), u.[DateOfBirth], 112))/10000)
) a(age)
inner join (values
( 0, 30, '0-30'),
(30, 40, '30-40'),
(40, 50, '40-50'),
(50, null, '50+')
) v(min_age, max_age, age_group)
on a.age >= v.min_age
and (a.age < v.max_age or v.max_age is null)
group by v.age_group
On the other hands, if you want the counts in columns, use conditional aggregation:
select
sum(case when a.age < 30 then 1 else 0 end) as age_0_30,
sum(case when a.age >= 30 and a.age < 40 then 1 else 0 end) as age_30_40,
sum(case when a.age >= 40 and a.age < 50 then 1 else 0 end) as age_40_50,
sum(case when a.age >= 50 then 1 else 0 end) as age_50
from dbo.[User] u
cross apply (values
((convert(int, convert(char(8), getdate(),112)) - convert(char(8), [DateOfBirth], 112))/10000)
) a(age)
yes you can.
this query should work with you
SELECT STR(ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) + 's' AS [Age Group], COUNT(UserId) AS Count
FROM dbo.User
GROUP BY STR(ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) + 's'
for your updated question
SELECT CASE
WHEN (ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) < 30 THEN '20s'
WHEN (ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) >= 50 THEN '50s'
ELSE str(ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) + 's'
END AS [Age Group], COUNT(UserId) AS Count
FROM dbo.User
GROUP BY CASE
WHEN (ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) < 30 THEN '20s'
WHEN (ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) >= 50 THEN '50s'
ELSE str(ROUND(DATEDIFF(year, DateOfBirth, GETDATE()), - 1) - 10) + 's'
END
You could use round with a length argument of -1 and a non-zero function argument to truncate the value to "tens", and group by it:
SELECT UserId,
Round((Convert(int,Convert(char(8),GETDATE(),112))-Convert(char(8),[DateOfBirth],112))/10000, -1, 1) AS [Rounded Age],
Count(*)
FROM dbo.[User]
GROUP BY Round((Convert(int,Convert(char(8),GETDATE(),112))-Convert(char(8),[DateOfBirth],112))/10000, -1, 1)

SQL - '1' IF hour in month EXISTS, '0' IF NOT EXISTS

I have a table that has aggregations down to the hour level YYYYMMDDHH. The data is aggregated and loaded by an external process (I don't have control over). I want to test the data on a monthly basis.
The question I am looking to answer is: Does every hour in the month exist?
I'm looking to produce output that will return a 1 if the hour exists or 0 if the hour does not exist.
The aggregation table looks something like this...
YYYYMM YYYYMMDD YYYYMMDDHH DATA_AGG
201911 20191101 2019110100 100
201911 20191101 2019110101 125
201911 20191101 2019110103 135
201911 20191101 2019110105 95
… … … …
201911 20191130 2019113020 100
201911 20191130 2019113021 110
201911 20191130 2019113022 125
201911 20191130 2019113023 135
And defined as...
CREATE TABLE YYYYMMDDHH_DATA_AGG AS (
YYYYMM VARCHAR,
YYYYMMDD VARCHAR,
YYYYMMDDHH VARCHAR,
DATA_AGG INT
);
I'm looking to produce the following below...
YYYYMMDDHH HOUR_EXISTS
2019110100 1
2019110101 1
2019110102 0
2019110103 1
2019110104 0
2019110105 1
... ...
In the example above, two hours do not exist, 2019110102 and 2019110104.
I assume I'd have to join the aggregation table against a computed table that contains all the YYYYMMDDHH combos???
The database is Snowflake, but assume most generic ANSI SQL queries will work.
You can get what you want with a recursive CTE
The recursive CTE generates the list of possible Hours. And then a simple left outer join gets you the flag for if you have any records that match that hour.
WITH RECURSIVE CTE (YYYYMMDDHH) as
(
SELECT YYYYMMDDHH
FROM YYYYMMDDHH_DATA_AGG
WHERE YYYYMMDDHH = (SELECT MIN(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG)
UNION ALL
SELECT TO_VARCHAR(DATEADD(HOUR, 1, TO_TIMESTAMP(C.YYYYMMDDHH, 'YYYYMMDDHH')), 'YYYYMMDDHH') YYYYMMDDHH
FROM CTE C
WHERE TO_VARCHAR(DATEADD(HOUR, 1, TO_TIMESTAMP(C.YYYYMMDDHH, 'YYYYMMDDHH')), 'YYYYMMDDHH') <= (SELECT MAX(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG)
)
SELECT
C.YYYYMMDDHH,
IFF(A.YYYYMMDDHH IS NOT NULL, 1, 0) HOUR_EXISTS
FROM CTE C
LEFT OUTER JOIN YYYYMMDDHH_DATA_AGG A
ON C.YYYYMMDDHH = A.YYYYMMDDHH;
If your timerange is too long you'll have issues with the cte recursing too much. You can create a table or temp table with all of the possible hours instead. For example:
CREATE OR REPLACE TEMPORARY TABLE HOURS (YYYYMMDDHH VARCHAR) AS
SELECT TO_VARCHAR(DATEADD(HOUR, SEQ4(), TO_TIMESTAMP((SELECT MIN(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG), 'YYYYMMDDHH')), 'YYYYMMDDHH')
FROM TABLE(GENERATOR(ROWCOUNT => 10000)) V
ORDER BY 1;
SELECT
H.YYYYMMDDHH,
IFF(A.YYYYMMDDHH IS NOT NULL, 1, 0) HOUR_EXISTS
FROM HOURS H
LEFT OUTER JOIN YYYYMMDDHH_DATA_AGG A
ON H.YYYYMMDDHH = A.YYYYMMDDHH
WHERE H.YYYYMMDDHH <= (SELECT MAX(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG);
You can then fiddle with the generator count to make sure you have enough hours.
You can generate a table with every hour of the month and LEFT OUTER JOIN your aggregation to it:
WITH EVERY_HOUR AS (
SELECT TO_CHAR(DATEADD(HOUR, HH, TO_DATE(YYYYMM::TEXT, 'YYYYMM')),
'YYYYMMDDHH')::NUMBER YYYYMMDDHH
FROM (SELECT DISTINCT YYYYMM FROM YYYYMMDDHH_DATA_AGG) t
CROSS JOIN (
SELECT ROW_NUMBER() OVER (ORDER BY NULL) - 1 HH
FROM TABLE(GENERATOR(ROWCOUNT => 745))
) h
QUALIFY YYYYMMDDHH < (YYYYMM + 1) * 10000
)
SELECT h.YYYYMMDDHH, NVL2(a.YYYYMM, 1, 0) HOUR_EXISTS
FROM EVERY_HOUR h
LEFT OUTER JOIN YYYYMMDDHH_DATA_AGG a ON a.YYYYMMDDHH = h.YYYYMMDDHH
Here's something that might help get you started. I'm guessing you want to have 'synthetic' [YYYYMMDD] values? Otherwise, if the value aren't there, then they shouldn't appear in the list
DROP TABLE IF EXISTS #_hours
DROP TABLE IF EXISTS #_temp
--Populate a table with hours ranging from 00 to 23
CREATE TABLE #_hours ([hour_value] VARCHAR(2))
DECLARE #_i INT = 0
WHILE (#_i < 24)
BEGIN
INSERT INTO #_hours
SELECT FORMAT(#_i, '0#')
SET #_i += 1
END
-- Replicate OP's sample data set
CREATE TABLE #_temp (
[YYYYMM] INTEGER
, [YYYYMMDD] INTEGER
, [YYYYMMDDHH] INTEGER
, [DATA_AGG] INTEGER
)
INSERT INTO #_temp
VALUES
(201911, 20191101, 2019110100, 100),
(201911, 20191101, 2019110101, 125),
(201911, 20191101, 2019110103, 135),
(201911, 20191101, 2019110105, 95),
(201911, 20191130, 2019113020, 100),
(201911, 20191130, 2019113021, 110),
(201911, 20191130, 2019113022, 125),
(201911, 20191130, 2019113023, 135)
SELECT X.YYYYMM, X.YYYYMMDD, X.YYYYMMDDHH
-- Case: If 'target_hours' doesn't exist, then 0, else 1
, CASE WHEN X.target_hours IS NULL THEN '0' ELSE '1' END AS [HOUR_EXISTS]
FROM (
-- Select right 2 characters from converted [YYYYMMDDHH] to act as 'target values'
SELECT T.*
, RIGHT(CAST(T.[YYYYMMDDHH] AS VARCHAR(10)), 2) AS [target_hours]
FROM #_temp AS T
) AS X
-- Right join to keep all of our hours and only the target hours that match.
RIGHT JOIN #_hours AS H ON H.hour_value = X.target_hours
Sample output:
YYYYMM YYYYMMDD YYYYMMDDHH HOUR_EXISTS
201911 20191101 2019110100 1
201911 20191101 2019110101 1
NULL NULL NULL 0
201911 20191101 2019110103 1
NULL NULL NULL 0
201911 20191101 2019110105 1
NULL NULL NULL 0
With (almost) standard sql, you can do a cross join of the distinct values of YYYYMMDD to a list of all possible hours and then left join to the table:
select concat(d.YYYYMMDD, h.hour) as YYYYMMDDHH,
case when t.YYYYMMDDHH is null then 0 else 1 end as hour_exists
from (select distinct YYYYMMDD from tablename) as d
cross join (
select '00' as hour union all select '01' union all
select '02' union all select '03' union all
select '04' union all select '05' union all
select '06' union all select '07' union all
select '08' union all select '09' union all
select '10' union all select '11' union all
select '12' union all select '13' union all
select '14' union all select '15' union all
select '16' union all select '17' union all
select '18' union all select '19' union all
select '20' union all select '21' union all
select '22' union all select '23'
) as h
left join tablename as t
on concat(d.YYYYMMDD, h.hour) = t.YYYYMMDDHH
order by concat(d.YYYYMMDD, h.hour)
Maybe in Snowflake you can construct the list of hours with a sequence much easier instead of all those UNION ALLs.
This version accounts for the full range of days, across months and years. It's a simple cross join of the set of possible days with the set of possible hours of the day -- left joined to actual dates.
set first = (select min(yyyymmdd::number) from YYYYMMDDHH_DATA_AGG);
set last = (select max(yyyymmdd::number) from YYYYMMDDHH_DATA_AGG);
with
hours as (select row_number() over (order by null) - 1 h from table(generator(rowcount=>24))),
days as (
select
row_number() over (order by null) - 1 as n,
to_date($first::text, 'YYYYMMDD')::date + n as d,
to_char(d, 'YYYYMMDD') as yyyymmdd
from table(generator(rowcount=>($last-$first+1)))
)
select days.yyyymmdd || lpad(hours.h,2,0) as YYYYMMDDHH, nvl2(t.yyyymmddhh,1,0) as HOUR_EXISTS
from days cross join hours
left join YYYYMMDDHH_DATA_AGG t on t.yyyymmddhh = days.yyyymmdd || lpad(hours.h,2,0)
order by 1
;
$first and $last can be packed in as sub-queries if you prefer.

SQL need to apply a variable rate if values occur over consecutive time periods

I have a table that looks like this:
Within the query I need to find the Maximum Import value that occurs over two time periods (rows) where the value is greater that a defined Threshold and apply a rate. If it happens over more than two time periods a different rate will be used
Threshold = 1000
Rate 1 (2 consecutive) = 100
Rate 2 (> 2 consecutive) = 200
Id DateTime Import Export Total
1 2016-01-13 00:00 1000 500 1500
2 2016-01-13 00:15 2500 100 3000
3 2016-01-13 00:30 1900 200 2100
4 2016-01-13 01:00 900 100 1200
Ids 2 and 3 are > Threshold so the query should return the MIN value of those (2500,1900) = 1900 minus the Threshold (1000) = 900. Apply the rate Rate1 * 900 = 9000
If we change the value of Id 4 to 1200 then the MIN value would be 1200. Less the threshold = 200. 200 * Rate2 = 4000
Any help would be greatly appreciated!
Update after feedback. My challenge appears to be that I'm not grabbing the 2nd highest value. Here is an example of the dataset:
Dataset example
I added another var to shrink the list down to test gap and island portion. Here is a smaller subset:
Subset
Here is the code:
WITH CTE AS (
SELECT LogTable.[LocalTimestamp] as thetime,LogTable.[SystemImport] as import, LogTable.[Id] - ROW_NUMBER() OVER (ORDER BY LogTable.[Id]) AS grp
FROM {System_KWLogRaw} LogTable
WHERE LogTable.[SystemImport] between #DemandThreshold and #In1 and
DATEPART(year,#inDate) = DATEPART(year, LogTable.[LocalTimestamp]) and
DATEPART(month,#inDate) = DATEPART(month, LogTable.[LocalTimestamp]) and
DATEPART(day,#inDate) = DATEPART(day, LogTable.[LocalTimestamp])
),
counted AS (
SELECT *, COUNT(*) OVER (PARTITION BY grp) AS cnt
FROM CTE
)
SELECT MAX(counted.import) as again1
FROM counted
WHERE cnt > 3 and counted.import < (SELECT MAX(counted.import) FROM counted)
This returns 3555.53 instead of 3543.2 which is the 2nd highest value
This will do what you're asking for:
with x as (
select
t1.Id,
t1.DateTime,
t1.Import,
t1.Export,
t1.Total,
count(t2.Import) over (partition by 1) as [QualifyingImports],
min(t2.Import) over (partition by 1) as [MinQualifyingImport]
from
myTable t1
left join myTable t2 on t2.Import > 1000 and t2.Id = t1.Id
where
t1.DateTime >= '2016-01-13'
and t1.DateTime < dateadd(d, 1,'2016-01-13')
)
select
x.Id,
x.DateTime,
x.Import,
x.Export,
x.Total,
case when x.[QualifyingImports] > 2 then (x.MinQualifyingImport - 1000) * 200 else (x.MinQualifyingImport - 1000) * 100 end as [Rate]
from x
I've put together a Fiddle so you can play around with different values for Id # 4.
I really wanted to make the values of things like threshold and period into #variables, but it doesn't appear to be supported inside CTEs so I just had to hard code them.
EDIT
Turns out the CTE is overkill, you can shrink it down to this and use #variables, yay!
declare #period smalldatetime = '2016-01-13'
declare #threshold float = 1000
declare #rate1 float = 100
declare #rate2 float = 200
select
t1.Id,
t1.DateTime,
t1.Import,
t1.Export,
t1.Total,
case
when count(t2.Import) over (partition by 1) > 2 then (min(t2.Import) over (partition by 1) - #threshold) * #rate2
else (min(t2.Import) over (partition by 1) - #threshold) * #rate1
end as [Rate]
from
myTable t1
left join myTable t2 on t2.Import > #threshold and t2.Id = t1.Id
where
t1.DateTime >= #period
and t1.DateTime < dateadd(d, 1, #period)
New Fiddle