Loop over query in bigquery - sql

I'm trying to loop over a query result and combine the result.
I want to loop over the variable called rolling date, which gives out an array of dates with 30 day difference.
DECLARE rollingdate ARRAY<DATE>;
SET rollingdate = ( GENERATE_DATE_ARRAY(CURRENT_DATE(), DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY), INTERVAL -30 DAY) );
My table is partitioned by DATE, and I'd like to loop over two consecutive dates from the rolling date and union all the results
select *, rollingdate[0]
from table
where date > rollingdate[1] and date < rollingdate[0]
union all
select *, rollingdate[1]
from table
where date > rollingdate[2] and date < rollingdate[1]
How do I achieve this in bigquery? i tried with bigquery scripts, but they don't take subqueries..

You can try using EXECUTE IMMEDIATE:
DECLARE i INT64 DEFAULT 1;
DECLARE dsql STRING DEFAULT '';
DECLARE rollingdate ARRAY<DATE>;
SET rollingdate = ( GENERATE_DATE_ARRAY(CURRENT_DATE(), DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY), INTERVAL -30 DAY) );
WHILE i <= 2
DO
SET dsql = dsql || " select *, '" || rollingdate[ORDINAL(i)] || "' from table where date > '" || rollingdate[ORDINAL(i+1)] || "' and date < '" || rollingdate[ORDINAL(i)] || "' union all";
SET i = i + 1;
END WHILE;
SET dsql = SUBSTR(dsql, 1, LENGTH(dsql) - LENGTH(' union all'));
EXECUTE IMMEDIATE dsql;

An approach like this should be ok:
with
dates as (
select * from unnest(generate_date_array(current_date, date_sub(current_date(), interval 365 DAY), interval -30 day)) as date
),
date_start_end as (
select lag(date,1) over(order by date asc) as begin_date, date as end_date
from dates
)
select table.*, end_date
from table
inner join date_start_end where date between begin_date and end_date
You might need to make adjustments depending if your date ranges are mean to be inclusive or exclusive.
As mentioned in my comment, any date in table will only be in 1 of your rollingdate intervals. SQL is much more performant when you can do operations on a set and avoid loops.

Related

How to generate with scripting INTERVAL 1 <day|week|month>?

We are trying to find a syntax to generate the DAY|WEEK|MONTH options from the 3rd param of date functions.
DECLARE var_date_option STRING DEFAULT 'DAY';
select GENERATE_DATE_ARRAY('2019-01-01','2020-01-01',INTERVAL 1 WEEK)
dynamic param here -^^^
Do you know what's the proper syntax to use in DECLARE and that should be converted to valid SQL.
Below is for BigQuery Standard SQL
Those DAY|WEEK|MONTH are LITERALs and cannot be parametrized
And, as you know - dynamic SQL is also not available yet
So, unfortunately below is the only solution I can think of as of today
#standardSQL
DECLARE var_date_option STRING DEFAULT 'DAY';
DECLARE start_date, end_date DATE;
DECLARE date_array ARRAY<DATE>;
SET (start_date, end_date, var_date_option) = ('2019-01-01','2020-01-01', 'MONTH');
SET date_array = (
SELECT CASE var_date_option
WHEN 'DAY' THEN GENERATE_DATE_ARRAY(start_date, end_date, INTERVAL 1 DAY)
WHEN 'WEEK' THEN GENERATE_DATE_ARRAY(start_date, end_date, INTERVAL 1 WEEK)
WHEN 'MONTH' THEN GENERATE_DATE_ARRAY(start_date, end_date, INTERVAL 1 MONTH)
END
);
SELECT * FROM UNNEST(date_array) AS date_dt;

How to get the next business date in postgres?

Business days are Monday through Friday.
Given I have a datetime field scheduled_for, how can I find the next business date and return that in a column alias?
I've tried something from another SO answer but it doesn't work as intended.
EXTRACT(ISODOW FROM v.scheduled_for)::integer) % 7 as next_business_day,
Error:
Query 1 ERROR: ERROR: syntax error at or near ")"
LINE 3: EXTRACT(ISODOW FROM v.scheduled_for)::integer % 7) as next...
^
Edit:
Thanks for the suggestions, I've attempted this:
SELECT
v.id AS visit_id,
(IF extract(''dow'' from v.scheduled_for) = 0 THEN
return v.scheduled_for + 1::integer;
ELSIF extract(''dow'' from v.scheduled_for) = 6 THEN
return v.scheduled_for - 1::integer;
ELSE
return v.scheduled_for;
) as next_business_day,
'' as invoice_ref_code,
The error I get is:
Query 1 ERROR: ERROR: syntax error at or near ")"
LINE 1: ) as next_business_day,
^
To generalize you need to create a function to calculate the next business day from a given date.
create or replace function utl_next_business_day(date_in date default current_date)
returns date
language sql immutable leakproof strict
as $$
with cd as (select extract(isodow from date_in)::integer d)
select case when d between 1 and 4
then date_in + 1
else date_in + 1 + (7-d)
end
from cd;
$$;
--- any single date
select current_date, utl_next_business_day();
-- over time span (short)
select gdate::date for_date, utl_next_business_day(gdate::date) next_business_day
from generate_series( current_date, current_date + 14, interval '1 day') gdate;
-- around year end over a time span
with test_date (dt) as
( values (date '2019-12-31')
, (date '2020-12-31'), (date '2021-12-31'),(date '2022-12-31')
, (date '2021-01-01'), (date '2022-01-01'),(date '2023-01-01')
)
select dt, utl_next_business_day(dt) from test_date
order by dt;
Alternatively with the calendar table suggestion from #Eric we get.
-- create and populate work table
create table bus_day_calendar ( bus_day date);
insert into bus_day_calendar (bus_day)
select utl_next_business_day(gdate::date)
from generate_series( date '2018-12-31', date '2023-01-01', interval '1 day') gdate
where extract(isodow from gdate)::integer not in (6,7) ;
--- Function to return next business day
create or replace function utl_next_cal_business_day(date_in date default current_date)
returns date
language sql stable leakproof strict
as $$
select min(bus_day)
from bus_day_calendar
where bus_day > date_in;
$$;
--- any single date
select current_date, utl_next_cal_business_day();
-- over time span (short)
select gdate::date for_date, utl_next_cal_business_day(gdate::date) next_business_day
from generate_series( current_date, current_date + 14, interval '1 day') gdate;
-- around year end over a time span
with test_date (dt) as
( values (date '2019-12-31')
, (date '2020-12-31'), (date '2021-12-31'),(date '2022-12-31')
, (date '2021-01-01'), (date '2022-01-01'),(date '2023-01-01')
)
select dt, utl_next_cal_business_day(dt) from test_date
order by dt;
Neither of these as they currently stand handle a non-business day that falls on Mon-Fri, but both can be modified to do so. Since the calendar table requires only deleting roes I think that becomes the superior method if this is necessary.

Next workday that not holiday in sql

I have Holiday table that contains all holidays in it.
How can I get next work day from given date in one SQL query and not use FOR Loop like below?
DECLARE
givendate DATE := TO_DATE('2019-07-01', 'YYYY-MM-DD');
out NUMBER := 1;
BEGIN
WHILE out != 0 LOOP
SELECT
COUNT(*)
INTO out
FROM
holiday h
WHERE
trunc(h.holiday_date) = trunc(givendate);
IF out != 0 THEN
givendate := givendate + 1;
END IF;
END LOOP;
DBMS_OUTPUT.put_line(givendate);
END;
SELECT MIN(hd)
FROM (
select TO_DATE('2019-07-01', 'YYYY-MM-DD') as hd from dual
union
select trunc(h.holiday_date + 1)
FROM
holiday h
WHERE
trunc(h.holiday_date) >= TO_DATE('2019-07-01', 'YYYY-MM-DD')
) t left join holiday h2 on trunc(h2.holiday_date) = t.hd
WHERE h2.holiday_date IS NULL;
You may want to use a hierarchical query:
select nvl(max(holiday_date)+1, trunc(sysdate))
from holiday
connect by holiday_date = prior holiday_date + 1
start with holiday_date = trunc(sysdate)
It works like that:
if sysdate is a holiday, it builds a chain of continuous holidays and then returns max holiday_date + 1 as thenext business date
if sysdate is not a holiday, it returns sysdate (nvl() does that when max returns null)
So just replace trunc(sysdate) with your given_date and it should work just as your piece of code.

How to get week start date from week number in postgresql?

I have all week numbers from 20161 to 201640. I want to know what is the start date and end date of week 31.
How can I write query in postgresql to get that?
To get the start date, simply convert the week number to a date using to_date()
If you are using ISO week numbers use:
select to_date('201643', 'iyyyiw');
Otherwise use:
select to_date('201643', 'yyyyww');
To get the end date, just add 7 to resulting date: to_date('201643', 'iyyyiw') + 7
SELECT date '2016-01-01' + interval '1 day' * ((7 - EXTRACT(DOW FROM DATE '2016-01-01')) + 29*7) AS start_date,
date '2016-01-01' + interval '1 day' * ((7 - EXTRACT(DOW FROM DATE '2016-01-01')) + 29*7 + 6) AS end_date,
Dealing with weeks may be tricky if you allow the first and last weeks of the year to be less than 7 days long.
This is my two cents using ISO weeks (first day is monday, with dow = 1). They return the first and last date of a week out of the year and the week index.
Notice a year has 365/7 = 52,142857142857143 weeks, or 366/7 = 52,285714285714286 weeks, depending on its length. So, weeks always range in [0, 52]. I had to use an if for week 52 as it cannot be calculated as 7-d.
create or replace function first_date_of_isoweek(y int, w int)
returns date
language sql as $$
select to_date(concat(y, to_char(w, 'fm00')), 'iyyyiw');
$$;
create or replace function last_date_of_isoweek(y int, w int)
returns date
language plpgsql as $$
declare
d1 date;
d2 date;
d smallint;
begin
-- Calculate first date of an isoweek
d1 = to_date(concat(y, to_char(w, 'fm00')), 'iyyyiw');
-- Year's last week needs an speacial treatment
if w = 52 then
return to_date(concat(y, '1231'), 'yyyyMMdd');
else
-- Calculate the dow of the first date
d = extract(isodow from d1);
-- Calculate the last date out of the first date
return d1 + interval '1 day' * (7-d);
end if;
end $$;
Test 2021 as follows:
select first_date_of_isoweek(2021, 0);
first_date_of_isoweek|
---------------------+
2021-01-01|
select last_date_of_isoweek(2021, 0);
last_date_of_isoweek|
--------------------+
2021-01-03|
select first_date_of_isoweek(2021, 1);
first_date_of_isoweek|
---------------------+
2021-01-04|
select last_date_of_isoweek(2021, 1);
last_date_of_isoweek|
--------------------+
2021-01-10|
select first_date_of_isoweek(2021, 52);
first_date_of_isoweek|
---------------------+
2021-12-27|
select last_date_of_isoweek(2021, 52);
last_date_of_isoweek|
--------------------+
2021-12-31|

Possible recursive CTE query using date ranges

Not sure how to even phrase the title on this one!
I have the following data:
IF OBJECT_ID ('tempdb..#data') IS NOT NULL DROP TABLE #data
CREATE TABLE #data
(
id UNIQUEIDENTIFIER
,reference NVARCHAR(30)
,start_date DATETIME
,end_date DATETIME
,lapse_date DATETIME
,value_received DECIMAL(18,3)
)
INSERT INTO #data VALUES ('BE91B9C1-C02F-46F7-9B63-4D0B25D9BA2F','168780','2006-05-01 00:00:00.000',NULL,'2011-09-27 00:00:00.000',537.42)
INSERT INTO #data VALUES ('B538F123-C839-447A-B300-5D16EACF4560','320858','2011-08-08 00:00:00.000',NULL,NULL,0)
INSERT INTO #data VALUES ('1922465D-2A55-434D-BAAA-8E15D681CF12','306597','2011-04-08 00:00:00.000','2011-06-22 13:14:40.083','2011-08-07 00:00:00.000',12)
INSERT INTO #data VALUES ('7DF8FBCC-B490-4892-BDC5-8FD2D73B0323','321461','2011-07-01 00:00:00.000',NULL,'2011-09-25 00:00:00.000',8.44)
INSERT INTO #data VALUES ('1EC2E754-F325-4313-BDFC-9010E255F6FE','74215','2000-10-31 00:00:00.000',NULL,'2011-08-30 00:00:00.000',258)
INSERT INTO #data VALUES ('9E59B09C-0198-48AC-8EEC-A0D76CEA9385','169194','2008-06-25 00:00:00.000',NULL,'2011-09-25 00:00:00.000',1766.4)
INSERT INTO #data VALUES ('97CF6C0F-324A-49A6-B9D8-AC848A1F821A','288039','2010-09-01 00:00:00.000','2011-07-29 00:00:00.000','2011-08-21 00:00:00.000',55)
INSERT INTO #data VALUES ('97CF6C0F-324A-49A6-B9D8-AC848A1F821A','324423','2011-08-01 00:00:00.000',NULL,'2011-09-25 00:00:00.000',5)
INSERT INTO #data VALUES ('D5E5197A-E8E1-468C-9991-C8712224C2BF','323395','2011-08-25 00:00:00.000',NULL,NULL,0)
INSERT INTO #data VALUES ('0EC4976C-16B9-4C99-BD07-D0CBDF014D32','323741','2011-08-25 00:00:00.000',NULL,NULL,0)
And I want to be able to group all references into a category of 'active', 'lapsed' or 'new' based upon the following criteria:
Active has a start date that is less than the last date of the reference month, a lapse date after the last day of the prior month and a value_received > 0;
New has a start date which falls within the reference month;
Lapsed has a lapse date which falls within the reference month.
And to then apply these definitions for each reference for a rolling 13 months (so from Now going back as far as July 2010) so that for each month I can see how many references fall into each group.
I am able to use the following to define this for the current month:
select
id
,reference
,start_date
,end_date
,lapse_date
,value_received
,CASE WHEN start_date < DATEADD(month,DATEPART(Month,GETDATE()) + 1,DATEADD(year,DATEPART(year,GETDATE())-1900,0)) --next month start date
AND lapse_date > DATEADD(ms,-3,DATEADD(mm,DATEDIFF(mm,0,GETDATE())+1,0)) --last day of current month
AND value_received > 0
THEN 'Active'
WHEN lapse_date < DATEADD(month,DATEPART(Month,GETDATE()) + 1,DATEADD(year,DATEPART(year,GETDATE())-1900,0)) --next month start
AND lapse_date > DATEADD(ms,-3,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) --last day of prior month
THEN 'lapse'
WHEN start_date < DATEADD(month,DATEPART(Month,GETDATE()) + 1,DATEADD(year,DATEPART(year,GETDATE())-1900,0)) --next month start date
AND start_date > DATEADD(ms,-3,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) --last day of prior month
THEN 'New'
ELSE 'Not applicable'
END AS [type]
from #data
But I can't see a nice / efficient way of doing this (other than to repeat this query 13 times and union the results, which I know is just awful)
Would this be a case for using the current month as an anchor and using recursion (if so, some pointers would be most appreciated)?
Any help most appreciated as always :)
* Edited to include actual solution *
In case it's of interest to anyone, this is the final query I used:
;WITH Months as
(
SELECT DATEADD(ms,-3,DATEADD(mm,DATEDIFF(mm,0,GETDATE())+1,0)) as month_end
,0 AS level
UNION ALL
SELECT DATEADD(month, -1, month_end)as month_end
,level + 1 FROM Months
WHERE level < 13
)
SELECT
DATENAME(Month,month_end) + ' ' + DATENAME(YEAR,month_end) as date
,SUM(CASE WHEN start_date <= month_end
AND Month(start_date) <> MONTH(Month_end)
AND lapse_date > Month_end
THEN 1 ELSE 0 END) AS Active
,SUM(CASE WHEN start_date <= Month_end
AND DATENAME(MONTH,start_date) + ' ' + DATENAME(YEAR,start_date) =
DATENAME(MONTH,month_end) + ' ' + DATENAME(YEAR,month_end)
THEN 1 ELSE 0 END) AS New
,SUM(CASE WHEN lapse_date <= Month_end
AND Month(lapse_date) = MONTH(Month_end)
THEN 1 ELSE 0 END) AS lapse
FROM #data
CROSS JOIN Months
WHERE id IS NOT NULL
AND start_date IS NOT NULL
GROUP BY DATENAME(Month,month_end) + ' ' + DATENAME(YEAR,month_end)
ORDER by MAX(level) ASC
You don't need a "real" recursive CTE here. You can use one for the month references though:
;WITH Months
as
(
SELECT DATEADD(day, -DATEPART(day, GETDATE())+1, GETDATE()) as 'MonthStart'
UNION ALL
SELECT DATEADD(month, -1, MonthStart) as 'MonthStart'
FROM Months
)
Then you can JOIN to SELECT TOP 13 * FROM Months in your above query.
I'm not going to try to parse all your CASE statements, but essentially you can use a GROUP BY on the date and the MonthStart fields, like:
GROUP BY Datepart(year, monthstart), Datepart(month, monthstart)
and aggregate by month. It will probably be easiest to have all your options (active, lapsed, etc) as columns and calculate each with a SUM(CASE WHEN ... THEN 1 ELSE 0 END) as it will be easier with a GROUP BY.
You can cross join your request with a recursive CTE, this is a good idea.
WITH thirteenMonthBack(myDate, level) as
(
SELECT GETDATE() as myDate, 0 as level
UNION ALL
SELECT DATEADD(month, -1, myDate), level + 1
FROM thirteenMonthBack
WHERE level < 13
)
SELECT xxx
FROM youQuery
CROSS JOIN thirteenMonthBack
DECLARE #date DATE = GETDATE()
;WITH MonthsCTE AS (
SELECT 1 [Month], DATEADD(DAY, -DATEPART(DAY, #date)+1, #date) as 'MonthStart'
UNION ALL
SELECT [Month] + 1, DATEADD(MONTH, 1, MonthStart)
FROM MonthsCTE
WHERE [Month] < 12 )
SELECT * FROM MonthsCTE
/*
| The below SELECT statements show TWO examples of how this can be useful.
| Example 1 SELECT: Simple example of showing how to generate 12 days ahead based on date entered
| Example 2 SELECT: This example shows how to generate 12 months ahead based on date entered
| This example tries to mimic as best it can Oracles use of LEVEL and CONNECT BY LEVEL
*/
WITH dynamicRecords(myDate, level) AS
(
SELECT GETDATE() AS myDate, 1 AS level
UNION ALL
SELECT myDate + 1, level + 1 /* 12 Days - WHERE level < 12 */
--SELECT DATEADD(month, 1, myDate), level + 1 /* 12 Months - WHERE level < 12 */
FROM dynamicRecords
WHERE level < 12
)
SELECT *
FROM dynamicRecords
Option (MaxRecursion 0) /* The default MaxRecursion setting is 100. Generating more than 100 dates using this method will require the Option (MaxRecursion N) segment of the query, where N is the desired MaxRecursion setting. Setting this to 0 will remove the MaxRecursion limitation altogether */
Screenshots:
/* Original T-SQL Solution I found here: https://riptutorial.com/sql-server/example/11098/generating-date-range-with-recursive-cte
| The below provides an example of how to generate the days within a date range of the dates entered.
| The below SELECT statements show TWO examples of how this can be useful.
| Example 1 SELECT: Uses static dates to display ALL of the dates within the range for the dates entered
| Example 2 SELECT: This example uses GETDATE() and then obtains the FOM day and the EOM day of the dates
| beging entered to then show all days in the month of the dates entered.
*/
With DateCte AS
(
SELECT CAST('2021-04-21' AS DATE) AS BeginDate, CAST('2022-05-02' AS DATE) AS EndDate
--SELECT CAST( GETDATE() - Day(GETDATE()) + 1 AS DATE ) AS BeginDate, CAST(EOMONTH(GETDATE()) AS DATE) AS EndDate
UNION ALL
SELECT DateAdd(Day, 1, BeginDate), EndDate
FROM DateCte
WHERE BeginDate < EndDate
)
Select BeginDate AS Dates
From DateCte
Option (MaxRecursion 0) /* The default MaxRecursion setting is 100. Generating more than 100 dates using this method will require the Option (MaxRecursion N) segment of the query, where N is the desired MaxRecursion setting. Setting this to 0 will remove the MaxRecursion limitation altogether */
;
Screenshot: