How to only add business days to a date in BigQuery?

How to only add business days to a date in BigQuery? - sql

For a given date I want to add business days to it. For example, if today is 10-17-2022 and I have a field that is 8 business days. How can I add 8 business days to 10-17-2022 which would be 10-27-2022.
Current Data:
BUSINESS_DAYS
Date
8
10-11-2022
10
10-13-2022
9
10-12-2022
Desired Output Data
BUSINESS_DAYS
Date
FINAL_DATE
8
10-11-2022
10-21-2022
10
10-13-2022
10-27-2022
9
10-12-2022
10-25-2022
As you can see we are skipping all weekends. We can ignore holidays for now.
Update:
Using
The suggest logic I got the following answer. I changed the names up.
I used:
DATE_ADD(A.PO_SENT_DATE , INTERVAL
(CAST(PREDICTED_LEAD_TIME AS INT64)
+ (date_diff(A.PO_SENT_DATE , DATE_ADD(A.PO_SENT_DATE , INTERVAL CAST(PREDICTED_LEAD_TIME AS INT64) DAY), week)* 2))
DAY) as FINAL_DATE
Update2: Using the following:
DATE_ADD(`Date`, INTERVAL
(BUSINESS_DAYS
+ (date_diff( DATE_ADD(`Date`, INTERVAL BUSINESS_DAYS DAY),`Date`, week) * 2))
DAY) as FINAL_DATE
There are instances where the result falls on the weekend. See screenshot below. 10-22-2022 falls on a Saturday.

Consider below simple solution
select *,
( select day
from unnest(generate_date_array(date, date + (div(business_days, 5) + 1) * 7)) day
where not extract(dayofweek from day) in (1, 7)
qualify row_number() over(order by day) = business_days + 1
) final_date
from your_table
if applied to sample data in your question
with your_table as (
select 8 business_days, date '2022-10-11' date union all
select 10, '2022-10-13' union all
select 9, '2022-10-12'
)
output is

The solution from #mikhailberlyant is really really cool, and very innovative. However if you have a lot of rows in your table and value of "business_days" column varies a lot, query will be less efficient especially for larger "business_days" values as implementation needs to generate entire range of array for each row, unnest it, and then do manipulation in that array.
This might help you do calculation without any array business:
select day, add_days as add_business_days,
DATE_ADD(day, INTERVAL cast(add_days +2*ceil((add_days -(5-(
(case when EXTRACT(DAYOFWEEK FROM day) = 7 then 1 else EXTRACT(DAYOFWEEK FROM day) end)
-1)))/5)+(case when EXTRACT(DAYOFWEEK FROM day) = 7 then 1 else 0 end) as int64) DAY) as final_day
from
(select parse_date('%Y-%m-%d', "2022-10-11") as day, 8 as add_days)

Related

How can I aggregate time series data in postgres from a specific timestamp & fixed intervals (e.g. 1 hour , 1 day, 7 day ) without using date_trunc()?

I have a postgres table "Generation" with half-hourly timestamps spanning 2009 - present with energy data:
I need to aggregate (average) the data across different intervals from specific timepoints, for example data from 2021-01-07T00:00:00.000Z for one year at 7 day intervals, or 3 months at 1 day interval or 7 days at 1h interval etc. date_trunc() partly solves this, but rounds the weeks to the nearest monday e.g.
SELECT date_trunc('week', "DATETIME") AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2021-01-07T00:00:00.000Z' AND "DATETIME" <= '2022-01-06T23:59:59.999Z'
GROUP BY week
ORDER BY week ASC
;
returns the first time series interval as 2021-01-04 with an incorrect count:
week count gas coal
"2021-01-04 00:00:00" 192 18291.34375 2321.4427083333335
"2021-01-11 00:00:00" 336 14477.407738095239 2027.547619047619
"2021-01-18 00:00:00" 336 13947.044642857143 1152.047619047619
****EDIT: the following will return the correct weekly intervals by checking the start date relative to the nearest monday / start of week, and adjusts the results accordingly:
WITH vars1 AS (
SELECT '2021-01-07T00:00:00.000Z'::timestamp as start_time,
'2021-01-28T00:00:00.000Z'::timestamp as end_time
),
vars2 AS (
SELECT
((select start_time from vars1)::date - (date_trunc('week', (select start_time from vars1)::timestamp))::date) as diff
)
SELECT date_trunc('week', "DATETIME" - ((select diff from vars2) || ' day')::interval)::date + ((select diff from vars2) || ' day')::interval AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (select start_time from vars1) AND "DATETIME" < (select end_time from vars1)
GROUP BY week
ORDER BY week ASC
returns..
week count gas coal
"2021-01-07 00:00:00" 336 17242.752976190477 2293.8541666666665
"2021-01-14 00:00:00" 336 13481.497023809523 1483.0565476190477
"2021-01-21 00:00:00" 336 15278.854166666666 1592.7916666666667
And then for any daily or hourly (swap out day with hour) intervals you can use the following:
SELECT date_trunc('day', "DATETIME") AS day,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2022-01-07T00:00:00.000Z' AND "DATETIME" < '2022-01-10T23:59:59.999Z'
GROUP BY day
ORDER BY day ASC
;

In order to select the complete week, you should change the WHERe-clause to something like:
WHERE "DATETIME" >= date_trunc('week','2021-01-07T00:00:00.000Z'::timestamp)
AND "DATETIME" < (date_trunc('week','2022-01-06T23:59:59.999Z'::timestamp) + interval '7' day)::date
This will effectively get the records from January 4,2021 until (and including ) January 9,2022
Note: I changed <= to < to stop the end-date being included!
EDIT:
when you want your weeks to start on January 7, you can always group by:
(date_part('day',(d-'2021-01-07'))::int-(date_part('day',(d-'2021-01-07'))::int % 7))/7
(where d is the column containing the datetime-value.)
see: dbfiddle
EDIT:
This will get the list from a given date, and a specified interval.
see DBFIFFLE
WITH vars AS (
SELECT
'2021-01-07T00:00:00.000Z'::timestamp AS qstart,
'2022-01-06T23:59:59.999Z'::timestamp AS qend,
7 as qint,
INTERVAL '1 DAY' as qinterval
)
SELECT
(select date(qstart) FROM vars) + (SELECT qinterval from vars) * ((date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int-(date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int % (SELECT qint FROM vars)))::int) AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (SELECT qstart FROM vars) AND "DATETIME" <= (SELECT qend FROM vars)
GROUP BY week
ORDER BY week
;
I added the WITH vars to do the variable stuff on top and no need to mess with the rest of the query. (Idea borrowed here)
I only tested with qint=7,qinterval='1 DAY' and qint=14,qinterval='1 DAY' (but others values should work too...)

Using the function EXTRACT you may calculate the difference in days, weeks and hours between your timestamp ts and the start_date as follows
Difference in Days
extract (day from ts - start_date)
Difference in Weeks
Is the difference in day divided by 7 and truncated
trunc(extract (day from ts - start_date)/7)
Difference in Hours
Is the difference in day times 24 + the difference in hours of the day
extract (day from ts - start_date)*24 + extract (hour from ts - start_date)
The difference can be used in GROUP BY directly. E.g. for week grouping the first group is difference 0, i.e. same week, the next group with difference 1, the next week, etc.
Sample Example
I'm using a CTE for the start date to avoid multpile copies of the paramater
with start_time as
(select DATE'2021-01-07' as start_ts),
prep as (
select
ts,
extract (day from ts - (select start_ts from start_time)) day_diff,
trunc(extract (day from ts - (select start_ts from start_time))/7) week_diff,
extract (day from ts - (select start_ts from start_time)) *24 + extract (hour from ts - (select start_ts from start_time)) hour_diff,
value
from test_table
where ts >= (select start_ts from start_time)
)
select week_diff, avg(value)
from prep
group by week_diff order by 1

Customizing the range of a week with date_trunc

I've been trying for hours now to write a date_trunc statement to be used in a group by where my week starts on a Friday and ends the following Thursday.
So something like
SELECT
DATE_TRUNC(...) sales_week,
SUM(sales) sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
Which would return the results for the last complete week (by those standards) as 09-13-2019.

You can subtract 4 days and then add 4 days:
SELECT DATE_TRUNC(<whatever> - INTERVAL '4 DAY') + INTERVAL '4 DAY' as sales_week,
SUM(sales) as sales
FROM table
GROUP BY 1
ORDER BY 1 DESC

The expression
select current_date - cast(cast(7 - (5 - extract(dow from current_date)) as text) || ' days' as interval);
should always give you the previous Friday's date.

if by any chance you might have gaps in data (maybe more granular breakdowns vs just per week), you can generate a set of custom weeks and left join to that:
drop table if exists sales_weeks;
create table sales_weeks as
with
dates as (
select generate_series('2019-01-01'::date,current_date,interval '1 day')::date as date
)
,week_ids as (
select
date
,sum(case when extract('dow' from date)=5 then 1 else 0 end) over (order by date) as week_id
from dates
)
select
week_id
,min(date) as week_start_date
,max(date) as week_end_date
from week_ids
group by 1
order by 1
;

How to get DATE_DIFF in decimal

My friends are migrating from Netezza to BigQuery. In Netezza "month_between" function gives them back a decimal result. Meanwhile in BQ date_diff is always an integer. Is there a way to get fractional output in BQ?
(their logic)

You could write an UDF:
CREATE TEMP FUNCTION months_between_impl(date_1 DATE, date_2 DATE) AS (
CASE
WHEN date_1 = date_2
THEN 0
WHEN EXTRACT(DAY FROM DATE_ADD(date_1, INTERVAL 1 DAY)) = 1
AND EXTRACT(DAY FROM DATE_ADD(date_2, INTERVAL 1 DAY)) = 1
THEN DATE_DIFF(date_1,date_2, MONTH)
WHEN EXTRACT(DAY FROM date_1) = 1
AND EXTRACT(DAY FROM DATE_ADD(date_2, INTERVAL 1 DAY)) = 1
THEN DATE_DIFF(DATE_ADD(date_1, INTERVAL -1 DAY), date_2, MONTH) + 1/31
ELSE DATE_DIFF(DATE_ADD(date_1, INTERVAL -1 DAY), date_2, MONTH) - 1 + EXTRACT(DAY FROM DATE_ADD(date_1, INTERVAL -1 DAY)) / 31 + (31 - EXTRACT(DAY FROM date_2) + 1) / 31
END
);
CREATE TEMP FUNCTION months_between(date_1 DATE, date_2 DATE) AS (
TRUNC(months_between_impl(date_1, date_2),9)
);
WITH
t AS (
SELECT DATE("2005-02-02") AS from_date, DATE("2005-01-01") AS to_date, "1.032258064516129" AS Expected
UNION ALL
SELECT DATE("2007-03-15"), DATE("2007-02-20"), "0.838709677419354"
UNION ALL
SELECT DATE("2008-03-29"), DATE("2008-02-29"), "1.0"
UNION ALL
SELECT DATE("2008-03-31"), DATE("2008-02-29"), "1.0"
UNION ALL
SELECT DATE("2005-11-29"), DATE("2006-03-01"), "-3.096774194"
UNION ALL
SELECT DATE("1993-07-01"), DATE("1993-03-31"), "3.03225806"
UNION ALL
SELECT DATE("2005-03-31"), DATE("2005-01-01"), "2.967741935"
UNION ALL
SELECT DATE("2008-03-30"), DATE("2008-02-29"), "1.032258064516129"
)
SELECT
from_date, to_date, expected, months_between(from_date, to_date) months_Between
FROM t;
added by Mikhail
Below is real run on Netezza showing that above UDF actually returns totally correct result (as for some reason the numbers in expected column are not what really Netezza returns - rather correct numbers are under result column - which as I mentioned exactly what Felipe's UDF produces)

Change timestamp to select from previous week starting now

I am running a query in Bigquery that selects all of the data entires made within the last week. However, I just noticed that the code I have is selecting the data from the previous week (Sunday to Sunday) and not the immediate week starting from whenever I run the query. How can I change this query to accomplish that. Thanks for the help!
#standardsql
SELECT count(distinct DeviceID)
FROM `dataworks-356fa.FirebaseArchive.test2`
Where PeripheralType = 5
AND EXTRACT(WEEK FROM createdAt) = EXTRACT(WEEK FROM CURRENT_TIMESTAMP()) - 1
AND serial != 'null'
I tried to run the query below but this would not select any data entries that were made today.
AND EXTRACT(day FROM createdAt) = EXTRACT(day FROM CURRENT_TIMESTAMP()) - 7

assuming that createdAt is a timestamp - you should use:
WHERE DATE(createdAt) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE()
So, your example would be like below
#standardsql
SELECT count(distinct DeviceID)
FROM `dataworks-356fa.FirebaseArchive.test2`
Where PeripheralType = 5
AND DATE(createdAt) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE()
AND serial != 'null'

BigQuery Where Date is Less Than or Equal to 3 Days Minus Current Date

I'm trying to create a query to only return data where date is minus 3 days from the current date. I've tried:
date <= DATE_ADD(CURRENT_DATE(), -3, 'DAY')
But this returns Error: Expected INTERVAL expression

See WHERE clause in below example
#standardSQL
WITH yourTable AS (
SELECT i, date
FROM UNNEST(GENERATE_DATE_ARRAY('2017-04-15', '2017-04-28')) AS date WITH OFFSET AS i
)
SELECT *
FROM yourTable
WHERE date <= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
-- ORDER BY date
Btw, in case if you are still with Legacy SQL - see below example
#legacySQL
SELECT *
FROM -- yourTable
(SELECT 1 AS id, DATE('2017-04-20') AS date),
(SELECT 2 AS id, DATE('2017-04-21') AS date),
(SELECT 3 AS id, DATE('2017-04-22') AS date),
(SELECT 4 AS id, DATE('2017-04-23') AS date),
(SELECT 5 AS id, DATE('2017-04-24') AS date),
(SELECT 6 AS id, DATE('2017-04-25') AS date)
WHERE TIMESTAMP(date) <= DATE_ADD(TIMESTAMP(CURRENT_DATE()), -3, 'DAY')
-- ORDER BY date

This works with a string formatted date.
DATE(TIMESTAMP(date)) <= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)

Just tested this and seems to work.
I added this :
and DATE(TIMESTAMP(datevalue)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 21 DAY)
and managed to get all records greater than last 21 days worth. Only thing I changed from #ericbrownaustin 's code was changed the 'date' in the first piece of code in the second set of parenthesis.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to only add business days to a date in BigQuery? - sql

Related

How can I aggregate time series data in postgres from a specific timestamp & fixed intervals (e.g. 1 hour , 1 day, 7 day ) without using date_trunc()?

Customizing the range of a week with date_trunc

How to get DATE_DIFF in decimal

Change timestamp to select from previous week starting now

BigQuery Where Date is Less Than or Equal to 3 Days Minus Current Date

Categories

Resources