Dynamic scheduled query based on timestamp - google-bigquery

I am trying to set up a scheduled query to run on the 1st of each month, and capture one month of data. However it should not be the previous month, but 2 months previous - due to delays in data being loaded in to the source table. The source table is partitioned by day on session_timestamp so refining this as much as possible will be of benefit to reducing query cost.
So far I have this:
WHERE
EXTRACT(YEAR
FROM
session_timestamp) = EXTRACT(YEAR
FROM
DATE_SUB(CURRENT_DATE, INTERVAL 2 MONTH))
AND EXTRACT(MONTH
FROM
session_timestamp) = EXTRACT(MONTH
FROM
DATE_SUB(CURRENT_DATE, INTERVAL 2 MONTH))
This seems a highly inelegant solution but was intended to address cases where a year boundary would be crossed. However I can see from the "This script will process * when run." area that this is going to query everything in 2020 and not just in May 2020.

As you have pointed out, your query doesn't engage partition filter down to the 2 months of data which you want to query.
You don't have to do the year trick because DATE_TRUNC(..., MONTH) has year in it. Please try filter below:
-- Last day of the month
DATE(session_timestamp) <= DATE_SUB(DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY)
AND
-- First day of the month
DATE(session_timestamp) >= DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 2 MONTH), MONTH)

Related

Amazon Athena Date Functions [duplicate]

This question already has answers here:
Presto: Last day of the month prior
(2 answers)
Closed 4 months ago.
I understand Athena uses Presto, however the function last_day_of_month(x) in the documentation doesn't seem to work in AWS Athena.
Is there a function I can use to get the last day of the previous month based on the current date (30 september 2021), last day of previous year (31 december 2021) and last day of the half year (30 June 2022) etc?
I used the below script to do this, however it would be good to know if there's a function I can use or simpler way to run the dates.
SELECT date_trunc('month', current_date) - interval '1' day
SELECT date_trunc('year',(date_trunc('month', current_date) - interval '1' day)) - interval '1' day
SELECT date_add('month',6, date_trunc('year',(date_trunc('month', current_date) - interval '1' day)) - interval '1' day)
First, you need to upgrade your Workgroup to use the Athena engine version 3, which already supports last_day_of_month(x) function.
Athena is based on Presto/Trino, and each version of the Athena engine is based on a different version of the open-source project. You can control the version from the Workgroups menu and even let Athena upgrade the engine automatically for you.
Second, if you want a get the last day of the previous month, the easiest way is to create the first day of the following month and substruct one day from it.
SELECT date '2012-08-01' - interval '1' day
Therefore, if you want the last day of the previous month, and as suggested in the comment, using date_trunc:
SELECT date_trunc('month', current_date ) - interval '1' day
--- half year back
SELECT date_trunc('month', current_date - interval '6' month) - interval '1' day
--- one year back
SELECT date_trunc('month', current_date - interval '1' year) - interval '1' day

how to get data for last calender week in redshift

I have a below query that I run to extract material movements from the last 7 days.
Purpose is to get the data for the last calender week for certain reports.
select
*
From
redshift
where
posting_date between CURRENT_DATE - 7 and CURRENT_DATE - 1
That means I need to run the query on every Monday to get the data for the former week.
Sometimes I am too busy on Monday or its vacation/bank holiday. In that case I would need to change the query or pull the data via SAP.
Question:
Is there a function for redshift that pulls out the data for the last calender week regardless when I run the query?
I already found following solution
SELECT id FROM table1
WHERE YEARWEEK(date) = YEARWEEK(NOW() - INTERVAL 1 WEEK)
But this doesnt seem to be working for redshift sql
Thanks a lot for your help.
Redshift offers a DATE_TRUNC('week', datestamp) function. Given any datestamp value, either a date or datetime, it gives back the date of the preceding Sunday.
So this might work for you. It filters rows from the Sunday before last, up until but not including, the last Sunday, and so gets a full week.
SELECT id
FROM table1
WHERE date >= DATE_TRUNC('week', NOW()) - INTERVAL 1 WEEK
AND date < DATE_TRUNC('week', NOW())
Pro tip: Every minute you spend learning your DBMS's date/time functions will save you an hour in programming.

Finding the WEEK number for 1st January - Big Query

I am calculating the first week of every month for past 12 months from current date. The query logic that I am using is as follows:
SELECT
FORMAT_DATE('%Y%m%d', DATE_TRUNC(DATE_SUB(CURRENT_DATE(),interval 10 month), MONTH)) AS YYMMDD,
FORMAT_DATE('%Y%m', DATE_TRUNC(DATE_SUB(CURRENT_DATE(), interval 10 month), MONTH)) AS YYMM,
FORMAT_DATE('%Y%W', DATE_TRUNC(DATE_SUB(CURRENT_DATE(), interval 10 month), MONTH)) AS YYWW
OUTPUT:
Row
YYMMDD
YYMM
YYWW
1
20210101
202101
202100
The YYWW format returns the week as 00 and is causing my logic to fail. Is there anyway to handle this? My logic is going to be running 12 months calculation to find first week of every month.
At a very basic level, you can accomplish it with something like this:
with calendar as (
select date, extract(day from date) as day_of_month
from unnest(generate_date_array('2021-01-01',current_date(), interval 1 day)) date
)
select
date,
extract(month from date) as month_of_year,
case
when day_of_month < 8 then 1
when day_of_month < 15 then 2
when day_of_month < 22 then 3
when day_of_month < 29 then 4
else 5
end as week_of_month
from calendar
order by date
This approach is very simplistic, but you gave no criteria for your week-of-month definition in the query, so this is a reasonable answer. There is potential for a ton of variation in how you define week-of-month. The logic for week-of-year is built in to BQ, and provides options to handle items such as the starting day of the week, carryover at the end/beginning of consecutive years, etc. There is no corresponding week-of-month logic out of the box, so any "easy" built-in function like FORMAT_DATE() is unlikely to solve the problem.

How to retrieve first and last day of the previous month in Google BigQuery?

I am looking to create a get the first and last date of the previous month so I can do a WHERE clause with a between statement. It'll look something like this
WHERE
FirstSold_Date BETWEEN first_day_previous_month AND last_day_previous_month
Try this:
WHERE FirstSold_Date BETWEEN date_trunc(date_sub(current_date(), interval 1 month), month) AND last_day(date_sub(current_date(), interval 1 month), month)
I would not recommend between for this. Instead:
WHERE FirstSold_Date >= date_add(date_trunc(current_date, month), interval -1 month) and
FirstSold_Date < date_trunc(current_date, month)
The advantage of this approach is that the same logic works for timestamps and datetimes as well. Looking at the last date causes problems when times are involved.
Consider below
where date_trunc(FirstSold_Date, month) = date_trunc(date_sub(current_date, interval 1 month), month)

Simulate query over a range of dates

I have a fairly long query that looks over the past 13 weeks and determines if the current day's performance is an anomaly compared to the last 13 weeks. It just returns a single row that has the date, the performance of the current day and a flag saying if it is an anomaly or not. To make matters a little more complicated: The performance isn't just a single day but rather a running 24 hour window. This query is then run every hour to monitor the KPI over the last 24 hours. i.e. If it is 2pm on Tuesday, it will look from 2pm the previous day (Monday) to now, and compare it to every other 2pm-to-2pm for the last 13 weeks.
To test if this code is working I would like simulate it running over the past month.
The code goes as follows:
WITH performance AS(
SELECT TRUNC(dateColumn - to_number(to_char(sysdate, 'hh24')/24) as startdate,
KPI_a,
KPI_b,
KPI_c
FROM table
WHERE someConditions
GROUP BY TRUNC(dateColumn - to_number(to_char(sysdate, 'hh24')/24)),
compare_t AS(
-- looks at relationships of the KPIs),
variables AS(
-- calculates the variables required for the anomaly detection),
... ok I don't know how much of the query needs to be given but it's basically I need to simulate 'sysdate'. Instead of inputting the current date, input each hour for the last month so this query will run approx 720 times and return the result 720 times, for each hour of each day.
I'm thinking a FOR loop, but I'm not sure.
You can use a recursive subquery:
with times(time) as
(
select sysdate - interval '1' month as time from dual
union all
select time + interval '1' hour from times
where time < sysdate
)
, performance as ()
, compare_t as ()
, variables as ()
select *
from times
join ...
order by time;
I don't understand your specific requirements but I had to solve similar problems. To give you an idea here are two proposals:
Calculate average and standard deviation of KPI value from past 13 weeks to yesterday. If current value from today it lower than "AVG - 10*STDDEV" then select record, i.e. mark as anomaly.
WITH t AS
(SELECT dateColumn, KPI_A,
AVG(KPI_A) OVER (ORDER BY dateColumn RANGE BETWEEN 13 * INTERVAL '7' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING) AS REF_AVG,
STDDEV(KPI_A) OVER (ORDER BY dateColumn RANGE BETWEEN 13 * INTERVAL '7' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING) AS REF_STDDEV
FROM TABLE
WHERE someConditions)
SELECT dateColumn, REF_AVG, KPI_A, REF_STDDEV
FROM t
WHERE TRUNC(dateColumn, 'HH') = TRUNC(LOCALTIMESTAMP, 'HH')
AND KPI_A < REF_AVG - 10 * REF_STDDEV;
Take hourly values from last week (i.e. the same weekday as yesterday) and make correlation with hourly values from yesterday. If correlation is less than certain value (I use 95%) then consider this day as anomaly.
WITH t AS
(SELECT dateColumn, KPI_A,
FIRST_VALUE(KPI_A) OVER (ORDER BY dateColumn RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) AS KPI_A_LAST_WEEK,
dateColumn - FIRST_VALUE(dateColumn) OVER (ORDER BY dateColumn RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) AS RANGE_INT
FROM table
WHERE ...)
SELECT 100*ROUND(CORR(KPI_A, KPI_A_LAST_WEEK), 2) AS CORR_VAL
FROM t
WHERE KPI_A_LAST_WEEK IS NOT NULL
AND RANGE_INT = INTERVAL '7' DAY
AND TRUNC(dateColumn) = TRUNC(LOCALTIMESTAMP - INTERVAL '1' DAY)
GROUP BY TRUNC(dateColumn);