Average on 12 preceding months interval and ISO Weeks - sql

I have the following query to get an average of units on the preceding 12 months in an interval but my problem is that the 12 preceding months is not taking into account ISO week 1 in the year, say this example:
SELECT
*,
avg(units) OVER (
ORDER BY to_date(year::text || '-' || week::text, 'IYYY-IW')
RANGE between interval '12 months' preceding and current row)
FROM
rolling_year_table
order by year,week;
Basically ISO week 1 2020 (which is actually '2019-12-30' is not taken into account in the calculations.
Is there a way to say 12 months preceding and current row but using ISO weeks?
Thanks,

This is too long for a comment.
I don't think there is an easy way to do this. The problem is the discrepancy between "week" and "year". There are, about 52.2 weeks in a 12 month period. So, what you are asking is that sometimes the "12 month" period have 52 weeks and sometimes it has 53 weeks.
I think you could do a cumulative calculation based on the past 52 weeks and then use condition logic to include the 53rd week previous. The problem is . . . I don't know what the exact rules are for going back 53 weeks.
If the only concern is that in the 53rd week of a year then the entire year should be included, then that would be pretty easy to include. The pseudo code for that would be:
(case when isoweek = 53
then avg() over (. . . range between '53 week' preceding and current row)
else avg() over (. . . range between '52 week' preceding and current row)
end)
EDIT:
I'm not 100% sure if this will work for your use-case. But I have an idea that might do what you want. That is to enumerate the weeks of the year as fractions of the year. So years with 52 weeks would have one enumeration and years with 53 weeks would have another.
This would look like:
select . . .,
avg(units) over (order by year + (isoweek - 1) / weeks_in_year
range between 1 preceding and current row
)
from (select t.*,
extract(isoyear from dte) as isoyear,
extract(week from dte) as isoweek,
greatest(extract(week from date_trunc('year', dte) + interval '1 year - 1 day'), 53) as weeks_in_year
f from t
) t;
You would need to test this to see if it does what you really want. As I say at the beginning of this answer "12 months ago" is not clearly defined for ISO weeks, but this may be a reasonable interpretation.

Doesn't this do what you want?
select ry.*,
avg(units) over (
order by year * 100 + week
range between 100 preceding and current row
from rolling_year_table
order by year,week;
Say current row is year 2020 and week 45, this will take rows from the same week in 2019 to the current row.

Related

Get the end date for an ISO week number in Google Data studio/Looker Studio

I google data studio, I have a field called week that contains the week number the value is string 'Week 2' for example. How can I extract the last day of the week based on the week number. In this case I want to get 2023-01-14 which is the last day of week 2?
enter image description here
Have you tried the last_day function?
I would build from the using the date using last_day, because from the week number it would require a bit more code.
SELECT *,
LAST_DAY(today, WEEK) AS last_day_of_week,
EXTRACT(WEEK FROM today) AS week_number
FROM (SELECT CURRENT_DATE() AS today)
But if you need to go with just the week number, I recommend taking a look into this other question How to create date based on year, week number and day in bigquery .
The following code works:
DATEtime_ADD(DATEtime_TRUNC(DATEtime_ADD(DATEtime_TRUNC(date, WEEK), INTERVAL (week-1) * 1 DAY), WEEK), INTERVAL 6 DAY)

Write a SQL Query in Google Big Query which pulls all values from last week, all values from 2 weeks ago, and calculate percent change between them

I'm trying to query a table comparing order numbers from last week (Sunday to Saturday) vs 2 weeks ago, and calculate percent change between the two. My thought process so far has been to group my date column by week, then use a lag function to pull last week and the previous week in to the same row. From there use basic arithmetic functions to calculate percent change. In practice, I haven't been able to get a working query, but I picture the table to look as follows:
Week
Orders
Orders - Previous Week
% Change
2023-02-05
5
10
-0.5
2023-01-29
10
2
+5.0
2023-01-29
2
Important to note that the days in last week should not change regardless of what day it is today (i.e not use today -7 days to calculate last week, and -14 days to calculate 2 weeks ago)
My query so far:
SELECT
min(date) as date,
orders,
coalesce(lag(order) over (order by (date), 0)) as Orders - Previous Week
FROM `table`
WHERE date BETWEEN '2023-01-01' AND current_date()
group by date_trunc(date, WEEK)
ORDER BY date desc
I realize I'm not using coalesce and my lag function correctly, but a bit lost on how to correct it
To calculate the percent change, you can use the following query:
sql
Copy code
SELECT
min(date) as Week,
sum(orders) as Orders,
coalesce(sum(lag(orders) over (order by date_trunc(date, WEEK))), 0) as "Orders - Previous Week",
(sum(orders) - coalesce(sum(lag(orders) over (order by date_trunc(date, WEEK))), 0)) / coalesce(sum(lag(orders) over (order by date_trunc(date, WEEK))), 0) as "% Change"
FROM `table`
WHERE date BETWEEN '2023-01-01' AND current_date()
group by date_trunc(date, WEEK)
ORDER BY Week desc
In this query, the sum function is used to aggregate the orders by week. The coalesce function is used to handle the case where there is no previous week data, and default to 0. The percent change calculation uses the same formula you described.

Finding the WEEK number for 1st January - Big Query

I am calculating the first week of every month for past 12 months from current date. The query logic that I am using is as follows:
SELECT
FORMAT_DATE('%Y%m%d', DATE_TRUNC(DATE_SUB(CURRENT_DATE(),interval 10 month), MONTH)) AS YYMMDD,
FORMAT_DATE('%Y%m', DATE_TRUNC(DATE_SUB(CURRENT_DATE(), interval 10 month), MONTH)) AS YYMM,
FORMAT_DATE('%Y%W', DATE_TRUNC(DATE_SUB(CURRENT_DATE(), interval 10 month), MONTH)) AS YYWW
OUTPUT:
Row
YYMMDD
YYMM
YYWW
1
20210101
202101
202100
The YYWW format returns the week as 00 and is causing my logic to fail. Is there anyway to handle this? My logic is going to be running 12 months calculation to find first week of every month.
At a very basic level, you can accomplish it with something like this:
with calendar as (
select date, extract(day from date) as day_of_month
from unnest(generate_date_array('2021-01-01',current_date(), interval 1 day)) date
)
select
date,
extract(month from date) as month_of_year,
case
when day_of_month < 8 then 1
when day_of_month < 15 then 2
when day_of_month < 22 then 3
when day_of_month < 29 then 4
else 5
end as week_of_month
from calendar
order by date
This approach is very simplistic, but you gave no criteria for your week-of-month definition in the query, so this is a reasonable answer. There is potential for a ton of variation in how you define week-of-month. The logic for week-of-year is built in to BQ, and provides options to handle items such as the starting day of the week, carryover at the end/beginning of consecutive years, etc. There is no corresponding week-of-month logic out of the box, so any "easy" built-in function like FORMAT_DATE() is unlikely to solve the problem.

Calculate the end day of a week with DATE_TRUNC in PostgreSQL

Is there a way to change the showing date to the end of each week instead of the beginning of the week.
Here's my code:
SELECT date_trunc('week', day + '1 day'::interval)::date - '1 day'::interval AS anchor, AVG(value) AS average
FROM daily_metrics
WHERE metric = 'daily-active-users'
GROUP BY anchor
ORDER BY anchor
And the result is as below:
What I want to achieve is to make it 2018-03-03 (the end of the self defined week) instead of 2018-02-25 (the beginning of the self defined week), 2018-03-10 instead of 2018-03-04...
Your trick with shifting back and forth by one day works just fine to get the start of your custom week. You get the end or your custom week (Saturday) by adding 5 instead of subtracting 1:
SELECT date_trunc('week', day + interval '1 day')::date + 5 AS anchor ...
Adding an integer to the date, signifying days.
Related:
PostgreSQL custom week number - first week containing Feb 1st
Simply try
SELECT date_trunc('week', day::DATE + 1)::date + 5 AS anchor, AVG(value) AS average
FROM daily_metrics
WHERE metric = 'daily-active-users'
GROUP BY anchor
ORDER BY anchor
When a date is the start date of a week adding 6 (1 + 5) days will move the date to the last date of the week. The the addition of one is to move sundays to the following week and the 5 to get the end of the week from the start date.
Note, PostgreSQL allows the addition of integers (= days) to dates.

Simulate query over a range of dates

I have a fairly long query that looks over the past 13 weeks and determines if the current day's performance is an anomaly compared to the last 13 weeks. It just returns a single row that has the date, the performance of the current day and a flag saying if it is an anomaly or not. To make matters a little more complicated: The performance isn't just a single day but rather a running 24 hour window. This query is then run every hour to monitor the KPI over the last 24 hours. i.e. If it is 2pm on Tuesday, it will look from 2pm the previous day (Monday) to now, and compare it to every other 2pm-to-2pm for the last 13 weeks.
To test if this code is working I would like simulate it running over the past month.
The code goes as follows:
WITH performance AS(
SELECT TRUNC(dateColumn - to_number(to_char(sysdate, 'hh24')/24) as startdate,
KPI_a,
KPI_b,
KPI_c
FROM table
WHERE someConditions
GROUP BY TRUNC(dateColumn - to_number(to_char(sysdate, 'hh24')/24)),
compare_t AS(
-- looks at relationships of the KPIs),
variables AS(
-- calculates the variables required for the anomaly detection),
... ok I don't know how much of the query needs to be given but it's basically I need to simulate 'sysdate'. Instead of inputting the current date, input each hour for the last month so this query will run approx 720 times and return the result 720 times, for each hour of each day.
I'm thinking a FOR loop, but I'm not sure.
You can use a recursive subquery:
with times(time) as
(
select sysdate - interval '1' month as time from dual
union all
select time + interval '1' hour from times
where time < sysdate
)
, performance as ()
, compare_t as ()
, variables as ()
select *
from times
join ...
order by time;
I don't understand your specific requirements but I had to solve similar problems. To give you an idea here are two proposals:
Calculate average and standard deviation of KPI value from past 13 weeks to yesterday. If current value from today it lower than "AVG - 10*STDDEV" then select record, i.e. mark as anomaly.
WITH t AS
(SELECT dateColumn, KPI_A,
AVG(KPI_A) OVER (ORDER BY dateColumn RANGE BETWEEN 13 * INTERVAL '7' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING) AS REF_AVG,
STDDEV(KPI_A) OVER (ORDER BY dateColumn RANGE BETWEEN 13 * INTERVAL '7' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING) AS REF_STDDEV
FROM TABLE
WHERE someConditions)
SELECT dateColumn, REF_AVG, KPI_A, REF_STDDEV
FROM t
WHERE TRUNC(dateColumn, 'HH') = TRUNC(LOCALTIMESTAMP, 'HH')
AND KPI_A < REF_AVG - 10 * REF_STDDEV;
Take hourly values from last week (i.e. the same weekday as yesterday) and make correlation with hourly values from yesterday. If correlation is less than certain value (I use 95%) then consider this day as anomaly.
WITH t AS
(SELECT dateColumn, KPI_A,
FIRST_VALUE(KPI_A) OVER (ORDER BY dateColumn RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) AS KPI_A_LAST_WEEK,
dateColumn - FIRST_VALUE(dateColumn) OVER (ORDER BY dateColumn RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) AS RANGE_INT
FROM table
WHERE ...)
SELECT 100*ROUND(CORR(KPI_A, KPI_A_LAST_WEEK), 2) AS CORR_VAL
FROM t
WHERE KPI_A_LAST_WEEK IS NOT NULL
AND RANGE_INT = INTERVAL '7' DAY
AND TRUNC(dateColumn) = TRUNC(LOCALTIMESTAMP - INTERVAL '1' DAY)
GROUP BY TRUNC(dateColumn);