SQL moving average by weekday with moving range for last 4 weeks, excluding current week - sql

I'm trying to calculate 7 day moving average by weekday, for last 4 weeks but not including the current week. The code below calculates the average however that includes the current week's data. How can I exclude the current week and only calculate last 4 Sundays, Mondays, Tuesdays, etc. Please note, the average will differ by day. For example, each Sunday will have a different average based on the last 4 weeks.
Attached is the sample data and desired results Data example
Select a.Date, a.WeekDay,
avg(a.count) Over(partition by a.WeekDay order by a.Date rows between 3 preceding and current row) as rolling_avg
from
(Select Date, WeekDay, Count from Sales) a
Where a.Date >= current_date- 7*7

You almost have it, change the way to use following ROWS. You need to use FOLLOWING 1 to use starting next row and 4 as the limit. I also changed the order by date desc that, way the next row is the previous week.
Select
a.Date_V,
a.WeekDay,
--AVG(CAST(a.count_v AS DECIMAL(8,2))) OVER /*cast optional*/
AVG(a.count_v) Over
(partition by a.WeekDay order by a.date_v desc
ROWS BETWEEN 1 FOLLOWING AND 4 FOLLOWING) as rolling_avg
FROM
(Select Date_V, WeekDay, Count_v from historic_data) a
Where a.Date_v >= GETDATE()- 7*7
ORDER BY a.date_v desc
I created following SQL Fiddle for testing if someone wants to play with it.

Related

SQL week number for the whole table

How to create a new column which calculates week number but for the whole table ignoring year?
Desired output is as follows:
Appreciate any help :)
You can do this by calculating 1st day of week of oldest row, and then calculate day diff of 1st day of week of current row and coldest row, after that, divide it by 7 days plus 1 will give you the desired week number across the full table.
Assuming you are using MySQL and the first day of the week is Sunday:
WITH min_week_start AS (
SELECT
SUBDATE(MIN(record_date), dayofweek(MIN(record_date)) - 1) as week_start_date
FROM
record_table
),
record_week_start AS (
SELECT
record_date,
SUBDATE(record_date, dayofweek(record_date) - 1) as week_start_date
FROM
record_table
)
SELECT
record_week_start.record_date,
DATEDIFF(record_week_start.week_start_date, min_week_start.week_start_date) / 7 + 1 as week_num
FROM
record_week_start
CROSS JOIN
min_week_start

Write a SQL Query in Google Big Query which pulls all values from last week, all values from 2 weeks ago, and calculate percent change between them

I'm trying to query a table comparing order numbers from last week (Sunday to Saturday) vs 2 weeks ago, and calculate percent change between the two. My thought process so far has been to group my date column by week, then use a lag function to pull last week and the previous week in to the same row. From there use basic arithmetic functions to calculate percent change. In practice, I haven't been able to get a working query, but I picture the table to look as follows:
Week
Orders
Orders - Previous Week
% Change
2023-02-05
5
10
-0.5
2023-01-29
10
2
+5.0
2023-01-29
2
Important to note that the days in last week should not change regardless of what day it is today (i.e not use today -7 days to calculate last week, and -14 days to calculate 2 weeks ago)
My query so far:
SELECT
min(date) as date,
orders,
coalesce(lag(order) over (order by (date), 0)) as Orders - Previous Week
FROM `table`
WHERE date BETWEEN '2023-01-01' AND current_date()
group by date_trunc(date, WEEK)
ORDER BY date desc
I realize I'm not using coalesce and my lag function correctly, but a bit lost on how to correct it
To calculate the percent change, you can use the following query:
sql
Copy code
SELECT
min(date) as Week,
sum(orders) as Orders,
coalesce(sum(lag(orders) over (order by date_trunc(date, WEEK))), 0) as "Orders - Previous Week",
(sum(orders) - coalesce(sum(lag(orders) over (order by date_trunc(date, WEEK))), 0)) / coalesce(sum(lag(orders) over (order by date_trunc(date, WEEK))), 0) as "% Change"
FROM `table`
WHERE date BETWEEN '2023-01-01' AND current_date()
group by date_trunc(date, WEEK)
ORDER BY Week desc
In this query, the sum function is used to aggregate the orders by week. The coalesce function is used to handle the case where there is no previous week data, and default to 0. The percent change calculation uses the same formula you described.

How to do a rolling sum in BigQuery for groups of weeks?

I have the following table, which contains two columns: date and total_visits (website visits). I need to compute two new variables (rolling sums).
Column_A: A rolling sum for each day. For each day, I have to present the sum of the total visits from the last 14 days (without considering the current day). Of course, the first 14 days in this table can not have this value due to the fact there are not enough previous days to compute this value.
Column_B: A rolling sum for each day. For each day, I have to present the sum of the total visits considering the days between 4 weeks and 2 weeks before the current day. This means, for example, for 2019-01-29, the value we should be seeing is the sum of the total visits between 2021-01-01 and 2021-01-14. Of course, the first 28 days in the table won't have values for this column due to the fact there are no enough data to compute the value.
The next table is an example:
I currently have a solution in SQL (Workbench), but I need to apply this for a database store in GCP and there are syntax differences that I have not been able to understand. Any hint? Thanks in advance
Consider below approach
select *,
if(dense_rank() over win <= 14, null, sum(total_visits) over rolling_last_14_day) as total_last_14_day,
if(dense_rank() over win <= 28, null, sum(total_visits) over rolling_between_4_and_2_weeks_ago) as total_between_4_and_2_weeks_ago
from `project.dataset.table`
window win as (order by unix_date(date)),
rolling_last_14_day as (win range between 14 preceding and 1 preceding),
rolling_between_4_and_2_weeks_ago as (win range between 28 preceding and 15 preceding)
If applied to sample data in your question - output is
If you have data on every day, just use a window frame:
select t.*,
(case when row_number() over (order by date) >= 14
then sum(total_visits) over (order by date rows between 13 preceding and current row)
end) as total_14_day,
(case when row_number() over (order by date) >= 28
then sum(total_visits) over (order by date rows between 27 preceding and 14 preceding)
end) as total_14_day
from t;

Sum of shifting range in SQL Query

I am trying to write an efficient query to get the sum of the previous 7 days worth of values from a relational DB table, and record each total against the final date in the 7 day period (e.g. the 'WeeklyTotals Table' in the example below). For example, in my WeeklyTotals query, I would like the value for February 15th to be 333, since that is the total sum of users from Feb 9th - Feb 15th, and so on:
I have a base query which gets me my previous weeks users for today's date (simplified for the sake of the example):
SELECT Date, Sum("Total Users")
FROM "UserRecords"
WHERE (dateadd(hour, -8, "UserRecords"."Date") BETWEEN
dateadd(hour, -8, sysdate) - INTERVAL '7 DAY' AND dateadd(hour, -8, sysdate);
The problem is, this only get's me the total for today's date. I need a query which will get me this information for the previous seven days.
I know I can make a view for each date (since I only need the previous seven entries) and join them all together, but that seems really inefficient (I'll have to create/update 7 views, and then do all the inner join operations). I am wondering if there's a more efficient way to achieve this.
Provided there are no gaps, you can use a running total with SUM OVER including the six previous rows. Use ROW_NUMBER to exclude the first six records, as their totals don't represent complete weeks.
select log_date, week_total
from
(
select
log_date,
sum(total_users) over (order by log_date rows 6 preceding) as week_total,
row_number() over (order by log_date) as rn
from mytable
where log_date > 0
)
where rn >= 7
order by log_date;
UPDATE: In case there are gaps, it should be
sum(total_users) over (order by log_date range interval '6' day preceding)
but I don't know whether PostgreSQL supports this already. (Moreover the ROW_NUMBER exclusion wouldn't work then and would have to be replaced by something else.)
Here's a a query that self joins to the previous 6 days and sums the value to get the weekly totals:
select u1.date, sum(u2.total_users) as weekly_users
from UserRecords u1
join UserRecords u2
on u1.date - u2.date < 7
and u1.date >= u2.date
group by u1.date
order by u1.date
You can use the SUM over Window function, with the expression using Date Part, of week.
Self joins are much slower than Window functions.

Data for specific date

My report gets data for the 1st of the current month. Let's say the 1st has still not come then how would I make the report show the data for the 1st of the previous month.
Thanks.
Simply use a select top 1 from your table, filtering by extract(day from yourDateColumn) = 1 to get only the rows with the data for the 1st day of any month, and order them in descending order by your date column (order by yourDateColumn desc), so that you always get the 1st day of the last available month in your table.
Docs for Oracle EXTRACT function