Write a SQL Query in Google Big Query which pulls all values from last week, all values from 2 weeks ago, and calculate percent change between them - sql

I'm trying to query a table comparing order numbers from last week (Sunday to Saturday) vs 2 weeks ago, and calculate percent change between the two. My thought process so far has been to group my date column by week, then use a lag function to pull last week and the previous week in to the same row. From there use basic arithmetic functions to calculate percent change. In practice, I haven't been able to get a working query, but I picture the table to look as follows:
Week
Orders
Orders - Previous Week
% Change
2023-02-05
5
10
-0.5
2023-01-29
10
2
+5.0
2023-01-29
2
Important to note that the days in last week should not change regardless of what day it is today (i.e not use today -7 days to calculate last week, and -14 days to calculate 2 weeks ago)
My query so far:
SELECT
min(date) as date,
orders,
coalesce(lag(order) over (order by (date), 0)) as Orders - Previous Week
FROM `table`
WHERE date BETWEEN '2023-01-01' AND current_date()
group by date_trunc(date, WEEK)
ORDER BY date desc
I realize I'm not using coalesce and my lag function correctly, but a bit lost on how to correct it

To calculate the percent change, you can use the following query:
sql
Copy code
SELECT
min(date) as Week,
sum(orders) as Orders,
coalesce(sum(lag(orders) over (order by date_trunc(date, WEEK))), 0) as "Orders - Previous Week",
(sum(orders) - coalesce(sum(lag(orders) over (order by date_trunc(date, WEEK))), 0)) / coalesce(sum(lag(orders) over (order by date_trunc(date, WEEK))), 0) as "% Change"
FROM `table`
WHERE date BETWEEN '2023-01-01' AND current_date()
group by date_trunc(date, WEEK)
ORDER BY Week desc
In this query, the sum function is used to aggregate the orders by week. The coalesce function is used to handle the case where there is no previous week data, and default to 0. The percent change calculation uses the same formula you described.

Related

SQL moving average by weekday with moving range for last 4 weeks, excluding current week

I'm trying to calculate 7 day moving average by weekday, for last 4 weeks but not including the current week. The code below calculates the average however that includes the current week's data. How can I exclude the current week and only calculate last 4 Sundays, Mondays, Tuesdays, etc. Please note, the average will differ by day. For example, each Sunday will have a different average based on the last 4 weeks.
Attached is the sample data and desired results Data example
Select a.Date, a.WeekDay,
avg(a.count) Over(partition by a.WeekDay order by a.Date rows between 3 preceding and current row) as rolling_avg
from
(Select Date, WeekDay, Count from Sales) a
Where a.Date >= current_date- 7*7
You almost have it, change the way to use following ROWS. You need to use FOLLOWING 1 to use starting next row and 4 as the limit. I also changed the order by date desc that, way the next row is the previous week.
Select
a.Date_V,
a.WeekDay,
--AVG(CAST(a.count_v AS DECIMAL(8,2))) OVER /*cast optional*/
AVG(a.count_v) Over
(partition by a.WeekDay order by a.date_v desc
ROWS BETWEEN 1 FOLLOWING AND 4 FOLLOWING) as rolling_avg
FROM
(Select Date_V, WeekDay, Count_v from historic_data) a
Where a.Date_v >= GETDATE()- 7*7
ORDER BY a.date_v desc
I created following SQL Fiddle for testing if someone wants to play with it.

In Bigquery SQL: How to fetch previous week, specified week and next week data?

Scenario: From bigquery, have to fetch the specified date's week data + its previous week data + its next future week data. Week starts is Wednesday.
Tried Query:
Select * from table
and extract(week(wednesday) from Calendar_Day) >= (extract(week(wednesday) from PARSE_DATE('%d/%m/%Y','21/10/2020')) - 1)
and extract(week(wednesday) from Calendar_Day) >= (extract(week(wednesday) from PARSE_DATE('%d/%m/%Y','21/10/2020') ))
and extract(week(wednesday) from Calendar_Day) <= (extract(week(wednesday) from PARSE_DATE('%d/%m/%Y','21/10/2020')) + 1)
But this is not working for me.
Need help in resolving this. Thanks in Advance!
EXTRACT the week as the code already does. and the year as the weeks repeat every year.
GROUP BY the week and year. At this point I find it handy to make a STRUCT from the remaining fields as it simplifies the remaining code.
make another query that uses the query which did the GROUP BY, I used a WITH. In this last query, LEAD and LAG the data with a WINDOW by week.
Here's an example from a public dataset.
WITH
data_by_week AS (
SELECT
EXTRACT(year FROM date) AS year,
EXTRACT(week(wednesday) FROM date) AS week,
struct(
SUM(new_tested) as total_new_tested,
sum(new_recovered) as total_new_recovered
) as week_data
FROM
`bigquery-public-data.covid19_open_data.covid19_open_data`
GROUP BY
year,
week )
SELECT
year,
week,
LAG(week_data) OVER window_by_week AS previous_week,
week_data AS current_week,
LEAD(week_data) OVER window_by_week AS following_week
FROM
data_by_week
WINDOW
window_by_week AS ( ORDER BY year, week)
ORDER BY
year,
week

Bigquery: How can I aggregate the data of several columns according to a specific time range?

I am new to big query and I am trying to aggregate transaction data, revenue data, and visitor data across a series of client accounts. I need the output to be grouped by clientname and by 8 month period, so each client account has 12 months of data that are aggregated (each day of the month added together in one month entry). I can only manage to get an out of the first day of each month and not everything in between added together:
SELECT
clientname,
DATE_TRUNC(PARSE_DATE('%Y%m%d',date), MONTH) as MonthStart,
SUM (totals.visits) AS visits,
SUM (totals.transactions) AS transactions,
SUM (totals.campaigns) AS campaigns,
sum (totals.totalTransactionRevenue) AS Transactionsrevenue,
FROM `prod.mar.auto` as automotive
GROUP BY
clientname,monthstart
ORDER BY
clientname,monthstart ASC
Limit 1000
The out is only providing the value for the first of the month and not the sum between the months. Can someone help point me in the right direction?
Thanks
If you want data for the last 8 or 12 months, then use a WHERE clause:
SELECT a.clientname,
SUM(a.visits) AS visits,
SUM(a.transactions) AS transactions,
SUM(a.campaigns) AS campaigns,
SUM(a.totalTransactionRevenue) AS Transactionsrevenue,
FROM `prod.mar.auto` as a
WHERE PARSE_DATE('%Y%m%d',date >= DATE_ADD(CURRENT_DATE interval -12 months)
GROUP BY clientname
ORDER BY clientname ASC
Limit 1000;
Part of the question is somewhat unclear, so I'm doing my best with the information I have.
From my understanding of your question, it seems like there is an issue with aggregating across the entire month. It is only returning one day of the month, rather than summing across the month.
If you change
DATE_TRUNC(PARSE_DATE('%Y%m%d',date), MONTH) as MonthStart
to
EXTRACT(MONTH FROM DATE) AS MonthStart
This will return the number of the month and therefore you can aggregate across because the numbers across all the date fields will be the same for each month individually.
Here is the final query:
SELECT
clientname,
EXTRACT(MONTH FROM DATE) AS MonthStart,
SUM (totals.visits) AS visits,
SUM (totals.transactions) AS transactions,
SUM (totals.campaigns) AS campaigns,
sum (totals.totalTransactionRevenue) AS Transactionsrevenue,
FROM `prod.mar.auto` as automotive
GROUP BY
clientname, MonthStart
ORDER BY
clientname, MonthStart ASC
Limit 1000
Providing Documentation to Function:
https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions

BigQuery, Sum by week

I am using standard SQL and am trying to add the weekly sum for product usage by week.
Using code below, I was able to add to each row the respective week and year it falls into. How would I go about summing the totals for an item by week and outputting it in columns, say up to the last 8 weeks.
extract(week from Metrics_Date) as week, EXTRACT(YEAR FROM Metrics_Date) AS year
Image is my raw data with the week and year next to an item:
This image is of above raw data being analyzed further(grouping them together). Here is where I would want to add columns, current_week & firstday of week date, and a sum of that weeks totals.
Any help would be appreciated.
You don't need the extract() by the way, you can do truncation DATE_TRUNC(your_date, WEEK) and it will truncate it to the week, usually easier.
Also, because the result of the truncation is a date, you will have the first day of the week already.
The rest I believe you have it figured out already, but just in case:
SELECT DATE_TRUNC(your_date_field, WEEK) AS week, SUM(message_count) AS total_messages FROM your_table GROUP BY 1

Bigquery SQL for sliding window aggregate

Hi I have a table that looks like this
Date Customer Pageviews
2014/03/01 abc 5
2014/03/02 xyz 8
2014/03/03 abc 6
I want to get page view aggregates grouped by week but showing aggregates for past 30 days - (sliding window aggregates with window-size of 30 days for every week)
I am using google bigquery
EDIT: Gordon - re your comment about "Customer", Actually what I need is slightly more complicated thats why I included customer in the table above. I am looking to get the number of customers who had >n pageviews in a 30day window every week. something like this
Date Customers>10 pageviews in 30day window
2014/02/01 10
2014/02/08 5
2014/02/15 6
2014/02/22 15
However to keep it simple, I will work my way if I could just get a sliding window aggregate of pageviews ignoring customers altogether. something like this
Date count of pageviews in 30day window
2014/02/01 50
2014/02/08 55
2014/02/15 65
2014/02/22 75
How about this:
SELECT changes + changes1 + changes2 + changes3 changes28days, login, USEC_TO_TIMESTAMP(week)
FROM (
SELECT changes,
LAG(changes, 1) OVER (PARTITION BY login ORDER BY week) changes1,
LAG(changes, 2) OVER (PARTITION BY login ORDER BY week) changes2,
LAG(changes, 3) OVER (PARTITION BY login ORDER BY week) changes3,
login,
week
FROM (
SELECT SUM(payload_pull_request_changed_files) changes,
UTC_USEC_TO_WEEK(created_at, 1) week,
actor_attributes_login login,
FROM [publicdata:samples.github_timeline]
WHERE payload_pull_request_changed_files > 0
GROUP BY week, login
))
HAVING changes28days > 0
For each user it counts how many changes they have submitted per week. Then with LAG() we can peek into the next row, how many changes they submitted the -1, -2, and -3 week. Then we just add those 4 weeks to see how many changes were submitted on the last 28 days.
Now you can wrap everything in a new query to filter users with changes>X, and count them.
I have created the following "Times" table:
Table Details: Dim_Periods
Schema
Date TIMESTAMP
Year INTEGER
Month INTEGER
day INTEGER
QUARTER INTEGER
DAYOFWEEK INTEGER
MonthStart TIMESTAMP
MonthEnd TIMESTAMP
WeekStart TIMESTAMP
WeekEnd TIMESTAMP
Back30Days TIMESTAMP -- the date 30 days before "Date"
Back7Days TIMESTAMP -- the date 7 days before "Date"
and I use such query to handle "running sums"
SELECT Date,Count(*) as MovingCNT
FROM
(SELECT Date,
Back7Days
FROM DWH.Dim_Periods
where Date < timestamp(current_date()) AND
Date >= (DATE_ADD (CURRENT_TIMESTAMP(), -5, 'month'))
)P
CROSS JOIN EACH
(SELECT repository_url,repository_created_at
FROM publicdata:samples.github_timeline
) L
WHERE timestamp(repository_created_at)>= Back7Days
AND timestamp(repository_created_at)<= Date
GROUP EACH BY Date
Note that it can be used for "Month to date", Week to Date" "30 days back" etc. aggregations as well.
However, performance is not the best and the query can take a while on larger data sets due to the Cartesian join.
Hope this helps