Percent change with grouping variables - sql

I have four grouping variables Month, State,County, City. In addition I have the metric column sales which can be null I would like to calculate the percent change of sales per month for each City.
My solution would have the same grouping but with the sales column replaced by percent change for each month in calendar year 2019. Any help with a solution is appreciated.

You can use window functions:
select month, state, city, sales,
lag(sales) over (partition by state, city order by month) as prev_month,
(-1 + sales / lag(sales) over (partition by state, city order by month)) as change_ratio
from t;

Related

How can I simplify this query to calculate a rolling sum?

I would like to calculate a rolling average per person & city for a year. My data from database 'dbl' first needs to be aggregated by city, person and month year - I then calculate the rolling yearly average. My question is how can I simplify the below /calculate the rolling average differently?
select
city,
month_year,
person,
sum(total) over (partition by person,city order by month_year rows between 10 preceding and current row) rolling_one_year
from
(select
city,
month_year,
person,
sum(amount_dollar) as total
from db1 d
group by 1,2,3) ;

LEAD difference returning no negative value

when I run the below, I was expecting some lead_difference values to be negative (2016 value < 1990 value). However I don't see that in the output. Could you pls help check if I missed anything? Thanks! Images of output
)
WITH a AS
(SELECT
region,
year,
SUM(forest_area_sqkm)/(SUM(total_area_sq_mi*2.59)) as percentage_forest_region
FROM forestation2
GROUP BY region, year)
SELECT
region,
year,
percentage_forest_region,
LEAD(percentage_forest_region) OVER (ORDER BY percentage_forest_region) - percentage_forest_region AS lead_difference
FROM a
WHERE year = '2016' OR year = '1990'
GROUP BY region, year, percentage_forest_region
ORDER BY region, year, percentage_forest_region DESC;
You are leading by the percentage itself. Presumably, you want the next value for the region based on the year. That would be:
LEAD(percentage_forest_region) OVER (PARTITION BY region ORDER BY year) - percentage_forest_region AS lead_difference
If you order by percentage_forest_region then all years and regions are combined and the next value is the next larger value.

SQL - lag variable creation using window function

I have daily city level data with some counts. I have to aggregate this data at monthly level(1st day of each month) and then create lag variables based on last 1 week from 1st day of month.
I have used following code to create lag variables for last 1 month using (after aggregating data at monthly level ( with 1st date of month)
sum(count) over (partition by City order by month_date rows between 1 preceding and 1 preceding) as last_1_month_count
Is there a way to aggregate data at monthly level and create lag variables based on last 7,14,21,28 days using window function?
you can use this L
select
CITY
, month(Date)
, year(date)
, sum(count)
from table1
where date < Datediff(days , 7 , getdate())
group by
City
, month(Date)
, year(date)
I think you're looking for something like this. The first cte summarizes city counts to the day, week, month, year. The second summarizes the counts to the week, month, year. To group sales by weeks starting from the 1st day it uses the DAY function along with YEAR and MONTH. Since DAY returns and integer, groups of distinct weeks can be created by dividing by 7, i.e. DAY(day_dt)/7.
One way to get the prior week sales would be to join the week sales summary cte to itself where the week is offset by -1. Since the prior week might possible have 0 sales it seems safer to LEFT JOIN than to use LAG imo
with
day_sales_cte(city, day_dt, yr, mo, wk, sum_count) as (
select city, day_dt, year(day_dt), month(day_cte), day(day_dt)/7, sum([count]) sum_counts
from city_level_data
group by city, day_dt, year(day_dt), month(day_cte), day(day_dt)/7)
wk_sales_cte(city, yr, mo, wk, sum_count) as (
select city, yr, mo, wk, sum(sum_counts) sum_counts
from sales_cte
group by city, yr, mo, wk)
select ws.*, ws2.sum_sales prior_wk_sales
from wk_sales_cte ws
left join wk_sales_cte ws2 on ws.city=ws2.city
and ws.yr=ws2.yr
and ws.mo=ws2.mo
and ws.wk=ws.wk-1;

Running Count Distinct using Over Partition By

I have a data set with user ids that have made purchases over time. I would like to show a YTD distinct count of users that have made a purchase, partitioned by State and Country. The output would have 4 columns: Country, State, Year, Month, YTD Count of Distinct Users with purchase activity.
Is there a way to do this? The following code works when I exclude the month from the view and do a distinct count:
Select Year, Country, State,
COUNT(DISTINCT (CASE WHEN ActiveUserFlag > 0 THEN MBR_ID END)) AS YTD_Active_Member_Count
From MemberActivity
Where Month <= 5
Group By 1,2,3;
The issue occurs when the user has purchases across multiple months, because I can’t aggregate at a monthly level then sum, because it duplicates user counts.
I need to see the YTD count for each month of the year, for trending purposes.
Return each member only once for the first month they make a purchase, count by month and then apply a Cumulative Sum:
select Year, Country, State, month,
sum(cnt)
over (partition by Year, Country, State
order by month
rows unbounded preceding) AS YTD_Active_Member_Count
from
(
Select Year, Country, State, month,
COUNT(*) as cnt -- 1st purchses per month
From
( -- this assumes there's at least one new active member per year/month/country
-- otherwise there would be mising rows
Select *
from MemberActivity
where ActiveUserFlag > 0 -- only active members
and Month <= 5
-- and year = 2019 -- seems to be for this year only
qualify row_number() -- only first purchase per member/year
over (partition by MBR_ID, year
order by month --? probably there's a purchase_date) = 1
) as dt
group by 1,2,3,4
) as dt
;
Count users in the first month they appear:
select Country, State, year, month,
sum(case when ActiveUserFlag > 0 and seqnum = 1 then 1 else 0 end) as YTD_Active_Member_Count
from (select ma.*,
row_number() over (partition by year order by month) as seqnum
from MemberActivity ma
) ma
where Month <= 5
group by Country, State, year, month;

SQL BigQuery : Calculate Value per time period

I'm new to SQL on BigQuery and I'm blocked on a project I have to compile.
I'm being asked to find the year over year growth of sales in percentage on a database that doesn't even sum the revenues... I know I have to assemble various request but can't figure out how to calculate the growth of sales.
Here is where I am at :
Has Anybody an insight on how to do so?
Thanks a lot !
(1) Starting from what you have, group by product line to get this year and last year's revenue in each row:
#standardsql
with yearly_sales AS (
select year, product_line, sum(revenue) as revenue
from `dataset.sales`
group by product_line, year
),
year_on_year AS (
select array_agg(struct(year, revenue))
OVER(partition by product_line ORDER BY year
RANGE BETWEEN PRECEDING AND CURRENT ROW) AS data
from yearly_sales
)
(2) Compute year-on-year growth from the two values you now have in each row
Below is for BigQuery Standard SQL
#standardSQL
SELECT product_line, year, revenue, prev_year_revenue,
ROUND(100 * (revenue - prev_year_revenue)/prev_year_revenue) year_over_year_growth_percent
FROM (
SELECT product_line, year, revenue,
LAG(revenue) OVER(PARTITION BY product_line ORDER BY year) prev_year_revenue
FROM (
SELECT product_line, year, SUM(revenue) revenue
FROM `project.dataset.table`
GROUP BY product_line, year
)
)
-- ORDER BY product_line, year
I tried with your information (plus mine made up data for 2007) and I arrived here:
SELECT
year,
sum(revenue) as year_sum
FROM
YearlyRevenue.SportCompany
GROUP BY
year
ORDER BY
year_sum
Whose result is:
R year year_sum
1 2005 1.159E9
2 2006 1.4953E9
3 2007 1.5708E9
Now the % growth should be added. Have a look here for inspiration.
Let me know if you don't succeed and I will try the hard part, with no guarantees.