BigQuery - Year over Year Comparison with Month to Date - sql

I am having trouble accurately doing a year over year comparison by month but at any point during the month. For example for August 2022 vs 2021, I want to compare August 1 to today, rather than full month of August 2021.
My data has a date field.
I want the final result to basically be:
Product_ID, Year, Month, PY_Sales, CY_Sales
I have daily totals. Some products do have not sales on certain days though. Here's an example:
product_id
sale_date
units
1
2021-01-01
5
2
2021-01-02
4
...
...
...
1
2021-06-05
2
2
2021-08-01
1
2
2021-08-31
6
2
2022-01-06
1
2
2022-08-15
9
The final result for August should be:
product_id
Year
Month
PY_Sales
CY_Sales
2
2022
8
1
9
Right now my code will show 7 for August for product_id = 2 because 6 sales happened on August 31st but that day hasn't happened yet in 2022.
This is the code I have, but it doesn't do MTD. Right now, PY_Sales for August 2022 is showing the entire August of 2021, but I want it to show the MTD of August 2021. I used this code because some products do not have sales on certain months.
WITH cte AS
(
SELECT
PRODUCT_ID,
EXTRACT(YEAR FROM SALE_DATE) AS Year,
EXTRACT(MONTH FROM SALE_DATE) AS Month,
CONCAT(EXTRACT(YEAR FROM SALE_DATE), '-',EXTRACT(MONTH FROM SALE_DATE)) AS Year_Month,
SUM(Units) AS Units
FROM data
WHERE Product_ID = 1
AND DATE(SALE_DATE) >= '2019-01-01'
GROUP BY 1, 2, 3
),
diff AS
(
SELECT
COALESCE(c.PRODUCT_ID, p.PRODUCT_ID) AS Product_ID,
COALESCE(c.Year, p.Year + 1) AS Year,
COALESCE(c.Month, p.Month) AS Month,
IFNULL(c.Units, 0) AS Current_Units,
IFNULL(p.Units, 0) AS Previous_Units,
NULLIF(((IFNULL(c.Units, 0) - IFNULL(p.Units,0)) / p.Units),0) * 100 AS Percent_Change
FROM CTE c
FULL OUTER JOIN CTE p ON c.PRODUCT_ID = p.PRODUCT_ID AND c.Year = p.Year + 1 AND c.Month = p.Month
WHERE c.Year <= EXTRACT(YEAR FROM CURRENT_DATE())
ORDER BY 2, c.Year, c.Month
)
SELECT *
FROM diff
--This is to avoid dividing by 0
WHERE diff.Previous_Units > 0
--AND Percent_Change <= -.5
I'm being a little repetitive but I hope this is clear! Thank you so much!

In the cte table you summarize the sold units by month and year.
Your question can be solved by adding here a column units_last_year. This contains the units, which are sold up to the day one year ago. Today is the 27th of August 2022, therefore the units on the 31th of August 2021 will be set to zero.
SUM(Units) AS Units,
SUM(IF(SALE_DATE< date_sub(current_Date(),interval 1 year), Units, 0 )) as units_last_year
Please use the safe_divide command, if there is any chance of diving by zero
Here is the full query with example data.
You given an example of fixed dates, which are compared to the current date. Therefore, the query would not show the desired effect after 30th of August 2022.
The product_id three is made up values related to the current date, thus the following query yields results after August 2022.
with data as (
select *,date(sale_date_) as sale_date
from (
Select 1 product_id, "2021-01-01" sale_date_, 5 units
union all select 2,"2021-01-02", 4
union all select 1,"2021-06-05", 2
union all select 2,"2021-08-01", 1
union all select 2,"2021-08-31", 6
union all select 2,"2022-01-06", 1
union all select 2,"2022-08-15", 9
union all select 3, current_date(), 10
union all select 3, date_sub(current_date(),interval 1 year), 9
union all select 3, date_sub( date_trunc(current_date(),month),interval 1 year), 1
)
),
cte AS
(
SELECT
PRODUCT_ID,
EXTRACT(YEAR FROM SALE_DATE) AS Year,
EXTRACT(MONTH FROM SALE_DATE) AS Month,
CONCAT(EXTRACT(YEAR FROM SALE_DATE), '-',EXTRACT(MONTH FROM SALE_DATE)) AS Year_Month,
SUM(Units) AS Units,
sum(if(SALE_DATE< date_sub(current_Date(),interval 1 year), units, 0 )) as units_last_year
FROM data
WHERE # Product_ID = 1 AND
DATE(SALE_DATE) >= '2019-01-01'
GROUP BY 1, 2, 3, 4
),
diff AS
(
SELECT
COALESCE(c.PRODUCT_ID, p.PRODUCT_ID) AS Product_ID,
COALESCE(c.Year, p.Year + 1) AS Year,
COALESCE(c.Month, p.Month) AS Month,
IFNULL(c.Units, 0) AS Current_Units,
IFNULL(p.Units, 0) AS Previous_Units,
IFNULL(p.Units_last_Year, 0) AS Previous_Units_ok,
NULLIF(((IFNULL(c.Units, 0) - IFNULL(p.Units,0)) / p.Units),0) * 100 AS Percent_Change,
NULLIF(safe_divide((IFNULL(c.Units, 0) - IFNULL(p.Units_last_Year,0)) , p.Units_last_Year),0) * 100 AS Percent_Change_ok,
FROM CTE c
FULL OUTER JOIN CTE p ON c.PRODUCT_ID = p.PRODUCT_ID AND c.Year = p.Year + 1 AND c.Month = p.Month
WHERE c.Year <= EXTRACT(YEAR FROM CURRENT_DATE())
ORDER BY 2, c.Year, c.Month
)
SELECT *
FROM diff

Related

Year over Year by Month Comparison and Month to Date in BigQuery

Edit: #shawnt00 has the correct answer. Thank you very much!
I am having trouble accurately doing a year over year comparison by month but at any point during the month. For example for August 2022 vs 2021, I want to compare August 1 - August 25, rather than full month of August 2021.
I am also using a daily date field.
I want the final result to basically be:
Product_ID, Year, Month, PY_Sales, CY_Sales
Edit: I have daily totals. Some products do have not sales on certain days though:
product_id
sale_date
units
1
2021-01-01
5
2
2021-01-02
4
...
...
...
1
2021-06-05
2
2
2022-01-06
1
2
2022-08-15
9
This is the code I have, but it doesn't do MTD. So 2021 August is the entire month of August and I want it the same dates for 2022. I used this code because some products do not have sales on certain months.
WITH cte AS
(
SELECT
PRODUCT_ID,
EXTRACT(YEAR FROM SALE_DATE) AS Year,
EXTRACT(MONTH FROM SALE_DATE) AS Month,
CONCAT(EXTRACT(YEAR FROM SALE_DATE), '-',EXTRACT(MONTH FROM SALE_DATE)) AS Year_Month,
SUM(Units) AS Units
FROM data
WHERE Product_ID = 1
AND DATE(SALE_DATE) >= '2019-01-01'
GROUP BY 1, 2, 3
),
diff AS
(
SELECT
COALESCE(c.PRODUCT_ID, p.PRODUCT_ID) AS Product_ID,
COALESCE(c.Year, p.Year + 1) AS Year,
COALESCE(c.Month, p.Month) AS Month,
IFNULL(c.Units, 0) AS Current_Units,
IFNULL(p.Units, 0) AS Previous_Units,
NULLIF(((IFNULL(c.Units, 0) - IFNULL(p.Units,0)) / p.Units),0) * 100 AS Percent_Change
FROM CTE c
FULL OUTER JOIN CTE p ON c.PRODUCT_ID = p.PRODUCT_ID AND c.Year = p.Year + 1 AND c.Month = p.Month
WHERE c.Year <= EXTRACT(YEAR FROM CURRENT_DATE())
ORDER BY 2, c.Year, c.Month
)
SELECT *
FROM diff
--This is to avoid dividing by 0
WHERE diff.Previous_Units > 0
--AND Percent_Change <= -.5
You could just roll up two different monthly totals and then switch for the current month comparison:
with agg as (
select
PRODUCT_ID,
extract(year from SALE_DATE) as yr,
extract(month from SALE_DATE) as mth,
sum(Units) as Units,
sum(case when extract(day from SALE_DATE) <= extract(day from current_date())
then Units end) as UnitsMTD
from data
where date(SALE_DATE) >= '2019-01-01' -- one year before report output
group by 1, 2, 3
)
select c.Yr, c.Mth, c.PRODUCT_ID,
case when Yr = extract(year from current_date())
and Mth = extract(month from current_date())
then (c.UnitsMTD - p.UnitsMTD) / p.UnitsMTD
else (c.Units - p.Units ) / p.Units
end as Percent_Change
from agg c left outer join agg p
on p.Product_ID = c.Product_ID and p.Yr = c.Yr - 1 and p.Mth = c.Mth
order by c.Yr, c.Mth, c.PRODUCT_ID;
Note my earlier comment about leap years. This will treat February 28 of the year following a leap year as an "MTD" month. You might need to handle that differently inside the case expression.

What to use in place of union in above query i wrote or more optimize query then my given query without union and union all

I am counting the birthdays , sales , order in all 12 months from customers table in SQL server like these
In Customers table birth_date ,sale_date, order_date are columns of the table
select 1 as ranking,'Birthdays' as Type,[MONTH],TOTAL
from ( select DATENAME(month, birth_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, birth_date)
)x
union
select 2 as ranking,'sales' as Type,[MONTH],TOTAL
from ( select DATENAME(month, sale_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, sale_date)
)x
union
select 3 as ranking,'Orders' as Type,[MONTH],TOTAL
from ( select DATENAME(month, order_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, order_date)
)x
And the output is like these(just dummy data)
ranking
Type
MONTH
TOTAL
1
Birthdays
January
12
1
Birthdays
April
6
1
Birthdays
May
10
2
Sales
Febrary
8
2
Sales
April
14
2
Sales
May
10
3
Orders
June
4
3
Orders
July
3
3
Orders
October
6
3
Orders
December
17
I want to find count of these all these three types without using UNION and UNION ALL, means I want these data by single query statement (or more optimize version of these query)
Another approach is to create a CTE with all available ranking values ​​and use CROSS APPLY for it, as shown below.
WITH ranks(ranking) AS (
SELECT * FROM (VALUES (1), (2), (3)) v(r)
)
SELECT
r.ranking,
CASE WHEN r.ranking = 1 THEN 'Birthdays'
WHEN r.ranking = 2 THEN 'Sales'
WHEN r.ranking = 3 THEN 'Orders'
END AS Type,
DATENAME(month, CASE WHEN r.ranking = 1 THEN c.birth_date
WHEN r.ranking = 2 THEN c.sale_date
WHEN r.ranking = 3 THEN c.order_date
END) AS MONTH,
COUNT(*) AS TOTAL
FROM customers c
CROSS APPLY ranks r
GROUP BY r.ranking,
DATENAME(month, CASE WHEN r.ranking = 1 THEN c.birth_date
WHEN r.ranking = 2 THEN c.sale_date
WHEN r.ranking = 3 THEN c.order_date
END)
ORDER BY r.ranking, MONTH

Count active users if they have made a buy in the last three months (Historical) SQL

This is my query
with months (Date,Familia) as (
select cast(eomonth(datefromparts(year(date_var),
month(date_var),01)) as datetime) as Fecha, p.family as 'Familia'
from sales v
left join products p on p.id_product=v.id_product
where date_var >= '2016-08-01'
group by date_var, p.family
)
select m.Date, m.Family, (
select count(distinct v.user_id)
from sales v
where datediff(month, m.Date, v.date_var) between -2 and 0 and
v.date_var >= '2016-08-01'
) as 'Active Users'
from months m
group by m.family, m.Date
order by m.Date
I want to obtain the number of active users, taking into account that a user counts as active if they have made a purchase in the last three months.
For instance
family
year
month
#
Nubrenza
2017
1
2500
Keppra
2017
1
350
Nubrenza
2017
2
2400
Keppra
2017
2
357
Active users of January 2017 would be count( DISTINCT users) who have made a transaction in January 2017, Dec 2016 and / or Nov 2016 and so on...
Update my query is now showing the distinct count of users grouping them by month but it's returning the same value for all my families, how can I fix that?
You can generate the months and use a subquery:
with months as (
select convert(date, '2017-01-01') as month
union all
select dateadd(month, 1, month)
from months
where month < '2018-01-01'
)
select m.month,
(select count(*)
from mytable t
where datediff(month, date_var, m.month) between 0 and 2
)
from months m;

SQL: add missing months from different years

SQL SERVER
[CreatedOn] - DATETIME
I get this table:
Year Month Count
2009 7 1
2009 9 1
2010 1 2
2010 3 13
From query:
SELECT
YEAR ([CreatedOn]) AS 'Year',
MONTH ([CreatedOn]) AS 'Month',
COUNT ([CreatedOn]) AS 'Count'
FROM xxx
GROUP BY YEAR ([CreatedOn]), MONTH ([CreatedOn])
How can I get table like this (with missed months and Count 0):
Year Month Count
2009 7 1
2009 8 0
2009 9 1
2009 10 0
2009 11 0
2009 12 0
2010 1 2
2010 2 0
2010 3 13
Syntax says you are using MSSQL. Use Recursive CTE to generate the calender table then do a Left outer join with XXX table
DECLARE #maxdate DATE = (SELECT Max([CreatedOn])
FROM xxx);
WITH calender
AS (SELECT Min([CreatedOn]) dates,
FROM xxx
UNION ALL
SELECT Dateadd(mm, 1, dates)
FROM cte
WHERE dates < #maxdate)
SELECT Year(dates) [YEAR],
Month(dates) [month],
Count ([CreatedOn]) AS 'Count'
FROM calender a
LEFT OUTER JOIN xxx b
ON Year(dates) = Year ([CreatedOn])
AND Month(dates) = Month ([CreatedOn])
GROUP BY Year(dates),
Month(dates)
Note : Instead of Recursive CTE create a physical calender table
This will use a build in table to create the calendar:
;WITH limits as
(
SELECT min([CreatedOn]) mi, max([CreatedOn]) ma
FROM xxx
), months as(
SELECT
dateadd(mm, number, mi) m
FROM
master..spt_values v
JOIN
limits l
ON
number between 0 and datediff(mm, l.mi, l.ma)
WHERE
v.type = 'P'
)
SELECT
year(months.m) year,
month(months.m) month,
count(qry.[CreatedOn]) cnt
FROM
xxx qry
RIGHT JOIN
months
ON
months.m = dateadd(mm, datediff(mm, 0, qry.[CreatedOn]), 0)
GROUP BY
year(months.m),
month(months.m)

SQL spread month value into weeks

I have a table where I have values by month and I want to spread these values by week, taking into account that weeks that spread into two month need to take part of the value of each of the month and weight on the number of days that correspond to each month.
For example I have the table with a different price of steel by month
Product Month Price
------------------------------------
Steel 1/Jan/2014 100
Steel 1/Feb/2014 200
Steel 1/Mar/2014 300
I need to convert it into weeks as follows
Product Week Price
-------------------------------------------
Steel 06-Jan-14 100
Steel 13-Jan-14 100
Steel 20-Jan-14 100
Steel 27-Jan-14 128.57
Steel 03-Feb-14 200
Steel 10-Feb-14 200
Steel 17-Feb-14 200
As you see above, the week that overlaps between Jan and Feb needs to be calculated as follows
(100*5/7)+(200*2/7)
This takes into account tha the week of the 27th has 5 days that fall into Jan and 2 into Feb.
Is there any possible way to create a query in SQL that would achieve this?
I tried the following
First attempt:
select
WD.week,
PM.PRICE,
DATEADD(m,1,PM.Month),
SUM(PM.PRICE/7) * COUNT(*)
from
( select '2014-1-1' as Month, 100 as PRICE
union
select '2014-2-1' as Month, 200 as PRICE
)PM
join
( select '2014-1-20' as week
union
select '2014-1-27' as week
union
select '2014-2-3' as week
)WD
ON WD.week>=PM.Month
AND WD.week < DATEADD(m,1,PM.Month)
group by
WD.week,PM.PRICE, DATEADD(m,1,PM.Month)
This gives me the following
week PRICE
2014-1-20 100 2014-02-01 00:00:00.000 14
2014-1-27 100 2014-02-01 00:00:00.000 14
2014-2-3 200 2014-03-01 00:00:00.000 28
I tried also the following
;with x as (
select price,
datepart(week,dateadd(day, n.n-2, t1.month)) wk,
dateadd(day, n.n-1, t1.month) dt
from
(select '2014-1-1' as Month, 100 as PRICE
union
select '2014-2-1' as Month, 200 as PRICE) t1
cross apply (
select datediff(day, t.month, dateadd(month, 1, t.month)) nd
from
(select '2014-1-1' as Month, 100 as PRICE
union
select '2014-2-1' as Month, 200 as PRICE)
t
where t1.month = t.month) ndm
inner join
(SELECT (a.Number * 256) + b.Number AS N FROM
(SELECT number FROM master..spt_values WHERE type = 'P' AND number <= 255) a (Number),
(SELECT number FROM master..spt_values WHERE type = 'P' AND number <= 255) b (Number)) n --numbers
on n.n <= ndm.nd
)
select min(dt) as week, cast(sum(price)/count(*) as decimal(9,2)) as price
from x
group by wk
having count(*) = 7
order by wk
This gimes me the following
week price
2014-01-07 00:00:00.000 100.00
2014-01-14 00:00:00.000 100.00
2014-01-21 00:00:00.000 100.00
2014-02-04 00:00:00.000 200.00
2014-02-11 00:00:00.000 200.00
2014-02-18 00:00:00.000 200.00
Thanks
If you have a calendar table it's a simple join:
SELECT
product,
calendar_date - (day_of_week-1) AS week,
SUM(price/7) * COUNT(*)
FROM prices AS p
JOIN calendar AS c
ON c.calendar_date >= month
AND c.calendar_date < DATEADD(m,1,month)
GROUP BY product,
calendar_date - (day_of_week-1)
This could be further simplified to join only to mondays and then do some more date arithmetic in a CASE to get 7 or less days.
Edit:
Your last query returned jan 31st two times, you need to remove the =from on n.n < ndm.nd. And as you seem to work with ISO weeks you better change the DATEPART to avoid problems with different DATEFIRST settings.
Based on your last query I created a fiddle.
;with x as (
select price,
datepart(isowk,dateadd(day, n.n, t1.month)) wk,
dateadd(day, n.n-1, t1.month) dt
from
(select '2014-1-1' as Month, 100.00 as PRICE
union
select '2014-2-1' as Month, 200.00 as PRICE) t1
cross apply (
select datediff(day, t.month, dateadd(month, 1, t.month)) nd
from
(select '2014-1-1' as Month, 100.00 as PRICE
union
select '2014-2-1' as Month, 200.00 as PRICE)
t
where t1.month = t.month) ndm
inner join
(SELECT (a.Number * 256) + b.Number AS N FROM
(SELECT number FROM master..spt_values WHERE type = 'P' AND number <= 255) a (Number),
(SELECT number FROM master..spt_values WHERE type = 'P' AND number <= 255) b (Number)) n --numbers
on n.n < ndm.nd
) select min(dt) as week, cast(sum(price)/count(*) as decimal(9,2)) as price
from x
group by wk
having count(*) = 7
order by wk
Of course the dates might be from multiple years, so you need to GROUP BY by the year, too.
Actually, you need to spred it over days, and then get the averages by week. To get the days we'll use the Numbers table.
;with x as (
select product, price,
datepart(week,dateadd(day, n.n-2, t1.month)) wk,
dateadd(day, n.n-1, t1.month) dt
from #t t1
cross apply (
select datediff(day, t.month, dateadd(month, 1, t.month)) nd
from #t t
where t1.month = t.month and t1.product = t.product) ndm
inner join numbers n on n.n <= ndm.nd
)
select product, min(dt) as week, cast(sum(price)/count(*) as decimal(9,2)) as price
from x
group by product, wk
having count(*) = 7
order by product, wk
The result of datepart(week,dateadd(day, n.n-2, t1.month)) expression depends on SET DATEFIRST so you might need to adjust accordingly.