Last three months average for each month in PostgreSQL query - sql

I'm trying to build a query in Postgresql that will be used for a budget.
I currently have a list of data that is grouped by month.
For each month of the year I need to retrieve the average monthly sales from the previous three months. For example, in January I would need the average monthly sales from October through December of the previous year. So the result will be something like:
1 12345.67
2 54321.56
3 242412.45
This is grouped by month number.
Here is a snippet of code from my query that will get me the current month's sales:
LEFT JOIN (SELECT SUM((sti.cost + sti.freight) * sti.case_qty * sti.release_qty)
AS trsf_cost,
DATE_PART('month', st.invoice_dt) as month
FROM stransitem sti,
stocktrans st
WHERE sti.invoice_no = st.invoice_no
AND st.invoice_dt >= date_trunc('year', current_date)
AND st.location_cd = 'SLC'
AND st.order_st != 'DEL'
GROUP BY month) as trsf_cogs ON trsf_cogs.month = totals.month
I need another join that will get me the same thing, only averaged from the previous 3 months, but I'm not sure how.
This will ALWAYS be a January-December (1-12) list, starting with January and ending with December.

This is a classic problem for a window function. Here is how to solve this:
SELECT month_nr
,(COALESCE(m1, 0)
+ COALESCE(m2, 0)
+ COALESCE(m3, 0))
/
NULLIF ( CASE WHEN m1 IS NULL THEN 0 ELSE 1 END
+ CASE WHEN m2 IS NULL THEN 0 ELSE 1 END
+ CASE WHEN m3 IS NULL THEN 0 ELSE 1 END, 0) AS avg_prev_3_months
-- or divide by 3 if 3 previous months are guaranteed or you don't care
FROM (
SELECT date_part('month', month) as month_nr
,lag(trsf_cost, 1) OVER w AS m1
,lag(trsf_cost, 2) OVER w AS m2
,lag(trsf_cost, 3) OVER w AS m3
FROM (
SELECT date_part( 'month', month) as trsf_cost -- some dummy nr. for demo
,month
FROM generate_series('2010-01-01 0:0'::timestamp
,'2012-01-01 0:0'::timestamp, '1 month') month
) x
WINDOW w AS (ORDER BY month)
) y;
This is requires that no month is ever missing! Else, have a look at this related answer:
How to compare the current row with next and previous row in PostgreSQL?
Calculates correct average for every month. If only two previous moths then devide by 2, etc. If no prev. months, result is NULL.
In your subquery, use
date_trunc('month', st.invoice_dt)::date AS month
instead of
DATE_PART('month', st.invoice_dt) as month
so you can sort months over the years easily!
More info
Window function lag()
date_trunc()

Related

Calculate average and standard deviation for pre defined number of values substituting missing rows with zeros

I have a simple table that contains a record of products and their total sales per day over a year (just 3 columns - Product, Date, Sales). So, for example, if product A is sold every single day, it'll have 365 records. Similarly, if product B is sold for only 50 days, the table will have just 50 rows for that product - one for each day of sale.
I need to calculate the daily average sales and standard deviation for the entire year, which means that, for product B, I need to have additional 365-50=315 entries with zero sales to be able to calculate the daily average and standard deviation for the year correctly.
Is there a way to do this efficiently and dynamically in SQL?
Thanks
We can generate 366 rows and join the sales data to it:
WITH rg(rn) AS (
SELECT 1 AS rn
UNION ALL
SELECT a.rn + 1 AS rn
FROM rg a
WHERE a.rn <= 366
)
SELECT
*
FROM
rg
LEFT JOIN (
SELECT YEAR(saledate) as yr, DATEPART(dayofyear, saledate) as doy, count(*) as numsales
FROM sales
GROUP BY YEAR(saledate), DATEPART(dayofyear, saledate)
) s ON rg.rn = s.doy
OPTION (MAXRECURSION 370);
You can replace the nulls (where there is no sale data for that day) with e.g. AVG(COALESCE(numsales, 0)). You'll probably also need a WHERE clause to eliminate the 366th day on non leap years (such as MODULO the year by 4 and only do 366 rows if it's 0).
If you're only doing a single year, you can use a where clause in the sales subquery to give only the relevant records; most efficient is to use a range like WHERE salesdate >= DATEFROMPARTS(YEAR(GetDate()), 1, 1) AND salesdate < DATEFROMPARTS(YEAR(GetDate()) + 1, 1, 1) rather than calling a function on every sales date to extract the year from it to compare to a constant. You can also drop the YEAR(salesdate) from the select/group by if there is only a single year
If you're doing multiple years, you could make the rg generate more rows, or (perhaps simpler) cross join it to a list of years so you get 366 rows multiplied by e.g. VALUES (2015),(2016),(2017),(2018),(2019),(2020) (and make the year from the sales part of the join too)
find the first and last day of the year and then use datediff() to find number of days in that year.
After that don't use AVG on sales, but SUM(Sales) / days_in_year
select *,
days_in_year = datediff(day, first_of_year, last_of_year) + 1
from (values (2019), (2020)) v(year)
cross apply
(
select first_of_year = dateadd(year, year - 1900, 0),
last_of_year = dateadd(year, year - 1900 + 1, -1)
) d
There's a different way to look at it - don't try to add additional empty rows, just divide by the number of days in a year. While the number of days a year isn't constant (a leap year will have 366 days), it can be calculated easily since the first day of the year is always January 1st and the last is always December 31st:
SELECT YEAR(date),
product,
SUM(sales) / DATEPART(dy, DATEFROMPARTS(YEAR(date)), 12, 31))
FROM sales_table
GROUP BY YEAR(date), product

SQL concat case select when trying to add new column of previous month data

I have a data set in which I need to concat month and year for the previous month. the problem is it spans over two years. I need to create a statement in which mean the month - 1 = 0, month becomes 12 and year-1.
Select concat(case
when month > 1 then select(month-1, year)
when month = 1 then select(12, year -1)
else select(month-1, year)
end
as monthyear
from table
You need 2 CASE statements inside concat(), for the month and the year:
select
concat(
case when month > 1 then month - 1 else 12 end,
case when month > 1 then year else year - 1 end
)
If you have month and year as separate columns and want to subtract one month, then you can use:
select (case when month > 1 then year else year - 1 end) as year,
(case when month > 1 then month - 1 else 12 end) as month
Here's a different approach (tried with SQLite3, adjust accordingly to your SQL syntax):
-- some test data first
create table t(yy,mm);
insert into t values
(2018,1),(2018,2),(2018,3),(2018,4),(2018,5),(2018,6),
(2018,7),(2018,8),(2018,9),(2018,10),(2018,11),(2018,12),
(2019,1),(2019,2),(2019,3),(2019,4),(2019,5),(2019,6),
(2019,7),(2019,8),(2019,9),(2019,10),(2019,11),(2019,12),
(2020,1),(2020,2),(2020,3),(2020,4),(2020,5),(2020,6),
(2020,7),(2020,8),(2020,9),(2020,10),(2020,11),(2020,12);
-- current year and month, year of previous month and previous month
select yy,mm,yy-((mm+10)%12+1)/12 as yy_1,(mm+10)%12+1 mm_1 from t;

SQL Server / SSRS: Calculating monthly average based on grouping and historical values

I need to calculate an average based on historical data for a graph in SSRS:
Current Month
Previous Month
2 Months ago
6 Months ago
This query returns the average for each month:
SELECT
avg_val1, month, year
FROM
(SELECT
(sum_val1 / count) as avg_val1, month, year
FROM
(SELECT
SUM(val1) AS sum_val1, SUM(count) AS count, month, year
FROM
(SELECT
COUNT(val1) AS count, SUM(val1) AS val1,
MONTH([SnapshotDate]) AS month,
YEAR([SnapshotDate]) AS year
FROM
[DC].[dbo].[KPI_Values]
WHERE
[SnapshotKey] = 'Some text here'
AND No = '001'
AND Channel = '999'
GROUP BY
[SnapshotDate]) AS sub3
GROUP BY
month, year, count) AS sub2
GROUP BY sum_val1, count, month, year) AS sub1
ORDER BY
year, month ASC
When I add the following WHERE clause I get the average for March (2 months ago):
WHERE month = MONTH(GETDATE())-2
AND year = YEAR(GETDATE())
Now the problem is when I want to retrieve data from 6 months ago; MONTH(GETDATE()) - 6 will output -1 instead of 12. I also have an issue with the fact that the year changes to 2016 and I am a bit unsure of how to implement the logic in my query.
I think I might be going about this wrong... Any suggestions?
Subtract the months from the date using the DATEADD function before you do your comparison. Ex:
WHERE SnapshotDate BETWEEN DATEADD(month, -6, GETDATE()) AND GETDATE()
MONTH(GETDATE()) returns an int so you can go to 0 or negative values. you need a user scalar function managing this, adding 12 when <= 0

Running Total for Current & Previous Year on weekly basis

This query provides Year to date numbers for both Price and Square Feet of the current year and the previous year to date. This is more like the Running Total of the current Year and the Previous year with respect to the weeks in this case from 1 through 7 and so on..... (week 7th of 2017 ended on 02/19/2017) of the current year and the previous year(week 7th of 2016 ended on 02/22/2016). The reason why I am using subqueries is because this is the only way I know to get around this situation. And of course if you think there is a shorter, viable alternative of executing this query, please advice.
Actual_Sale_Date holds data on all of the seven days of the week but we cut off on Sunday that is why 2/22/2016 (Sunday ending 7th week of 2016) and 2/19/2017 (Sunday ending 7th week of 2017).
I tried "Actual_Sale_Date" = date_trunc('week', now())::date - 1 this function only returns the previous week data ending on the passed Sunday. I took a look at interval since dateadd does not exist in postgresql but could not get my ways around with it.
My query:
select (money(Sum("Price") / COUNT("Price"))) as "Avg_Value YTD",
Round(Avg("Price"/"Sq_Ft"),+2) as "Avg_PPSF YTD",
(select
(money(Sum("Price") / COUNT("Price"))) from allsalesdata
where "Actual_Sale_Date" >= '01/01/2016' AND "Actual_Sale_Date" < '02/22/2016'
and "Work_ID" = 'SO') AS "Last Year at this time Avg_Value",
(select Round(Avg("Price"/"Sq_Ft"),+2)
from allsalesdata
where "Actual_Sale_Date" >= '01/01/2016' AND "Actual_Sale_Date" < '02/22/2016'
and "Work_ID" = 'SO') AS "Last Year at this time Avg_PPSF"
from allsalesdata
where "Actual_Sale_Date" >= '01/01/2017' AND "Actual_Sale_Date" <'02/20/2017'
and "Work_ID" = 'SO'
Sample Data:
Price Sq_Ft Actual_Sale_Date Work_ID
45871 3583 01/15/2016 SO
55874 4457 02/05/2016 SO
88745 4788 02/20/2016 SO
58745 1459 01/10/2016 SO
88749 2145 01/25/2017 SO
74856 1478 01/25/2017 SO
74586 4587 01/31/2017 ABC
74745 1142 02/10/2017 SO
74589 2214 02/19/2017 SO
This should be what you need (assuming you have a recent version of PG):
SELECT DISTINCT wk AS "Week",
sum("Price")::money FILTER (WHERE yr = 2017) OVER w /
count("Price") FILTER (WHERE yr = 2017) OVER w AS "Avg_Value YTD",
sum("Price")::money FILTER (WHERE yr = 2017) OVER w /
sum("Sq_Ft") FILTER (WHERE yr = 2017) OVER w AS "Avg_PPSF YTD",
sum("Price")::money FILTER (WHERE yr = 2016) OVER w /
count("Price") FILTER (WHERE yr = 2016) OVER w AS "Last Year this time Avg_Value",
sum("Price")::money FILTER (WHERE yr = 2016) OVER w /
sum("Sq_Ft") FILTER (WHERE yr = 2016) OVER w AS "Last Year this time Avg_PPSF",
FROM (
SELECT extract(isoyear from "Actual_Sale_Date")::integer AS yr,
extract(week from "Actual_Sale_Date")::integer AS wk,
"Price", "Sq_Ft"
FROM allsalesdata
WHERE "Work_ID" = 'SO') sub
-- optional, show only completed weeks in this year:
WHERE wk <= extract(week from CURRENT_DATE)::integer - 1
WINDOW w AS (ORDER BY wk)
ORDER BY wk;
In the inner query the year and week of the sale date are extracted for every sale. The week starts on Monday, as per your requirement.
In the main query these rows are processed as a single partition frame, i.e. from the start of the partition (= first row) to the last peer of the current row. Since the window definition orders the rows by wk, all rows from the start (week = 1) to the current week are included in the summarization. This will give you the running total. The sum() and count() functions filter by the year in question and the DISTINCT clause ensures that you get only a single row per week.

MySQL AVG function for recent 15 records by date (order date desc) in every symbol

I am trying to create a statement in SQL (for a table which holds stock symbols and price on specified date) with avg of 5 day price and avg of 15 days price for each symbol.
Table columns:
symbol
open
high
close
date
The average price is calculated from last 5 days and last 15 days. I tried this for getting 1 symbol:
SELECT avg(close),
avg(`trd_qty`)
FROM (SELECT *
FROM cashmarket
WHERE symbol = 'hdil'
ORDER BY `M_day` desc
LIMIT 0,15 ) s
but I couldn't get the desired list for showing avg values for all symbols.
You can either do it with row numbers as suggested by astander, or you can do it with dates.
This solution will also take the last 15 days if you don't have rows for every day while the row number solution takes the last 15 rows. You have to decide which one works better for you.
EDIT: Replaced AVG, use CASE to avoid division by 0 in case no records are found within the period.
SELECT
CASE WHEN SUM(c.is_5) > 0 THEN SUM( c.close * c.is_5 ) / SUM( c.is_5 )
ELSE 0 END AS close_5,
CASE WHEN SUM(c.is_5) > 0 THEN SUM( c.trd_qty * c.is_5 ) / SUM( c.is_5 )
ELSE 0 END AS trd_qty_5,
CASE WHEN SUM(c.is_15) > 0 THEN SUM( c.close * c.is_15 ) / SUM( c.is_15 )
ELSE 0 END AS close_15,
CASE WHEN SUM(c.is_15) > 0 THEN SUM( c.trd_qty * c.is_15 ) / SUM( c.is_15 )
ELSE 0 END AS trd_qty_15
FROM
(
SELECT
cashmarket.*,
IF( TO_DAYS(NOW()) - TO_DAYS(m_day) < 15, 1, 0) AS is_15,
IF( TO_DAYS(NOW()) - TO_DAYS(m_day) < 5, 1, 0) AS is_5
FROM cashmarket
) c
The query returns the averages of close and trd_qty for the last 5 and the last 15 days. Current date is included, so it's actually today plus the last 4 days (replace < by <= to get current day plus 5 days).
Use:
SELECT DISTINCT
t.symbol,
x.avg_5_close,
y.avg_15_close
FROM CASHMARKET t
LEFT JOIN (SELECT cm_5.symbol,
AVG(cm_5.close) 'avg_5_close',
AVG(cm_5.trd_qty) 'avg_5_qty'
FROM CASHMARKET cm_5
WHERE cm_5.m_date BETWEEN DATE_SUB(NOW(), INTERVAL 5 DAY) AND NOW()
GROUP BY cm_5.symbol) x ON x.symbol = t.symbol
LEFT JOIN (SELECT cm_15.symbol,
AVG(cm_15.close) 'avg_15_close',
AVG(cm_15.trd_qty) 'avg_15_qty'
FROM CASHMARKET cm_15
WHERE cm_15.m_date BETWEEN DATE_SUB(NOW(), INTERVAL 15 DAY) AND NOW()
GROUP BY cm_15.symbol) y ON y.symbol = t.symbol
I'm unclear on what trd_qty is, or how it factors into your equation considering it isn't in your list of columns.
If you want to be able to specify a date rather than the current time, replace the NOW() with #your_date, an applicable variable. And you can change the interval values to suit, in case they should really be 7 and 21.
Have a look at How to number rows in MySQL
You can create the row number per item for the date desc.
What you can do is to retrieve the Rows where the rownumber is between 1 and 15 and then apply the group by avg for the selected data you wish.
trdqty is the quantity traded on particular day.
the days are not in order coz the market operates only on weekdays and there are holidays too so date may not be continuous