Include Non-Existent Rows in Aggregate (Partition with Preceding and Following Rows) - sql

I'm trying to get an average of the previous six months counts. However, I noticed if there are only 4 previous months, it will only do an average of those 4 months instead of 6 months. Is there a way to make so I forcefully sum over the 6 months?
select
ST.AccountNumber
, st.PrevMonth
, st.[Transaction Effective Date]
, st.[Transaction Amt]
, st.CurrentMonthTransCnt
, mt.EndOfMonth
, AvgMonthlyTransCntLast6Months = AVG(isnull(cnt, 0)) OVER (PARTITION BY MT.AccountNumber order by rowid ROWS BETWEEN 1 following and 6 following)
--into #AvgCntAndStdDev
from EDWAnalytics.ML.TEMP_SymitarTransactionsFinal as ST
left join MonthlyTransCnt as MT on MT.EndOfMonth = ST.PrevMonth and ST.AccountNumber = MT.AccountNumber
where ST.AccountNumber = '0000709510'

If you want the average of the previous six months for each month then the order should be inverted:
order by rowid desc
Another option is to use the current order, but with ROWS BETWEEN 6 preceding and 1 preceding
If you want the average to always be computed over six months you should replace avg with sum/6.
isnull(sum(cnt) OVER (PARTITION BY MT.AccountNumber order by rowid ROWS BETWEEN 6 preceding and 1 preceding),0)/6

Related

calculate window average without partition or order by column in SQL

If I have a data frame with only number of clicks at certain fixed time interval looking like this:
1
3
4
2
6
1
And I want to calculate their rolling average with the 5 rows above, would this be legit:
SELECT AVG(value) OVER (ORDER BY 1 ASC ROWS 4 PRECEDING ) AS avg_value
FROM df GROUP BY 1
Or should it be
SELECT AVG(value) OVER (PARTITION BY 1 ASC ROWS 4 PRECEDING) AS
avg_value FROM df GROUP BY 1
You seem to want:
SELECT df.*,
AVG(value) OVER (ORDER BY datetimecol ASC
ROWS 4 PRECEDING
) AS avg_rolling_5
FROM df;
Notes:
A rolling average requires implies an ordering on the data. The datetimecol is for the column that represents that ordering.
A rolling average is for the original data, not the aggregated data, so no order by is needed.
SQL databases have tables, not dataframes.

SQL Server : PRECEDING with another condition

I have a query that is working fine: The query is to find the sum & Avg for the last 3 months and last year. It is working fine, till I got a new request to break the query down to more details by AwardCode.
So how to include that?
I mean for this section
SUM(1.0 * InvolTerm) OVER (ORDER BY Calendar_Date ASC
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS InvolMov3Mth,
I want to find the last 3 months based on AwardCode.
My original query that is working is
SELECT
Calendar_Date, Mth, NoOfEmp, MaleCount, FemaleCount,
SUM(1.0*InvolTerm) OVER (ORDER BY Calendar_Date ASC
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS InvolMov3Mth,
SUM(1.0*TotalTerm) OVER (ORDER BY Calendar_Date ASC
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS TermSum12Mth
FROM #X
The result is
But now I need to add another group AwardCode
SELECT
Mth, AwardCode, NoOfEmp, MaleCount, FemaleCount,
SUM(1.0 * InvolTerm) OVER (ORDER BY Calendar_Date ASC
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS InvolMov3Mth,
SUM(1.0 * TotalTerm) OVER (ORDER BY Calendar_Date ASC
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS TermSum12Mth
FROM #X
The result will be like this
You can notice that the sum of InvolMov3Mth & TermSum12Mth for the whole period does not match the query above
I think I found the answer for my question.
I used PARTITION BY AwardCode before ORDER BY
seems to be working.
SUM(1.0*TotalTerm) OVER (PARTITION BY AwardCode ORDER BY Calendar_Date ASC
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS TermSum12Mth,
Yes. "Partition by" will make it work for your requirment

Get moving average over time frame in PostgreSQL with inconsistent data

I have a table called answers with columns created_at and response, response being an integer 0 (for 'no'), 1 (for 'yes'), or 2 (for 'don't know'). I want to get a moving average for the response values, filtering out 2s for each day, only taking in to account the previous 30 days. I know you can do ROWS BETWEEN 29 AND PRECEDING AND CURRENT ROW but that only works if you have data for each day, and in my case there might be no data for a week or more.
My current query is this:
SELECT answers.created_at, answers.response,
AVG(answers.response)
OVER(ORDER BY answers.created_at::date ROWS
BETWEEN 29 PRECEDING AND CURRENT ROW) AS rolling_average
FROM answers
WHERE answers.user_id = 'insert_user_id''
AND (answers.response = 0 OR answers.response = 1)
GROUP BY answers.created_at, answers.response
ORDER BY answers.created_at::date
But this will return an average based on the previous rows, if a user responded with a 1 on 2018-3-30 and a 0 on 2018-5-15, the rolling average on 2018-5-15 would be 0.5 instead of 0 as I want. How can I create a query that will only take in to account the responses that were created within the last 30 days for the rolling average?
Since Postgres 11 you can do this:
SELECT created_at,
response,
AVG(response) OVER (ORDER BY created_at
RANGE BETWEEN '29 day' PRECEDING AND current row) AS rolling_average
FROM answers
WHERE user_id = 1
AND response in (0,1)
ORDER BY created_at;
Try something like this:
SELECT * FROM (
SELECT
d.created_at, d.response,
Avg(d.response) OVER(ORDER BY d.created_at::date rows BETWEEN 29 PRECEDING AND CURRENT row) AS rolling_average
FROM (
SELECT
COALESCE(a.created_at, d.dates) AS created_at, response, a.user_id
FROM
(SELECT generate_series('2018-01-01'::date, '2018-05-31'::date, '1day'::interval)::date AS dates) d
LEFT JOIN
(SELECT * FROM answers WHERE answers.user_id = 'insert_user_id' AND ( answers.response = 0 OR answers.response = 1)) a
ON d.dates = a.created_at::date
) d
GROUP BY d.created_at, d.response
) agg WHERE agg.response IS NOT NULL
ORDER BY agg.created_at::date
generate_series creates list of days - you have to set reasonable boundaries
this list of days is LEFT JOINed with preselected answers
this result is used for rolling average calculation
after it I select only records with response and I get:
created_at | response | rolling_averagte
2018-03-30 | 1 | 1.00000000000000000000
2018-05-15 | 0 | 0.00000000000000000000

Cumulative Additions in SQL Server 2008

I am trying to add the previous row value to the current row and keep this adding until I reach the total.
I have tried the below query and I have set row number to the rows as per my needs, the additions should take place in the same manner.
select *
from #Finalle
order by rownum ;
Output:
Type Count_DB Rownum
------------------------------------------
Within 30 days 399480 1
Within 60 days 30536 2
Within 90 days 10432 3
Within 120 days 11777 4
Greater than 120 days 13091 5
Blank 29297 6
Total 494613 7
When I try the below query, it works fine until the 6th row, but fails for the last row:
select
f1.[type],
(select
Sum(f.[Count of ED_DB_category]) as [Cummumative]
from
#Finalle f
where
f1.rownum >= f.rownum)
from
#Finalle f1
order by
rownum
Output:
Type No column name
--------------------------------------
Within 30 days 399480
Within 60 days 430016
Within 90 days 440448
Within 120 days 452225
Greater than 120 days 465316
Blank 494613 <---Add only until here
Total 989226
Here the total should return the same value as in the first table.
How do I achieve this?
Try this:
select f1.[type],
CASE
WHEN type = 'Total' THEN f1.[Count_DB]
ELSE SUM(f1.[Count_DB]) OVER (ORDER BY rownum)
END
from #Finalle f1
order by rownum
You may looking for this
select f1.[type],
(select Sum(f.[Count of ED_DB_category])
from #Finalle f where f1.rownum >= f.rownum AND
f.Type <> 'Total'
)as [Cummumative]
from #Finalle f1
order by rownum
Maybe you could use windowed function....
Here's a SWAG but I am not sure of your table structures.....or what to order by
SELECT
f1.[type],
SUM(f.[Count of ED_DB_category]) OVER(ORDER BY f1.[type] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS [Cummumative]
FROM
FROM #Finalle f1
WHERE f.Type <> 'Total'

Moving trailing 13-week average in Postgres

I am trying to build a view that generates a movable 13-week average over the past year.
My source data includes a date, customer ID, and integer, and basically I want to average the 13 prior values (including the current one), over the previous 52 weeks. When I'm finished, I'd like to have a table with a date, each customer ID, and trailing 13-week average for that customer.
After upgrading Postgres to 9.1, the window functions worked great for this:
SELECT vs.weekending,
cs.slinkcust AS customer,
cs.slinkid AS id,
round(avg(vs.maxattached) OVER (PARTITION BY cs.slinkid ORDER BY vs.weekending DESC ROWS BETWEEN 0 PRECEDING AND 12 FOLLOWING), 2) AS rolling_conc_avg,
round(avg(vs.totsessions) OVER (PARTITION BY cs.slinkid ORDER BY vs.weekending DESC ROWS BETWEEN 0 PRECEDING AND 12 FOLLOWING), 2) AS rolling_sess_avg,
dense_rank() OVER (ORDER BY vs.weekending) AS week_number
FROM cfg_slink cs
JOIN view_statslink vs ON cs.slinkid = vs.id
WHERE vs.weekending >= (now() - '364 days'::interval) AND cs.disabled = 0
GROUP BY vs.weekending, cs.slinkid, vs.maxattached, vs.totsessions
ORDER BY vs.weekending DESC, cs.slinkcust;