Rolling Average SQL - sql

Hi I have a dataset where I have Year Month and output variables with the values as following:
Year | Month | Output
2015 | 1 | 12
2015 | 2 | 24
2015 | 3 | 2
2015 | 4 | 3
2015 | 5 | 7
2015 | 6 | 3
2015 | 7 | 7
2015 | 8 | 6
2015 | 9 | 7
2015 | 10 | 8
2015 | 11 | 3
2015 | 12 | 6
2016 | 1 | 3
2016 | 2 | 6
2016 | 3 | 8
2016 | 4 | 9
2016 | 5 | 4
......... and so on...
I want to add a new column in the dataset as Rolling_Average
Rolling_Average = Sum of previous 12 month Output/ Output of this month
for example :
Rolling_Average (for 2015-7) = output (2015-01) + output (2015-02) +output (2015-03) + output (2015-04) +output (2015-05) + output (2015-06) / output (2015-07)
I tried couple of queries online to get the output but it didn't work for me. Can someone please help me
Output Required is as follows:
Year | Month | Output | Rolling Average
2015 | 1 | 12 | 12
2015 | 2 | 24 | 0.5
2015 | 3 | 2 | 18
2015 | 4 | 3 | 38/3
2015 | 5 | 7 | 45/7
2015 | 6 | 3 | 48/3
2015 | 7 | 7 | 55/7
2015 | 8 | 6 | 61/6
2015 | 9 | 7 | 68/7
2015 | 10 | 8 | 74/8
2015 | 11 | 3 | 77/3
2015 | 12 | 6 | 83/6
2016 | 1 | 3 | 86/3
2016 | 2 | 6 | 92/6
2016 | 3 | 8 | 100/8
2016 | 4 | 9 | 109/9
2016 | 5 | 4 | 113/4
The Query I tried is :
SELECT DISTINCT
//CALCULATIONS
Year,
Month,
Output,
(sum(CAST(Output) AS DOUBLE)))
over(order by year,month rows between 12 preceding and 1 preceding )
as Rolling_Average
from my_table
group by Year,Month
order by Year,Month
It gives me error :
Syntax error: OVER keyword must follow a function call
Also I have tried other things
Can someone please help me in an easy way . I am using SQL Plx it is similar to SQL
Thank You!

You might have misplaced some parentheses
(sum( CAST(Output) AS DOUBLE ))) over (order by year, month rows between 12 preceding and 1 preceding ) as Rolling_Average
Versus:
SUM( CAST(Output AS DOUBLE) ) OVER (order by year, month rows between 12 preceding and 1 preceding) as Rolling_Average
You can also ROUND that result.
And those records already seem to be unique by Year and Month.
So there's not really a need to group on those.
SELECT
t.Year, t.Month, t.Output,
ROUND(SUM(CAST(t.Output AS INT)) OVER (ORDER BY t.Year, t.Month ROWS BETWEEN 12 PRECEDING AND 1 PRECEDING)*1.0 / CAST(t.Output AS INT), 1) as Rolling_Average
FROM my_table t
ORDER BY t.Year, t.Month;
And if the window functions aren't supported, then this will work:
SELECT
t1.Year, t1.Month, t1.Output,
ROUND(SUM(CAST(t2.Output AS INT))*1.0 / CAST(t1.Output AS INT), 1) as Rolling_Average
FROM my_table t1
LEFT JOIN my_table t2 ON ((t2.Year = t1.Year AND t2.Month < t1.Month) OR
(t2.Year = t1.Year - 1 AND t2.Month >= t1.Month))
GROUP BY t1.Year, t1.Month, t1.Output
ORDER BY t1.Year, t1.Month;
db<>fiddle here

Try this(if you use sql-server)
Select *
from tableName T
outer apply (
select sum(output) Rolling_Average
from tableName T_in on T_in.year = T.year and T_in.Month <= T.Month
)x

Related

How to get last value for each user_id (postgreSQL)

Current ratio of user is his last inserted ratio in table "Ratio History"
user_id | year | month | ratio
For example if user with ID 1 has two rows
1 | 2019 | 2 | 10
1 | 2019 | 3 | 15
his ratio is 15.
there is some slice from develop table
user_id | year | month | ratio
1 | 2018 | 7 | 10
2 | 2018 | 8 | 20
3 | 2018 | 8 | 30
1 | 2019 | 1 | 40
2 | 2019 | 2 | 50
3 | 2018 | 10 | 60
2 | 2019 | 3 | 70
I need a query which will select grouped rows by user_id and their last ratio.
As a result of the request, the following entries should be selected
user_id | year | month | ratio
1 | 2019 | 1 | 40
2 | 2019 | 3 | 70
3 | 2018 | 10 | 60
I tried use this query
select rh1.user_id, ratio, rh1.year, rh1.month from ratio_history rh1
join (
select user_id, max(year) as maxYear, max(month) as maxMonth
from ratio_history group by user_id
) rh2 on rh1.user_id = rh2.user_id and rh1.year = rh2.maxYear and rh1.month = rh2.maxMonth
but i got only one row
Use distinct on:
select distinct on (user_id) rh.*
from ratio_history rh
order by user_id, year desc, month desc;
distinct on is a very convenient Postgres extension. It returns one row for the key values in parentheses? Which row, it is the first row based on the sort criteria. Note that the sort criteria need to start with the expressions in parentheses.

Select records using max values for two columns

I have a table laid out similar to this. I need to select distinct vendor number that has the highest year value and the highest month value
VENDORMONTHLY:
id Vendor Year month More stuff(More columns)
---|---------|-------|-------|---------|
1 | 93000 | 2017 | 3 | sadf |
2 | 93000 | 2017 | 2 | asdf |
5 | 93000 | 2017 | 1 | asdf |
3 | 93000 | 2016 | 12 | fff |
4 | 93000 | 2016 | 11 | ffff |
6 | 40000 | 2017 | 2 | fff |
7 | 40000 | 2017 | 1 | fff |
8 | 40000 | 2016 | 12 | fff |
The result would look like this. I can not for the life of me come up with a query that will give me what I need.
id Vendor Year month More stuff(More columns)
---|---------|-------|-------|---------|
1 | 93000 | 2017 | 3 | sadf |
6 | 40000 | 2017 | 2 | fff |
Any help would be greatly appreciated!
Quick answer, use NOT EXISTS to verify the same id has no other row with a later year or same year but later month:
select v1.*
from VENDORMONTHLY v1
where not exists (select 1 from VENDORMONTHLY v2
where v2.Vendor = v1.Vendor
and (v2.Year > v1.year
or (v2.Year = v1.Year and v2.Month > v1.Month)))
Will return both rows in case of a latest row tie.
Core ANSI SQL-99. Will run on any dbms!
If you are using some database (SQL Server, Oracle, Postgres etc) that support window functions, you can rank ( or row_number if you need only one row per year-month combination per vendor)
select *
from (
select v.*,
rank() over (
partition by vendor order by year desc,
month desc
) rn
from vendormonthly v
) v
where rn = 1;
In SQL server, same can be done in a better way using top with ties:
Select top 1 with ties *
From vendormonthly
Order by rank() over (
partition by vendor
order by year desc, month desc
)

How to subtract previous value in a column with calculation of other column on SQL server

I have a requirement for a table as shown below. As you can see mgt_year,tot_dflt_mgt and to_accum_mgt columns. In year column where its 2016 the value is 20 and accum value is 600. What I want is that when I do
(to_accum_mgt - tot_dflt_mgt)
I want this calculated result in previous row as shown in the table below. Then this calculated result i.e. 580 is used for subtracting 9 like (580 - 9) for year 2015 and so on for all trailing years. I have done this in excel and also in Oracle thanks to #mathguy, but how to achieve this result in SQL server. I have tried to use this SQL server but its not working.
Please forgive My bad English and noob formatting.
My table t:
line_seg MGT_YEAR TOT_DFLT_MGT TOT_ACCUM_MGT
--------- -------- ------------ ------------
A 2013 10
A 2014 15
A 2015 9
A 2016 20 600
B 2013 10
B 2014 15
B 2015 8
B 2016 20 500
Oracle Solution:
select mgt_year, tot_dflt_mgt,
max(tot_accum_mgt) over () -
nvl( sum(tot_dflt_mgt) over
(order by mgt_year
rows between 1 following and unbounded following)
, 0 ) as tot_accum_mgt
from t;
but I am unable use this in SQL Server.
required output
line_seg MGT_YEAR TOT_DFLT_MGT TOT_ACCUM_MGT
--------- -------- ------------ ------------
A 2013 10 556
A 2014 15 471
A 2015 9 580
A 2016 20 600
B 2013 12 457
B 2014 15 472
B 2015 8 480
B 2016 20 500
select *,
(sum(TOT_ACCUM_MGT) over()) -
(sum(TOT_DFLT_MGT ) over (order by TOT_DFLT_MGT )) as somecolname
from
table
Put Row_number() and self join it with the previous row on (a.ID = b.ID) and (a.row_num = b.row_num - 1)
OR
You can use lag() function
Please try the following query. I assumed that you are using 2012+ version of SQL Server. If not, please change the FIRST_VALUE to SUM -
SELECT t1.line_seg, t1.mgt_year, t1.[tot_dflt_mgt]
, FIRST_VALUE(t1.tot_accum_mgt) OVER(PARTITION BY t1.[line_seg] ORDER BY t1.mgt_year DESC)
- ISNULL(SUM(t2.[tot_dflt_mgt]) OVER(PARTITION BY t2.[line_seg] ORDER BY t2.mgt_year DESC), 0) AS tot_accum_mgt
FROM [dbo].[t] AS t1
LEFT JOIN [dbo].[t] AS t2 ON (t2.line_seg = t1.line_seg AND t2.mgt_year = t1.mgt_year + 1)
ORDER BY t1.line_seg, t1.mgt_year ASC;
To do this first I have to imagine the table as sorted by the descending order of date -
+------------+----------+--------------+---------------+
| line_seg | mgt_year | tot_dflt_mgt | tot_accum_mgt |
+------------+----------+--------------+---------------+
| A | 2016 | 20 | 600 |
| A | 2015 | 9 | NULL |
| A | 2014 | 15 | NULL |
| A | 2013 | 10 | NULL |
| B | 2016 | 20 | 500 |
| B | 2015 | 8 | NULL |
| B | 2014 | 15 | NULL |
| B | 2013 | 12 | NULL |
+------------+----------+--------------+---------------+
Then all I have to do is to subtract the PREVIOUS running total of tot_dflt_mgt from the latest year's tot_accum_mgt. This is equivalent to subtract the previous tot_dflt_mgt from the current computed value of tot_accum_mgt To use the previous year's fields LEFT JOIN is used to self join the table. Resulting in the following table -
+------------+----------+--------------+---------------+------------+----------+--------------+---------------+
| line_seg | mgt_year | tot_dflt_mgt | tot_accum_mgt | line_seg | mgt_year | tot_dflt_mgt | tot_accum_mgt |
+------------+----------+--------------+---------------+------------+----------+--------------+---------------+
| A | 2013 | 10 | NULL | A | 2014 | 15 | NULL |
| A | 2014 | 15 | NULL | A | 2015 | 9 | NULL |
| A | 2015 | 9 | NULL | A | 2016 | 20 | 600 |
| A | 2016 | 20 | 600 | NULL | NULL | NULL | NULL |
| B | 2013 | 12 | NULL | B | 2014 | 15 | NULL |
| B | 2014 | 15 | NULL | B | 2015 | 8 | NULL |
| B | 2015 | 8 | NULL | B | 2016 | 20 | 500 |
| B | 2016 | 20 | 500 | NULL | NULL | NULL | NULL |
+------------+----------+--------------+---------------+------------+----------+--------------+---------------+
The AND t2.mgt_year = t1.mgt_year + 1 filter in the LEFT join clause does the trick of getting previous rows value. Now all I had to do is to calculate the running total on this previous rows (t2). Also as, subtracting NULL from anything will result in NULL. So ISNULL replaces any NULL with zeros.
ISNULL(SUM(t2.[tot_dflt_mgt]) OVER(PARTITION BY t2.[line_seg] ORDER BY t2.mgt_year DESC), 0) AS tot_accum_mgt
Now, as we have the previous running total of tot_dflt_mgt, all we have to do is to delete the latest (largest mgt_year) tot_accum_mgt. We get that by using FIRST_VALUE function. SUM could also be used instead I guess.
FIRST_VALUE(t1.tot_accum_mgt) OVER(PARTITION BY t1.[line_seg] ORDER BY t1.mgt_year DESC)

Select by increasing order SQL

Table:
id | year | score
-----+------+-----------
12 | 2011 | 0.929
12 | 2014 | 0.933
12 | 2010 | 0.937
12 | 2013 | 0.938
12 | 2009 | 0.97
13 | 2010 | 0.851
13 | 2014 | 0.881
13 | 2011 | 0.885
13 | 2013 | 0.895
13 | 2009 | 0.955
16 | 2009 | 0.867
16 | 2011 | 0.881
16 | 2012 | 0.886
16 | 2013 | 0.897
16 | 2014 | 0.953
Desired Output:
id | year | score
-----+------+-----------
16 | 2009 | 0.867
16 | 2011 | 0.881
16 | 2012 | 0.886
16 | 2013 | 0.897
16 | 2014 | 0.953
I'm having difficulties in trying to output scores that are increasing in respect to the year.
Any help would be greatly appreciated.
So you want to select id = 16 because it is the only one that has steadily increasing values.
Many versions of SQL support lag(), which can help solve this problem. You can determine, for a given id, if all the values are increasing or decreasing by doing:
select id,
(case when min(score - prev_score) < 0 then 'nonincreasing' else 'increasoing' end) as grp
from (select t.*, lag(score) over (partition by id order by year) as prev_score
from table t
) t
group by id;
You can then select all "increasing" ids using a join:
select t.*
from table t join
(select id
from (select t.*, lag(score) over (partition by id order by year) as prev_score
from table t
) t
group by id
having min(score - prev_score) > 0
) inc
on t.id = inc.id;

How in query result add 0-data for don't exist rows?

I have Table with columns: "Month" and "Year", and other data.
All row in Table have different values "Month" and "Year".
But for some Month and Year rows don't exist.
I want create SQL-query (... where year in (2010, 2011, 2012) ...), that in result this SQL-query have all Month for select Year and if some month don't exist else add it to result with 0 in other data columns.
Example:
Input: Table
data / month / year
+-----+---+------+
| 3.0 | 1 | 2011 |
| 4.3 | 3 | 2011 |
| 5.7 | 4 | 2011 |
| 2.2 | 5 | 2011 |
| 5.4 | 7 | 2011 |
+-----+---+------+
Output: SELECT ... WHERE year IN (2011)
+-----+----+------+
| 3.0 | 1 | 2011 |
| 0 | 2 | 2011 |
| 4.3 | 3 | 2011 |
| 5.7 | 4 | 2011 |
| 2.2 | 5 | 2011 |
| 0 | 6 | 2011 |
| 5.4 | 7 | 2011 |
| 0 | 8 | 2011 |
| 0 | 9 | 2011 |
| 0 | 10 | 2011 |
| 0 | 11 | 2011 |
| 0 | 12 | 2011 |
+-----+----+------+
Try Partition Outer Join:
SELECT
NVL(T.DATA, 0) DATA,
F.MONTH,
T.YEAR
FROM <your_table> T
PARTITION BY(T.YEAR)
RIGHT JOIN (SELECT LEVEL MONTH FROM DUAL CONNECT BY LEVEL <= 12) F ON T.MONTH = F.MONTH
Add your WHERE clause at the end or create a view with that definition and query against it.
select datecol,
nvl(val,0),
to_char(d.date_col,'MM') month,
to_char(d.date_col,'yyyy') year
from(
select add_months('1-Jan-2011',level-1) as datecol
from dual connect by level <= 12
) d
left join(
select sum(val) as val, month, year
from your_table
group by month, year
) S
on (to_char(d.date_col,'MM') = s.month and to_char(d.date_col,'yyyy') = s.year)
select nvl(t.data, 0), x.month, nvl(t.year, <your_year>) as year
from <your_table> t,
(select rownum as month from dual connect by level < 13) x
where (t.year is null or t.year = <your_year>)
and t.month(+) = x.month
order by x.month