If I have data from week 1 to week 52 data and I want 4 week Moving Average with 1 week how can I make a SQL query for this? For example, for week 5 I want week1-week4 average, week6 I want week5-week8 average and so on.
I have the columns week and target_value in table A.
Sample data is like this:
Week target_value
1 20
2 10
3 10
4 20
5 60
6 20
So the output I want will start from week 5 as only week 1-week4 is available not before that.
Output data will look like:
Week Output
5 15 (20+10+10+20)/4=15 Moving Average week1-week4
6 25 (10+10+20+60)/4=25 Moving Average week2-week5
The data is in hive but I can move it to oracle if it is simpler to do this there.
SELECT
Week,
(SELECT ISNULL(AVG(B.target_value), A.target_value)
FROM tblA B
WHERE (B.Week < A.Week)
AND B.Week >= (A.Week - 4)
) AS Moving_Average
FROM tblA A
The ISNULL keeps you from getting a null for your first week since there is no week 0. If you want it to be null, then just leave the ISNULL function out.
If you want it to start at week 5 only, then add the following line to the end of the SQL that I wrote:
WHERE A.Week > 4
Results:
Week Moving_Average
1 20
2 20
3 15
4 13
5 15
6 25
Related
I have the following Dataframe, organized in panel data. It contains daily returns of many companies on different days following the IPO date. The day_diff represents the days that have passed since the IPO, and return_1 represents the daily individual returns for that specific day for that specific company, from which I have already added +1. Each company has its own company_tic and I have about 300 companies. My goal is to calculate the first component of the right-hand side of the equation below (so having results for each day_diff and company_tic, always starting at day 0, until the last day of data; e.g. = from day 0 to day 1, then from day 0 to day 2, from 0 to day 3, and so on until my last day, which is day 730). I have tried df.groupby(['company_tic', 'day_diff'])['return_1'].expanding().prod() but it doesn't work. Any alternatives?
Index day_diff company_tic return_1
0 0 xyz 1.8914
1 1 xyz 1.0542
2 2 xyz 1.0016
3 0 abc 1.4398
4 1 abc 1.1023
5 2 abc 1.0233
... ... ... ...
159236 x 3
Not sure to fully get what you want, but you might want to use cumprod instead of expanding().prod().
Here's what I tried :
df['return_1_prod'] = df.groupby('company_tic')['return_1'].cumprod()
Output :
day_diff company_tic return_1 return_1_prod
0 0 xyz 1.8914 1.891400
1 1 xyz 1.0542 1.993914
2 2 xyz 1.0016 1.997104
3 0 abc 1.4398 1.439800
4 1 abc 1.1023 1.587092
5 2 abc 1.0233 1.624071
I have a view that converts fiscal year periods to calendar periods, creating a new column called "NewPeriod". I would then like to create a date using this "NewPeriod" column using the Date() function, Date(Year, NewPeriod, "1"). I am unable to use the NewPeriod in the Date function, is there a way I can accomplish this in the same view?
SELECT distinct
company_code,
Period,
Year,
CASE COMPANY_CODE
WHEN 1 THEN CASE Period
WHEN 4 THEN 1
WHEN 5 THEN 2
WHEN 6 THEN 3
WHEN 7 THEN 4
WHEN 8 THEN 5
WHEN 9 THEN 6
WHEN 10 THEN 7
WHEN 11 THEN 8
WHEN 12 THEN 9
WHEN 1 THEN 10
WHEN 2 THEN 11
WHEN 3 THEN 12
ELSE
Period
END
Else Period
END AS NewPeriod,
FROM
`table`
Using Sql Server Mgmt Studio. My data set is as below.
ID Days Value Threshold
A 1 10 30
A 2 20 30
A 3 34 30
A 4 25 30
A 5 20 30
B 1 5 15
B 2 10 15
B 3 12 15
B 4 17 15
B 5 20 15
I want to run a query so only rows after the threshold has been reached are selected for each ID. Also, I want to create a new days column starting at 1 from where the rows are selected. The expected output for the above dataset will look like
ID Days Value Threshold NewDayColumn
A 3 34 30 1
A 4 25 30 2
A 5 20 30 3
B 4 17 15 1
B 5 20 15 2
It doesn't matter if the data goes below the threshold for the latter rows, I want to take the first row when threshold is crossed as 1 and continue counting rows for the ID.
Thank you!
You can use window functions for this. Here is one method:
select t.*, row_number() over (partition by id order by days) as newDayColumn
from (select t.*,
min(case when value > threshold then days end) over (partition by id) as threshold_days
from t
) t
where days >= threshold_days;
e.g: If I have the data as below:
Week 1 Week2 Week3
S M T W T F S S M T W T F S S M T W T F S
2 5 6 7 5 5 3 4 5 7 2 4 3 2 4 5 2 1 2 7 8
If today is Monday, my average will be (5+5+5)/3 which is 5. Tomorrow it will be (6+7+2)/3 which will be 5 again and day after it will be (7+2+1)/3 which will be 3.33
How to get this in Tableau?
First, you can use "Weekday" as a column or row (by rightclicking on the date).
Then you can simply add a Table Calculation "Moving Average" with a specific computing dimension "Week of [Date]"
=> Table Calculation Specifics <=
=> Result <=
Data source used-: Tableau Sample Superstore.
You can do the following-:
Columns-: Week(Order Date)
Rows-: Weekday(Order date)
Put Sales in text.
Right click sales>Quick Table Calculation>Moving Average
right click Sales>edit quick table calculation>
Set the following
Select Moving along-: "Table across"
Previous values-: 4
I have a table that looks like this:
Name Post Like Share Comment Date
--------------------------------------------
Sita test data 1 5 2 4 28/4/2015
Munni test data 2 5 2 5 27/4/2015
Shila test data 3 1 3 1 22/4/2015
Ram Test data 4 5 0 5 1/4/2015
Sam Test data 5 4 0 2 2/4/2015
Jadu Test data 6 1 5 2 30/3/2015
Madhu Test data 7 5 0 4 10/4/2015
Now I want my result set like this:
Type Name Post Like Share Comment Date
-------------------------------------------------------------------------
Today Sita test data 1 5 2 4 28/4/2015
Last 7 Days Sita test data 1 5 2 4 28/4/2015
Last 7 Days Munni test data 2 5 2 5 27/4/2015
Last 7 Days Shila test data 3 1 3 1 22/4/2015
Last 30 Days Sita test data 1 5 2 4 28/4/2015
Last 30 Days Munni test data 2 5 2 5 27/4/2015
Last 30 Days Shila test data 3 1 3 1 22/4/2015
Last 30 Days Ram Test data 4 5 0 5 1/4/2015
Last 30 Days Sam Test data 5 4 0 2 2/4/2015
Last 30 Days Jadu Test data 6 1 5 2 30/3/2015
Last 30 Days Madhu Test data 7 5 0 4 10/4/2015
Today must have only today's post. Last 7 days must have today's post + last 7 day's post. Last 30 days column must have all the post of last 30 days.
A couple of unions with different case statements to get the date range would work.
Use union all and dateadd:
select 'Today' as Type, Name, Post, [Like], Share, Comment, [Date]
from yourtable
where [Date] = getdate()
union all
select 'Last 7 Days' as Type, Name, Post, [Like], Share, Comment, [Date]
from yourtable
where [Date] >= DateAdd(day,-7,getdate())
union all
select 'Last 30 Days' as Type, Name, Post, [Like], Share, Comment, [Date]
from yourtable
where [Date] >= DateAdd(day,-30,getdate())
BTW, terrible choice for column names (don't use reserved words).