how to build auto_arima with seasonality using historical data - auto

I have 5 years of temperature data in 15 mins slot, with that I want to forecast for the next 2 days.. I am confused of p,d,q and P,D,Q values in ARIMA

Use auto.arima() it will automatically find best value of p,d,q.
Therefore no need to specify manually.

Related

price rate of change over 12 periods using padas dataframe

dears
I want to calculate the price rate of change by dividing the today's price with price 12 periods ago
"df.close" is the my data where I want to calculate the rate of change.
please guide me
I was unable to try anything because I felt it is the window of 12 days but not a continuous window
I see some similar replies but those are some big codes not relevant to me. I simply need one line to call the price 12 days ago and divide it
thank you.

Need to divide a Dataframe in various tables using multiple categories and date time

this is my first time asking a question here, so if I'm doing something wrong please guide me to the right place. I have a big and clean dataset. (29000+ , 24). The thing is that I have to calculate the churn rate based on 4 different categorical columns, and I'm given just 1 column that contains the subs for a given period. I have a date column too. My idea on calculating the churn is to do
churn_rate= (Sub_start_period-Sub_end_period)/Sub_start_period*100
The Problem
I don't know how to group the data using these 4 different categorical variables. Also If I manage to do so I would end up with more than 200 different tables, so I don't believe this would be a good approach.
My goal is able to predict the churn rate using the information in the table but I should be able to determine the churn rate based on these variables. The churn is not given, it has to be calculated, so I'm having problems here as I can't think of a way of working through this.

Select Data between current time - 15 mins and current time in SQL

I am looking to pull data between two time periods at only 15 to 30 mins apart. I want to be able to rerun the code multiple times to constantly update the data I had already pulled. I know there is a function for current system time but I am unable to use it effectively in SQL developer.
I have tried using the function CURRENT_TIMESTAMP but could not get it to work effectively.
Currently i am using the following code and just pulling over a broad time frame, but i would like to shrink that down to 15 to 30 minute intervals that could be used to continue to pull updated data.
I expect to be able to pull current data within 15 to 30 minute segments of time.

Pentaho data integration issue with loading a kettle based on some condition

I have a Pentaho Data Integration job which has the following steps:
Generate row step which has an initial date (for e.g. 2010-01-01) and the limit as 10*366 = 3660 rows for 10 years.
Next step has an incrementer to increment the number of days.
Next step uses this information viz. initial date, limit, and the incrementer, to generate dates for each day for 10 years starting 2010-01-01 using javascript functions.
Final step loads a table with the generated dates.
All this works fine.
Now, I have a requirement where I do not want this table to be static with dates for 10 years only. If the max date in the date table is 2 years from today, I want to load dates for 10 more years in the table.
For the above example, with the 1st load loading dates for 10 years from 2010, I should be able to load 10 more years in 2018, the next 10 years in 2028 and so on and so forth.
What will be the best way to achieve this?
How can I:
1) Read the max date from my date table? - I know how to do this.
2) Use the read date to compare against today. And if the max date is within 2 years from today, I populate the table with next 10 years.
I don't know how to do 2 above in Pentaho data integration. Will really appreciate any pointers on a way to resolve this issue.
You need to read the current date (today) in a variable. For example with a Get system info step.
Then you can compare the two fields, max date and today, with a Filter Rows step.
As the previous step may give you more than one row, you need to either use a Unique Row (no field to provide) either a Group by (no group by field).
If any row gets by, then you launch you generate 10 years process. As you cannot have a hop from a step into this second Generate row, you must use a Transformation executor to launch your currently existing transformation.
Now, if your requirement gets a tiny little bit more complex than that, I strongly suggest you to use jobs to orchestrate your transformations.

Bigquery - Table decorators changed weirdly

I used to have a number of queries running on the past 40 days of data using a decorator with [dataset.table#-4123456789-].
However, since September 15 all the decorators return maximum 10 days of data.
By the way [dataset.table#0] returns the whole table and not the past 7 days as told in the documentation.
Does anyone know what is going on. Do I have to move my table to partition in order to receive data for a limited period of time but more the a week?
Thanks