LSTM for time series classification - tensorflow2.0

I have records of 2 dates with 56 features and 5 classes . How to use LSTM for time series classification
I have utilise timeseries generator for each date record .

Related

Different Correlation Coefficent for Different Time Ranges

I built a DataFrame where there are the following data:
Daily Price of Gas Future of N Day;
Daily Price of Petroil
Future of N Day;
Daily Price of Dau-Ahead Eletricity Market in
Italy;
The data are taken from 2010 to 2022 time range, so 12 years of historical time data.
The DataFrame head looks like this:
PETROIL GAS ELECTRICITY
0 64.138395 2.496172 68.608696
1 65.196161 2.482612 113.739130
2 64.982403 2.505938 112.086957
3 64.272606 2.500000 110.043478
4 65.993436 2.521739 95.260870
So on this DataFrame I tried to build the Correlation Matric throught the Pandas metod .corr() and faced one big issue:
If I take all 12 years as data I get:
almost Zero as correlation between Electricity and Petroil price;
low correlation (0.12) between Electricity and Gas price;
While if I try to split in three time range (2010-2014; 2014-2018; 2018-2022) I get really high correlation (in both case between 0.40 and 0.60).
So I am here asking these two questions:
Why I get this so high difference when I split the time ranges?
Considering I am doing this kind of analysis to use Petroil and Gas
prices to predict the electricity price, which of these two analysis
should I consider? The first one (with low correlation) that
considers the entire time range or the second one (with higher
correlation) that is split into different time ranges?
Thank you for your answers.

sampling from a DataFrame on a daily basis

in my data frame, I have data for 3 months, and it's per day. ( for every day, I have a different number of samples, for example on 1st January I have 20K rows of samples and on the second of January there are 15K samples)
what I need is that I want to take the mean number and apply it to all the data frames.
for example, if the mean value is 8K, i want to get the random 8k rows from 1st January data and 8k rows randomly from 2nd January, and so on.
as far as I know, rand() will give the random values of the whole data frame, But I need to apply it per day. since my data frame is on a daily basis and the date is mentioned in a column of the data frame.
Thanks
You can use groupby_sample after computing the mean of records:
# Suppose 'date' is the name of your column
sample = df.groupby('date').sample(n=int(df['date'].value_counts().mean()))
# Or
g = df.groupby('date')
sample = g.sample(n=int(g.size().mean()))
Update
Is there ant solution for the dates that their sum is lower than the mean? I face with this error for those dates: Cannot take a larger sample than population when 'replace=False'
n = np.floor(df['date'].value_counts().mean()).astype(int)
sample = (df.groupby('date').sample(n, replace=True)
.loc[lambda x: ~x.index.duplicated()])

Calculating RMSE / R-squared / forecast error metrics through SQL BigQuery

I'm having trouble trying to figure out how to evaluate different set of forecast values using GoogleSQL.
I have a table as follows:
date | country | actual_revenue | forecast_rev_1 | forecast_rev_2 |
---------------------------------------------------------------------------------------
x/x/xxxx ABC 134644.64 153557.44 103224.35
.
.
.
The table is partitioned by date and country, and consists of 60 days worth of actual revenue and forecast revenue from different forecast modelling configurations.
3 questions here:
Is calculating R-squared / Root Means Square Error (RMSE) the best way to evaluate model accuracy in SQL?
If Yes to #1, do I calculate the R-squared/ RMSE for each row, and take the average value of them?
Is there some sort of GoogleSQL functionality/better optimized ways of doing this?
Not quite sure about the best way on doing this as I'm not familiar with the statistics territory.
Appreciate any guidance here Thank you!

How i can divide test train data for churn prediction?

I have telecom 7 months data, the definition of churn for telecom is "Customer who does not pay monthly bill consecutive 3 months is the churn" now i have to predict who will be churn in future. So, my question is how i can divide my data into train test to get better result? I did some work like i drop one month then divide the data into two parts 3 months each , i made flag from test and attach with train but the results are not good. can you help me out?

Test for change in month from list of dates

I want to code something in VBA that could identify each time there's a change in month, for example from January to February.
In the below example you can see I want to take the input from column B, and output to column C. The output should be:
Test when a month change occurs.
Train when the month is the same as before.
Example data:
A B C
1 29/12/2006 Train
2 01/01/2007 Test
3 02/01/2007 Train
4 03/01/2007 Train
5 04/01/2007 Train
6 05/01/2007 Train
..
100 01/07/2007 Test
Here's another way. A1-Day(A1) will always return the last day of the preceding month. So:
B2: =IF((A1-DAY(A1))=(A2-DAY(A2)),"Train","Test")
Something like this would probably do the job:
=IF(And(Month(A4)=Month(A3);Year(A4)=Year(A3));"Train";"Test")
It compares months for A4 and A3 and the years.