Tensorflow LSTM NN with broken data feed - tensorflow

I am looking to create a Tensowflow LSTM NN using time series data for only a portion of the day. For instance the data is only recorded into a CSV for training during the hours of 7am - 7pm as it is the only time relevant to predictions from the NN. Is this possible to train using a standard LSTM setup, or do i need to implement other processes with a 'broken' or 'misaligned' data feed like this?

Related

LSTM multivariate predicting multiple features

I am new to this neural networks and LSTM. I hope I will get a guidance from you and I will be thankful to you.
I have 2 years of bitcoin historical dataset and bitcoin sentiment dataset which is of one hour interval. My goal is to predict next 60 hours future chart using LSTM.
I have seen some of the articles regarding multivariate time series prediction. But in all of them they are taking only one feature for prediction. They predict only the price of one upcoming day and . So in order to predict next 2 months data, I have to predict all of the features. So that I can seed the predicted data as input for the next prediction and so on to predict for next 60 days.
Can someone help me to figure out how can I do this kind of prediction?
Edit:
The dataset looks like this:
timestamp,close,sentiment
2020-05-01_00,8842.85,0.21
2020-05-01_01,8824.43,0.2
2020-05-01_02,8745.91,0.2
2020-05-01_03,8639.12,0.19
2020-05-01_04,8625.69,0.2
And I would like to use tenserflow as backend. As of now i have not written code for building the model as I have to know what to do before i start coding.
The idea is to give 100 or 150 rows of data as input to the model and then forecast for the next 60 hours by seeding the prediction of the model as the input for the next prediction.
It would help if you shared some code for how you are constructing your model, and what your data looks like. How is your sentiment data encoded, and what framework you are using (tensorflow, pytorch, etc)? I am mostly familiar with Tensorflow, so I'll point you in that direction.
In general it can be helpful to use an Input Layer, but LSTMs expect a 3D tensor [batch, timestamps, feature].
You might want to consider a non-sequential model architecture, using functional APIs. If you went that route, you could have 2 separate inputs. One being the price time series, the other being the sentiment time series
pass each to an LSTM then you can concatenate/combine them and pass them to Dense layers or even convolutional layers.
Lastly you could also look into ConvLSTM2D which takes a 5D tensor:[samples, time, channels, rows, cols]
#------------------ Response (👇) post update: -------------------------
View the notebook here
#===========Design Model Architecture:
#==== Create Input Layers:
Price_Input = tf.keras.layers.Input(shape=(60,),name='Price_Input') #Price as Input
Sent_Input = tf.keras.layers.Input(shape=(60,),name='Sentiment_Input') #Sentiment as Input
#=== Handle Reshaping as part of the Model Architecture:
P_Input_rshp = tf.keras.layers.Reshape(target_shape=(60,1),
input_shape=(60,),
name='Price_Reshape')(Price_Input) #Pass price to reshape layer
S_Input_rshp = tf.keras.layers.Reshape(target_shape=(60,1),
input_shape=(60,),
name='Sentiment_Reshape')(Sent_Input) #Pass sentiment to rehape layer
#=== Use LSTM layers for timeseries:
P_x = tf.keras.layers.LSTM(units=1,activation='tanh',name='Price_LSTM')(P_Input_rshp) #Price Focused LSTM
S_x = tf.keras.layers.LSTM(units=1,activation='tanh',name='Sentiment_LSTM')(S_Input_rshp) #Sentiment Focused LSTM
C_x = tf.keras.layers.Concatenate(name='Concat')([P_x,S_x]) #Concatinate(join) inputs from each branch
Output = tf.keras.layers.Dense(units=1,name='Dense')(C_x) #Dense layer as model output to synthesize results
#============== Greate Model Graph:
model = tf.keras.Model(inputs=[Price_Input,Sent_Input],
outputs=Output,
name='Double_LSTM_Model')

Training with tensorflow and colab

I'm working on a project that involves temperature reading and LEDs. I need to train the connection between a certain temperature and the LED, but only with tensorflow and coding, not the real LED. For example, if the temperature is 37 degrees, the LED is ON and if the temperature is 39 the LED is OFF. What can I do to train this kind of connection between this variables?
The first step is to generate a csv dataset with the true value labels for the temperatures.
For example:
21,1
37,1
39,0
50,0
Then split this dataset into training and testing. A good split is 80%(training) to 20%(testing). Then use the information to train your tensorflow model which will have a single output which will either be a 1 or a 0.
Once you have trained your model to fit this data you can use the predict function to determine if the LED should be going on or off.

What is the best machine Learning model to train time series data? [ Not forecasting ]

I have a set of time series data belong to 5 different classes. [ EEG data (1 data point for 1 second). And those data have been divided in to 30-40 second epochs and each epoch is classified into different classes like A,B,C,D,E]. So basically I have around 13500 labelled data.
[10,5,48,75,1,...,22,45,8] = A
[26,47,8,77,4,...,56,88,96] = B like wise
What I did was I directly fed these data to a Neural Network and trained the model. But the accuracy was very low around 40%. What want to know is rather than just using a neural network, what is the best model to train time series data?
In case of time series data some architectures are performing quite well :
Recurrent Neural Network (with LSTM, GRU or BERT for example), designed to train on sequence of data
This could be an example : https://arxiv.org/pdf/1812.04818.pdf
How this works inside : link
Example implementation in keras : link
, you should then find/design your own architecture
TCN, it uses causal and dilated convolution in order to capture time series data
Example : https://arxiv.org/pdf/1905.03806.pdf
How this works : link
Implementation in keras : link
I would personnaly go for those types of architecture, well suited for time series data.

How can I use multiple datasets with one model in Keras?

I am trying Forex prediction with Keras and Tensorflow using a LSTM Network.
I of course want it to train on many days of trading but to do that I would have to give it sequential data with big jumps and phases without movement... when the market is closed... This isn't ideal as it gets "confused" because of these jumps and phases of no movement. Alternatively I an use one day of minute per minute data but this way I have very limited time of training data and the model won't be very good.
Do you have Ideas on how to fix this?
here is my current code:
CODE
Thanks
If you plan on fitting multiple datasets as data slices, sequentially, something like this would work:
for _ in range(10):
#somehow cut the data into slices and fit them one by one
model.fit(data_slice, label_slice ......)
As successive calls to fit will train the single model incrementally .

Time series classification using LSTM - How to approach?

I am working on an experiment with LSTM for time series classification and I have been going through several HOWTOs, but still, I am struggling with some very basic questions:
Is the main idea for learning the LSTM to take a same sample from every time series?
E.g. if I have time series A (with samples a1,a2,a3,a4), B(b1,b2,b3,b4) and C(c1,c2,c3,c4), then I will feed the LSTM with batches of (a1,b1,c1), then (a2,b2,c2) etc.? Meaning that all time series needs to be of the same size/number of samples?
If so, can anynone more experienced be so kind and describe me very simply how to approach the whole process of learning the LSTM and creating the classifier?
My intention is to use TensorFlow, but I am still new to this.
If your goal is classification, then your data should be a a time series and a label. During training, you feed each into the lstm, and look only at the last output and backprop as necessary.
Judging from your question, you are probably confused about batching -- you can train multiple items at once. However, each item in the batch would get its own hidden state, and only the parameters of the layers are updated.
The time series in a single batch should be of the same length. You should terminate each sequence with a END token and pad items that are too short with a special token PAD -- the lstm should learn that PAD's after and END are useless.
There is no need for different batches to have the same number of items, nor to have items of the same length.