How should the input shape of Keras LSTM layer looks like - tensorflow

I've been reading for a while about training LSTM models using tf.keras, where i did use the same framework for regression problems using simple feedforward NN architectures and i highly understand how should i prepare the input data for such models, however when it comes for training LSTM, i feel so confused about the shape of the input.
There is a lot to take care about: time steps, number of samples, batch size, number of units, etc. In addition to many parameters for the LSTM keras layer that aren't clear for me yet as what is shown in the below example:
model.add(keras.layers.LSTM(units=3, batch_input_shape=(8,2,10), return_sequence=True, stateful= True))
So I have the following forex data structure where i don't know how to reshape it properly for an LSTM model.
open | close | high | low | volume | i1 | i2 | i3 | ... | i30 | nextClose
These features represents the open/close/high/low prices of a certain currency pair, in addition to the volume and 30 different indicators' values from i1 -> i30, all of these features corresponds to one minuet tick. Besides the nextClose feature represents the next minuet tick close price value that I'm trying hopefully to predict.
Q1: Could anyone please explain the general concept of how the data should be shaped for input, and what are all of these required parameters (time step, batch size...)?
Q2: Where I think a simple example could be great, How should my data structure above be reshaped to look like a valid input for LSTM?

After a lot of search i found this excellent summary for keras LSTM in a diagram, that was so helpful. check it in the Keras_LSTM_Diagram git repository.

Related

LSTM multivariate predicting multiple features

I am new to this neural networks and LSTM. I hope I will get a guidance from you and I will be thankful to you.
I have 2 years of bitcoin historical dataset and bitcoin sentiment dataset which is of one hour interval. My goal is to predict next 60 hours future chart using LSTM.
I have seen some of the articles regarding multivariate time series prediction. But in all of them they are taking only one feature for prediction. They predict only the price of one upcoming day and . So in order to predict next 2 months data, I have to predict all of the features. So that I can seed the predicted data as input for the next prediction and so on to predict for next 60 days.
Can someone help me to figure out how can I do this kind of prediction?
Edit:
The dataset looks like this:
timestamp,close,sentiment
2020-05-01_00,8842.85,0.21
2020-05-01_01,8824.43,0.2
2020-05-01_02,8745.91,0.2
2020-05-01_03,8639.12,0.19
2020-05-01_04,8625.69,0.2
And I would like to use tenserflow as backend. As of now i have not written code for building the model as I have to know what to do before i start coding.
The idea is to give 100 or 150 rows of data as input to the model and then forecast for the next 60 hours by seeding the prediction of the model as the input for the next prediction.
It would help if you shared some code for how you are constructing your model, and what your data looks like. How is your sentiment data encoded, and what framework you are using (tensorflow, pytorch, etc)? I am mostly familiar with Tensorflow, so I'll point you in that direction.
In general it can be helpful to use an Input Layer, but LSTMs expect a 3D tensor [batch, timestamps, feature].
You might want to consider a non-sequential model architecture, using functional APIs. If you went that route, you could have 2 separate inputs. One being the price time series, the other being the sentiment time series
pass each to an LSTM then you can concatenate/combine them and pass them to Dense layers or even convolutional layers.
Lastly you could also look into ConvLSTM2D which takes a 5D tensor:[samples, time, channels, rows, cols]
#------------------ Response (👇) post update: -------------------------
View the notebook here
#===========Design Model Architecture:
#==== Create Input Layers:
Price_Input = tf.keras.layers.Input(shape=(60,),name='Price_Input') #Price as Input
Sent_Input = tf.keras.layers.Input(shape=(60,),name='Sentiment_Input') #Sentiment as Input
#=== Handle Reshaping as part of the Model Architecture:
P_Input_rshp = tf.keras.layers.Reshape(target_shape=(60,1),
input_shape=(60,),
name='Price_Reshape')(Price_Input) #Pass price to reshape layer
S_Input_rshp = tf.keras.layers.Reshape(target_shape=(60,1),
input_shape=(60,),
name='Sentiment_Reshape')(Sent_Input) #Pass sentiment to rehape layer
#=== Use LSTM layers for timeseries:
P_x = tf.keras.layers.LSTM(units=1,activation='tanh',name='Price_LSTM')(P_Input_rshp) #Price Focused LSTM
S_x = tf.keras.layers.LSTM(units=1,activation='tanh',name='Sentiment_LSTM')(S_Input_rshp) #Sentiment Focused LSTM
C_x = tf.keras.layers.Concatenate(name='Concat')([P_x,S_x]) #Concatinate(join) inputs from each branch
Output = tf.keras.layers.Dense(units=1,name='Dense')(C_x) #Dense layer as model output to synthesize results
#============== Greate Model Graph:
model = tf.keras.Model(inputs=[Price_Input,Sent_Input],
outputs=Output,
name='Double_LSTM_Model')

Binary classification of every time series step based on past and future values

I'm currently facing a Machine Learning problem and I've reached a point where I need some help to proceed.
I have various time series of positional (x, y, z) data tracked by sensors. I've developed some more features. For example, I rasterized the whole 3D space and calculated a cell_x, cell_y and cell_z for every time step. The time series itself have variable lengths.
My goal is to build a model which classifies every time step with the labels 0 or 1 (binary classification based on past and future values). Therefore I have a lot of training time series where the labels are already set.
One thing which could be very problematic is that there are very few 1's labels in the data (for example only 3 of 800 samples are labeled with 1).
It would be great if someone can help me in the right direction because there are too many possible problems:
Wrong hyperparameters
Incorrect model
Too few 1's labels, but I think that's not a big problem because I only need the model to suggests the right time steps. So I would only use the peaks of the output.
Bad or too less training data
Bad features
I appreciate any help and tips.
Your model seems very strange. Why only use 2 units in lstm layer? Also your problem is a binary classification. In this case you should choose only one neuron in your output layer (try to insert one additional dense layer between and lstm layer and try dropout layers between them).
Binary crossentropy does not make much sense with 2 output neurons, if you don't have a multi label problem. But if you're switching to one output neuron it's the right one. You also need sigmoid then as activation function.
As last advice: Try class weights.
http://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html
This can make a huge difference, if you're label are unbalanced.
You can create the model using tensorflow BasicLSTMCell, the shape of your data fits for BasicLSTMCell in TensorFlow you can find Documentation for BasicLSTMCell here and for creating the model this Documentation contain code that will help to build BasicLstmCell model . Hope this will help you, Cheers.

Convolutional Neural Network Training

I have a question regarding convolutional neural network (CNN) training.
I have managed to train a network using tensorflow that takes an input image (1600 pixels) and output one of three classes that matches it.
Testing the network with variations of the trained classes is giving good results. However; when I give it a different -fourth- image (does not contain any of the trained 3 image), it always returns a random match to one of the classes.
My question is, how can I train a network to classify that the image does not belong to either of the three trained images? A similar example, if i trained a network against the mnist database and then a gave it the character "A" or "B". Is there a way to discriminate that the input does not belong to either of the classes?
Thank you
Your model will always make predictions like your labels, so for example if you train your model with MNIST data, when you will make predictions, prediction will always be 0-9 just like MNIST labels.
What you can do is train a different model first with 2 classes in which you will predict if an image belongs to data set A or B. E.x. for MNIST data you label all data as 1 and add data from other sources that are different (not 0-9) and label them as 0. Then train a model to find if image belongs to MNIST or not.
Convolutional Neural Network (CNN) predicts the result from the defined classes after training. CNN always return from one of the classes regardless of accuracy. I have faced similar problem, what you can do is to check for accuracy value. If the accuracy is below some threshold value then it's belong to none category. Hope this helps.
You probably have three output nodes, and choose the maximum value (one-hot encoding). That's a bit unfortunate as it's a low number of outputs. Non-recognized inputs tend to cause pretty random outputs.
Now, with 3 outputs, roughly speaking you can get 7 outcomes. You might get a single high value (3 possibilities) but non-recognized input can also cause 2 high outputs (also 3 possibilities) or approximately equal output (also 3 possibilities). So there's a decent chance (~ 3/7) of random inputs producing a pattern on the output nodes which you'd only expect for a recognized input.
Now, if you had 15 classes and thus 15 output nodes, you'd be looking at roughly 32767 possible outcomes for unrecognized inputs, only 15 of which correspond to expected one-hot outcomes.
Underlying this is a lack of training data. If your training set has examples outside the 3 classes, you can just dump this in a 4th "other" category and train with that. This by itself isn't a reliable indication, as usually the theoretical "other" set is huge, but you now have 2 complementary ways of detecting other inputs: either by the "other" output node or by one of the 11 ambiguous outputs.
Another solution would be to check what outcome your CNN usually gives when given something else. I believe the last layer must be softmax and your CNN should return probabilities of the three given classes. If none of these probabilities is close to 1 this might be a sign that this is something else assuming your CNN is well trained (it must be fined for overconfidence when predicting wrong labels).

How can I build a recurrent neural network to deal with time series to get a single continuous value?

I want to build a RNN with tensorflow, and it can convert a time series to a single continuous value. For example, the input time series x is [x1,x2,x3,x4,...,xt]=[1,2,3,4,...,100], and the corresponding output y is 98.5, just like score the time series. I found this figure in Yoshua Bengio's deep learning book, and this RNN model is what I want. Is there any useful reading material can help me to solve the problem?
see the rnn section in book << TensorFlow for machine intelligence >> which descibes different functionalities of rnn models with sample code

Can I add sample weights to the training dataset?

Is it possible to add to the configuration (eg, SGD block) sample weights? Where each training example is assigned a weight?
In many cases, a simple 1/N (where N is the total number of samples in the training dataset) is good enough, but in others, I would like the network to put extra "emphasis" in certain examples. Can this be done in CNTK?
Thanks,
Pedro
Yes, it can be done. If your task is binary classification you can use WeightedLogistic (label, probability, weight).
Typically weight will be an Input() which you can hook up to a stream in your data (assuming you are using CNTKTextFormatReader you can just have a |weight stream).
It can also be done for tasks other than binary classification with a little more effort.