Keras LSTM: how to predict beyond validation vs predictions? - tensorflow

When dealing with time series forecasting, I've seen most people follow these steps when using an LSTM model:
Obtain, clean, and pre-process data
Take out validation dataset for future comparison with model predictions
Initialise and train LSTM model
Use a copy of validation dataset to be pre-processed exactly like the training data
Use trained model to make predictions on the transformed validation data
Evaluate results: predictions vs validation
However, if the model is accurate, how do you make predictions that go beyond the end of the validation period?
The following only accepts data that have been transformed in the same way as the training data, but for predictions that go beyond the validation period, you don't have any input data to feed to the model. So, how do people do this?
# Predictions vs validation
predictions = model.predict(transformed_validation)
# Future predictions
future_predictions = model.predict(?)

To predict the ith value, your LSTM model need last N values.
So if you want to forecast, you should use each prediction to predict the next one.
In other terms you have to loop over something like
prediction = model.predict(X[-N:])
X.append(prediction)
As you can guess, you add your output in your input that's why your predictions can diverge and amplify uncertainty.
Other model are more stable to predict far future.

You have to break your data into training and testing and then fit your mode. Finally, you make a prediction like this.
future_predictions = model.predict(X_test)
Check out the link below for all details.
Time-Series Forecasting: Predicting Stock Prices Using An LSTM Model

Related

LSTM multivariate predicting multiple features

I am new to this neural networks and LSTM. I hope I will get a guidance from you and I will be thankful to you.
I have 2 years of bitcoin historical dataset and bitcoin sentiment dataset which is of one hour interval. My goal is to predict next 60 hours future chart using LSTM.
I have seen some of the articles regarding multivariate time series prediction. But in all of them they are taking only one feature for prediction. They predict only the price of one upcoming day and . So in order to predict next 2 months data, I have to predict all of the features. So that I can seed the predicted data as input for the next prediction and so on to predict for next 60 days.
Can someone help me to figure out how can I do this kind of prediction?
Edit:
The dataset looks like this:
timestamp,close,sentiment
2020-05-01_00,8842.85,0.21
2020-05-01_01,8824.43,0.2
2020-05-01_02,8745.91,0.2
2020-05-01_03,8639.12,0.19
2020-05-01_04,8625.69,0.2
And I would like to use tenserflow as backend. As of now i have not written code for building the model as I have to know what to do before i start coding.
The idea is to give 100 or 150 rows of data as input to the model and then forecast for the next 60 hours by seeding the prediction of the model as the input for the next prediction.
It would help if you shared some code for how you are constructing your model, and what your data looks like. How is your sentiment data encoded, and what framework you are using (tensorflow, pytorch, etc)? I am mostly familiar with Tensorflow, so I'll point you in that direction.
In general it can be helpful to use an Input Layer, but LSTMs expect a 3D tensor [batch, timestamps, feature].
You might want to consider a non-sequential model architecture, using functional APIs. If you went that route, you could have 2 separate inputs. One being the price time series, the other being the sentiment time series
pass each to an LSTM then you can concatenate/combine them and pass them to Dense layers or even convolutional layers.
Lastly you could also look into ConvLSTM2D which takes a 5D tensor:[samples, time, channels, rows, cols]
#------------------ Response (👇) post update: -------------------------
View the notebook here
#===========Design Model Architecture:
#==== Create Input Layers:
Price_Input = tf.keras.layers.Input(shape=(60,),name='Price_Input') #Price as Input
Sent_Input = tf.keras.layers.Input(shape=(60,),name='Sentiment_Input') #Sentiment as Input
#=== Handle Reshaping as part of the Model Architecture:
P_Input_rshp = tf.keras.layers.Reshape(target_shape=(60,1),
input_shape=(60,),
name='Price_Reshape')(Price_Input) #Pass price to reshape layer
S_Input_rshp = tf.keras.layers.Reshape(target_shape=(60,1),
input_shape=(60,),
name='Sentiment_Reshape')(Sent_Input) #Pass sentiment to rehape layer
#=== Use LSTM layers for timeseries:
P_x = tf.keras.layers.LSTM(units=1,activation='tanh',name='Price_LSTM')(P_Input_rshp) #Price Focused LSTM
S_x = tf.keras.layers.LSTM(units=1,activation='tanh',name='Sentiment_LSTM')(S_Input_rshp) #Sentiment Focused LSTM
C_x = tf.keras.layers.Concatenate(name='Concat')([P_x,S_x]) #Concatinate(join) inputs from each branch
Output = tf.keras.layers.Dense(units=1,name='Dense')(C_x) #Dense layer as model output to synthesize results
#============== Greate Model Graph:
model = tf.keras.Model(inputs=[Price_Input,Sent_Input],
outputs=Output,
name='Double_LSTM_Model')

Simultaneous multi-model predictions in Keras

Here's the situation I'm working with. I've got ONE model but of a bunch of different pre-trained sets of weights for this model. I need to iterate through all these sets of weights and make a prediction for the same input once for each set of weights. I am currently doing this basically as follows:
def ModelIterator(model,pastModelWeights,modelInput):
elapsedIters = len(pastModelWeights)
outputs = []
for t in range(elapsedIters):
iterModel = model #This is just a Keras model object with no pre-set weights
iterModel.set_weights(pastModelWeights[t])
iterOutput = iterModel.predict(x=modelInput)
outputs.append(iterOutput)
return outputs
As you can see, this is really just a single model whose weights I'm changing for each iteration t in order to make predictions on the same input each time. Each prediction is very fast, but I need to do this with many (thousands) sets of weights, and as elapsedIters increases, this loop becomes quite slow.
So the question is, can I parallelize this process? Instead of setting the model's weights for each t and generating predictions in series, is there a relatively simple (heh...) way any of you know of to make these predictions simultaneously?

Implementing stochastic forward passes in part of a neural network in Keras?

my problem is the following:
I am working on an object detection problem and would like to use dropout during test time to obtain a distribution of outputs. The object detection network consists of a training model and a prediction model, which wraps around the training model. I would like to perform several stochastic forward passes using the training model and combine these e.g. by averaging the predictions in the prediction wrapper. Is there a way of doing this in a keras model instead of requiring an intermediate processing step using numpy?
Note that this question is not about how to enable dropout during test time
def prediction_wrapper(model):
# Example code.
# Arguments
# model: the training model
regression = model.outputs[0]
classification = model.outputs[1]
predictions = # TODO: perform several stochastic forward passes (dropout during train and test time) here
avg_predictions = # TODO: combine predictions here, e.g. by computing the mean
outputs = # TODO: do some processing on avg_predictions
return keras.models.Model(inputs=model.inputs, outputs=outputs, name=name)
I use keras with a tensorflow backend.
I appreciate any help!
The way I understand, you're trying to average the weight updates for a single sample while Dropout is enabled. Since dropout is random, you would get different weight updates for the same sample.
If this understanding is correct, then you could create a batch by duplicating the same sample. Here I am assuming that the Dropout is different for each sample in a batch. Since, backpropagation averages the weight updates anyway, you would get your desired behavior.
If that does not work, then you could write a custom loss function and train with a batch-size of one. You could update a global counter inside your custom loss function and return non-zero loss only when you've averaged them the way you want it. I don't know if this would work, it's just an idea.

What is the difference between model.fit() an model.evaluate() in Keras?

I am using Keras with TensorFlow backend to train CNN models.
What is the between model.fit() and model.evaluate()? Which one should I ideally use? (I am using model.fit() as of now).
I know the utility of model.fit() and model.predict(). But I am unable to understand the utility of model.evaluate(). Keras documentation just says:
It is used to evaluate the model.
I feel this is a very vague definition.
fit() is for training the model with the given inputs (and corresponding training labels).
evaluate() is for evaluating the already trained model using the validation (or test) data and the corresponding labels. Returns the loss value and metrics values for the model.
predict() is for the actual prediction. It generates output predictions for the input samples.
Let us consider a simple regression example:
# input and output
x = np.random.uniform(0.0, 1.0, (200))
y = 0.3 + 0.6*x + np.random.normal(0.0, 0.05, len(y))
Now lets apply a regression model in keras:
# A simple regression model
model = Sequential()
model.add(Dense(1, input_shape=(1,)))
model.compile(loss='mse', optimizer='rmsprop')
# The fit() method - trains the model
model.fit(x, y, nb_epoch=1000, batch_size=100)
Epoch 1000/1000
200/200 [==============================] - 0s - loss: 0.0023
# The evaluate() method - gets the loss statistics
model.evaluate(x, y, batch_size=200)
# returns: loss: 0.0022612824104726315
# The predict() method - predict the outputs for the given inputs
model.predict(np.expand_dims(x[:3],1))
# returns: [ 0.65680361],[ 0.70067143],[ 0.70482892]
In Deep learning you first want to train your model. You take your data and split it into two sets: the training set, and the test set. It seems pretty common that 80% of your data goes into your training set and 20% goes into your test set.
Your training set gets passed into your call to fit() and your test set gets passed into your call to evaluate(). During the fit operation a number of rows of your training data are fed into your neural net (based on your batch size). After every batch is sent the fit algorithm does back propagation to adjust the weights in your neural net.
After this is done your neural net is trained. The problem is sometimes your neural net gets overfit which is a condition where it performs well for the training set but poorly for other data. To guard against this situation you run the evaluate() function to send new data (your test set) through your neural net to see how it performs with data it has never seen. There is no training occurring, this is purely a test. If all goes well then the score from training is similar to the score from testing.
fit(): Trains the model for a given number of epochs (this is for training time, with the training dataset).
predict(): Generates output predictions for the input samples (this is for somewhere between training and testing time).
evaluate(): Returns the loss value & metrics values for the model in test mode (this is for testing time, with the testing dataset).
While all the above answers explain what these functions : fit(), evaluate() or predict() do however more important point to keep in mind in my opinion is what data you should use for fit() and evaluate().
The most clear guideline that I came across in Machine Learning Mastery and particular quote in there:
Training set: A set of examples used for learning, that is to fit the parameters of the classifier.
Validation set: A set of examples used to tune the parameters of a classifier, for example to choose the number of hidden units in a neural network.
Test set: A set of examples used only to assess the performance of a fully-specified classifier.
: By Brian Ripley, page 354, Pattern Recognition and Neural Networks, 1996
You should not use the same data that you used to train(tune) the model (validation data) for evaluating the performance (generalization) of your fully trained model (evaluate).
The test data used for evaluate() should be unseen/not used for training(fit()) in order to be any reliable indicator of model evaluation (for generlization).
For Predict() you can use just one or few example(s) that you choose (from anywhere) to get quick check or answer from your model. I don't believe it can be used as sole parameter for generalization.
One thing which was not mentioned here, I believe needs to be specified. model.evaluate() returns a list which contains a loss figure and an accuracy figure. What has not been said in the answers above, is that the "loss" figure is the sum of ALL the losses calculated for each item in the x_test array. x_test would contain your test data and y_test would contain your labels. It should be clear that the loss figure is the sum of ALL the losses, not just one loss from one item in the x_test array.
I would say the mean of losses incurred from all iterations, not the sum. But sure, that's the most important information here, otherwise the modeler would be slightly confused.

DeepLearning Anomaly Detection for images

I am still relatively new to the world of Deep Learning. I wanted to create a Deep Learning model (preferably using Tensorflow/Keras) for image anomaly detection. By anomaly detection I mean, essentially a OneClassSVM.
I have already tried sklearn's OneClassSVM using HOG features from the image. I was wondering if there is some example of how I can do this in deep learning. I looked up but couldn't find one single code piece that handles this case.
The way of doing this in Keras is with the KerasRegressor wrapper module (they wrap sci-kit learn's regressor interface). Useful information can also be found in the source code of that module. Basically you first have to define your Network Model, for example:
def simple_model():
#Input layer
data_in = Input(shape=(13,))
#First layer, fully connected, ReLU activation
layer_1 = Dense(13,activation='relu',kernel_initializer='normal')(data_in)
#second layer...etc
layer_2 = Dense(6,activation='relu',kernel_initializer='normal')(layer_1)
#Output, single node without activation
data_out = Dense(1, kernel_initializer='normal')(layer_2)
#Save and Compile model
model = Model(inputs=data_in, outputs=data_out)
#you may choose any loss or optimizer function, be careful which you chose
model.compile(loss='mean_squared_error', optimizer='adam')
return model
Then, pass it to the KerasRegressor builder and fit with your data:
from keras.wrappers.scikit_learn import KerasRegressor
#chose your epochs and batches
regressor = KerasRegressor(build_fn=simple_model, nb_epoch=100, batch_size=64)
#fit with your data
regressor.fit(data, labels, epochs=100)
For which you can now do predictions or obtain its score:
p = regressor.predict(data_test) #obtain predicted value
score = regressor.score(data_test, labels_test) #obtain test score
In your case, as you need to detect anomalous images from the ones that are ok, one approach you can take is to train your regressor by passing anomalous images labeled 1 and images that are ok labeled 0.
This will make your model to return a value closer to 1 when the input is an anomalous image, enabling you to threshold the desired results. You can think of this output as its R^2 coefficient to the "Anomalous Model" you trained as 1 (perfect match).
Also, as you mentioned, Autoencoders are another way to do anomaly detection. For this I suggest you take a look at the Keras Blog post Building Autoencoders in Keras, where they explain in detail about the implementation of them with the Keras library.
It is worth noticing that Single-class classification is another way of saying Regression.
Classification tries to find a probability distribution among the N possible classes, and you usually pick the most probable class as the output (that is why most Classification Networks use Sigmoid activation on their output labels, as it has range [0, 1]). Its output is discrete/categorical.
Similarly, Regression tries to find the best model that represents your data, by minimizing the error or some other metric (like the well-known R^2 metric, or Coefficient of Determination). Its output is a real number/continuous (and the reason why most Regression Networks don't use activations on their outputs). I hope this helps, good luck with your coding.