How can I get the output of intermediate layers for each epoch in Keras model? - tensorflow

I have a simple sequential model
Following the instruction of keras documentation, the output of intermediate layer can be obtained by:
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
I can get the output of a trained model now. However, I want to get their output during (for each epoch)
I try to use a Callback function to achieve this goal, but it whill only return the output for last trained model.
Another naive way I can think of is saving models for each epoch, but it wastes lots of storage space..
I try to use a Callback function to achieve this goal, but it whill only return the output for last trained model.
Another naive way I can think of is saving models for each epoch, but it wastes lots of storage space..

Related

Keras LSTM: how to predict beyond validation vs predictions?

When dealing with time series forecasting, I've seen most people follow these steps when using an LSTM model:
Obtain, clean, and pre-process data
Take out validation dataset for future comparison with model predictions
Initialise and train LSTM model
Use a copy of validation dataset to be pre-processed exactly like the training data
Use trained model to make predictions on the transformed validation data
Evaluate results: predictions vs validation
However, if the model is accurate, how do you make predictions that go beyond the end of the validation period?
The following only accepts data that have been transformed in the same way as the training data, but for predictions that go beyond the validation period, you don't have any input data to feed to the model. So, how do people do this?
# Predictions vs validation
predictions = model.predict(transformed_validation)
# Future predictions
future_predictions = model.predict(?)
To predict the ith value, your LSTM model need last N values.
So if you want to forecast, you should use each prediction to predict the next one.
In other terms you have to loop over something like
prediction = model.predict(X[-N:])
X.append(prediction)
As you can guess, you add your output in your input that's why your predictions can diverge and amplify uncertainty.
Other model are more stable to predict far future.
You have to break your data into training and testing and then fit your mode. Finally, you make a prediction like this.
future_predictions = model.predict(X_test)
Check out the link below for all details.
Time-Series Forecasting: Predicting Stock Prices Using An LSTM Model

Keras: Custom loss function with training data not directly related to model

I am trying to convert my CNN written with tensorflow layers to use the keras api in tensorflow (I am using the keras api provided by TF 1.x), and am having issue writing a custom loss function, to train the model.
According to this guide, when defining a loss function it expects the arguments (y_true, y_pred)
https://www.tensorflow.org/guide/keras/train_and_evaluate#custom_losses
def basic_loss_function(y_true, y_pred):
return ...
However, in every example I have seen, y_true is somehow directly related to the model (in the simple case it is the output of the network). In my problem, this is not the case. How do implement this if my loss function depends on some training data that is unrelated to the tensors of the model?
To be concrete, here is my problem:
I am trying to learn an image embedding trained on pairs of images. My training data includes image pairs and annotations of matching points between the image pairs (image coordinates). The input feature is only the image pairs, and the network is trained in a siamese configuration.
I am able to implement this successfully with tensorflow layers and train it sucesfully with tensorflow estimators.
My current implementations builds a tf Dataset from a large database of tf Records, where the features is a dictionary containing the images and arrays of matching points. Before I could easily feed these arrays of image coordinates to the loss function, but here it is unclear how to do so.
There is a hack I often use that is to calculate the loss within the model, by means of Lambda layers. (When the loss is independent from the true data, for instance, and the model doesn't really have an output to be compared)
In a functional API model:
def loss_calc(x):
loss_input_1, loss_input_2 = x #arbirtray inputs, you choose
#according to what you gave to the Lambda layer
#here you use some external data that doesn't relate to the samples
externalData = K.constant(external_numpy_data)
#calculate the loss
return the loss
Using the outputs of the model itself (the tensor(s) that are used in your loss)
loss = Lambda(loss_calc)([model_output_1, model_output_2])
Create the model outputting the loss instead of the outputs:
model = Model(inputs, loss)
Create a dummy keras loss function for compilation:
def dummy_loss(y_true, y_pred):
return y_pred #where y_pred is the loss itself, the output of the model above
model.compile(loss = dummy_loss, ....)
Use any dummy array correctly sized regarding number of samples for training, it will be ignored:
model.fit(your_inputs, np.zeros((number_of_samples,)), ...)
Another way of doing it, is using a custom training loop.
This is much more work, though.
Although you're using TF1, you can still turn eager execution on at the very beginning of your code and do stuff like it's done in TF2. (tf.enable_eager_execution())
Follow the tutorial for custom training loops: https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough
Here, you calculate the gradients yourself, of any result regarding whatever you want. This means you don't need to follow Keras standards of training.
Finally, you can use the approach you suggested of model.add_loss.
In this case, you calculate the loss exaclty the same way I did in the first answer. And pass this loss tensor to add_loss.
You can probably compile a model with loss=None then (not sure), because you're going to use other losses, not the standard one.
In this case, your model's output will probably be None too, and you should fit with y=None.

Implementing stochastic forward passes in part of a neural network in Keras?

my problem is the following:
I am working on an object detection problem and would like to use dropout during test time to obtain a distribution of outputs. The object detection network consists of a training model and a prediction model, which wraps around the training model. I would like to perform several stochastic forward passes using the training model and combine these e.g. by averaging the predictions in the prediction wrapper. Is there a way of doing this in a keras model instead of requiring an intermediate processing step using numpy?
Note that this question is not about how to enable dropout during test time
def prediction_wrapper(model):
# Example code.
# Arguments
# model: the training model
regression = model.outputs[0]
classification = model.outputs[1]
predictions = # TODO: perform several stochastic forward passes (dropout during train and test time) here
avg_predictions = # TODO: combine predictions here, e.g. by computing the mean
outputs = # TODO: do some processing on avg_predictions
return keras.models.Model(inputs=model.inputs, outputs=outputs, name=name)
I use keras with a tensorflow backend.
I appreciate any help!
The way I understand, you're trying to average the weight updates for a single sample while Dropout is enabled. Since dropout is random, you would get different weight updates for the same sample.
If this understanding is correct, then you could create a batch by duplicating the same sample. Here I am assuming that the Dropout is different for each sample in a batch. Since, backpropagation averages the weight updates anyway, you would get your desired behavior.
If that does not work, then you could write a custom loss function and train with a batch-size of one. You could update a global counter inside your custom loss function and return non-zero loss only when you've averaged them the way you want it. I don't know if this would work, it's just an idea.

What is the difference between model.fit() an model.evaluate() in Keras?

I am using Keras with TensorFlow backend to train CNN models.
What is the between model.fit() and model.evaluate()? Which one should I ideally use? (I am using model.fit() as of now).
I know the utility of model.fit() and model.predict(). But I am unable to understand the utility of model.evaluate(). Keras documentation just says:
It is used to evaluate the model.
I feel this is a very vague definition.
fit() is for training the model with the given inputs (and corresponding training labels).
evaluate() is for evaluating the already trained model using the validation (or test) data and the corresponding labels. Returns the loss value and metrics values for the model.
predict() is for the actual prediction. It generates output predictions for the input samples.
Let us consider a simple regression example:
# input and output
x = np.random.uniform(0.0, 1.0, (200))
y = 0.3 + 0.6*x + np.random.normal(0.0, 0.05, len(y))
Now lets apply a regression model in keras:
# A simple regression model
model = Sequential()
model.add(Dense(1, input_shape=(1,)))
model.compile(loss='mse', optimizer='rmsprop')
# The fit() method - trains the model
model.fit(x, y, nb_epoch=1000, batch_size=100)
Epoch 1000/1000
200/200 [==============================] - 0s - loss: 0.0023
# The evaluate() method - gets the loss statistics
model.evaluate(x, y, batch_size=200)
# returns: loss: 0.0022612824104726315
# The predict() method - predict the outputs for the given inputs
model.predict(np.expand_dims(x[:3],1))
# returns: [ 0.65680361],[ 0.70067143],[ 0.70482892]
In Deep learning you first want to train your model. You take your data and split it into two sets: the training set, and the test set. It seems pretty common that 80% of your data goes into your training set and 20% goes into your test set.
Your training set gets passed into your call to fit() and your test set gets passed into your call to evaluate(). During the fit operation a number of rows of your training data are fed into your neural net (based on your batch size). After every batch is sent the fit algorithm does back propagation to adjust the weights in your neural net.
After this is done your neural net is trained. The problem is sometimes your neural net gets overfit which is a condition where it performs well for the training set but poorly for other data. To guard against this situation you run the evaluate() function to send new data (your test set) through your neural net to see how it performs with data it has never seen. There is no training occurring, this is purely a test. If all goes well then the score from training is similar to the score from testing.
fit(): Trains the model for a given number of epochs (this is for training time, with the training dataset).
predict(): Generates output predictions for the input samples (this is for somewhere between training and testing time).
evaluate(): Returns the loss value & metrics values for the model in test mode (this is for testing time, with the testing dataset).
While all the above answers explain what these functions : fit(), evaluate() or predict() do however more important point to keep in mind in my opinion is what data you should use for fit() and evaluate().
The most clear guideline that I came across in Machine Learning Mastery and particular quote in there:
Training set: A set of examples used for learning, that is to fit the parameters of the classifier.
Validation set: A set of examples used to tune the parameters of a classifier, for example to choose the number of hidden units in a neural network.
Test set: A set of examples used only to assess the performance of a fully-specified classifier.
: By Brian Ripley, page 354, Pattern Recognition and Neural Networks, 1996
You should not use the same data that you used to train(tune) the model (validation data) for evaluating the performance (generalization) of your fully trained model (evaluate).
The test data used for evaluate() should be unseen/not used for training(fit()) in order to be any reliable indicator of model evaluation (for generlization).
For Predict() you can use just one or few example(s) that you choose (from anywhere) to get quick check or answer from your model. I don't believe it can be used as sole parameter for generalization.
One thing which was not mentioned here, I believe needs to be specified. model.evaluate() returns a list which contains a loss figure and an accuracy figure. What has not been said in the answers above, is that the "loss" figure is the sum of ALL the losses calculated for each item in the x_test array. x_test would contain your test data and y_test would contain your labels. It should be clear that the loss figure is the sum of ALL the losses, not just one loss from one item in the x_test array.
I would say the mean of losses incurred from all iterations, not the sum. But sure, that's the most important information here, otherwise the modeler would be slightly confused.

Feeding individual examples into TensorFlow graph trained on files?

I'm new to TensorFlow and am getting a bit tripped up on the mechanics of reading data. I set up a TensorFlow graph on the mnist data, but I'd like to modify it so that I can run one program to train it + save the model out, and run another to load said graph, make predictions, and compute test accuracy.
Where I'm getting confused is how to bypass the original I/O system in the training graph and "inject" an image to predict or an (image, label) tuple of test data for accuracy testing. To read the training data, I'm using this code:
_, input_data = util.read_examples(
paths_to_files,
batch_size,
shuffle=shuffle,
num_epochs=None)
feature_map = {
'label': tf.FixedLenFeature(
shape=[], dtype=tf.int64, default_value=[-1]),
'image': tf.FixedLenFeature(
shape=[NUM_PIXELS * NUM_PIXELS], dtype=tf.int64),
}
example = tf.parse_example(input_data, features=feature_map)
I then feed example to a convolution layer, etc. and generate the output.
Now imagine that I train my graph with that code specifying the input, save out the graph and weights, and then restore the graph and weights in another script for prediction -- I'd like to take (say) 10 images and feed them to the graph to generate predictions. How do I "inject" those 10 images so that the predictions come out the other end?
I played around with feed dictionaries and placeholders, but I'm not sure if they're the right things for me to use... it seems like they rely on having data in memory, as opposed to reading from a queue of test data, for example.
Thanks!
A feed dictionary with placeholders would make sense if you wanted to perform a small number of inferences/evaluations (i.e. enough to fit in memory) - e.g. if you were serving a simple model or running small eval loops.
If you specifically want to infer or evaluate large batches then you should use the same approach you've used for training, but with a different path to your test/eval/live data. e.g.
_, eval_data = util.read_examples(
paths_to_files, # CHANGE THIS BIT
batch_size,
shuffle=shuffle,
num_epochs=None)
You can use this as a normal python variable and set up successive, dependent steps to use this as a provided variable. e.g.
def get_example(data):
return tf.parse_example(data, features=feature_map)
sess.run([get_example(path_to_your_data)])