Simultaneous multi-model predictions in Keras - tensorflow

Here's the situation I'm working with. I've got ONE model but of a bunch of different pre-trained sets of weights for this model. I need to iterate through all these sets of weights and make a prediction for the same input once for each set of weights. I am currently doing this basically as follows:
def ModelIterator(model,pastModelWeights,modelInput):
elapsedIters = len(pastModelWeights)
outputs = []
for t in range(elapsedIters):
iterModel = model #This is just a Keras model object with no pre-set weights
iterModel.set_weights(pastModelWeights[t])
iterOutput = iterModel.predict(x=modelInput)
outputs.append(iterOutput)
return outputs
As you can see, this is really just a single model whose weights I'm changing for each iteration t in order to make predictions on the same input each time. Each prediction is very fast, but I need to do this with many (thousands) sets of weights, and as elapsedIters increases, this loop becomes quite slow.
So the question is, can I parallelize this process? Instead of setting the model's weights for each t and generating predictions in series, is there a relatively simple (heh...) way any of you know of to make these predictions simultaneously?

Related

Keras LSTM: how to predict beyond validation vs predictions?

When dealing with time series forecasting, I've seen most people follow these steps when using an LSTM model:
Obtain, clean, and pre-process data
Take out validation dataset for future comparison with model predictions
Initialise and train LSTM model
Use a copy of validation dataset to be pre-processed exactly like the training data
Use trained model to make predictions on the transformed validation data
Evaluate results: predictions vs validation
However, if the model is accurate, how do you make predictions that go beyond the end of the validation period?
The following only accepts data that have been transformed in the same way as the training data, but for predictions that go beyond the validation period, you don't have any input data to feed to the model. So, how do people do this?
# Predictions vs validation
predictions = model.predict(transformed_validation)
# Future predictions
future_predictions = model.predict(?)
To predict the ith value, your LSTM model need last N values.
So if you want to forecast, you should use each prediction to predict the next one.
In other terms you have to loop over something like
prediction = model.predict(X[-N:])
X.append(prediction)
As you can guess, you add your output in your input that's why your predictions can diverge and amplify uncertainty.
Other model are more stable to predict far future.
You have to break your data into training and testing and then fit your mode. Finally, you make a prediction like this.
future_predictions = model.predict(X_test)
Check out the link below for all details.
Time-Series Forecasting: Predicting Stock Prices Using An LSTM Model

CNN + LSTM model for images performs poorly on validation data set

My training and loss curves look like below and yes, similar graphs have received comments like "Classic overfitting" and I get it.
My model looks like below,
input_shape_0 = keras.Input(shape=(3,100, 100, 1), name="img3")
model = tf.keras.layers.TimeDistributed(Conv2D(8, 3, activation="relu"))(input_shape_0)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Conv2D(16, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Conv2D(32, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(Flatten())(model)
model = tf.keras.layers.TimeDistributed(Dropout(0.4))(model)
model = LSTM(16, kernel_regularizer=tf.keras.regularizers.l2(0.007))(model)
# model = Dense(100, activation="relu")(model)
# model = Dense(200, activation="relu",kernel_regularizer=tf.keras.regularizers.l2(0.001))(model)
model = Dense(60, activation="relu")(model)
# model = Flatten()(model)
model = Dropout(0.15)(model)
out = Dense(30, activation='softmax')(model)
model = keras.Model(inputs=input_shape_0, outputs = out, name="mergedModel")
def get_lr_metric(optimizer):
def lr(y_true, y_pred):
return optimizer.lr
return lr
opt = tf.keras.optimizers.RMSprop()
lr_metric = get_lr_metric(opt)
# merged.compile(loss='sparse_categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
model.compile(loss='sparse_categorical_crossentropy',
optimizer=opt, metrics=['accuracy',lr_metric])
model.summary()
In the above model building code, please consider the commented lines as some of the approaches I have tried so far.
I have followed the suggestions given as answers and comments to this kind of question and none seems to be working for me. Maybe I am missing something really important?
Things that I have tried:
Dropouts at different places and different amounts.
Played with inclusion and expulsion of dense layers and their number of units.
Number of units on the LSTM layer was tried with different values (started from as low as 1 and now at 16, I have the best performance.)
Came across weight regularization techniques and tried to implement them as shown in the code above and so tried to put it at different layers ( I need to know what is the technique in which I need to use it instead of simple trial and error - this is what I did and it seems wrong)
Implemented learning rate scheduler using which I reduce the learning rate as the epochs progress after a certain number of epochs.
Tried two LSTM layers with the first one having return_sequences = true.
After all these, I still cannot overcome the overfitting problem.
My data set is properly shuffled and divided in a train/val ratio of 80/20.
Data augmentation is one more thing that I found commonly suggested which I am yet to try, but I want to see if I am making some mistake so far which I can correct it and avoid diving into data augmentation steps for now. My data set has the below sizes:
Training images: 6780
Validation images: 1484
The numbers shown are samples and each sample will have 3 images. So basically, I input 3 mages at once as one sample to my time-distributed CNN which is then followed by other layers as shown in the model description. Following that, my training images are 6780 * 3 and my Validation images are 1484 * 3. Each image is 100 * 100 and is on channel 1.
I am using RMS prop as the optimizer which performed better than adam as per my testing
UPDATE
I tried some different architectures and some reularizations and dropouts at different places and I am now able to achieve a val_acc of 59% below is the new model.
# kernel_regularizer=tf.keras.regularizers.l2(0.004)
# kernel_constraint=max_norm(3)
model = tf.keras.layers.TimeDistributed(Conv2D(32, 3, activation="relu"))(input_shape_0)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Conv2D(64, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Conv2D(128, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(GlobalAveragePooling2D())(model)
model = LSTM(128, return_sequences=True,kernel_regularizer=tf.keras.regularizers.l2(0.040))(model)
model = Dropout(0.60)(model)
model = LSTM(128, return_sequences=False)(model)
model = Dropout(0.50)(model)
out = Dense(30, activation='softmax')(model)
Try to perform Data Augmentation as a preprocessing step. Lack of data samples can lead to such curves. You can also try using k-fold Cross Validation.
There are many ways to prevent overfitting, according to the papers below:
Dropout layers (Disabling randomly neurons). https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
Input Noise (e.g. Random Gaussian Noise on the imges). https://arxiv.org/pdf/2010.07532.pdf
Random Data Augmentations (e.g. Rotating, Shifting, Scaling, etc.).
https://arxiv.org/pdf/1906.11052.pdf
Adjusting Number of Layers & Units.
https://clgiles.ist.psu.edu/papers/UMD-CS-TR-3617.what.size.neural.net.to.use.pdf
Regularization Functions (e.g. L1, L2, etc)
https://www.researchgate.net/publication/329150256_A_Comparison_of_Regularization_Techniques_in_Deep_Neural_Networks
Early Stopping: If you notice that for N successive epochs that your model's training loss is decreasing, but the model performs poorly on validaiton data set, then It is a good sign to stop the training.
Shuffling the training data or K-Fold cross validation is also common way way of dealing with Overfitting.
I found this great repository, which contains examples of how to implement data augmentations:
https://github.com/kochlisGit/random-data-augmentations
Also, this repository here seems to have examples of CNNs that implement most of the above methods:
https://github.com/kochlisGit/Tensorflow-State-of-the-Art-Neural-Networks
The goal should be to get the model predict correctly irrespective of
the order in which the 3 images in the sample are arranged.
If the order of the images of each sample is not important for the training, I think your model does the inverse, the Timedistributed layers succeded by LSTM take into account the order of the three images. As a solution, primarily, you can add images by reordering the images of each sample (= Augmented data). Secondly, try to consider the three images as one image with three-channel and remove the Timedistributed layers (I'm not sure that the three-channels are more efficient but you can give it a try)

Implementing stochastic forward passes in part of a neural network in Keras?

my problem is the following:
I am working on an object detection problem and would like to use dropout during test time to obtain a distribution of outputs. The object detection network consists of a training model and a prediction model, which wraps around the training model. I would like to perform several stochastic forward passes using the training model and combine these e.g. by averaging the predictions in the prediction wrapper. Is there a way of doing this in a keras model instead of requiring an intermediate processing step using numpy?
Note that this question is not about how to enable dropout during test time
def prediction_wrapper(model):
# Example code.
# Arguments
# model: the training model
regression = model.outputs[0]
classification = model.outputs[1]
predictions = # TODO: perform several stochastic forward passes (dropout during train and test time) here
avg_predictions = # TODO: combine predictions here, e.g. by computing the mean
outputs = # TODO: do some processing on avg_predictions
return keras.models.Model(inputs=model.inputs, outputs=outputs, name=name)
I use keras with a tensorflow backend.
I appreciate any help!
The way I understand, you're trying to average the weight updates for a single sample while Dropout is enabled. Since dropout is random, you would get different weight updates for the same sample.
If this understanding is correct, then you could create a batch by duplicating the same sample. Here I am assuming that the Dropout is different for each sample in a batch. Since, backpropagation averages the weight updates anyway, you would get your desired behavior.
If that does not work, then you could write a custom loss function and train with a batch-size of one. You could update a global counter inside your custom loss function and return non-zero loss only when you've averaged them the way you want it. I don't know if this would work, it's just an idea.

Feeding individual examples into TensorFlow graph trained on files?

I'm new to TensorFlow and am getting a bit tripped up on the mechanics of reading data. I set up a TensorFlow graph on the mnist data, but I'd like to modify it so that I can run one program to train it + save the model out, and run another to load said graph, make predictions, and compute test accuracy.
Where I'm getting confused is how to bypass the original I/O system in the training graph and "inject" an image to predict or an (image, label) tuple of test data for accuracy testing. To read the training data, I'm using this code:
_, input_data = util.read_examples(
paths_to_files,
batch_size,
shuffle=shuffle,
num_epochs=None)
feature_map = {
'label': tf.FixedLenFeature(
shape=[], dtype=tf.int64, default_value=[-1]),
'image': tf.FixedLenFeature(
shape=[NUM_PIXELS * NUM_PIXELS], dtype=tf.int64),
}
example = tf.parse_example(input_data, features=feature_map)
I then feed example to a convolution layer, etc. and generate the output.
Now imagine that I train my graph with that code specifying the input, save out the graph and weights, and then restore the graph and weights in another script for prediction -- I'd like to take (say) 10 images and feed them to the graph to generate predictions. How do I "inject" those 10 images so that the predictions come out the other end?
I played around with feed dictionaries and placeholders, but I'm not sure if they're the right things for me to use... it seems like they rely on having data in memory, as opposed to reading from a queue of test data, for example.
Thanks!
A feed dictionary with placeholders would make sense if you wanted to perform a small number of inferences/evaluations (i.e. enough to fit in memory) - e.g. if you were serving a simple model or running small eval loops.
If you specifically want to infer or evaluate large batches then you should use the same approach you've used for training, but with a different path to your test/eval/live data. e.g.
_, eval_data = util.read_examples(
paths_to_files, # CHANGE THIS BIT
batch_size,
shuffle=shuffle,
num_epochs=None)
You can use this as a normal python variable and set up successive, dependent steps to use this as a provided variable. e.g.
def get_example(data):
return tf.parse_example(data, features=feature_map)
sess.run([get_example(path_to_your_data)])

Tensorflow RNN sequence training

I'm making my first steps learning TF and have some trouble training RNNs.
My toy problem goes like this: a two layers LSTM + dense layer network is fed with raw audio data and should test whether a certain frequency is present in the sound.
so the network should 1 to 1 map float(audio data sequence) to float(pre-chosen frequency volume)
I've got this to work on Keras and seen a similar TFLearn solution but would like to implement this on bare Tensorflow in a relatively efficient way.
what i've done:
lstm = rnn_cell.BasicLSTMCell(LSTM_SIZE,state_is_tuple=True,forget_bias=1.0)
lstm = rnn_cell.DropoutWrapper(lstm)
stacked_lstm = rnn_cell.MultiRNNCell([lstm] * 2,state_is_tuple=True)
outputs, states = rnn.dynamic_rnn(stacked_lstm, in, dtype=tf.float32)
outputs = tf.transpose(outputs, [1, 0, 2])
last = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)
network= tf.matmul(last, W) + b
# cost function, optimizer etc...
during training I fed this with (BATCH_SIZE, SEQUENCE_LEN,1) batches and it seems like the loss converged correctly but I can't figure out how to predict with the trained network.
My (awful lot of) questions:
how do i make this network return a sequence right from Tensorflow without going back to python for each sample(feed a sequence and predict a sequence of the same size)?
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
after some research:
how do i make this network return a sequence right from Tensorflow
without going back to python for each sample(feed a sequence and
predict a sequence of the same size)?
you can use state_saving_rnn
class Saver():
def __init__(self):
self.d = {}
def state(self, name):
if not name in self.d:
return tf.zeros([1,LSTM_SIZE],tf.float32)
return self.d[name]
def save_state(self, name, val):
self.d[name] = val
return tf.identity('save_state_name') #<-important for control_dependencies
outputs, states = rnn.state_saving_rnn(stacked_lstm, inx, Saver(),
('lstmstate', 'lstmstate2', 'lstmstate3', 'lstmstate4'),sequence_length=[EVAL_SEQ_LEN])
#4 states are for two layers of lstm each has hidden and CEC variables to restore
network = [tf.matmul(outputs[-1], W) for i in xrange(EVAL_SEQ_LEN)]
one problem is that state_saving_rnn is using rnn() and not dynamic_rnn() therefore unroll at compile time EVAL_SEQ_LEN steps you might want to re-implement state_saving_rnn with dynamic_rnn if you want to input long sequences
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
you can use dynamic_rnn and supply initial_state. this is probably just as efficient as state_saving_rnn. look at state_saving_rnn implementations for reference
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
dynamic_rnn does do unrolling at runtime similarly to compile time rnn(). I guess it returns all the steps for you to branch the graph in some other places - after less time steps. in a network that use [one time step input * current state -> one output, new state] like the one described above it's not needed in testing but could be used for training truncated time back propagation