Tensorflow (tf-slim) Model with is_training True and False - tensorflow

I would like to run a given model both on the train set (is_training=True) and on the validation set (is_training=False), specifically with how dropout is applied. Right now the prebuilt models expose a parameter is_training that is passed it the dropout layer when building the network. The issue is that If I call the method twice with different values of is_training, I will get two different networks that do no share weights (I think?). How do I go about getting the two networks to share the same weights such that I can run the network that I have trained on the validation set?

I wrote a solution with your comment to use Overfeat in train and test mode. (I couldn't test it so you can check if it works?)
First some imports and parameters:
import tensorflow as tf
slim = tf.contrib.slim
overfeat = tf.contrib.slim.nets.overfeat
batch_size = 32
inputs = tf.placeholder(tf.float32, [batch_size, 231, 231, 3])
dropout_keep_prob = 0.5
num_classes = 1000
In train mode, we pass a normal scope to the function overfeat:
scope = 'overfeat'
is_training = True
output = overfeat.overfeat(inputs, num_classes, is_training,
dropout_keep_prob, scope=scope)
Then in test mode, we create the same scope but with reuse=True.
scope = tf.VariableScope(reuse=True, name='overfeat')
is_training = False
output = overfeat.overfeat(inputs, num_classes, is_training,
dropout_keep_prob, scope=scope)

you can just use a placeholder for is_training:
isTraining = tf.placeholder(tf.bool)
# create nn
net = ...
net = slim.dropout(net,
keep_prob=0.5,
is_training=isTraining)
net = ...
# training
sess.run([net], feed_dict={isTraining: True})
# testing
sess.run([net], feed_dict={isTraining: False})

It depends on the case, the solutions are different.
My first option would be to use a different process to do the evaluation. You only need to check that there is a new checkpoint and load that weights into the evaluation network (with is_training=False):
checkpoint = tf.train.latest_checkpoint(self.checkpoints_path)
# wait until a new check point is available
while self.lastest_checkpoint == checkpoint:
time.sleep(30) # sleep 30 seconds waiting for a new checkpoint
checkpoint = tf.train.latest_checkpoint(self.checkpoints_path)
logging.info('Restoring model from {}'.format(checkpoint))
self.saver.restore(session, checkpoint)
self.lastest_checkpoint = checkpoint
The second option is after every epoch you unload the graph and create a new evaluation graph. This solution waste a lot of time loading and unloading graphs.
The third option is to share the weights. But feeding these networks with queues or dataset can lead to issues, so you have to be very careful. I only use this for Siamese networks.
with tf.variable_scope('the_scope') as scope:
your_model(is_training=True)
scope.reuse_variables()
your_model(is_training=False)

Related

Better understanding of training parameter for Keras-Model call method needed

I'd like to get a better understanding of the parameter training, when calling a Keras model.
In all tutorials (like here) it is explained, that when you are doing a custom train step, you should call the model like this (because some layers may behave differently depending if you want to do training or inference):
pred = model(x, training=True)
and when you want to do inference, you should set training to false:
pred = model(x, training=False)
What I am wondering now is, how this is affected by the creation of a functional model. Assume I have 2 models: model_base and model_head, and I want to create a new model out of those two, where I want the model_base allways to be called with training=False (because I plan on freezing it like in this tutorial here):
inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)
new_model = keras.Model(inputs, outputs)
What will in such a case happen, when I later on call new_model(x_new, training=True)? Will the usage of training=False for the base_model be overruled? Or will training now allways be set to True for the base_model, regardless of what I pass to the new_model? If the latter is the case, does that also mean, that if I set e.g. outputs = head_model(inputs, training=True), that this part of the new model would always run in training mode? And how would it work out if I don't give any specific value for training, when I run the new_model like this new_model(x_new)?
Thanks in advance!
training is a boolean argument that determines whether this call function runs in training mode or inference mode. For example, the Dropout layer is primarily used to as regularize in model training, randomly dropping weights but in inference time or prediction time we don't want it to happen.
y = Dropout(0.5)(x, training=True)
By this, we're setting training=True for the Dropout layer for training time. When we call .fit(), it set sets a flag to True and when we use evaluate or predict, in behind it sets a flag to False. And same goes for the custom training loop. When we pass input tensor to the model within the GradientTape scope, we can set this parameter; though it does not have manually set, the program will figure out itself. And same goes to inference time. So, this training argument is set as True or False if we want layers to operate either training mode or inference mode respectively.
# training mode
with tf.GradientTape() as tape:
logits = model(x, training=True) # forward pass
# inference mode
al_logits = model(x, training=False)
Now coming to your question. After defining the model
# Freeze the base_model
base_model.trainable = False
inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)
new_model = keras.Model(inputs, outputs)
Now if your run this new model whether .fit() or custom training loop, the base_model will always run in inference mode as it's sets training=False.

Change training and testing behavior of custom Layer Keras

I am trying to change my Keras model's behavior during training and testing.
To be more precise, I want to simply evaluate the predictions during training, and add another Lambda layer (as a kind of postprocessing) for testing.
I've found a solution Here, where K.functionreceives the specified K.learning_phase() and returns an output. As I understand, using K.in_test_phase() or K.in_training_phase() will return either the first or the second parameter (as described in the docs), based on the training parameter passed.
I'm running on TF 2.0 as Backend, so eager execution is enabled by default. That being said,
passing K.learning_phase() results in an error, which has been described see here. Hence, I'm using
tensorflow.python.keras.symbolic_learning_phase(), which seems to work.
I'm currently able to get the output with K.function(), but my aim is to perform model.fit() in order to train my model, and later call model.evaluate() (I'm using the Keras' functional API).
How would the proper way to train and test my model, based on the learning flag be?
Currently my MWE is:
def build_model(images, training=None):
input_layer = Input(shape=(256,256,3), dtype="float32", batch_size=80)
...
#performing some factor disentanglement here
angle_pred = UpSampling2D(size=(8, 8), interpolation='bilinear')(angle)
radius_pred = UpSampling2D(size=(8, 8), interpolation='bilinear')(radius)
angle_radius_stack = tf.stack([angle_pred, radius_pred], 0)
hough_voting = Lambda(hough_vote)((angle_radius_stack, images))
train_test = K.in_test_phase(hough_voting, angle_radius_stack, training = training)
angle_v, radius_v = tf.unstack(train_test)
model = Model(inputs=input_layer, outputs=[angle_v, radius_v])
adam = Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=adam, loss='mean_absolute_error', metrics=['accuracy'], run_eagerly=True)
return model
And then using training:
def train_model(model, patches, radii, angles):
fun = K.function([model.layers[0].input, B.symbolic_learning_phase()], [model.layers[-1].output])
print(fun([patches[0:80,:,:,:], True]))
model.fit(patches, [radii, angles], batch_size=80, epochs=1)

Using shared variables across sessions in tensorflow

I want to train a model and at the same time use the results of the model for further actions. The training can be done in the background, but I need the prediction model to be available all the time.
I've got an idea to how to do this but not sure if that is possible to do in tensorflow. So I'm thinking of creating separate threads/processes for prediction and training. There will be two different sessions running in each process and they will share the same variables. So, the training model can update the variables in it's own time and the prediction model can use the latest weights for better prediction.
Is there any way to share variable across sessions or some better way to do this? I've heard that it is dicouraged to run multiple sessions in tensorflow.
On same machine can you share session between "predict" and "train" threads? tf.Session().run() calls are thread safe. Here is a working example:
import tensorflow as tf
import numpy as np
import time
import threading
N = 128
input = tf.placeholder(tf.float32, shape=(None, N))
labels = tf.greater_equal(tf.reduce_sum(input, axis=-1, keepdims=True), 0)
l1size = 1024
fc1 = tf.contrib.layers.fully_connected(input, l1size)
l2size=128
fc2 = tf.contrib.layers.fully_connected(fc1, l2size)
predictions = tf.contrib.layers.fully_connected(fc2, 1,
activation_fn=tf.nn.sigmoid)
loss = tf.losses.mean_squared_error(labels, predictions)
train_op = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)
session = tf.Session()
session.run(tf.global_variables_initializer())
keep_going = True
def predict_thread(session):
test_data = np.random.randn(10, N)
while keep_going:
current_loss = session.run(loss, feed_dict={input:test_data})
print("Current loss: %f" % current_loss)
time.sleep(1.)
def train_thread(session):
train_data = np.random.randn(1024, N)
while keep_going:
session.run(train_op, feed_dict={input:train_data})
t1 = threading.Thread(target=train_thread, args=(session,))
t2 = threading.Thread(target=predict_thread, args=(session,))
t1.start()
t2.start()
time.sleep(10)
keep_going = False
t1.join()
t2.join()
You can also save/restore your model from time to time if training and prediction are on different machines. This question might be related.

Tensorflow forward pass with dropout

I am trying to use dropout to get error estimates for a neural network. This involves running several forward passes of my network during not only training but also testing, with dropout activated. Dropout layers seem only activated while training but not testing. Can this be done in Tensorflow by just calling some functions or modifying some parameters?
Yes, the easiest way is to use tf.layers.dropout that has a training argument, which can be tensor that you can define by true or false in a any particular session run:
mode = tf.placeholder(tf.string, name='mode')
training = tf.equal(mode, 'train')
...
layer = tf.layers.dropout(layer, rate=0.5, training=training)
...
with tf.Session() as sess:
sess.run(..., feed_dict={mode: 'train'}) # This turns on the dropout
sess.run(..., feed_dict={mode: 'test'}) # This turns off the dropout

load multiple models in Tensorflow

I have written the following convolutional neural network (CNN) class in Tensorflow [I have tried to omit some lines of code for clarity.]
class CNN:
def __init__(self,
num_filters=16, # initial number of convolution filters
num_layers=5, # number of convolution layers
num_input=2, # number of channels in input
num_output=5, # number of channels in output
learning_rate=1e-4, # learning rate for the optimizer
display_step = 5000, # displays training results every display_step epochs
num_epoch = 10000, # number of epochs for training
batch_size= 64, # batch size for mini-batch processing
restore_file=None, # restore file (default: None)
):
# define placeholders
self.image = tf.placeholder(tf.float32, shape = (None, None, None,self.num_input))
self.groundtruth = tf.placeholder(tf.float32, shape = (None, None, None,self.num_output))
# builds CNN and compute prediction
self.pred = self._build()
# I have already created a tensorflow session and saver objects
self.sess = tf.Session()
self.saver = tf.train.Saver()
# also, I have defined the loss function and optimizer as
self.loss = self._loss_function()
self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.loss)
if restore_file is not None:
print("model exists...loading from the model")
self.saver.restore(self.sess,restore_file)
else:
print("model does not exist...initializing")
self.sess.run(tf.initialize_all_variables())
def _build(self):
#builds CNN
def _loss_function(self):
# computes loss
#
def train(self, train_x, train_y, val_x, val_y):
# uses mini batch to minimize the loss
self.sess.run(self.optimizer, feed_dict = {self.image:sample, self.groundtruth:gt})
# I save the session after n=10 epochs as:
if epoch%n==0:
self.saver.save(sess,'snapshot',global_step = epoch)
# finally my predict function is
def predict(self, X):
return self.sess.run(self.pred, feed_dict={self.image:X})
I have trained two CNNs for two separate tasks independently. Each took around 1 day. Say, model1 and model2 are saved as 'snapshot-model1-10000' and 'snapshot-model2-10000' (with their corresponding meta files) respectively. I can test each model and compute its performance separately.
Now, I want to load these two models in a single script. I would naturally try to do as below:
cnn1 = CNN(..., restore_file='snapshot-model1-10000',..........)
cnn2 = CNN(..., restore_file='snapshot-model2-10000',..........)
I encounter the error [The error message is long. I just copied/pasted a snippet of it.]
NotFoundError: Tensor name "Variable_26/Adam_1" not found in checkpoint files /home/amitkrkc/codes/A549_models/snapshot-hela-95000
[[Node: save_1/restore_slice_85 = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save_1/Const_0, save_1/restore_slice_85/tensor_name, save_1/restore_slice_85/shape_and_slice)]]
Is there a way to load from these two files two separate CNNs? Any suggestion/comment/feedback is welcome.
Thank you,
Yes there is. Use separate graphs.
g1 = tf.Graph()
g2 = tf.Graph()
with g1.as_default():
cnn1 = CNN(..., restore_file='snapshot-model1-10000',..........)
with g2.as_default():
cnn2 = CNN(..., restore_file='snapshot-model2-10000',..........)
EDIT:
If you want them into same graph. You'll have to rename some variables. One idea is have each CNN in separate scope and let saver handle variables in that scope e.g.:
saver = tf.train.Saver(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES), scope='model1')
and in cnn wrap all your construction in scope:
with tf.variable_scope('model1'):
...
EDIT2:
Other idea is renaming variables which saver manages (since I assume you want to use your saved checkpoints without retraining everything. Saving allows different variable names in graph and in checkpoint, have a look at documentation for initialization.
This should be a comment to the most up-voted answer. But I do not have enough reputation to do that.
Anyway.
If you(anyone searched and got to this point) still having trouble with the solution provided by lpp AND you are using Keras, check following quote from github.
This is because the keras share a global session if no default tf session provided
When the model1 created, it is on graph1
When the model1 loads weight, the weight is on a keras global session which is associated with graph1
When the model2 created, it is on graph2
When the model2 loads weight, the global session does not know the graph2
A solution below may help,
graph1 = Graph()
with graph1.as_default():
session1 = Session()
with session1.as_default():
with open('model1_arch.json') as arch_file:
model1 = model_from_json(arch_file.read())
model1.load_weights('model1_weights.h5')
# K.get_session() is session1
# do the same for graph2, session2, model2
You need to create 2 sessions and restore the 2 models separately. In order for this to work you need to do the following:
1a. When you're saving the models you need to add scopes to the variable names. That way you will know which variables belong to which model:
# The first model
tf.Variable(tf.zeros([self.batch_size]), name="model_1/Weights")
...
# The second model
tf.Variable(tf.zeros([self.batch_size]), name="model_2/Weights")
...
1b. Alternatively, if you already saved the models you can rename the variables by adding scope with this script.
2.. When you restore the different models you need to filter by variable name like this:
# The first model
sess_1 = tf.Session()
sess_1.run(tf.initialize_all_variables())
saver_1 = tf.train.Saver([v for v in tf.all_variables() if 'model_1' in v.name])
saver_1.restore(sess_1, weights_1_file)
sess_1.run(pred, feed_dict={image: X})
# The second model
sess_2 = tf.Session()
sess_2.run(tf.initialize_all_variables())
saver_2 = tf.train.Saver([v for v in tf.all_variables() if 'model_2' in v.name])
saver_2.restore(sess_2, weights_2_file)
sess_2.run(pred, feed_dict={image: X})
I encountered the same problem and could not solve the problem (without retraining) with any solution i found on the internet. So what I did is load each model in two separate threads which communicate with the main thread. It is simple enough to write the code, you just have to be careful when you synchronize the threads.
In my case each thread received the input for its problem and returned to the main thread the output. It works without any observable overhead.
One way is to clear your session if you want to train or load multiple models in succession. You can easily do this using
from keras import backend as K
# load and use model 1
K.clear_session()
# load and use model 2
K.clear_session()`
K.clear_session() destroys the current TF graph and creates a new one.
Useful to avoid clutter from old models / layers.