Number of layers in the NN model keeps increasing every time I call a new instance of the model - tensorflow

I am working for a while now with tensorflow and jupyter but this is the first time I have encountered this problem. I have a NN model which has 6 layers and I get an instance of this model by calling the function "classifier"
def classifier(input_repr,prob,reuse=None):
e_l1=tf.layers.dense(inputs=input_repr,units=512,activation=tf.nn.leaky_relu)
e_l1=tf.nn.dropout(e_l1,prob)
e_l2=tf.layers.dense(inputs=e_l1,units=256,activation=tf.nn.leaky_relu)
e_l2=tf.nn.dropout(e_l2,prob)
e_l3=tf.layers.dense(inputs=e_l2,units=128,activation=tf.nn.leaky_relu)
e_l3=tf.nn.dropout(e_l3,prob)
e_l4=tf.layers.dense(inputs=e_l3,units=64,activation=tf.nn.leaky_relu)
e_l4=tf.nn.dropout(e_l4,prob)
e_l5=tf.layers.dense(inputs=e_l4,units=32,activation=tf.nn.leaky_relu)
e_l5=tf.nn.dropout(e_l5,prob)
d_l3=tf.layers.dense(inputs=e_l5,units=1,activation=tf.nn.leaky_relu)
return d_l3
I also have a function to visualize the model summary as
def model_summary():
model_vars = tf.trainable_variables()
slim.model_analyzer.analyze_vars(model_vars, print_info=True)
print(model_summary())
And i get the model instance as,
model_output=classifier(input_repr,prob)
The problem is whenever I call this, and then i call model_summary() the layers gets stacked up to the previous model. For eg, if when i call the "classifier()" for the first time, model_Summary() shows 5 layers, but when i call it again it shows 10 layers and so on. I always initialize again before calling the classifier() method but it just happens again and again. I don't know if that is some issue with jupyter. The only way I know to solve this problem is to completely restart the kernel, which leads to loss of variables.

Don't forget to reset the default graph tf.reset_default_graph() before creating your model. The issue is that the notebook is running in a single thread, and Tensorflow stacks new nodes on the graph whenever you build the graph over and over again. That's why when prototyping in Jupyter notebook, always reset the default graph when you start build a new graph.

every time you call function classifier you create additional layers, when you will create your model and compile it only use model object to model.fit and model.predict

Related

Loss tensor being pruned out of graph in PopART

I’ve written a very simple PopART program using the C++ interface, but every time I try to compile it to run on an IPU device I get the following error:
terminate called after throwing an instance of ‘popart::error’
what(): Could not find loss tensor ‘L1:0’ in main graph tensors
I’m defining the loss in my program like so:
auto loss = builder->aiGraphcoreOpset1().l1loss({outputs[0]}, 0.1f, popart::ReductionType::Sum, “l1LossVal”);
Is there something wrong with my loss definition that’s resulting in it being pruned out of the graph? I’ve followed the same structure as one of the Graphcore examples here.
This error usually happens when the model protobuf you pass to the TrainingSession or InferenceSession objects doesn’t contain the loss tensor. A common reason for this is when you call builder->getModelProto() before you add the loss tensor to the graph. To ensure your loss tensor is part of the protobuf your calls should be in the following order:
...
auto loss = builder->aiGraphcoreOpset1().l1loss(...);
auto proto = builder->getModelProto();
auto session = popart::TrainingSession::createFromOnnxModel(...);
...
The key point is that the getModelProto() call should be the last call from the builder interface before setting up the PopART session.

Outputting multiple loss components to tensorboard from tensorflow estimators

I am pretty new to tensorflow and I am struggling to get tensorboard to display some of my custom metrics. The model I am working with is a tf.estimator.Estimator, with an associated EstimatorSpec. The first new metric I am trying to log is from my loss function, which is composed of two components: a loss for an age prediction (tf.float32) and a loss for a class prediction (one-hot/multiclass), which I add together to determine a total loss (my model is predicting both a class and an age). The total loss is output just fine during training and shows up on tensorboard, but I would like to track the individual age and the class prediction loss components as well.
I think a solution that is supposed to work is to add a eval_metric_ops argument to the EstimatorSpec as described here (Custom eval_metric_ops in Estimator in Tensorflow). I have not been able to make this approach work, however. I defined a custom metric function that looks like this:
def age_loss_function(labels, ages_pred, ages_true):
per_sample_age_loss = get_age_loss_per_sample(ages_pred, ages_true) ### works fine
#### The error happens on this line:
mean_abs_age_diff, age_loss_update_fn = tf.metrics.Mean(per_sample_age_loss)
######
return mean_abs_age_diff, age_loss_update_fn
eval_metric_ops = {"age_loss": age_loss_function} #### Want to use this in EstimatorSpec
The instructions seem to say that I need both the error metric and the update function which should both be returned from the tf.metrics command as in examples like the one I linked. But this command fails for me with the error message:
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with #tf.function.
I am probably just misusing the APIs. If someone can guide me on the proper usage I would really appreciate it. Thanks!
It looks like the problem was from a version change. I had updated to tensorflow 2.0 while the instructions I was following were from 1.X. Using tf.compat.v1.metrics.mean() instead gets past this problem.

Keras reset layer numbers

Keras assigns incrementing ID numbers to layers of the same type, e.g. max_pooling1d_7, max_pooling1d_8, max_pooling1d_9,etc. Each iteration of my code constructs a new model, starting with model = Sequential() and then adding layers via model.add(). Even though each cycle creates a new Sequential object, the layer ID numbers continue incrementing from the previous cycle. Since my process is long-running these ID numbers can grow very large. I am concerned that this could cause some problem. Why are the IDs not reset by model = Sequential()? Is there a way to reset them? After each cycle I have no use for the layer ID numbers and can discard them, but how? I am using the Tensorflow backend.
The solution, from Attempting to reset tensorflow graph when using keras, failing:
from keras import backend as K
K.clear_session()
Do you use jupyter notebooks? It seems like while you're rebuilding your model your tensorlow session won't restart.
Because keras refers to tensorflow graphs per name it's necessary, that the counting continues.
So if you don't want to restartthe session you're fine. However this also means, that the tensorflow session gets bigger and bigger, so restarting the session might be the desired approach.
For this restart the complete program/kernel.
Each iteration shouldn't construct a new model. Training should go through in same model. Maybe post your code to see what might goes wrong.
Add
tf.keras.backend.clear_session()
as in https://www.tensorflow.org/api_docs/python/tf/keras/backend/clear_session

Cannot run Tensorflow code multiple times in Jupyter Notebook

I'm struggling running Tensorflow (v1.1) code multiple times in Jupyter Notebook.
For example, I execute this simple code snippet that creates an encoding layer for a seq2seq model:
# Construct encoder layer (LSTM)
encoder_cell = tf.contrib.rnn.LSTMCell(encoder_hidden_units)
encoder_outputs, encoder_final_state = tf.nn.dynamic_rnn(
encoder_cell, encoder_inputs_embedded,
dtype=tf.float32, time_major=False
)
First time is totally fine, my encoder is created.
However, if I rerun it (no matter the changes I've applied), I get this error:
Attempt to have a second RNNCell use the weights of a variable scope that already has weights
It's very annoying as it forces me to restart the kernel every time I want to change a layer.
Can someone explain me why this happens and how I can fix this ?
Thanks!
You are trying to build the exact same graph twice and therefore TensorFlow complains because the variables already exist in the default graph.
What you could do is to call tf.reset_default_graph() before trying to call the method a second time to ensure you create a new graph when required.
Just in case, I would also suggest using an interactive session as described here in the Start TensorFlow InteractiveSession section:
import tensorflow as tf
sess = tf.InteractiveSession()

How to structure the model for training and evaluation on the test set

I want to train a model. Every 1000 steps, I want to evaluate it on the test set and write it to the tensorboard log. However, there's a problem. I have a code like this:
image_b_train, label_b_train = tf.train.shuffle_batch(...)
out_train = model.inference(image_b_train)
accuracy_train = tf.reduce_mean(...)
image_b_test, label_b_test = tf.train.shuffle_batch(...)
out_test = model.inference(image_b_test)
accuracy_test = tf.reduce_mean(...)
where model inference declares the variables in the model. However, there's a problem. For the test set I have a separate queue, and I can't swap one queue for another with tensorflow.
Currently I solved the problem by creating 2 graphs, one for training and the other for testing. I copy from one graph to the other with tf.train.Saver. Another solution might be to use tf.get_variable, but this is a global variable, and I don't like it because my code becomes less reusable.
Yes, you need two graphs. These graphs can share variables. This can be done by:
Using Keras layers (from tf.contrib.keras) which let you define the model once and use it to compute two inference graphs
Using slim-style layers (from tf.layers) with tf.get_variable and reuse
Using tf.make_template to make your own model-like object which can be called once to build the training graph and once to build the inference graph
Using tf.estimator.Estimator which lets you define a model function once and runs it automatically for training and evaluation for you
There are other options, but any of these is well-supported and should unblock you.