Tensorflow warning: two cells provided to MultiRNNCell are the same object - tensorflow

I have been consistently receiving the following warning while executing tensorflow scripts
WARNING:tensorflow:At least two cells provided to MultiRNNCell are the
same object and will share weights.
lstm_layer=rnn.LSTMBlockCell(num_units,forget_bias=1)
lstm_layer=rnn.DropoutWrapper(lstm_layer, output_keep_prob=output_keep_prob)
stacked_lstm = rnn.MultiRNNCell([lstm_layer] * num_layers)
outputs,_=rnn.static_rnn(stacked_lstm,input,dtype="float32")
However, the RNNs in question appear to be running fine, and are making accurate predictions.
What are the implications in relation to the warning message? Can it be safely ignored? If it is potential serious, how might its impact be evaluated?

You use [lstm_layer] * num_layers to create multiple RNN layers that actually refer to a same object in python. This usage is normal in some versions of tensorflow, and some versions will report errors.
As the warning says, since all RNN layers are the same object, their weights will remain the same. All errors are fed back to an RNN layer. It is equivalent to reducing the parameters of the model and reducing the complexity of the model.
If you want to create multiple different RNN layers and complex models, you can use the following usage. The effectiveness evaluation of these two different methods depends on the specific application scenarios and results. If your model results are good enough, more complex models don't make much sense.
rnn_layers = []
for _ in range(num_layers):
lstm_layer = rnn.LSTMBlockCell(num_units, forget_bias=1)
lstm_layer = rnn.DropoutWrapper(lstm_layer, output_keep_prob=output_keep_prob)
rnn_layers.append(lstm_layer)
stacked_lstm = rnn.MultiRNNCell(rnn_layers)

Related

Function CNN model written in Keras 2.2.4 not learning in TensorFlow or Keras 2.4

I am dealing with an object detection problem and using a model which is actually functioning (its results have been published on a paper and I have the original code). Originally, the code was written with Keras 2.2.4 without importing TensorFlow and trained and tested on the same dataset that I am using at the moment. However, when I try to run the same model with TensorFlow 2.x it just won't learn a thing.
I have tried importing everything from TensorFlow 2.4, but I have the same problem if I import everything (layers, models, optimizers...) from Keras 2.4. And I have tried to do so on two different devices, both using a GPU. Namely, what is happening is that the loss function decreases ridiculously fast, but the accuracy won't increase a bit (or, if it does, it gets stuck around 10% or smth). Also, every now and then this happens from an epoch to the next one:
Loss undergoes HUGE jumps between consecutive epochs, and all this without any changes in accuracy
I have tried to train the network on another dataset (had to change the last layers in order to match the required dimensions) and the model seemed to be learning in a normal way, i.e. the accuracy actually increases and the loss doesn't reach 0.0x in one epoch.
I can't post the script, but the model is an Encoder-Decoder network: consecutive Convolutions with increasing number of filters reduce the dimensions of the image, and a specular path of Transposed Convolutions restores the original dimensions. So basically the network only contains:
Conv2D
Conv2DTranspose
BatchNormalization
Activation("relu")
Activation("sigmoid")
concatenate
6 is used to put together outputs from parallel paths or distant layers; 3 and 4 are used after every Conv or ConvTranspose; 5 is only used as final activation function, i.e. as output layer.
I think the problem is pretty generic and I am honestly surprised that I couldn't find a single question about it. What could be happening here? The problem must have something to do with TF/Keras versions, but I can't find any documentation about it and I have been trying to change so many things but nothing changes. It's crazy because if I didn't know that the model works I would try to rewrite it from scratch so I am afraid that this problem may occurr with a new network and I won't be able to understand whether it's the libraries or the model itself.
Thank you in advance! :)
EDIT
Code snippets:
Convolutional block:
encoder1 = Conv2D(filters=first_layer_channels, kernel_size=2, strides=2)(input)
encoder1 = BatchNormalization()(encoder1)
encoder1 = Activation('relu')(encoder1)
Decoder
decoder1 = Conv2DTranspose(filters=first_layer_channels, kernel_size=2, strides=2)(encoder4)
decoder1 = BatchNormalization()(decoder1)
decoder1 = Activation('relu')(decoder1)
Final layers:
final = Conv2D(filters=total, kernel_size=1)(decoder4)
final = BatchNormalization()(final)
Last_Conv = Activation('sigmoid')(final)
The task is human pose estimation: the network (which, I recall, works on this specific task with Keras 2.2.4) has to predict twenty binary maps containing the positions of specific keypoints.

How to initialize the model with certain weights?

I am using the example "stateful_clients" in tensorflow-federated examples. I want to use my pretrained model weights to initialize the model. I use the function model.load_weights(init_weight). But it seems that it doesn't work. The validation accuracy in the first round is still low. How can I solve the problem?
def tff_model_fn():
"""Constructs a fully initialized model for use in federated averaging."""
keras_model = get_five_layers_cnn([28, 28, 1])
keras_model.load_weights(init_weight)
loss = tf.keras.losses.SparseCategoricalCrossentropy()
return stateful_fedavg_tf.KerasModelWrapper(keras_model,
test_data.element_spec, loss)
A quick primer on state and model weights in TFF
TFF takes a distinct perspective on state in machine learning, generally a consequence of its desire to be purely functional.
Usually in machine learning, a model is conceptually a function which takes data and produces a prediction. However, this notion is a little overloaded at times; does 'model' refer to a trained model (fitting the specification above), or an architecture which is parameterized by its parameters, and therefore needs to accept these parameters as an argument to be considered truly a 'function'? A conception somewhat in the middle is that of a 'stateful function', which I think tends to be what people intend to refer to when they use the term 'model'.
TFF standardizes on the latter understanding. For TFF, a 'model' is a function which accepts parameters along with data as an argument, producing a prediction. This is generally to avoid the notion of a stateful function, which is disallowed by a purely functional perspective (f(x) == f(x) should always be true, so f cannot have any state which affects its output).
On the code in question
I'm not super familiar with this portion of the TFF codebase; in particular I'm a little surprised at the behavior of the keras model wrapper, as usually TFF wants to serialize all logic into TFF-defined data structures as soon as possible (at least, this is how I think about it). Glancing at the code, it looks to me like it could work--but there have been exciting interactions between TFF and Keras in the past.
Briefly, here is how this path should be working:
The model function you define above is invoked while building the initialize computation, in a graph context; the logic to load weights (or assignment of the weights themselves, baked into the graph as a constant) would hopefully be serialized into the graph that TFF generates to represent initialize.
Upon calling iterative_process.initialize, you would find your desired weights populated in the appropriate attributes of the returned data structure. This would serve as your initial starting point for your iterative process, and you would be off to the races.
What I am suspicious of in the above is 1. TFF will silently invoke your model_fn in a TensorFlow graph context, resulting in non program-order semantics; if there is no control dependency between the assignment and the return value of your function (which there isn't in the code above, and in fact it is not obvious how to force this), the assignment may be skipped at initialize time. Therefore the state returned from initialize won't have your specified weights.
If this suspicion is true, the appropriate solution is to run this to run the weight loading logic directly in Python. TFF provides some utilities to help with this kind of thing, like tff.learning.state_with_new_model_weights. This would be used like:
state = iterative_process.initialize()
weights = tf.keras.load_weights(...) # No idea if this call is correct, probably not.
state_with_loaded_weights = tff.learning.state_with_new_model_weights(state, weights)
...
# continue on using state in the iterative process

Using different optimizers to train the same layer in tensorflow

I have a model which consists of convolutional layers followed by fully connected layers. I trained this model on the fer dataset. This is considered a classification problem where the number of output is equal to 8.
After training this model, I kept the fully connected layer, and replaced only the last layer with a new one that has 3 outputs. Therefore, the purpose was to fine tune the fully connected layers along with training the output layer.
Therefore, I have used an optimizer at the beginning to train the whole model. Then I created a new optimizer to fine tune the fully connected layer along with training the last layer.
As a result, I got the following error:
ValueError: Variable Dense/dense/bias/Adam/ already exists,
I know the reason for getting error. The second optimizer was trying to create a kernel for updating the weights using the same name; because a kernel with the same name was created by the first optimizer.
Hence, I would like to know how to fix this problem. Is there a way to delete the kernels associated with the first optimizer?
Any help is much appreciated!!
This is probably caused by both optimizers using the (same) default name 'Adam'. To avoid this clash, you can give the second optimizer a different name, e.g.
opt_finetune = tf.train.AdamOptimizer(name='Adam_finetune')
This should make opt_finetune create its variables under different names. Please let us know whether this works!

Purpose of batch channel in tensorflow model on forward pass of 1 input

So far I have trained a couple different models in TensorFlow (with Keras) and I see that getting the batch_size right seems to be important not just for speed of training but also the resultant accuracy of the model.
What confuses me is a case where a model has an actual batch channel as the first dimension on the input (and output as well). If my batch size is 32 but I'm always inputting 1 data at run-time then where does the batch channel apply? How could I utilize the vast majority of it if I'm inherently only using 1/batch_size amount of it in forward pass?
If you are curious the model I am researching, it is this one:
https://github.com/pierluigiferrari/ssd_keras/blob/master/models/keras_ssd300.py
see:
Output shape of predictions: (batch, n_boxes_total, n_classes + 4 + 8)
predictions = Concatenate(axis=2, name='predictions')([mbox_conf_softmax, mbox_loc, mbox_priorbox])
The tensors had run through numerous other layers that had constants and such pretrained with [batch_size] as well. To me it just seems like inputs at various batch index would have to yield different results. Maybe I just need something incredibly obvious pointed out to me.
It would seem that after training you must recompile the model with a batch size of 1, then transfer the weights from the training model to the new model for evaluation. The alternative is performing 'batch_size' count of predictions at once (which of course is not always feasible per application). If there are alternatives (or if I read wrong) please feel free to add an answer.

Tensorflow: jointly training CNN + LSTM

There are quite a few examples on how to use LSTMs alone in TF, but I couldn't find any good examples on how to train CNN + LSTM jointly.
From what I see, it is not quite straightforward how to do such training, and I can think of a few options here:
First, I believe the simplest solution (or the most primitive one) would be to train CNN independently to learn features and then to train LSTM on CNN features without updating the CNN part, since one would probably have to extract and save these features in numpy and then feed them to LSTM in TF. But in that scenario, one would probably have to use a differently labeled dataset for pretraining of CNN, which eliminates the advantage of end to end training, i.e. learning of features for final objective targeted by LSTM (besides the fact that one has to have these additional labels in the first place).
Second option would be to concatenate all time slices in the batch
dimension (4-d Tensor), feed it to CNN then somehow repack those
features to 5-d Tensor again needed for training LSTM and then apply a cost function. My main concern, is if it is possible to do such thing. Also, handling variable length sequences becomes a little bit tricky. For example, in prediction scenario you would only feed single frame at the time. Thus, I would be really happy to see some examples if that is the right way of doing joint training. Besides that, this solution looks more like a hack, thus, if there is a better way to do so, it would be great if someone could share it.
Thank you in advance !
For joint training, you can consider using tf.map_fn as described in the documentation https://www.tensorflow.org/api_docs/python/tf/map_fn.
Lets assume that the CNN is built along similar lines as described here https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10.py.
def joint_inference(sequence):
inference_fn = lambda image: inference(image)
logit_sequence = tf.map_fn(inference_fn, sequence, dtype=tf.float32, swap_memory=True)
lstm_cell = tf.contrib.rnn.LSTMCell(128)
output_state, intermediate_state = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=logit_sequence)
projection_function = lambda state: tf.contrib.layers.linear(state, num_outputs=num_classes, activation_fn=tf.nn.sigmoid)
projection_logits = tf.map_fn(projection_function, output_state)
return projection_logits
Warning: You might have to look into device placement as described here https://www.tensorflow.org/tutorials/using_gpu if your model is larger than the memory gpu can allocate.
An Alternative would be to flatten the video batch to create an image batch, do a forward pass from CNN and reshape the features for LSTM.