apply_gradients might have been removed in future versions of optimizer in tensorflow or keras. DO not know why but I am getting this:
AttributeError: 'Adam' object has no attribute 'apply_gradients'
Any other way to achieve the same thing?
apply_gradients is something that is only possible in tensorflow.keras, because you can make manual training loops with eager execution on.
Pure keras must use symbolic graph and can only apply gradients with fit or train_on_batch.
I had the same problem. In the end, this initializer worked:
optimizer = tf.keras.optimizers.Adam()
But these lead to the error:
optimizer = keras.optimizers.Adam()
optimizer = tf.python.keras.optimizers.Adam()
Related
I'm using the following code to load an imagenet pre-trained VGG19 model and fit to my custom dataset.
from keras.applications.vgg19 import VGG19
optim = tf.keras.optimizers.RMSprop(momentum=0.9)
vgg19 = VGG19(include_top=False, weights='imagenet', input_tensor=tf.keras.layers.Input(shape=(224, 224, 3)))
vgg19.trainable = False
# x = keras.layers.GlobalAveragePooling2D()(model_vgg19_pt.output)
x = keras.layers.Flatten()(vgg19.output)
output = keras.layers.Dense(n_classes, activation='softmax')(x)
model_vgg19_pt = keras.models.Model(inputs=[vgg19.input], outputs=[output])
model_vgg19_pt.compile(optimizer=optim,
loss='categorical_crossentropy', metrics=['categorical_accuracy'])
callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
model_vgg19_pt.fit(x_train, y_train, batch_size=20,
epochs=50, callbacks=[callback]
)
on model.fit() line, I get the following error
KeyError: 'The optimizer cannot recognize variable dense_1/kernel:0. This usually means you are trying to call the optimizer to update different parts of the model separately. Please call optimizer.build(variables) with the full list of trainable variables before the training loop or use legacy optimizer `tf.keras.optimizers.legacy.{self.class.name}.'
What does it mean and how can I fix it?
I get the same errors for
keras.applications.inception_v3
too, when using the same implementation method.
Additionally, this was working with jupyter notebook file on tensorflow cpu, but when running on a remote machine with tensorflow-gpu installed, I'm getting these errors.
This works fine with optimizer SGD, but not with RMSprop. why?
Additional
Using this:
model_vgg19_pt.compile(optimizer=tf.keras.optimizers.RMSprop(momentum=0.9),
loss='categorical_crossentropy', metrics=['categorical_accuracy'])
instead as used above works. But can somebody explain why....
Which version of Tensorflow GPU have you installed? TensorFlow 2.10 was the last TensorFlow release that supported GPU on native-Windows. Please check the link to install TensorFlow by following all the Hardware/Software requirements for the GPU support.
The LearningRateScheduler arguments in callback is not defined which you are passing while model compilation.
I was able to train the model after removing the callback from model.fit(). (Attaching the gist here for your reference)
It's been days that I've been struggling just to simply view layers' gradients in the debug mode of Keras2. Needless to say, I have already tried codes such as:
import Keras.backend as K
gradients = K.gradients(model.output, model.input)
sess = tf.compat.v1.keras.backend.get_session()
evaluated_gradients = sess.run(gradients, feed_dict={model.input:images})
or
evaluated_gradients = sess.run(gradients, feed_dict{model.input.experimantal_ref():images})
or
with tf.compat.v1.Session(graph=tf.compat.v1.keras.backend.get_default_graph())
or similar approaches using
tf.compat.v1
which all lead to the following error:
RuntimeError: The Session graph is empty. Add operations to the graph
before calling run().
I assume this should be the most basic tool any deep learning package could provide, it is strange why there seems no easy way to do so in Keras2. Any ideas?
You can try to do this on TF 2 with eager mode on.
Please notice that you need to use tf.keras for everything, including your model, layers, etc. For this to work you can never use keras alone, it must be tf.keras. This means, for instance, using tf.keras.layers.Dense, tf.keras.models.Sequential, etc..
input_images_tensor = tf.constant(input_images_numpy)
with tf.GradientTape() as g:
g.watch(input_images_tensor)
output_tensor = model(input_images_tensor)
gradients = g.gradient(output_tensor, input_images_tensor)
If you are going to calculate the gradients more than once with the same tape, you need the tape to be persistent=True and delete it manually after you get the gradients. (See details on the link below)
You can get the gradients regarding any "trainable" weight without needing watch. If you are going to get gradients with respect to non-trainable tensors (such as the input images), then you must call g.watch as above for each of these variables).
More details on GradientTape: https://www.tensorflow.org/api_docs/python/tf/GradientTape
I am trying to infer mobilenetV2 model.
I have trained the model using tensorflow/models/slim.
The model is giving proper accuracy with is_training=true.
But when I do, is_training=false and save the model, then inference is giving very less accuracy.
I could see the below difference in the graph in these two cases.
With is_training=true, then moving_mean and moving_variance become Const and Const_1 respectively. This is the only difference I could see.
And during inference, the output of FusedBatchNorm node is different in these two cases.
Please, someone, help me to understand why is it happening and how to resolve this issue?
Having the same issue.
With TF Document Suggestions as given below :
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)
I am getting better Accuracy at TEST with is_training : False
I have my workload partitioned on two GPUs (aka, model partitioning). By default, TF/Keras allocates all the gradients on GPU0 but I want to use the colocate_gradients_with_ops to spread the allocation across two GPU.
I'm looking for a simple way to do that in Keras. My way was to create a new optimizer subclassed from tf.train.AdamOptimizer just to flip the default value of colocate_gradients_with_ops (from False to True) . Also I have to flip it in two methods!
I'm looking for a shorter, more direct way than the one below in Keras.
class MyAdamOptimizer(tf.train.AdamOptimizer):
def compute_gradients(self,
loss,
var_list=None,
gate_gradients=tf.train.Optimizer.GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=True,
grad_loss=None):
return super(MyAdamOptimizer, self).compute_gradients(
loss,
var_list=None,
gate_gradients=tf.train.Optimizer.GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=True,
grad_loss=None)
def minimize(
loss,
global_step=None,
var_list=None,
gate_gradients=tf.train.Optimizer.GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=True,
name=None,
grad_loss=None):
return super(MyAdamOptimizer, self).minimize(
loss,
global_step=None,
var_list=None,
gate_gradients=tf.train.Optimizer.GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=True,
name=None,
grad_loss=None)
Then I call
model.compile(optimizer=MyAdamOptimizer(learning_rate=0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])
There is no simpler way. Keras AdamOptimizer uses its own implementation from basic operators. You have to use a custom optimizer for colocate_gradients_with_ops. If the purpose is to improve multi-GPU performance, you can try Keras-MXNet's AdamOptimizer, we overloaded Keras' Optimizer class and have better efficiency on multi-GPUs. You don't have to change your training code.
How do I update moving mean and moving variance in keras BatchNormalization?
I found this in tensorflow documentation, but I don't know where to put train_op or how to work it with keras models:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize( loss )
No posts I found say what to do with train_op and whether you can use it in model.compile.
You do not need to manually update the moving mean and variances if you are using the BatchNormalization layer. Keras takes care of updating these parameters during training, and to keep them fixed during testing (by using the model.predict and model.evaluate functions, same as with model.fit_generator and friends).
Keras also keeps track of the learning phase so different codepaths run during training and validation/testing.
If you need just update the weights for existing model with some new values then you can do the following:
w = model.get_layer('batchnorm_layer_name').get_weights()
# Order: [gamma, beta, mean, std]
for j in range(len(w[0])):
gamma = w[0][j]
beta = w[1][j]
run_mean = w[2][j]
run_std = w[3][j]
w[2][j] = new_run_mean_value1
w[3][j] = new_run_std_value2
model.get_layer('batchnorm_layer_name').set_weights(w)
There are two interpretations of the question: the first is assuming that the goal is to use high level training api and this question was answered by Matias Valdenegro.
The second - as discussed in the comments - is whether it is possible to use batch normalization with the standard tensorflow optimizer as discussed here keras a simplified tensorflow interface and the section "Collecting trainable weights and state updates". As mentioned there the update ops are accessible in layer.updates and not in tf.GraphKeys.UPDATE_OPS, in fact if you have a keras model in tensorflow you can optimize with a standard tensorflow optimizer and batch normalization like this
update_ops = model.updates
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize( loss )
and then use a tensorflow session to fetch the train_op. To distinguish training and evaluation modes of the batch normalization layer you need to feed the
learning phase state of the keras engine (see "Different behaviors during training and testing" on the same tutorial page as given above). This would work for example like this
...
# train
lo, _ = tf_sess.run(fetches=[loss, train_step],
feed_dict={tf_batch_data: bd,
tf_batch_labels: bl,
tensorflow.keras.backend.learning_phase(): 1})
...
# eval
lo = tf_sess.run(fetches=[loss],
feed_dict={tf_batch_data: bd,
tf_batch_labels: bl,
tensorflow.keras.backend.learning_phase(): 0})
I tried this in tensorflow 1.12 and it works with models containing batch normalization. Given my existing tensorflow code and in the light of approaching tensorflow version 2.0 I was tempted to use this approach myself, but given that this approach is not being mentioned in the tensorflow documentation I am not sure this will be supported in the long term and I finally have decided to not use it and to invest a little bit more to change the code to use the high level api.