Discriminative Learning Rates for Keras - tensorflow

I wish to apply different learning rates for each layer (as is done in Fastai) for Keras. All I have found in coming close to this is by modifying this line self.optimizer.apply_gradients(zip(gradients, trainable_vars)) in the keras code block found here by multiplying the gradients with its corresponding learning rate (and set the global learning rate to 1).
However, this method would only work with SGD and other simple optimizers as things such as momentum would distort this simple multiplication of the gradient.
Here is ideally how I would like to implement it:
# example data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype("float32") / 255.0
x_test = x_test.reshape(-1, 784).astype("float32") / 255.0
lrs = [1e-2, 1e-1]
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(20, input_dim=784, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=lrs), # DOESNT WORK
loss="mean_squared_error",
metrics=["mean_absolute_error"],
)
Any thoughts on how I could implement this. Another idea is to have the same number of optimizers as learning rates, and to update them in the train_step function inside the custom model.

This is possible using tensorflow addons now.
pip install tensorflow-addons
import tensorflow_addons as tfa
optimizers = [Adam(learning_rate=1e-2),
Adam(learning_rate=1e-1)]
#specifying the optimizers and layers in which it will operate
optimizers_and_layers = [
(optimizers[0], model.layers[0]),
(optimizers[1], model.layers[1]),
]
# Using Multi Optimizer from Tensorflow Addons
opt = tfa.optimizers.MultiOptimizer(optimizers_and_layers)
model.compile(
optimizer=opt,
loss="mean_squared_error",
metrics=["mean_absolute_error"],
)

Related

How to visualize graph without training the model using Tensorboard?

I'm trying to visualize the model in Tensorboard without training.
I checked this and that, but this still doesn't work even for the simplest model.
import tensorflow as tf
import tensorflow.keras as keras
# Both tf.__version__ tensorboard.__version__ are 2.5.0
s_model = keras.models.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(32, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(10, activation='softmax')
])
logdir = '.../logs'
_callbacks = keras.callbacks.TensorBoard(log_dir=logdir)
_callbacks.set_model(s_model) # This is exactly suggested in the link
When I did the above, I get the error message:
Graph visualization failed.
Error: Malformed GraphDef. This can sometimes be caused by a bad
network connection or difficulty reconciling mulitple GraphDefs; for
the latter case, please refer to
https://github.com/tensorflow/tensorboard/issues/1929.
I don't think this is a reconciliation problem because it is not a custom function, and if I compile the model, train, then I can get the graph visualization I wanted.
s_model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
(train_images, train_labels), _ = keras.datasets.fashion_mnist.load_data()
train_images = train_images / 255.0
logdir = '.../logs'
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)
s_model.fit(
train_images,
train_labels,
batch_size=64,
epochs=5,
callbacks=[tensorboard_callback])
This gives the wanted graph visualization. But is there any other way to get graph visualization in Tensorboard without training?
Of course, I'm also aware that workaround, i.e. train with the tf.random.normal() for a while, would do the trick but I'm looking for the neat way like _callbacks.set_model(s_model)...

One back-propagation pass in keras [duplicate]

I am interested in building reinforcement learning models with the simplicity of the Keras API. Unfortunately, I am unable to extract the gradient of the output (not error) with respect to the weights. I found the following code that performs a similar function (Saliency maps of neural networks (using Keras))
get_output = theano.function([model.layers[0].input],model.layers[-1].output,allow_input_downcast=True)
fx = theano.function([model.layers[0].input] ,T.jacobian(model.layers[-1].output.flatten(),model.layers[0].input), allow_input_downcast=True)
grad = fx([trainingData])
Any ideas on how to calculate the gradient of the model output with respect to the weights for each layer would be appreciated.
To get the gradients of model output with respect to weights using Keras you have to use the Keras backend module. I created this simple example to illustrate exactly what to do:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
To calculate the gradients we first need to find the output tensor. For the output of the model (what my initial question asked) we simply call model.output. We can also find the gradients of outputs for other layers by calling model.layers[index].output
outputTensor = model.output #Or model.layers[index].output
Then we need to choose the variables that are in respect to the gradient.
listOfVariableTensors = model.trainable_weights
#or variableTensors = model.trainable_weights[0]
We can now calculate the gradients. It is as easy as the following:
gradients = k.gradients(outputTensor, listOfVariableTensors)
To actually run the gradients given an input, we need to use a bit of Tensorflow.
trainingExample = np.random.random((1,8))
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:trainingExample})
And thats it!
The below answer is with the cross entropy function, feel free to change it your function.
outputTensor = model.output
listOfVariableTensors = model.trainable_weights
bce = keras.losses.BinaryCrossentropy()
loss = bce(outputTensor, labels)
gradients = k.gradients(loss, listOfVariableTensors)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:training_data1})
print(evaluated_gradients)

Is it possible to override the progress bar of TensorFlow's keras?

In the last days, I have been observing a weird behavior in the printed loss in the progress bar. It turned out that the weird behaviour was due to the fact that the default progress bar of keras displays a moving average of the losses (rather than the actual losses at every epoch).
So, is it possible to override the progress bar of TensorFlow's keras? I don't think so.
There's the class tf.keras.utils.Progbar that contains the parameter stateful_metrics, which is probably what I need, but fit doesn't seem to provide an option to override the progress bar or to change the behaviour from moving average to actual loss of the epoch/step. What alternative do you suggest? Feel free to write an answer below with some reproducible code.
It sounds like what you want should be done through tf.keras.callbacks.ProgbarLogger. Theoretically it should work as outlined in the following example, however, there is currently an issue with tf.keras.callbacks.ProgbarLogger.
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Flatten
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255, x_test / 255
model = tf.keras.Sequential([
Flatten(),
Dense(128, activation='relu'),
Dense(10)
])
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
progbar_callback = tf.keras.callbacks.ProgbarLogger(stateful_metrics="accuracy")
model.fit(x_train, y_train, callbacks=[progbar_callback])

Resnet-50 adversarial training with cleverhans FGSM accuracy stuck at 5%

I am facing a strange problem when adversarially training a resnet-50, and I am not sure whether is's a logical error, or a bug somewhere in the code/libraries.
I am adversarially training a resnet-50 thats loaded from Keras, using the FastGradientMethod from cleverhans, and expecting the adversarial accuracy to rise at least above 90% (probably 99.x%). The training algorithm, training- and attack-params should be visible in the code.
The problem, as already stated in the title is, that the accuracy is stuck at 5% after training ~3000 of 39002 training inputs in the first epoch. (GermanTrafficSignRecognitionBenchmark, GTSRB).
When training without and adversariy loss function, the accuracy does not get stuck after 3000 samples, but continues to rise > 0.95 in the first epoch.
When substituting the network with a lenet-5, alexnet and vgg19, the code works as expected, and an accuracy absolutely comparabele to the non-adversarial, categorical_corssentropy lossfunction is achieved. I've also tried running the procedure using solely tf-cpu and different versions of tensorflow, the result is always the same.
Code for obtaining ResNet-50:
def build_resnet50(num_classes, img_size):
from tensorflow.keras.applications import ResNet50
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Flatten
resnet = ResNet50(weights='imagenet', include_top=False, input_shape=img_size)
x = Flatten(input_shape=resnet.output.shape)(resnet.output)
x = Dense(1024, activation='sigmoid')(x)
predictions = Dense(num_classes, activation='softmax', name='pred')(x)
model = Model(inputs=[resnet.input], outputs=[predictions])
return model
Training:
def lr_schedule(epoch):
# decreasing learning rate depending on epoch
return 0.001 * (0.1 ** int(epoch / 10))
def train_model(model, xtrain, ytrain, xtest, ytest, lr=0.001, batch_size=32,
epochs=10, result_folder=""):
from cleverhans.attacks import FastGradientMethod
from cleverhans.utils_keras import KerasModelWrapper
import tensorflow as tf
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import LearningRateScheduler, ModelCheckpoint
sgd = SGD(lr=lr, decay=1e-6, momentum=0.9, nesterov=True)
model(model.input)
wrap = KerasModelWrapper(model)
sess = tf.compat.v1.keras.backend.get_session()
fgsm = FastGradientMethod(wrap, sess=sess)
fgsm_params = {'eps': 0.01,
'clip_min': 0.,
'clip_max': 1.}
loss = get_adversarial_loss(model, fgsm, fgsm_params)
model.compile(loss=loss, optimizer=sgd, metrics=['accuracy'])
model.fit(xtrain, ytrain,
batch_size=batch_size,
validation_data=(xtest, ytest),
epochs=epochs,
callbacks=[LearningRateScheduler(lr_schedule)])
Loss-function:
def get_adversarial_loss(model, fgsm, fgsm_params):
def adv_loss(y, preds):
import tensorflow as tf
tf.keras.backend.set_learning_phase(False) #turn off dropout during input gradient calculation, to avoid unconnected gradients
# Cross-entropy on the legitimate examples
cross_ent = tf.keras.losses.categorical_crossentropy(y, preds)
# Generate adversarial examples
x_adv = fgsm.generate(model.input, **fgsm_params)
# Consider the attack to be constant
x_adv = tf.stop_gradient(x_adv)
# Cross-entropy on the adversarial examples
preds_adv = model(x_adv)
cross_ent_adv = tf.keras.losses.categorical_crossentropy(y, preds_adv)
tf.keras.backend.set_learning_phase(True) #turn back on
return 0.5 * cross_ent + 0.5 * cross_ent_adv
return adv_loss
Versions used:
tf+tf-gpu: 1.14.0
keras: 2.3.1
cleverhans: > 3.0.1 - latest version pulled from github
It is a side-effect of the way we estimate the moving averages on BatchNormalization.
The mean and variance of the training data that you used are different from the ones of the dataset used to train the ResNet50. Because the momentum on the BatchNormalization has a default value of 0.99, with only 10 iterations it does not converge quickly enough to the correct values for the moving mean and variance. This is not obvious during training when the learning_phase is 1 because BN uses the mean/variance of the batch. Nevertheless when we set learning_phase to 0, the incorrect mean/variance values which are learned during training significantly affect the accuracy.
You can fix this problem by below approachs:
More iterations
Reduce the size of the batch from 32 to 16(to perform more updates per epoch) and increase the number of epochs from 10 to 250. This way the moving average and variance will converge to the correct values.
Change the momentum of BatchNormalization
Keep the number of iterations fixed but change the momentum of the BatchNormalization layer to update more aggressively the rolling mean and variance (not recommended for production models).
On the original snippet, add the following code between reading the base_model and defining the new layers:
# ....
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=input_shape)
# PATCH MOMENTUM - START
import json
conf = json.loads(base_model.to_json())
for l in conf['config']['layers']:
if l['class_name'] == 'BatchNormalization':
l['config']['momentum'] = 0.5
m = Model.from_config(conf['config'])
for l in base_model.layers:
m.get_layer(l.name).set_weights(l.get_weights())
base_model = m
# PATCH MOMENTUM - END
x = base_model.output
# ....
Would also recommend you to try another hack provided bu us here.

Sequentialmodels without `input_shape` passed to the 1st layer cannot reload optimizer state

WARNING:tensorflow:Sequential models without an input_shape passed to the first layer cannot reload their optimizer state. As a result, your model is starting with a freshly initialized optimizer.
while trying to load a saved model i encountered this warning from tensorflow
import tensorflow.keras as keras
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=3)
model.save('epic_num_reader.model')
new_model = tf.keras.models.load_model('epic_num_reader.model')
predictions = new_model.predict(x_test)
Had the same problem after upgrading to TF 1.14, I fixed it changing the definition of the first layer from this:
model.add(tf.keras.layers.Flatten())
to this
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
where 28 is the size of the input map to be flattened (mnist pixels in our case)
As the warning suggest, your first layer need the argument input_shape. In your case this would be the layer Flatten.
In the keras Documentation there is an extra section about the sequential API. See here for further information.
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
for the first layer after tf 1.14 it is require to use input type which is the dimensions for the particular image.
Or you might get warning while retrieving model to not get proper working for your optimizer