Reusable block in Keras' functional API - tensorflow

The goal is to create a block of layers, using Keras' functional API, which is usable (also syntax-wise) like a 'normal' Keras layer.
Here is a toy example
from tensorflow.keras import layers as kl
def layer_block(prev_layer, args):
# some code using 'args'
layer = kl.Dense(units=prev_layer.shape[1])(prev_layer)
layer = kl.Dense(units=5)(layer)
layer = kl.Dense(units=prev_layer.shape[1])(layer)
return layer
This block is called using layer_block(prev_layer, args) which is in contradiction to Keras' functional API's syntax. It should rather look like layer_block(args)(prev_layer).
The approach so far is to wrap this block by another block:
def outer_block(args):
def layer_block(prev_layer, args):
# some code using 'args'
layer = kl.Dense(units=prev_layer.shape[1])(prev_layer)
layer = kl.Dense(units=5)(layer)
layer = kl.Dense(units=prev_layer.shape[1])(layer)
return layer
return lambda prev_layer: layer_block(prev_layer, args)
Now two questions arise:
Is there an easier way to achieve this?
Is it effective this way or does it have negative impact on performance?
Thank you in advance!

What you're doing doesn't affect performance, you're creating layers perfectly fine.
There is no problem in any of your two approaches, but if you do want to make it work as an actual layer, transform it into a model.
This may not work in every keras version:
class LayerBlock(tensorflow.keras.Model): #not sure if it works in normal keras (without tf)
def __init__(self):
super(LayerBlock, self).__init__(outer_units)
self.layer1 = kl.Dense(units=outer_units)
self.layer2 = kl.Dense(units=5)
self.layer3 = kl.Dense(units=outer_units)
def call(self, inputs):
x = self.layer1(inputs)
x = self.layer2(x)
x = self.layer3(x)
return x
This tutorial seems to suggest that you can use tf.keras.Layer instead of tf.keras.Model, but that sounds strange to me. It may work with eager mode on, but it lacks the build method with a self.built=True statement.

Related

Is there way to get the current learning rate or current epoch/step from within a custom tensorflow layer?

I know that it is possible to get the current learning rate simply by doing self.optimizer.lr when you are in your custom model, but I need to do something similar when implementing my own layer.
For now I have solved the issue by creating a function in my custom layer that accepts it as a parameter and is called by my custom model, but I was wondering if there is another way since there are many layers in my architecture and this way is pretty awful to see. I leave some code to be clearer.
For my purpose, it would be enough even just to get the current epoch or current step from inside the layer.
I am working in tensorflow 2.8.2. Thank you.
My layer structure is as follows
class my_layer(tf.keras.layers.Layer):
#constructor, build, call methods etc..
def function_to_get_lr(self,lr):
#do sth with the lr
and in my custom model I do something like this
class my_model(tf.keras.Model):
#other functions, constructor etc..
def call(self, inputs, training=False):
if training:
for layer in self.layers:
if "my_layer" in layer.name:
layer.function_to_get_lr(self.optimizer.lr)
I'm pretty sure that there is no "nice way" to do this, but you can do something like this:
def CustomLayer(Layer):
def __init__(optimizer, ...):
self.optimizer = optimizer
def call(...):
lr = self.optimizer.lr
class M(Model):
def build(...):
this.custom_layer = CustomLayer(self.optimizer, ...)
...

Is it possible to add different behavior for training and testing in keras Functional API

I want to use different behavior in functional API for training and testing. Is it possible?
E.g.,
a = Input
b = CONV1(a)
if testing:
return b
c = CONV2(b)
Yes, this can be achieved by defining custom keras layers.
Example codes:
class diff_behavior_layer(tf.keras.layers.Layer):
def __init__(self, **kwargs):
self.dense_1=Dense(64)
super().__init__(**kwargs)
def call(self, inputs,training=None):
if training:
return self.dense_1(inputs)
else:
return inputs
inputs = tf.keras.Input(shape=(2,))
x=Dense(64)(inputs)
x=Dense(64)(x)
x=diff_behavior_layer()(x)
outputs=Dense(64)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model(data,training=True) # flow through 4 dense layer
model(data,training=False) # flow through 3 dense layer
Remark: 'training' must be used as the keyword argument here. You cannot define your own keyword argument like testing, etc..

Custom optimizer with multiple loss function evaluations

I want to implement a custom optimization algorithm for TF models.
I have read the following sources
tf documentation on custom optimizers
tf SGD implementation
keras documentation on custom models
towardsdatascience guide on custom optimizers
However lot of questions remain.
It seems like it is not possible to evaluate the loss function multiple times (for different weight settings) before applying a gradient step, when using the custom optimizer API. For example in a line-search type of algorithm this is necessary.
I tried to do all steps manually.
Assume I have setup my model and my optimization problem like this
from tensorflow.keras import layers
from tensorflow.keras import losses
from tensorflow.keras import models
model = models.Sequential()
model.add(layers.Dense(15, input_dim=10))
model.add(layers.Dense(20))
model.add(layers.Dense(1))
x_train, y_train = get_train_data()
loss = losses.MeanSquaredError()
def val_and_grads(weights):
model.set_weights(weights)
with tf.GradientTape() as tape:
val = loss(y_train, model(x_train))
grads = tape.gradient(val, model.trainable_variables)
return val, grads
initial_weights = model.get_weights()
optimal_weigths = my_fancy_optimization_algorithm(val_and_grads, initial_weights)
However my function val_and_grads needs a list of weights and returns a list of gradients from my_fancy_optimization_algorithms point of view that seems unnatural.
I could warp val_and_grads to "stack" the returned gradients and "split" the passed weights like this
def wrapped_val_and_grad(weights):
grads = val_and_grads(split_weights(weights))
return stack_grads(grads)
however that seems very inefficient.
Anyway, I do not like this approach since it seems that I would loose out out on a lot of the surrounding tensorflow infrastructure (printing of current loss function values and metrics during learning, tensorboard stuff, ...).
I could also pack the above in a custom model with a tailored train_step like this
def CustomModel(keras.Model):
def train_step(self, data):
x_train, y_train = data
def val_and_grads(weights):
self.set_weights(weights)
with tf.GradientTape() as tape:
val = loss(y_train, self(x_train))
grads = tape.gradient(val, self.trainable_variables)
return val, grads
trainable_vars = self.trainable_variables
old_weights = self.get_weights()
update = my_fancy_update_finding_algorithm(val_and_grads, self.get_weights()) # this can do multiple evaluations of the model
self.set_weights(old_weights) # restore the weights
self.optimizer.apply_gradients(zip(update, trainable_vars))
Here I would need a accompanying custom optimizer that does nothing else than updating the current weights by adding the update (new_weigths = current_weights + update).
I am still unsure if this is the best way to go.
If someone can comment on the snippets and ideas above, guide me to any other resource that I should consider or provide new approaches and other feedback I would be very glad.
Thanks all.
Franz
EDIT:
Sadly I did not get any response here so far. Maybe my question is not concrete enough. As a first smaller question:
Given the model and val_and_grads in the first listing. How would I efficiently calculate the norm of the WHOLE gradient? What I do so far is
import numpy as np
_, grads = val_and_grad(model.get_weights())
norm_grads = np.linalg.norm(np.concatenate([grad.numpy().flatten() for grad in grad]))
This surely cannot be the "right" way.

Manipulating nn.Dense() layer parameters manually in MxNet

I'm trying to implement my own optimization algorithm for MxNet (Imperative / Gluon) that does not use gradients. My question is pretty simple is there a simple way to create new nn.Dense(...) layer initialized with parameters (i.e. Biases and Weights) represented by two nd.array() instances?
Thank you in advance!
You can create a custom block with parameters that set differentiable=False, and provide the data for initialization through the init argument. See the scales parameter in the example below taken from this tutorial. You can also see an example of FullyConnected which you'll want to use for your dense layer too. F is used to denote a generic backend, typically this would be mx.ndarray, but after hybridization this is set to mx.symbol.
class NormalizationHybridLayer(gluon.HybridBlock):
def __init__(self, hidden_units, scales):
super(NormalizationHybridLayer, self).__init__()
with self.name_scope():
self.weights = self.params.get('weights',
shape=(hidden_units, 0),
allow_deferred_init=True)
self.scales = self.params.get('scales',
shape=scales.shape,
init=mx.init.Constant(scales.asnumpy().tolist()), # Convert to regular list to make this object serializable
differentiable=False)
def hybrid_forward(self, F, x, weights, scales):
normalized_data = F.broadcast_div(F.broadcast_sub(x, F.min(x)), (F.broadcast_sub(F.max(x), F.min(x))))
weighted_data = F.FullyConnected(normalized_data, weights, num_hidden=self.weights.shape[0], no_bias=True)
scaled_data = F.broadcast_mul(scales, weighted_data)
return scaled_data

How to use evaluation_loop with train_loop in tf-slim

I'm trying to implement a few different models and train them on CIFAR-10, and I want to use TF-slim to do this. It looks like TF-slim has two main loops that are useful during training: train_loop and evaluation_loop.
My question is: what is the canonical way to use these loops?
As a followup: is it possible to use early stopping with train_loop?
Currently I have a model and my training file train.py looks like this
import ...
train_log_dir = ...
with tf.device("/cpu:0"):
images, labels, dataset = set_up_input_pipeline_with_fancy_prefetching(
subset='train', ... )
logits, end_points = set_up_model( images ) // Possibly using many GPUs
total_loss = set_up_loss( logits, labels, dataset )
optimizer, global_step = set_up_optimizer( dataset )
train_tensor = slim.learning.create_train_op(
total_loss,
optimizer,
global_step=global_step,
clip_gradient_norm=FLAGS.clip_gradient_norm,
summarize_gradients=True)
slim.learning.train(train_tensor,
logdir=train_log_dir,
local_init_op=tf.initialize_local_variables(),
save_summaries_secs=FLAGS.save_summaries_secs,
save_interval_secs=FLAGS.save_interval_secs)
Which is awesome so far - my models all train and converge nicely. I can see this from the events in train_log_dir where all the metrics are going in the right direction. And going in the right direction makes me happy.
But I'd like to check that the metrics are improving on the validation set, too. I don't know of any way to do with TF-slim in a way that plays nicely with the training loop, so I created a second file called eval.py which contains my evaluation loop.
import ...
train_log_dir = ...
with tf.device("/cpu:0"):
images, labels, dataset = set_up_input_pipeline_with_fancy_prefetching(
subset='validation', ... )
logits, end_points = set_up_model( images )
summary_ops, names_to_values, names_to_updates = create_metrics_and_summary_ops(
logits,
labels,
dataset.num_classes() )
slim.get_or_create_global_step()
slim.evaluation.evaluation_loop(
'',
checkpoint_dir=train_log_dir,
logdir=train_log_dir,
num_evals=FLAGS.num_eval_batches,
eval_op=names_to_updates.values(),
summary_op=tf.merge_summary(summary_ops),
eval_interval_secs=FLAGS.eval_interval_secs,
session_config=config)
Questions:
1) I currently have this model for the evaluation_loop hogging up an entire GPU, but it's rarely being used. I assume there's a better way to allocate resources. It would be pretty nice if I could use the same evaluation_loop to monitor the progress of multiple different models (checkpoints in multiple directories). Is something like this possible?
2) There's no feedback between the evaluation and training. I'm training a ton of models and would love to use early stopping to halt the models which aren't learning or are not converging. Is there a way to do this? Ideally using information from the validation set, but if it has to be just based on the training data that's okay, too.
3) Is my workflow all wrong and I should be structuring it differently? It's not clear from the documentation how to use evaluation in conjunction with training.
Update
~~It seems that as of TF r0.11 I'm also getting a segfault when calling slim.evaluation.evaluation_loop. It only happens sometimes (for me when I dispatch my jobs to a cluster). It happens in sv.managed_session--specifically prepare_or_wait_for_session.~~
This was just due to evaluation loop (a second instance of tensorflow) trying to use the GPU, which was already requisitioned by the first instance.
evaluation_loop is meant to be used (as you are currently using it) with a single directory. If you want to be more efficient, you could use slim.evaluation.evaluate_once and add the appropriate logic for swapping directories as you find appropriate.
You can do this by overriding the slim.learning.train(..., train_step_fn) argument. This argument replaces the 'train_step' function with a custom function. Here, you can supply custom training function which returns the 'total_loss' and 'should_stop' values as you see fit.
Your workflow looks great, this is probably the most common workflow for learning/eval using TF-Slim.
Thanks to #kmalakoff, the TensorFlow issue gave a brilliant way to the problem that how to validate or test model in tf.slim training. The main idea is overriding train_step_fn function:
import …
from tensorflow.contrib.slim.python.slim.learning import train_step
...
accuracy_validation = ...
accuracy_test = ...
def train_step_fn(session, *args, **kwargs):
total_loss, should_stop = train_step(session, *args, **kwargs)
if train_step_fn.step % FLAGS.validation_every_n_step == 0:
accuracy = session.run(train_step_fn.accuracy_validation)
print('your validation info')
if train_step_fn.step % FLAGS.test_every_n_step == 0:
accuracy = session.run(train_step_fn.accuracy_test)
print('your test info')
train_step_fn.step += 1
return [total_loss, should_stop]
train_step_fn.step = 0
train_step_fn.accuracy_validation = accuracy_validation
train_step_fn.accuracy_test = accuracy_test
# run training.
slim.learning.train(
train_op,
FLAGS.logs_dir,
train_step_fn=train_step_fn,
graph=graph,
number_of_steps=FLAGS.max_steps)
Adding my 2-cent:
I currently have this model for the evaluation_loop hogging up an
entire GPU, but it's rarely being used
Usually an evaluation model takes less GPU memory. You could prevent TF from hogging the whole GPU memory by setting the session config allow_growth to True. This way you can use the same GPU for both training and evaluation
Example # Training
session_config = tf.ConfigProto()
session_config.gpu_options.allow_growth = True
slim.learning.train(train_tensor,
logdir=train_log_dir,
local_init_op=tf.initialize_local_variables(),
save_summaries_secs=FLAGS.save_summaries_secs,
save_interval_secs=FLAGS.save_interval_secs,
session_config=session_config)
Example # validation
session_config = tf.ConfigProto()
session_config.gpu_options.allow_growth = True
slim.evaluation.evaluation_loop(
'',
checkpoint_dir=train_log_dir,
logdir=train_log_dir,
num_evals=FLAGS.num_eval_batches,
eval_op=names_to_updates.values(),
summary_op=tf.merge_summary(summary_ops),
eval_interval_secs=FLAGS.eval_interval_secs,
session_config=session_config)