Convert CudnnGRU params to normal weights and bias - tensorflow

I am using the CudnnGRU class from tensorflow.contrib.cudnn_rnn, the training speed is much faster. However after training I need to move the model to an system which is not CUDA based. So how can I convert the CudnnGRU params to normal weights and bias, then load them into tf.contrib.cudnn_rnn.CudnnCompatibleGRUCell?

In Tensorflow 2 version for both CuDNNGRU and normal Tensorflow based GRU has been brought to a same layer which is tf.keras.layers.GRU.
Based on the available runtime hardware and constraints the layer will choose either cuDNN or TensorFlow based implementations.
If a GPU is available and all the arguments to the layer meet the requirement of the CuDNN kernel (see below for details), the layer will use a fast cuDNN implementation.
The requirements to use the cuDNN implementation are:
activation == tanh
recurrent_activation == sigmoid
recurrent_dropout == 0
unroll is False
use_bias is True
reset_after is True
Inputs, if use masking, are strictly right-padded.
Eager execution is enabled in the outermost context.

Related

TensorFlow 2 GRU Layer with multiple hidden layers

I am attempting to port some TensorFlow 1 code to TensorFlow 2. The old code used the now deprecated MultiRNNCell to create a GRU layer with multiple hidden layers. In TensorFlow 2 I want to use the in-built GRU Layer, but there doesn't seem to be an option which allows for multiple hidden layers with that class. The PyTorch equivalent has such an option exposed as an initialization parameter, num_layers.
My workaround has been to use the TensorFlow RNN layer and pass a GRU cell for each hidden layer I want - this is the way recommended in the docs:
dim = 1024
num_layers = 4
cells = [tf.keras.layers.GRUCell(dim) for _ in range(num_layers)]
gru_layer = tf.keras.layers.RNN(
cells,
return_sequences=True,
stateful=True
)
But the in-built GRU layer has support for CuDNN, which the plain RNN seems to lack, to quote the docs:
Mathematically, RNN(LSTMCell(10)) produces the same result as
LSTM(10). In fact, the implementation of this layer in TF v1.x was
just creating the corresponding RNN cell and wrapping it in a RNN
layer. However using the built-in GRU and LSTM layers enables the use
of CuDNN and you may see better performance.
So how can I achieve this? How do I get a GRU layer that supports both multiple hidden layers and has support for CuDNN? Given that the inbuilt GRU layer in TensorFlow lacks such an option, is it in fact necessary? Or is the only way to get a deep GRU network is to stack multiple GRU layers in a sequence?
EDIT: It seems, according to this answer to a similar question, that there is indeed no in-built way to create a GRU Layer with multiple hidden layers, and that they have to be stacked manually.
OK, so it seems the only way to achieve this is to define a stack of GRU Layer instances. This is what I came up with (note that I only need stateful GRU layers that return sequences, and don't need the last layer's return state):
class RNN(tf.keras.layers.Layer):
def __init__(self, dim, num_layers=1):
super(RNN, self).__init__()
self.dim = dim
self.num_layers = num_layers
def layer():
return tf.keras.layers.GRU(
self.dim,
return_sequences=True,
return_state=True,
stateful=True)
self._layer_names = ['layer_' + str(i) for i in range(self.num_layers)]
for name in self._layer_names:
self.__setattr__(name, layer())
def call(self, inputs):
seqs = inputs
state = None
for name in self._layer_names:
rnn = self.__getattribute__(name)
(seqs, state) = rnn(seqs, initial_state=state)
return seqs
It's necessary to manually add the internal rnn layers to the parent layer using __setattr__. It seems adding the rnns to a list and setting that as a layer attribute won't allow the internal layers to be tracked by the parent layer (see this answer to this issue).
I hoped that this would speed up my network. Tests on Colab have showed no difference so far, if anything it's actually slightly slower than using a straight RNN initialized with a list of GRU cells. I thought that increasing the batch size from 10 to 64 might make a difference, but no, they still seem to be performing at around the same speed.
UPDATE: In fact there does seem to be a noticeable speed up, but only if I don't decorate my training step function with tf.function (I have a custom training loop, I don't use Model.fit). Not a huge increase in speed - maybe about 33% faster, with a batch size of 96. A much smaller batch size (between 10 to 20) gives an even bigger speed up, about 70%.

What does `training=True` mean when calling a TensorFlow Keras model?

In TensorFlow's offcial documentations, they always pass training=True when calling a Keras model in a training loop, for example, logits = mnist_model(images, training=True).
I tried help(tf.keras.Model.call) and it shows that
Help on function call in module tensorflow.python.keras.engine.network:
call(self, inputs, training=None, mask=None)
Calls the model on new inputs.
In this case `call` just reapplies
all ops in the graph to the new inputs
(e.g. build a new computational graph from the provided inputs).
Arguments:
inputs: A tensor or list of tensors.
training: Boolean or boolean scalar tensor, indicating whether to run
the `Network` in training mode or inference mode.
mask: A mask or list of masks. A mask can be
either a tensor or None (no mask).
Returns:
A tensor if there is a single output, or
a list of tensors if there are more than one outputs.
It says that training is a Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode. But I didn't find any information about this two modes.
In a nutshell, I don't know what is the influence of this argument. And what if I missed this argument when training?
Some neural network layers behave differently during training and inference, for example Dropout and BatchNormalization layers. For example
During training, dropout will randomly drop out units and correspondingly scale up activations of the remaining units.
During inference, it does nothing (since you usually don't want the randomness of dropping out units here).
The training argument lets the layer know which of the two "paths" it should take. If you set this incorrectly, your network might not behave as expected.
Training indicating whether the layer should behave in training mode or in inference mode.
training=True: The layer will normalize its inputs using the mean and variance of the current batch of inputs.
training=False: The layer will normalize its inputs using the mean and variance of its moving statistics, learned during training.
Usually in inference mode training=False, but in some networks such as pix2pix_cGAN‍‍‍‍‍‍ At both times of inference and training, training=True.

TensorFlow Graph to Keras Model?

Is it possible to define a graph in native TensorFlow and then convert this graph to a Keras model?
My intention is simply combining (for me) the best of the two worlds.
I really like the Keras model API for prototyping and new experiments, i.e. using the awesome multi_gpu_model(model, gpus=4) for training with multiple GPUs, saving/loading weights or whole models with oneliners, all the convenience functions like .fit(), .predict(), and others.
However, I prefer to define my model in native TensorFlow. Context managers in TF are awesome and, in my opinion, it is much easier to implement stuff like GANs with them:
with tf.variable_scope("Generator"):
# define some layers
with tf.variable_scope("Discriminator"):
# define some layers
# model losses
G_train_op = ...AdamOptimizer(...)
.minimize(gloss,
var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="Generator")
D_train_op = ...AdamOptimizer(...)
.minimize(dloss,
var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="Discriminator")
Another bonus is structuring the graph this way. In TensorBoard debugging complicated native Keras models are hell since they are not structured at all. With heavy use of variable scopes in native TF you can "disentangle" the graph and look at a very structured version of a complicated model for debugging.
By utilizing this I can directly setup custom loss function and do not have to freeze anything in every training iteration since TF will only update the weights in the correct scope, which is (at least in my opinion) far easier than the Keras solution to loop over all the existing layers and set .trainable = False.
TL;DR:
Long story short: I like the direct access to everything in TF, but most of the time a simple Keras model is sufficient for training, inference, ... later on. The model API is much easier and more convenient in Keras.
Hence, I would prefer to set up a graph in native TF and convert it to Keras for training, evaluation, and so on. Is there any way to do this?
I don't think it is possible to create a generic automated converter for any TF graph, that will come up with a meaningful set of layers, with proper namings etc. Just because graphs are more flexible than a sequence of Keras layers.
However, you can wrap your model with the Lambda layer. Build your model inside a function, wrap it with Lambda and you have it in Keras:
def model_fn(x):
layer_1 = tf.layers.dense(x, 100)
layer_2 = tf.layers.dense(layer_1, 100)
out_layer = tf.layers.dense(layer_2, num_classes)
return out_layer
model.add(Lambda(model_fn))
That is what sometimes happens when you use multi_gpu_model: You come up with three layers: Input, model, and Output.
Keras Apologetics
However, integration between TensorFlow and Keras can be much more tighter and meaningful. See this tutorial for use cases.
For instance, variable scopes can be used pretty much like in TensorFlow:
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
with tf.name_scope('block1'):
y = LSTM(32, name='mylstm')(x)
The same for manual device placement:
with tf.device('/gpu:0'):
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
y = LSTM(32)(x) # all ops / variables in the LSTM layer will live on GPU:0
Custom losses are discussed here: Keras: clean implementation for multiple outputs and custom loss functions?
This is how my model defined in Keras looks in Tensorboard:
So, Keras is indeed only a simplified frontend to TensorFlow so you can mix them quite flexibly. I would recommend you to inspect source code of Keras model zoo for clever solutions and patterns that allows you to build complex models using clean API of Keras.
You can insert TensorFlow code directly into your Keras model or training pipeline! Since mid-2017, Keras has fully adopted and integrated into TensorFlow. This article goes into more detail.
This means that your TensorFlow model is already a Keras model and vice versa. You can develop in Keras and switch to TensorFlow whenever you need to. TensorFlow code will work with Keras APIs, including Keras APIs for training, inference and saving your model.

Assign Torch and Tensorflow models two separate GPUs

I am comparing two pre-trained models, one is in Tensorflow and one is in Pytorch, on a machine that has multiple GPUs. Each model fits on one GPU. They are both loaded in the same Python script. How can I assign one GPU to the Tensorflow model and another GPU to the Pytorch model?
Setting CUDA_VISIBLE_DEVICES=0,1 only tells both models that these GPUs are available - how can I (within Python I guess), make sure that Tensorflow takes GPU 0 and Pytorch takes GPU 1?
You can refer to torch.device. https://pytorch.org/docs/stable/tensor_attributes.html?highlight=device#torch.torch.device
In particular do
device=torch.device("gpu:0")
tensor = tensor.to(device)
or to load a pretrained model
device=torch.device("gpu:0")
model = model.to(device)
to put tensor/model on gpu 0.
Similarly tensorflow has tf.device. https://www.tensorflow.org/api_docs/python/tf/device. Its usage is described here https://www.tensorflow.org/guide/using_gpu
for tensorflow to load model on gpu:0 do,
with tf.device("gpu:0"):
load_model_function(model_path)

Accessing neural network weights and neuron activations

After training a network using Keras:
I want to access the final trained weights of the network in some order.
I want to know the neuron activation values for every input passed. For example, after training, if I pass X as my input to the network, I want to know the neuron activation values for that X for every neuron in the network.
Does Keras provide API access to these things? I want to do further analysis based on the neuron activation values.
Update : I know I can do this using Theano purely, but Theano requires more low-level coding. And, since Keras is built on top of Theano, I think there could be a way to do this?
If Keras can't do this, then among Tensorflow and Caffe , which can? Keras is the easiest to use, followed by Tensorflow/Caffe, but I don't know which of these provide the network access I need. The last option for me would be to drop down to Theano, but I think it'd be more time-consuming to build a deep CNN with Theano..
This is covered in the Keras FAQ, you basically want to compute the activations for each layer, so you can do it with this code:
from keras import backend as K
#The layer number
n = 3
# with a Sequential model
get_nth_layer_output = K.function([model.layers[0].input],
[model.layers[n].output])
layer_output = get_nth_layer_output([X])[0]
Unfortunately you would need to compile and run a function for each layer, but this should be straightforward.
To get the weights, you can call get_weights() on any layer.
nth_weights = model.layers[n].get_weights()