How to initialize mean and variance of Pytorch BatchNorm2d? - tensorflow

I’m transforming a TensorFlow model to Pytorch. And I’d like to initialize the mean and variance of BatchNorm2d using TensorFlow model.
I’m doing it in this way:
bn.running_mean = torch.nn.Parameter(torch.Tensor(TF_param))
And I get this error:
RuntimeError: the derivative for 'running_mean' is not implemented
But is works for bn.weight and bn.bias. Is there any way to initialize the mean and variance using my pre-trained Tensorflow model? Is there anything like moving_mean_initializer and moving_variance_initializer in Pytorch?
Thanks!

The running mean and variance of a batch norm layer are not nn.Parameters, but rather a buffer of the layer.
I think you can simply assign a torch.tensor, no need to wrap a nn.Parameter around it.

Related

Tensorflow initial_weights what mean?

Why we have to init weight in model predict? I can't understand.
You can refer : https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#checkpoint_the_initial_weights
initial_weights = os.path.join(tempfile.mkdtemp(), 'initial_weights')
model.save_weights(initial_weights)
This tutorial appears to be referring to unbalanced data. You do not need to provide initial weights if you don't want to in Tensorflow's predict command. See this link describing potential inputs to the command.
Deep learning using Gradient Descent and its variant to find optimal weights. If you don't init weights, it may take a long time to converge or even can't converge.

How does TensorFlow calculate the gradients of an FFT layer?

If I insert the function, e.g., tf.fft(input, name=None), into a neural network, how does TensorFlow calculate the gradients in backpropagation?
I didn't find any documentation about this.
I am using TensorFlow 1.0.
If you're just inserting the tf.fft(...) function in the middle of a model I'm not certain tensorflow will even be able to handle a forward pass. If you read the docs on tf.signal.fft (https://www.tensorflow.org/api_docs/python/tf/signal/fft) or even just read the tf.fft function header, they both require inputs with dtype=tf.complex64 or dtype=tf.complex128. Perhaps tensorflow will cast float32 inputs to complex and then back again, allowing you to complete a forward pass, I'm not sure, but from what I can gather from reading tensorflow gradient documents casting values causes a disconnect between error gradient and Model parameters, meaning a backward pass won't work. You could try implementing a custom fft function which doesn't cast values and see if that works? It's not so easy though.

TensorFlow Graph to Keras Model?

Is it possible to define a graph in native TensorFlow and then convert this graph to a Keras model?
My intention is simply combining (for me) the best of the two worlds.
I really like the Keras model API for prototyping and new experiments, i.e. using the awesome multi_gpu_model(model, gpus=4) for training with multiple GPUs, saving/loading weights or whole models with oneliners, all the convenience functions like .fit(), .predict(), and others.
However, I prefer to define my model in native TensorFlow. Context managers in TF are awesome and, in my opinion, it is much easier to implement stuff like GANs with them:
with tf.variable_scope("Generator"):
# define some layers
with tf.variable_scope("Discriminator"):
# define some layers
# model losses
G_train_op = ...AdamOptimizer(...)
.minimize(gloss,
var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="Generator")
D_train_op = ...AdamOptimizer(...)
.minimize(dloss,
var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="Discriminator")
Another bonus is structuring the graph this way. In TensorBoard debugging complicated native Keras models are hell since they are not structured at all. With heavy use of variable scopes in native TF you can "disentangle" the graph and look at a very structured version of a complicated model for debugging.
By utilizing this I can directly setup custom loss function and do not have to freeze anything in every training iteration since TF will only update the weights in the correct scope, which is (at least in my opinion) far easier than the Keras solution to loop over all the existing layers and set .trainable = False.
TL;DR:
Long story short: I like the direct access to everything in TF, but most of the time a simple Keras model is sufficient for training, inference, ... later on. The model API is much easier and more convenient in Keras.
Hence, I would prefer to set up a graph in native TF and convert it to Keras for training, evaluation, and so on. Is there any way to do this?
I don't think it is possible to create a generic automated converter for any TF graph, that will come up with a meaningful set of layers, with proper namings etc. Just because graphs are more flexible than a sequence of Keras layers.
However, you can wrap your model with the Lambda layer. Build your model inside a function, wrap it with Lambda and you have it in Keras:
def model_fn(x):
layer_1 = tf.layers.dense(x, 100)
layer_2 = tf.layers.dense(layer_1, 100)
out_layer = tf.layers.dense(layer_2, num_classes)
return out_layer
model.add(Lambda(model_fn))
That is what sometimes happens when you use multi_gpu_model: You come up with three layers: Input, model, and Output.
Keras Apologetics
However, integration between TensorFlow and Keras can be much more tighter and meaningful. See this tutorial for use cases.
For instance, variable scopes can be used pretty much like in TensorFlow:
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
with tf.name_scope('block1'):
y = LSTM(32, name='mylstm')(x)
The same for manual device placement:
with tf.device('/gpu:0'):
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
y = LSTM(32)(x) # all ops / variables in the LSTM layer will live on GPU:0
Custom losses are discussed here: Keras: clean implementation for multiple outputs and custom loss functions?
This is how my model defined in Keras looks in Tensorboard:
So, Keras is indeed only a simplified frontend to TensorFlow so you can mix them quite flexibly. I would recommend you to inspect source code of Keras model zoo for clever solutions and patterns that allows you to build complex models using clean API of Keras.
You can insert TensorFlow code directly into your Keras model or training pipeline! Since mid-2017, Keras has fully adopted and integrated into TensorFlow. This article goes into more detail.
This means that your TensorFlow model is already a Keras model and vice versa. You can develop in Keras and switch to TensorFlow whenever you need to. TensorFlow code will work with Keras APIs, including Keras APIs for training, inference and saving your model.

how to convert pytorch adaptive_avg_pool2d method to keras or tensorflow

I don't know how to convert the PyTorch method adaptive_avg_pool2d to Keras or TensorFlow. Anyone can help?
PyTorch mehod is
adaptive_avg_pool2d(14,[14])
I tried to use the average pooling, the reshape the tensor in Keras, but got the error:
ValueError: total size of new array must be unchanged
I'm not sure if I understood your question, but in PyTorch, you pass the spatial dimensions to AdaptiveAvgPool2d. For instance, if you want to have an output sized 5x7, you can use nn.AdaptiveAvgPool2d((5,7)).
If you want a global average pooling layer, you can use nn.AdaptiveAvgPool2d(1). In Keras you can just use GlobalAveragePooling2D.
For other output sizes in Keras, you need to use AveragePooling2D, but you can't specify the output shape directly. You need to calculate/define the pool_size, stride, and padding parameters depending on how you want the output shape. If you need help with the calculations, check this page of CS231n course.

convert resnet implementation from caffe to tensorflow

I want to implement resnet 50 from scratch
it is implemented in caffe by author of original paper,but i want tensorflow implementation
due to this repository :https://github.com/KaimingHe/deep-residual-networks
and therefor this image : http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006
I know every equivalent (in tensorflow),but i dont lknow the meaning of scale in place,after batch normalization,can you explain me the meaning and also "use globale state " parameter in batchnorm ?
An "in-place" layer in caffe simply hints caffe to save memory: instead of allocating memory for both input and output of the net, "in-place" layer overrides the input with the output of the layer.
Using global state in "BatchNorm" layer means using the mean/std computed during training and not updating these values any further. This is the "deployment" state of BN layer.