Calculate gradient of trained network - numpy

assume I got a trained model, I am trying to calculate it's Jacobin (trying to understand some it's mathematical properties after training). I am trying to use autograd as follow:
from autograd import jacobian
jacobian_pred=jacobian(model.predict)
jacobian_pred(x)
where x is from my training set. It raises an error:
TypeError: object of type 'numpy.float32' has no len()
What should I do?
Thanks!

Related

Pytorch Autograd: what does runtime error "grad can be implicitly created only for scalar outputs" mean

I am trying to understand Pytorch autograd in depth; I would like to observe the gradient of a simple tensor after going through a sigmoid function as below:
import torch
from torch import autograd
D = torch.arange(-8, 8, 0.1, requires_grad=True)
with autograd.set_grad_enabled(True):
S = D.sigmoid()
S.backward()
My goal is to get D.grad() but even before calling it I get the runtime error:
RuntimeError: grad can be implicitly created only for scalar outputs
I see another post with similar question but the answer over there is not applied to my question. Thanks
The error means you can only run .backward (with no arguments) on a unitary/scalar tensor. I.e. a tensor with a single element.
For example, you could do
T = torch.sum(S)
T.backward()
since T would be a scalar output.
I posted some more information on using pytorch to compute derivatives of tensors in this answer.

How to initialize mean and variance of Pytorch BatchNorm2d?

I’m transforming a TensorFlow model to Pytorch. And I’d like to initialize the mean and variance of BatchNorm2d using TensorFlow model.
I’m doing it in this way:
bn.running_mean = torch.nn.Parameter(torch.Tensor(TF_param))
And I get this error:
RuntimeError: the derivative for 'running_mean' is not implemented
But is works for bn.weight and bn.bias. Is there any way to initialize the mean and variance using my pre-trained Tensorflow model? Is there anything like moving_mean_initializer and moving_variance_initializer in Pytorch?
Thanks!
The running mean and variance of a batch norm layer are not nn.Parameters, but rather a buffer of the layer.
I think you can simply assign a torch.tensor, no need to wrap a nn.Parameter around it.

How does Keras compute its loss function for matrix-valued outputs?

I try to compute the next several video frames given a collection of previous frames, i.e. I have a deep neural network that directly outputs a small video clip of dimension (samples, frames, m, n, channels). I train my neural network using Keras' mean squared error loss function.
Keras' implementation of the mean squared error loss function is
K.mean(K.square(y_pred - y_true), axis=-1)
The computed loss value will in my case still be a rank 4 tensor (which I checked is indeed true).
As the loss function should be scalar I had imagined this will cause a problem but surprisingly there is no warning issued from Keras and I do get some meaningful results.
Any clue as to how Keras is doing its back-propagation in this case? Is there an internal conversion to a scalar loss function that Keras is doing that I am not aware of?
Thank you!

Quantization aware (re)training a keras model

I have a (trained) tf.keras model which I would like to convert to a quantized model and retrain with tensorflow's fake quant strategy (using python as frontend).
Could I somehow apply tf.contrib.quantize.create_training_graph directly to the keras model (graph) then retrain?
Seems like there's some problem with the fact that the session is already created when taking the graph from K.get_session().graph.
For example, the following approach:
import tensorflow.contrib.lite as tflite
keras_graph = tf.keras.backend.get_session().graph
from tensorflow.contrib.quantize import create_training_graph
create_training_graph(input_graph=keras_graph,
quant_delay=int(0*(len(X_train) / batch_size)))
...
model.compile(...)
model.fit_generator(...)
results with the message:
"Operation '{name:'act_softmax/sub' id:2294 op device:{} def:{{{node act_softmax/sub}} = Sub[T=DT_FLOAT](conv_preds/act_quant/FakeQuantWithMinMaxVars:0, act_softmax/Max)}}' was changed by updating input tensor after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session."
And true enough, the error:
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value conv_preds/act_quant/conv_preds/act_quant/max/biased
(i.e. create_training_graph needs the graph before the session was created? is it possible to get the graph from a keras model before the session was instantiated?)
Alternatively, if this doesn't work, could I convert the (h5) model to a checkpoint, then somehow load the model from this checkpoint to a tensorflow graph and continue working with pure tensorflow?
Would appreciate any help or pointers.
Thank you!

how to convert pytorch adaptive_avg_pool2d method to keras or tensorflow

I don't know how to convert the PyTorch method adaptive_avg_pool2d to Keras or TensorFlow. Anyone can help?
PyTorch mehod is
adaptive_avg_pool2d(14,[14])
I tried to use the average pooling, the reshape the tensor in Keras, but got the error:
ValueError: total size of new array must be unchanged
I'm not sure if I understood your question, but in PyTorch, you pass the spatial dimensions to AdaptiveAvgPool2d. For instance, if you want to have an output sized 5x7, you can use nn.AdaptiveAvgPool2d((5,7)).
If you want a global average pooling layer, you can use nn.AdaptiveAvgPool2d(1). In Keras you can just use GlobalAveragePooling2D.
For other output sizes in Keras, you need to use AveragePooling2D, but you can't specify the output shape directly. You need to calculate/define the pool_size, stride, and padding parameters depending on how you want the output shape. If you need help with the calculations, check this page of CS231n course.