Custom op backward - mxnet

I am writing a custom op, and I got stucked when writing the backward part.
When I call out_grad[0].asnumpy() or do any manipulation of the out_grad, the program crash without any error message.
I tried fill the in_grad with zeros, the program run smoothly, but I need the grad to flow backward.
def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
self.assign(in_grad[0], req[0], 0)
self.assign(in_grad[1], req[1], 0)
What's going wrong here?

Custom Operator in MXNet show us how to define a loss function using custom op. The loss op is very special because it doesn't need grad to be flow into.
But in my situation, I need grad to flow into my op. So, the function below should return the dependency instead of empty as in loss op.
def declare_backward_dependency(self, out_grad, in_data, out_data):
return [out_grad[0]]
In my opinion, the dependency is some variable which the gradient should be delievered to.

Have you tried to follow the tutorial here for developing a
Custom Operator in MXNet.
If that does not help, provide your full code of the Custom operator along with some sample data and a simple model with which this issue can be easily reproduced.

Related

Calculation operations with the parameters of a TFLite quantized model

I am trying to implement image classification in hardware using the quantized Mobilenetv2 model taken from here. To do that, I first need to reproduce the inference process from the beginning to the end to make sure I understand the calculations/operations that are performed on the data.
The first target is the Conv fuction. I can see how it is being calculated, but there are several arguments that are passed to this function which I would like to know how they are produced: output_offset, output_multiplier,output_shift, output_activation_min, output_activation_max. I cannot find the previous function that calls the Conv() function with these parameters. This would hopefully give me an insight of how these arguments are generated. Could someone point me to the right line of the source code?
Another gap in the sourcecode is at the interpreter.invoke() function. I wish to track and see what happens next, but can not find the soursecode that implements the invoke() function. The help would be greatly appreciated!
If you want to know how the conv reference code is used you can read the code for the conv operator.
The python interpreter uses swig to call the C++ intepreter.
Hope this helps.

How to implement the tensor product of two layers in Keras/Tf

I'm trying to set up a DNN for classification and at one point I want to take the tensor product of a vector with itself. I'm using the Keras functional API at the moment but it isn't immediately clear that there is a layer that does this already.
I've been attempting to use a Lambda layer and numpy in order to try this, but it's not working.
Doing a bit of googling reveals
tf.linalg.LinearOperatorKronecker, which does not seem to work either.
Here's what I've tried:
I have a layer called part_layer whose output is a single vector (rank one tensor).
keras.layers.Lambda(lambda x_array: np.outer(x_array, x_array),) ( part_layer) )
Ideally I would want this to to take a vector of the form [1,2] and give me [[1,2],[2,4]].
But the error I'm getting suggests that the np.outer function is not recognizing its arguments:
AttributeError: 'numpy.ndarray' object has no attribute '_keras_history
Any ideas on what to try next, or if there is a simple function to use?
You can use two operations:
If you want to consider the batch size you can use the Dot function
Otherwise, you can use the the dot function
In both case the code should look like this:
dot_lambda = lambda x_array: tf.keras.layers.dot(x_array, x_array)
# dot_lambda = lambda x_array: tf.keras.layers.Dot(x_array, x_array)
keras.layers.Lambda(dot_lamda)( part_layer)
Hope this help.
Use tf.tensordot(x_array, x_array, axes=0) to achieve what you want. For example, the expression print(tf.tensordot([1,2], [1,2], axes=0)) gives the desired result: [[1,2],[2,4]].
Keras/Tensorflow needs to keep an history of operations applied to tensors to perform the optimization. Numpy has no notion of history, so using it in the middle of a layer is not allowed. tf.tensordot performs the same operation, but keeps the history.

RETURNN Custom Layer Search Mode Assertion Error

I've implemented a custom RETURNN layer (HMM Factorization), which works as intended during training, but throws an assertion error when used in search mode. The output of the layer is identical to that of a softmax layer.
Here's the config that was used : transformer + HMM Factorization
This was tested using the latest version of RETURNN.
The exact line that fails is (code link):
assert fixed_seq_len is not None
Here's the full error log (too large to paste here)
Here's the training initialisation
Does anybody have any ideas what the error could be?
Thanks!
This is actually a bug in RETURNN. I created a pull request here which should fix that, and merged that in now.
The problem was not with your custom layer, but rather with a layer inside your RecLayer, which was actually totally independent, i.e. this one:
'encoder_int': {'activation': None,
'class': 'linear',
'from': ['base:encoder'],
'n_out': 1000,
'with_bias': False}
It just depends on one base layer ("base:encoder"), nothing else. So it (correctly) optimize this layer out of the recurrent loop, because it is independent.
However, then it sees that you are accessing this layer inside the loop, and as this is a loop over time, it assumes that this loop is over this time-dimension of "base:encoder". Then it tries to unroll the "base:encoder" (TensorArray.unroll) given the seq len of the rec layer, but then it fails because at this time it does not know the seq len of the rec layer.
My fix now does some more advanced check whether this assumption is correct, i.e. that the loop is really over the same time dimension. The check is a bit fragile though, and not sure if that works correctly in all cases. However, I created a test case which reproduces your problem and this is fixed now.

Pytorch register_hook to Keras implementation

Im trying to implement the following project into Tensorflow/Keras.
https://github.com/jacobgil/pytorch-pruning
Im having a hard time understanding what register_hook does? It can be found in finetune.py, row 66.
x.register_hook(self.compute_rank)
I've searched for clear explanations regarding this function and tried to find Keras-equivalents, without any luck. Do you have answers to these questions?
First things first, here's the documentation:
http://pytorch.org/docs/master/autograd.html#torch.autograd.Variable.register_hook
This allows you to register a method to a Variable that is called whenever the Variable's .grad is updated, i.e. in a backward pass, and takes the grad as input. The method can return a Variable that would replace the original .grad or None if you just want to read the gradients to do something else.
If you update the gradients this way, the nodes further down in the compute graph see the new updated gradient in the backward pass and will have their respective gradients calculated with the updated value.
I'm not a Tensorflow expert, but the RegisterGradient decorators (documentation) seem to be able to do the same, for an example see this answer.

Gradient for Each Example Using map_fn

I want to get the gradient of a layer with respect to a parameter matrix for each example. Normally, I would need a Jacobian, but following this idea, I decided to use map_fn so I could feed forward data in a batch rather than one by one. This gives me a problem I do not understand, unfortunately. With the code
get_grads = tf.map_fn(lambda x: tf.gradients(x, W['1'])[0], softmax_probs)
sess.run(get_grads, feed_dict={x: images[0:100]})
I get this error
InvalidArgumentError: TensorArray map_21/TensorArray_36#map_21/while/gradients: Could not write to TensorArray index 0 because it has already been read.
W['1'] is a variable in the graph. Ideas?
It seems like your issue may be connected with the bug
https://github.com/tensorflow/tensorflow/issues/7643
One commenter posts a possible fix at the end. You could try that out.
Alternatively, if you what you want is the jacobian, then you can check out this solution:
https://github.com/tensorflow/tensorflow/issues/675#issuecomment-362853672
although it appears that it will not work when nested.
I don't think this will work because x in this case is a loop variable which TensorFlow does not know how to connect to softmax_probs.