Pytorch register_hook to Keras implementation - tensorflow

Im trying to implement the following project into Tensorflow/Keras.
https://github.com/jacobgil/pytorch-pruning
Im having a hard time understanding what register_hook does? It can be found in finetune.py, row 66.
x.register_hook(self.compute_rank)
I've searched for clear explanations regarding this function and tried to find Keras-equivalents, without any luck. Do you have answers to these questions?

First things first, here's the documentation:
http://pytorch.org/docs/master/autograd.html#torch.autograd.Variable.register_hook
This allows you to register a method to a Variable that is called whenever the Variable's .grad is updated, i.e. in a backward pass, and takes the grad as input. The method can return a Variable that would replace the original .grad or None if you just want to read the gradients to do something else.
If you update the gradients this way, the nodes further down in the compute graph see the new updated gradient in the backward pass and will have their respective gradients calculated with the updated value.
I'm not a Tensorflow expert, but the RegisterGradient decorators (documentation) seem to be able to do the same, for an example see this answer.

Related

How to control reduction strategy for stateful metric in keras mirrored strategy

I use keras fit() method with custom metrics passed to model.
The metrics are stateful - i.e. are a subclass of a Metric, as described in https://keras.io/api/metrics/#as-subclasses-of-metric-stateful
When I run the code in a multi-gpu environment using a tf.distribute.MirroredStrategy() my metric code is called on every GPU separately with batch_size/no_of_gpus examples passed, which is reasonable to expect.
What happens next is that multiple scalars (one from every GPU) of the metric value need to be reduced to a single scalar, and what I get all the time is a sum reduction, while I would like to control that.
Keep in mind, that reduction parameter is the one of Loss in keras, and there is no such thing in the Metric class: https://github.com/tensorflow/tensorflow/blob/acbc065f8eb2ed05c7ab5c42b5c5bd6abdd2f91f/tensorflow/python/keras/metrics.py#L87
(the only crazy thing I tried was to inherit from a Mean class that is a subclass of a Metric but that didn't change anything)
reduction is mentioned in the metrics code, however this is a reduction over multiple accumulated values in a single metric object, and in multi-gpu setting - this is not the case, as every metric works in its own GPU and is somehow aggregated at the end.
The way I debugged it to understand this behaviour was - I was printing the shapes and the results inside update_state method of the metric. And then I looked at value of the metric in logs object in on_batch_end callback.
I tried looking at TF code, but couldn't find the place this is happening.
I would like to be able to control this behaviour - so either pick 'mean' or 'sum' for the metric, or at least know where it is being done in the code.
Edited: I guess this https://github.com/tensorflow/tensorflow/issues/39268 sheds some more light on this issue
I am facing the same problem as you (and that's why I found your question).
Seeing that it's been 15 days since you asked the question and there are no answers/comments yet, I thought I might share my temporary workaround.
Like you, I also think that a SUM reduction has been performed when combining progress over multiple GPUs. What I did is to pass the number of GPUs (e.g. given by the num_replicas_in_sync attribute of your tf.distribute strategy object) into the __init__(...) constructor of your sub-classed metric object, and use it to divide the return value in the results() method.
Potentially, you could also use tf.distribute.get_strategy() from within the metric object to make it "strategy aware", and use the information to decide how to modify the values in an ad hoc manner so that the SUM reduction will produce what you want.
I hope this helps for now, whether as a suggestion or as a confirmation that you're not alone on this.
When implementing the subclass of the Keras Metric class, you have to override the merge_state() function correctly. If you do not override this function, the default implementation will be used - which is a simple sum.
See: https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Metric

Calculation operations with the parameters of a TFLite quantized model

I am trying to implement image classification in hardware using the quantized Mobilenetv2 model taken from here. To do that, I first need to reproduce the inference process from the beginning to the end to make sure I understand the calculations/operations that are performed on the data.
The first target is the Conv fuction. I can see how it is being calculated, but there are several arguments that are passed to this function which I would like to know how they are produced: output_offset, output_multiplier,output_shift, output_activation_min, output_activation_max. I cannot find the previous function that calls the Conv() function with these parameters. This would hopefully give me an insight of how these arguments are generated. Could someone point me to the right line of the source code?
Another gap in the sourcecode is at the interpreter.invoke() function. I wish to track and see what happens next, but can not find the soursecode that implements the invoke() function. The help would be greatly appreciated!
If you want to know how the conv reference code is used you can read the code for the conv operator.
The python interpreter uses swig to call the C++ intepreter.
Hope this helps.

Does `tf.data.Dataset.take()` return random sample?

Different calls of tf.data.Dataset.take() return different batches from the given dataset. Are those samples chosen randomly or is there another mechanism at play?
This is all the more confusing that the documentation makes no reference as to the randomness of the sampling.
Most probably, you might be using data.shuffle() before tf.data.Dataset.take().
Commenting that out should make the iterator behave as intended: take the same results over and over for each iterator run.
-- Or if you used an api that automatically shuffles without asking like image_dataset_from_directory
shuffle: Whether to shuffle the data. Default: True.
If set to False, sorts the data in alphanumeric order.
You would have to explicitly set shuffle=False when creating the dataset
I am a newbie to this domain. But from what I have seen in my notebook is that the take() does pick random samples. For instance, in the image shown here, I had just called image_dataset_from_directory() before calling take(), so no shuffling preceded the take op, still I see different samples on every run. Pls correct me if I am wrong, will help my understanding as well.

How is tf.summary.tensor_summary meant to be used?

TensorFlow provides a tf.summary.tensor_summary() function that appears to be a multidimensional variant of tf.summary.scalar():
tf.summary.tensor_summary(name, tensor, summary_description=None, collections=None)
I thought it could be useful for summarizing inferred probabilities per class ... somewhat like
op_summary = tf.summary.tensor_summary('classes', some_tensor)
# ...
summary = sess.run(op_summary)
writer.add_summary(summary)
However it appears that TensorBoard doesn't provide a way to display these summaries at all. How are they meant to be used?
I cannot get it to work either. It seems like that feature is still under development. See this video from the TensorFlow Dev Summit that states that the tensor_summary is still under development (starting at 9:17): https://youtu.be/eBbEDRsCmv4?t=9m17s. It will probably be better defined and examples should be provided in the future.

How to delete existed tensorflow variable?

I am using windows GPU tensorflow 1.0. I want to delete useless or wrong tf variable.
For example I run code and generate a bad model, then I copy code and just modify existed variable W1(shape=[3,3],name="W1") as W1(shape=[5,5],name="W1") and run.
However tensorflow generate W1(shape=[5,5],name="W1_1") rather than replace old W1. So in the end Tensorflow will save both wrong trained W1(name='w1') and trained w1(name='w1_1'). When I restore W1, tensorflow give me wrong trained W1(name='w1').
Could you tell me how to delete old variable W1 and add new W1?
(the newline edit function is useless)
First of all, I don't know exactly what you are trying to achieve by changing the shape of your variable. If you would have asked the question in such a way we knew your intent, maybe we could help you better.
Now there are multiple options to solve your problem:
- ignore the name of your variable (what bugs does it cause??)
- remove all variables: https://www.tensorflow.org/versions/master/api_docs/python/framework/utility_functions#reset_default_graph
- Change the shape of the variable instead of defining a new one. Check the answer of mrry here: How can I change the shape of a variable in TensorFlow?