backpropagation issues with a custom layer (TF/Keras) - tensorflow

I've been working on a prototype and I am having issues with backpropagation.I am currently using the latest keras and tensorflow build ( as tensorflow as a backend, I have looked into cntk, mxnet, and chainer; so far only chainer would allow me to do it but the training time is quite slow..)
My current layer is similar to a convolutional layer with more operations than a simple multiplication.
I know that tensorflow should use automatic differentiation if all the operations support it to calculate the gradient and perform gradient descent.
Currently my layer uses the following operator : reduce_sum, sum, subtraction, multiplication and division.
I also relies on the following methods: extract_image_patches, reshape, transpose.
I doubt any of these would cause an issue with automatic gradient descent. I built 2 layers as tests, one inherits from the base layer in keras while the other inherit directly from _Conv. In both cases whenever I use that layer anywhere in a model no weights are updated during the training process.
How could I solve this problem and fix backpropagation?
Edit:
(Here is the layer implementation https://github.com/roya0045/cvar2/blob/master/tfvar.py,
for the testing iteself see https://github.com/roya0045/cvar2/blob/master/test2.py )

Related

Using multiple losses and multiple training steps in a TF2 model using subclassing?

I am implementing a generative adversarial autoencoder in TF2. I have got it working but not optimally and could use some high level advise to improve it.
The model is inspired by the paper “Adversarial Factorization Autoencoder for Look-alike Modeling” https://dmkd.cs.vt.edu/papers/CIKM19.pdf
The model consists of three parts: an encoder/generator, a decoder and a discriminator.
I have implemented these three parts as custom classes, each subclassing from tf.keras.Model (which is primary source of my issues). I have three different loss functions (autoencoder loss, generator loss, discriminator loss) and two custom training step functions.
The first training function trains first the autoencoder (using the autoencoder loss function) and then the generator (using the generator loss function). The generator is just the encoder part of the autoencoder but having the double purpose of also fooling the discriminator).
The second training function trains the discriminator using the discriminator loss function.
This approach works ok but subclassing all three parts from tf.keras.Model has limitations. I can’t utilize keras compile and fit functionalities. Callbacks are a nightmare and I really do need early stopping, keep best model, tensorboard integration and so on. It appears the best approach is to subclass each part from tf.keras.layers.Layer and then combine them in a single custom model. But I am not sure if it is at all possible to wire up multiple loss functions and multiple training steps to different layer blocks of a custom model?
Any hints and insights are greatly appreciated.

Trouble with implementing local response normalization in TensorFlow

I'm trying to implement a local response normalization layer in Tensorflow to be used in a Keras model:
Here is an image of the operation I am trying to implement:
Here is the Paper link, please refer to section 3.3 to see the description of this layer
I have a working NumPy implementation, however, this implementation uses for loops and inbuilt python min and max operators to compute the summation. However, these pythonic operations will cause errors when defining a custom keras layer, so I can't use this implementation.
The issue here lies in the fact that I need to iterate over all the elements in the feature map and generate a normalized value for each of them. Additionally, the upper and lower bound on the summation change depending on which value I am currently normalizing. I can't really think of a way to handle this without nested for loops, but this will not work in a Keras custom layer as it isn't a native TensorFlow function.
Could anyone point me towards tensorflow/keras backend functions that could help me in implementing this layer?
EDIT: I know that this layer is implemented as a keras layer, but I want to build intuition about custom layers, so I want to implement this layer using tensor ops.

Strategies for pre-training models for use in tfjs

This is a more general version of a question I've already asked: Significant difference between outputs of deep tensorflow keras model in Python and tensorflowjs conversion
As far as I can tell, the layers of a tfjs model when run in the browser (so far only tested in Chrome and Firefox) will have small numerical differences in the output values when compared to the same model run in Python or Node. The cumulative effect of these small differences across all the layers of the model can cause fairly significant differences in the output. See here for an example of this.
This means a model trained in Python or Node will not perform as well in terms of accuracy when run in the browser. And the deeper your model, the worse it will get.
Therefore my question is, what is the best way to train a model to use with tfjs in the browser? Is there a way to ensure the output will be identical? Or do you just have to accept that there will be small numerical differences and, if so, are there any methods that can be used to train a model to be more resilient to this?
This answer is based on my personal observations. As such, it is debatable and not backed by much evidence. Some things that I follow to get accuracy of 16-bit models close to 32 bit models are:
Avoid using activations that have small upper and lower bounds, such as sigmoid or tanh, for hidden layers. These activations cause the weights of the next layer to become very sensitive to small values, and hence, small changes. I prefer using ReLU for such models. Since it is now the standard activation for hidden layers in most models, you should be using it in any case.
Avoid weight decay and L1/L2 regularizations on weights while training (the kernel_regularizer parameter in keras), since these increase sensitivity of weights. Use Dropout instead, I didn't observe a major drop in performance on TFLite when using it instead of numerical regularizers.

Why is L2 regularization not added back into original loss function?

I'm aware that when using a kernal regularizer, particularly, l2 loss, I should bee add it back into the loss function and this is what is being done in other posts. However, in Keras, they are not following this process. Why is this so?
For instance, consider this and this notebook. They are using l2 loss as a kernal regularizer in some layers but not adding back into the original loss. Is this because of the particular loss, or is this a behavior followed in just Keras or am I completely misunderstanding everything?
Keras hides a lot of complexity (and this is not always a good thing).
You're using the Model abstraction: this model contains inside all the required information about the architecture and the training procedure.
When you invoke the method compile or train or train_on_batch you specify the loss function but under the hood what happens is:
Instantiate the loss function specified (e.g. categorical cross entropy)
Fetch from the model the regularizations applied and add all of them to the loss term previously instantiated
You can see the operations that are going to be added to the loss term accessing to the property .losses of the model instance (that's a list of tensorflow operations, usually all multilication operations, since the regularizations are in the for regularization_strenght * norm_p(variable).
The L2 regularization (or any weight regularization) in Keras is still added to the loss function in the same way as you would expect. It just happens behind the scene, so the user doesn't need to worry about it.
The notebooks you linked are the right way to use weight regularization in Keras.

Behaviour of Alpha Dropout in Training and Inference time

I am in the process of implementing a self normalizing neural network using the tensorflow. There are currently tensorflow "primitives" in the form of tf.nn.selu and tf.contrib.nn.alpha_dropout that should make this an easy process.
My problem is with tf.contrib.nn.alpha_dropout. I was expecting it to have a boolean switch for when you are in training and when you are in inference as does the usual dropout function used with other activation functions.
In the original implementation by the authors, we indeed see that they have this boolean switch (training) in the selu dropout function (dropout_selu).
Is there something I am missing?
tf.contrib.nn.alpha_dropout should be seen as an analogue to tf.nn.dropout. The latter function also does not have an argument for a training switch. It is not to be confused with tf.layers.dropout, which wraps tf.nn.dropout and has a training argument. As we can see in the implementation, the layers version returns either the result of nn.dropout or the identity depending on the training switch. It should be relatively easy to define your own wrapper around alpha_dropout in a similar manner.
To avoid any confusion: layers.dropout eventually calls the "keras layers" version of dropout which is the implementation linked above.