Do you need to define derivative function for custom activation function in tensorflow 2 keras? - tensorflow

I see in some places that you need to define the derivative function for your custom activation. Is this true? or is all you need to do just pass a tensorflow-compatible function to the wrapper and tensorflow.keras takes care of the rest?
Ie.
def my_actv(x):
return x * x
model.add(Activation(my_actv))

The derivative needs to be defined if your function is not differentiable at every point. For example, relu is not differentiable at zero.

Related

Using Guided Backpropagation with alternative activation functions

The explanations on the application of the guided backpropagation method always use the ReLu function for implementation. This is the most common activation function, but there are several others, such as Tanh or GeLu. According to my understanding, guided backpropagation should also be applicable if these functions are used in the NN. However, I wonder what the corresponding custom gradients would look like for a Tensorflow implementation.
According to this website, the custom Gradent for the Relu variant is as follows
#tf.custom_gradient
def guidedRelu(x):
def grad(dy):
return tf.cast(dy>0,tf.float32) * tf.cast(x>0,tf.float32) * dy
return tf.nn.relu(x), grad

Is there a PyTorch equivalent of tf.custom_gradient()?

I am new to PyTorch but have a lot of experience with TensorFlow.
I would like to modify the gradient of just a tiny piece of the graph: just the derivative of activation function of a single layer. This can be easily done in Tensorflow using tf.custom_gradient, which allows you to supply customized gradient for any functions.
I would like to do the same thing in PyTorch and I know that you can modify the backward() method, but that requires you to rewrite the derivative for the whole network defined in the forward() method, when I would just like to modify the gradient of a tiny piece of the graph. Is there something like tf.custom_gradient() in PyTorch? Thanks!
You can do this in two ways:
1. Modifying the backward() function:
As you already said in your question, pytorch also allows you to provide a custom backward implementation. However, in contrast to what you wrote, you do not need to re-write the backward() of the entire model - only the backward() of the specific layer you want to change.
Here's a simple and nice tutorial that shows how this can be done.
For example, here is a custom clip activation that instead of killing the gradients outside the [0, 1] domain, simply passes the gradients as-is:
class MyClip(torch.autograd.Function):
#staticmethod
def forward(ctx, x):
return torch.clip(x, 0., 1.)
#staticmethod
def backward(ctx, grad):
return grad
Now you can use MyClip layer wherever you like in your model and you do not need to worry about the overall backward function.
2. Using a backward hook
pytorch allows you to attach hooks to different layer (=sub nn.Modules) of your network. You can register_full_backward_hook to your layer. That hook function can modify the gradients:
The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations.

Keras' equivalent to Tensorflow's matmul function

I'm new to Keras and I can't seem to find an equivalent to Pytorchs bmm function or Tensorflows matmul function.
What would be the closest equivalent to this in Keras?
keras.backend.dot
From the documentation:
Multiplies 2 tensors (and/or variables) and returns a tensor.
Tecnically.. I think the direct equivalent to tf.matmul is K.batch_dot. this way you do not use the batch dimension.
Backend functions simply point back to their tensorflow/theano sources and cannot be used as is. To use them you need to wrap them into a Lambda layer:
from keras.layers import Lambda
from keras import backend as K
# this is simply defining the function
matmul = Lambda ( lambda x: K.dot(x[0], x[1]) )
# this is applying the function
tensor_product = matmul([tensor1, tensor2])
Not wrapping backend functions into Lambda layers will result in TypeError. Alternatively you can use the Dot layer which computes the dot product across an axis of your choice:
from keras.layers import Dot
tensor_product = Dot(axes=-1)([tensor1, tensor2])

How to create a tensor from a float value?

I'm using keras to build NN.
I need to make custom evaluation function (which I did but not completely).
Here is part of the code:
from keras import backend as K
...
def customized_loss(y_true, y_pred):
loss = K.square(y_true - y_pred)
return K.sum(loss)
...
model.compile(loss=customized_loss, optimizer='adam')
This part works perfectly but that's not really what I need.
I'd like to be able to return e.g. 0.123
Currently, what is being returned is K.sum(loss) which is also just a value (wrapped in some tensor).
Whatever you write it down 0.123 or [0.123] or [[0.123]] or ...
So the question is how to transform float to tensor which will be accepted as a response in this custom loss function?
E.g. something like return K.new_tensor(0.123)
(also fyi I'm using tensorflow as backend)
EDIT:
I just realized that my idea makes no sense conceptually. I wanted to apply loss (amount) without providing any info on what the loss function is (i.e. what's its derivative) which is crucial for back propagation so it can know how to update weights.

Custom external loss metric for Gradient Optimizer?

I have an external function which takes y and y_prediction (in matrix format), and computes a metric which depicts how good or bad the prediction actually is.
Unfortunately the metric is no simple y - ypred or confusion matrix, but still very useful and important. How can I use this number computed for the loss or as an argument for optimizer.minimize?
If i understood correctly i think there is two way to do this:
Either the loss you want to compute can be writen as tensorflow ops which gradient is defined (for exemple SVD has no gradient defined in tensorflow library saddly) then the optimisation is direct.
Or you can always write your loss function with numpy operators and use tf.py_func() https://www.tensorflow.org/api_docs/python/tf/py_func and then you have to explicit the gradient by hand as said in here : How to make a custom activation function with only Python in Tensorflow?
But you have to know an explicit formula of your gradient ...