Embedding Custom Functions into NN - tensorflow

I'm currently asking myself how to build a model with a couple of extra functions.
I got an entity of custom functions, and I want to embed them as layers into my model (NN).
For that I'm using TF 2.0. but I'm currently struggling to do that.
All I find is answers about activation functions, but that's not what I'm looking for.
A custom function returns something like a+b or any other algorithm (matrix multiplication etc.)
What we can say is, I have one layer to another one, and want to embed my custom function in between those two layers like so:
I'm going to say that the activation function from one layer to another is the custom function. But what if my custom function takes two inputs? Or I have two functions I want to process my input in before I pass it to the next function?
Another way to solve that problem:
Let's say I got my custom functions cm*, and my layers l*;
what I do is build a model for each layer I want to put in between two custom functions
cm1 -> model(l1) -> cm2 -> model(l2,l3) -> cm3 -> cm4 -> model(l4) -> ....
but wouldn't it be stupid to build a model for each of those trajectories?
And what about the loss? The back propagation of residual connected layers is something else than having a lot of models and functions layered together.
Or am I wrong?

I'm not sure about TF 2.0, but in Keras you can build your own custom layers that can receive multiple inputs by overriding the Layer class. See https://keras.io/guides/making_new_layers_and_models_via_subclassing/ for more details. The link doesn't explain how to pass in multiple inputs to a layer, but all you have to do is to call the layer with a list of inputs and unpack them inside the call function, something like this:
class MyCustomLayer(tf.keras.Layer):
def __init__(self):
# your code here
pass
def call(self, inputs): # example call: MyCustomLayer()([1, 2])
x, y = inputs
# your code here
output = x + y # placeholder
return output

Related

Is there way to get the current learning rate or current epoch/step from within a custom tensorflow layer?

I know that it is possible to get the current learning rate simply by doing self.optimizer.lr when you are in your custom model, but I need to do something similar when implementing my own layer.
For now I have solved the issue by creating a function in my custom layer that accepts it as a parameter and is called by my custom model, but I was wondering if there is another way since there are many layers in my architecture and this way is pretty awful to see. I leave some code to be clearer.
For my purpose, it would be enough even just to get the current epoch or current step from inside the layer.
I am working in tensorflow 2.8.2. Thank you.
My layer structure is as follows
class my_layer(tf.keras.layers.Layer):
#constructor, build, call methods etc..
def function_to_get_lr(self,lr):
#do sth with the lr
and in my custom model I do something like this
class my_model(tf.keras.Model):
#other functions, constructor etc..
def call(self, inputs, training=False):
if training:
for layer in self.layers:
if "my_layer" in layer.name:
layer.function_to_get_lr(self.optimizer.lr)
I'm pretty sure that there is no "nice way" to do this, but you can do something like this:
def CustomLayer(Layer):
def __init__(optimizer, ...):
self.optimizer = optimizer
def call(...):
lr = self.optimizer.lr
class M(Model):
def build(...):
this.custom_layer = CustomLayer(self.optimizer, ...)
...

Is there a PyTorch equivalent of tf.custom_gradient()?

I am new to PyTorch but have a lot of experience with TensorFlow.
I would like to modify the gradient of just a tiny piece of the graph: just the derivative of activation function of a single layer. This can be easily done in Tensorflow using tf.custom_gradient, which allows you to supply customized gradient for any functions.
I would like to do the same thing in PyTorch and I know that you can modify the backward() method, but that requires you to rewrite the derivative for the whole network defined in the forward() method, when I would just like to modify the gradient of a tiny piece of the graph. Is there something like tf.custom_gradient() in PyTorch? Thanks!
You can do this in two ways:
1. Modifying the backward() function:
As you already said in your question, pytorch also allows you to provide a custom backward implementation. However, in contrast to what you wrote, you do not need to re-write the backward() of the entire model - only the backward() of the specific layer you want to change.
Here's a simple and nice tutorial that shows how this can be done.
For example, here is a custom clip activation that instead of killing the gradients outside the [0, 1] domain, simply passes the gradients as-is:
class MyClip(torch.autograd.Function):
#staticmethod
def forward(ctx, x):
return torch.clip(x, 0., 1.)
#staticmethod
def backward(ctx, grad):
return grad
Now you can use MyClip layer wherever you like in your model and you do not need to worry about the overall backward function.
2. Using a backward hook
pytorch allows you to attach hooks to different layer (=sub nn.Modules) of your network. You can register_full_backward_hook to your layer. That hook function can modify the gradients:
The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations.

LSTM with Keras to optimize a black box function

I'm trying to implement the recurrent neural network architecture proposed in this paper (https://arxiv.org/abs/1611.03824), where the authors use a LSTM to minimize a black-box function (which however is assumed to be differentiable). Here is a diagram of the proposed architecture: RNN. Briefly, the idea is to use an LSTM like an optimizer, which has to learn a good heuristic to propose new parameters for the unknown function y=f(parameters), so that it moves towards a minimum. Here's how the proposed procedure works:
Select an initial value for the parameters p0, and for the function y0 = f(p0)
Call to LSTM cell with input=[p0,y0], and whose output is a new value for the parameters output=p1
Evaluate y1 = f(p1)
Call the LSTM cell with input=[p1,y1], and obtain output=p2
Evaluate y2 = f(p2)
Repeat for few times, for example stopping at fifth iteration: y5 = f(p5).
I'm trying to implement a similar model in Tensorflow/Keras but I'm having some troubles. In particular, this case is different from "standard" ones because we don't have a predefinite time sequence to be analyzed, but instead it is generated online, after each iteration of the LSTM cell. Thus, in this case, our input would consist of just the starting guess [p0,y0=f(p0)] at time t=0. If I understood it correctly, this model is similar to the one-to-many LSTM, but with the difference that the input to the next time step does not come from just the previous cell, but also form the output an additional function (in our case f).
I managed to create a custom tf.keras.layers.Layer which performs the calculation for a single time step (that is it performs the LSTM cell and then use its output as input to the function f):
class my_layer(tf.keras.layers.Layer):
def __init__(self, units = 4):
super(my_layer, self).__init__()
self.cell = tf.keras.layers.LSTMCell(units)
def call(self, inputs):
prev_cost = inputs[0]
prev_params = inputs[1]
prev_h = inputs[2]
prev_c = inputs[3]
# Concatenate the previous parameters and previous cost to create new input
new_input = tf.keras.layers.concatenate([prev_cost, prev_params])
# New parameters obtained by the LSTM cell, along with new internsal states: h and c
new_params, [new_h, new_c] = self.cell(new_input, states = [prev_h, prev_c])
# Function evaluation
new_cost = f(new_params)
return [new_cost, new_params, new_h, new_c]
but I do not know how to build the recurrent part. I tried to do it manually, that is doing something like:
my_cell = my_layer(units = 4)
outputs = my_cell(inputs)
outputs1 = my_cell(outputs)
outputs2 = my_cell(outputs1)
Is that correct? Is there some other way to do it more appropriately?
Bonus question: I would like to train the LSTM to be able to optimize not only a single function f, but rather a class of different functions [f1, f2, ...] which share some common structure which make them similar enough to be optimized using the same LSTM. How could I implement such a training loop which takes as inputs a list of this functions [f1, f2, ...], and tries to minimize them all? My first thought was to do that "brute force" way: use a for loop over the function and a tf.GradientTape which evaluates and applies the gradients for each function.
Any help is much appreciated!
Thank you very much in advance! :)

Can you feed a custom Keras layer with another layer instead of a Tensor?

Let's say I have the custom layer Node which inherits from keras.layers.Layer and should represent a single node in a neural network.
As far as I know, in order to feed a layer in keras you need to pass a tensor into it, but my desired syntax is something along the lines of:
n1 = Node()
n2 = Node()
n2(n1) # Instead of n2(n1.output) where n1.output is a Tensor
Is it considered bad practice to do something like that?
The Keras Functional API is a way to create models that are more flexible than the tf.keras.Sequential API. The functional API can handle models with non-linear topology, shared layers, and even multiple inputs or outputs.
The functional API can be used to create complex graphs of layers.
Lets look at a very simple example:
x = layers.Dense(64)(inputs)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(10)(x)
model = keras.Model(inputs=inputs, outputs=outputs)
Here you have 3 layers: Dense(64) -> Dense(64) -> Dense(10), the code first created the 3 layer pipeline and then builds the Model by linking the inputs and outputs.
This is similar to your desired syntax
Refer to the Tensorflow Keras Functional API Guide

does TensorFlow automatically use sparse_softmax_cross_entropy_with_logits when possible?

Let's say that I have some code such as:
out = tf.nn.softmax(x) # shape (batch,time,n)
labels = .... # reference labels of type (batch,time)->int
And then I define my loss as the Cross Entropy:
loss = -tf.log(tf.gather_nd(out, labels))
Will TensorFlow automatically replace the loss in the computation graph by this?
loss = sparse_softmax_cross_entropy_with_logits(x, labels)
What type of optimizations can I expect that TensorFlow will apply?
Follow-up question: If TensorFlow doesn't do this optimization, how can I do it manually? Consider that I have a modular framework where I get some out tensor which could possibly be the output of a softmax operation, and I want to calculate Cross Entropy, and I want to use sparse_softmax_cross_entropy_with_logits if possible. How could I accomplish this? Can I do something like the following?
if out.op == "softmax": # how to check this?
x = out.op.sources[0] # how to get this?
loss = sparse_softmax_cross_entropy_with_logits(x, labels)
else:
loss = -tf.log(tf.gather_nd(out, labels))
TensorFlow generally doesn't merge nodes together in the way you're hoping. This is because other code (e.g. fetching outputs when running) may depend on intermediate nodes like the softmax, so removing them behind the user's back would be confusing.
If you do want to do this optimization yourself as part of a higher-level framework, you can analyze the current graphdef, but there's no annotation in TF to tell you what the outputs are, since that can vary at runtime depending on how session.run is called.