Writing own convolutional layer in Keras from scratch - numpy

I would like to create my own layer in Keras. To be more precision I would like to create simple convolution layer using only NumPy library(without TensorFlow part). I have some reasons for do that - first, for learning something new and second I have some idea how to modify that layer, so I have to write it from scratch. To make problem easier we can assume that I need only convolutional layer with 3x3 kernel size and default for others parameters.
I know I have to base on: https://keras.io/layers/writing-your-own-keras-layers/
In def build(self, input_shape): section I have to add weights. Convolutional layer needs filters times kernel matrix with 3x3 size.
In def call(self, x): section I can use that weights. But I have some problems with that.
Problems:
I need to get something like sliding through the input - typical convolutional layer task(moving 3x3 matrix through image). But I can't do that because x in def call(self, x): have ? or None in first value in shape. I know it is batch_size, but I can't use loop on that tensor because of that. So how can I get all data(numbers) from x to make some operations using them?
Maybe you have some general tips how can I make my own Convolutional Layer from scratch in Keras?
The problem for me is not to write Convolutional Layer in numpy(there is materials about that - for example: https://github.com/Eyyub/numpy-convnet ) but to marge it with Keras without using TensorFlow backend.

Related

Convert 2D Convolutionary Neural Networks to 1D Convolutionary Neural Networks in Tensorflow

Say I have some feature extracted and it is 10x10 data(maybe image or cepstrogram).
Usually I would feed this into my 2DConv and i ll be on my way.
My quesiton is if I had to convert this into 1D of 100 inputs what disadvantages would I get besides the obvious part where my filter would not be detecting the surrounding neighboors but only the previous and the next ones to detect pattern, which might lead to a worse performance.
And If I had to do this though, would I just reshape ,use reshape layer or use permute layer ?
Thanks
Yes, you are correct regarding the GNA, our Intel GNA hardware is natively support only 1D convolution and 2D convolutions is experimental.
This article (GNA Plugin - OpenVINO™ Toolkit) specifies the steps to add Permute layers before or after convolutions.
You could try both methods and see which one works for you.
Generally,the 1d convolution in TensorFlow is created with 2d convolution wrapping in reshape layers to add H dimension before 2d convolution and remove it after that.
At the same time MO inserts permutes before and after reshape layers since they change the interpretation of data.
For advantages & disadvantages of 2D/1D CNN you may refer to this detailed thread
In TensorFlow, these are the process to build CNN architecture:
Reshape input if necessary using tf.reshape() to match the convolutional layer you intend to build (for example, if using a 2D convolution, reshape it into three-dimensional format)
Create a convolutional layer using tf.nn.conv1d(), tf.nn.conv2d(), or tf.nn.conv3d, depending on the dimensionality of the input.
Create a poling layer using tf.nn.maxpool()
Repeat steps 2 and 3 for additional convolution and pooling layers
Reshape output of convolution and pooling layers, flattening it to prepare for the fully connected layer
Create a fully connected layer using tf.matmul() function, add an activation using, for example, tf.nn.relu() and apply a dropout using tf.nn.dropout()
Create a final layer for class prediction, again using tf.matmul()
Store weights and biases using TensorFlow variables These are just the basic steps to create the CNN model, there are additional steps to define training and evaluation, execute the model and tune it
In step 2 of CNN development you create convolutional layer of 2D using tf.nn.conv2d() - this function Computes a 2-D convolution given 4-D input and filters tensors.
So if you have 1D vector as found in examples of MNIST datadet with 784 features, you can convert 1D vector to 4D input required for conv2d() function using the tensorflow reshape method, Reshape method converts to match picture format [Height x Width x Channel], then Tensor input become 4-D: [Batch Size, Height, Width, Channel]:
x = tf.reshape(x, shape=[-1, 28, 28, 1])
where x is placeholder vector
x = tf.placeholder(tf.float32, [None, num_input])
You may refer to the official Tensorflow documentation

gaussian projection versus gaussian noise

I am facing difficulties with the following layer in keras:
gaussian_projection = 64
gaussian_scale = 20
initializer = tf.keras.initializers.TruncatedNormal(mean=0.0, stddev=gauss_scale)
proj_kernel = tf.keras.layers.Dense(gaussian_projection, use_bias=False, trainable=False,
kernel_initializer=initializer)
What does above layers intends to do? Is it a layer to add gaussian noise or something different?
I hope someone knows about it.
##################### Another 2nd version of the layer ##########
input_dim = 3
new_layer = tf.keras.layers.Dense(input_dim, use_bias=False, trainable=False,
kernel_initializer='identity')
tf.keras.layers.GaussianNoise(stddev=gaussian_scale)
Does both version of layers (1st and 2nd) intends to do the same thing, i.e., adding gaussian noise?
I think the above 2 are different as follows:
The first block of codes basically create a Dense layer, in which the gaussian_projection variable is the number of units and the initializer is a way to initialize the layer. This initialization is normally done to improve the convergence of the layer and network; but overall, the first block of codes is a typical Dense layer. I think there is no noise added in this first block of code.
On the other hand, the second block of codes create a GaussianNoise layer after the Dense layer, which is normally done to regularize the network and reduce overfitting. And based on the official documentation, this GaussianNoise layer is only active during training.

Custom loss function in Keras that penalizes output from intermediate layer

Imagine I have a convolutional neural network to classify MNIST digits, such as this Keras example. This is purely for experimentation so I don't have a clear reason or justification as to why I'm doing this, but let's say I would like to regularize or penalize the output of an intermediate layer. I realize that the visualization below does not correspond to the MNIST CNN example and instead just has several fully connected layers. However, to help visualize what I mean let's say I want to impose a penalty on the node values in layer 4 (either pre or post activation is fine with me).
In addition to having a categorical cross entropy loss term which is typical for multi-class classification, I would like to add another term to the loss function that minimizes the squared sum of the output at a given layer. This is somewhat similar in concept to l2 regularization, except that l2 regularization is penalizing the squared sum of all weights in the network. Instead, I am purely interested in the values of a given layer (e.g. layer 4) and not all the weights in the network.
I realize that this requires writing a custom loss function using keras backend to combine categorical crossentropy and the penalty term, but I am not sure how to use an intermediate layer for the penalty term in the loss function. I would greatly appreciate help on how to do this. Thanks!
Actually, what you are interested in is regularization and in Keras there are two different kinds of built-in regularization approach available for most of the layers (e.g. Dense, Conv1D, Conv2D, etc.):
Weight regularization, which penalizes the weights of a layer. Usually, you can use kernel_regularizer and bias_regularizer arguments when constructing a layer to enable it. For example:
l1_l2 = tf.keras.regularizers.l1_l2(l1=1.0, l2=0.01)
x = tf.keras.layers.Dense(..., kernel_regularizer=l1_l2, bias_regularizer=l1_l2)
Activity regularization, which penalizes the output (i.e. activation) of a layer. To enable this, you can use activity_regularizer argument when constructing a layer:
l1_l2 = tf.keras.regularizers.l1_l2(l1=1.0, l2=0.01)
x = tf.keras.layers.Dense(..., activity_regularizer=l1_l2)
Note that you can set activity regularization through activity_regularizer argument for all the layers, even custom layers.
In both cases, the penalties are summed into the model's loss function, and the result would be the final loss value which would be optimized by the optimizer during training.
Further, besides the built-in regularization methods (i.e. L1 and L2), you can define your own custom regularizer method (see Developing new regularizers). As always, the documentation provides additional information which might be helpful as well.
Just specify the hidden layer as an additional output. As tf.keras.Models can have multiple outputs, this is totally allowed. Then define your custom loss using both values.
Extending your example:
input = tf.keras.Input(...)
x1 = tf.keras.layers.Dense(10)(input)
x2 = tf.keras.layers.Dense(10)(x1)
x3 = tf.keras.layers.Dense(10)(x2)
model = tf.keras.Model(inputs=[input], outputs=[x3, x2])
for the custom loss function I think it's something like this:
def custom_loss(y_true, y_pred):
x2, x3 = y_pred
label = y_true # you might need to provide a dummy var for x2
return f1(x2) + f2(y_pred, x3) # whatever you want to do with f1, f2
Another way to add loss based on input or calculations at a given layer is to use the add_loss() API. If you are already creating a custom layer, the custom loss can be added directly to the layer. Or a custom layer can be created that simply takes the input, calculates and adds the loss, and then passes the unchanged input along to the next layer.
Here is the code taken directly from the documentation (in case the link is ever broken):
from tensorflow.keras.layers import Layer
class MyActivityRegularizer(Layer):
"""Layer that creates an activity sparsity regularization loss."""
def __init__(self, rate=1e-2):
super(MyActivityRegularizer, self).__init__()
self.rate = rate
def call(self, inputs):
# We use `add_loss` to create a regularization loss
# that depends on the inputs.
self.add_loss(self.rate * tf.reduce_sum(tf.square(inputs)))
return inputs

TensorFlow 2 GRU Layer with multiple hidden layers

I am attempting to port some TensorFlow 1 code to TensorFlow 2. The old code used the now deprecated MultiRNNCell to create a GRU layer with multiple hidden layers. In TensorFlow 2 I want to use the in-built GRU Layer, but there doesn't seem to be an option which allows for multiple hidden layers with that class. The PyTorch equivalent has such an option exposed as an initialization parameter, num_layers.
My workaround has been to use the TensorFlow RNN layer and pass a GRU cell for each hidden layer I want - this is the way recommended in the docs:
dim = 1024
num_layers = 4
cells = [tf.keras.layers.GRUCell(dim) for _ in range(num_layers)]
gru_layer = tf.keras.layers.RNN(
cells,
return_sequences=True,
stateful=True
)
But the in-built GRU layer has support for CuDNN, which the plain RNN seems to lack, to quote the docs:
Mathematically, RNN(LSTMCell(10)) produces the same result as
LSTM(10). In fact, the implementation of this layer in TF v1.x was
just creating the corresponding RNN cell and wrapping it in a RNN
layer. However using the built-in GRU and LSTM layers enables the use
of CuDNN and you may see better performance.
So how can I achieve this? How do I get a GRU layer that supports both multiple hidden layers and has support for CuDNN? Given that the inbuilt GRU layer in TensorFlow lacks such an option, is it in fact necessary? Or is the only way to get a deep GRU network is to stack multiple GRU layers in a sequence?
EDIT: It seems, according to this answer to a similar question, that there is indeed no in-built way to create a GRU Layer with multiple hidden layers, and that they have to be stacked manually.
OK, so it seems the only way to achieve this is to define a stack of GRU Layer instances. This is what I came up with (note that I only need stateful GRU layers that return sequences, and don't need the last layer's return state):
class RNN(tf.keras.layers.Layer):
def __init__(self, dim, num_layers=1):
super(RNN, self).__init__()
self.dim = dim
self.num_layers = num_layers
def layer():
return tf.keras.layers.GRU(
self.dim,
return_sequences=True,
return_state=True,
stateful=True)
self._layer_names = ['layer_' + str(i) for i in range(self.num_layers)]
for name in self._layer_names:
self.__setattr__(name, layer())
def call(self, inputs):
seqs = inputs
state = None
for name in self._layer_names:
rnn = self.__getattribute__(name)
(seqs, state) = rnn(seqs, initial_state=state)
return seqs
It's necessary to manually add the internal rnn layers to the parent layer using __setattr__. It seems adding the rnns to a list and setting that as a layer attribute won't allow the internal layers to be tracked by the parent layer (see this answer to this issue).
I hoped that this would speed up my network. Tests on Colab have showed no difference so far, if anything it's actually slightly slower than using a straight RNN initialized with a list of GRU cells. I thought that increasing the batch size from 10 to 64 might make a difference, but no, they still seem to be performing at around the same speed.
UPDATE: In fact there does seem to be a noticeable speed up, but only if I don't decorate my training step function with tf.function (I have a custom training loop, I don't use Model.fit). Not a huge increase in speed - maybe about 33% faster, with a batch size of 96. A much smaller batch size (between 10 to 20) gives an even bigger speed up, about 70%.

Keras: Custom loss function with training data not directly related to model

I am trying to convert my CNN written with tensorflow layers to use the keras api in tensorflow (I am using the keras api provided by TF 1.x), and am having issue writing a custom loss function, to train the model.
According to this guide, when defining a loss function it expects the arguments (y_true, y_pred)
https://www.tensorflow.org/guide/keras/train_and_evaluate#custom_losses
def basic_loss_function(y_true, y_pred):
return ...
However, in every example I have seen, y_true is somehow directly related to the model (in the simple case it is the output of the network). In my problem, this is not the case. How do implement this if my loss function depends on some training data that is unrelated to the tensors of the model?
To be concrete, here is my problem:
I am trying to learn an image embedding trained on pairs of images. My training data includes image pairs and annotations of matching points between the image pairs (image coordinates). The input feature is only the image pairs, and the network is trained in a siamese configuration.
I am able to implement this successfully with tensorflow layers and train it sucesfully with tensorflow estimators.
My current implementations builds a tf Dataset from a large database of tf Records, where the features is a dictionary containing the images and arrays of matching points. Before I could easily feed these arrays of image coordinates to the loss function, but here it is unclear how to do so.
There is a hack I often use that is to calculate the loss within the model, by means of Lambda layers. (When the loss is independent from the true data, for instance, and the model doesn't really have an output to be compared)
In a functional API model:
def loss_calc(x):
loss_input_1, loss_input_2 = x #arbirtray inputs, you choose
#according to what you gave to the Lambda layer
#here you use some external data that doesn't relate to the samples
externalData = K.constant(external_numpy_data)
#calculate the loss
return the loss
Using the outputs of the model itself (the tensor(s) that are used in your loss)
loss = Lambda(loss_calc)([model_output_1, model_output_2])
Create the model outputting the loss instead of the outputs:
model = Model(inputs, loss)
Create a dummy keras loss function for compilation:
def dummy_loss(y_true, y_pred):
return y_pred #where y_pred is the loss itself, the output of the model above
model.compile(loss = dummy_loss, ....)
Use any dummy array correctly sized regarding number of samples for training, it will be ignored:
model.fit(your_inputs, np.zeros((number_of_samples,)), ...)
Another way of doing it, is using a custom training loop.
This is much more work, though.
Although you're using TF1, you can still turn eager execution on at the very beginning of your code and do stuff like it's done in TF2. (tf.enable_eager_execution())
Follow the tutorial for custom training loops: https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough
Here, you calculate the gradients yourself, of any result regarding whatever you want. This means you don't need to follow Keras standards of training.
Finally, you can use the approach you suggested of model.add_loss.
In this case, you calculate the loss exaclty the same way I did in the first answer. And pass this loss tensor to add_loss.
You can probably compile a model with loss=None then (not sure), because you're going to use other losses, not the standard one.
In this case, your model's output will probably be None too, and you should fit with y=None.