Using dilated convolution in Keras - tensorflow

In WaveNet, dilated convolution is used to increase receptive field of the layers above.
From the illustration, you can see that layers of dilated convolution with kernel size 2 and dilation rate of powers of 2 create a tree like structure of receptive fields. I tried to (very simply) replicate the above in Keras.
import tensorflow.keras as keras
nn = input_layer = keras.layers.Input(shape=(200, 2))
nn = keras.layers.Conv1D(5, 5, padding='causal', dilation_rate=2)(nn)
nn = keras.layers.Conv1D(5, 5, padding='causal', dilation_rate=4)(nn)
nn = keras.layers.Dense(1)(nn)
model = keras.Model(input_layer, nn)
opt = keras.optimizers.Adam(lr=0.001)
model.compile(loss='mse', optimizer=opt)
model.summary()
And the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 200, 2)] 0
_________________________________________________________________
conv1d_5 (Conv1D) (None, 200, 5) 55
_________________________________________________________________
conv1d_6 (Conv1D) (None, 200, 5) 130
_________________________________________________________________
dense_2 (Dense) (None, 200, 1) 6
=================================================================
Total params: 191
Trainable params: 191
Non-trainable params: 0
_________________________________________________________________
I was expecting axis=1 to shrink after each conv1d layer, similar to the gif. Why is this not the case?

The model summary is as expected. As you note using dilated convolutions results in an increase in the receptive field. However, dilated convolution actually preserves the output shape of our input image/activation as we are just changing the convolutional kernel. A regular kernel could be the following
0 1 0
1 1 1
0 1 0
A kernel with a dilation rate of 2 would add zeros in between each entry in our original kernel as below.
0 0 1 0 0
0 0 0 0 0
1 0 1 0 1
0 0 0 0 0
0 0 1 0 0
In fact you can see that our original kernel is also a dilated kernel with a dilation rate of 1. Alternative ways to increase the receptive field result in a downsizing of the input image. Max pooling and strided convolution are 2 alternative methods.
For example. if you want to increase the receptive field by decreasing the size of your output shape you could use strided convolution as below. I replace the dilated convolution with a strided convolution. You will see that the output shape reduces every layer.
import tensorflow.keras as keras
nn = input_layer = keras.layers.Input(shape=(200, 2))
nn = keras.layers.Conv1D(5, 5, padding='causal', strides=2)(nn)
nn = keras.layers.Conv1D(5, 5, padding='causal', strides=4)(nn)
nn = keras.layers.Dense(1)(nn)
model = keras.Model(input_layer, nn)
opt = keras.optimizers.Adam(lr=0.001)
model.compile(loss='mse', optimizer=opt)
model.summary()
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 200, 2)] 0
_________________________________________________________________
conv1d_3 (Conv1D) (None, 100, 5) 55
_________________________________________________________________
conv1d_4 (Conv1D) (None, 25, 5) 130
_________________________________________________________________
dense_1 (Dense) (None, 25, 1) 6
=================================================================
Total params: 191
Trainable params: 191
Non-trainable params: 0
_________________________________________________________________
To summarize dilated convolution is just another way to increase the receptive field of your model. It has the benefit of preserving the output shape of your input image.

Here's an example of this dialtion with 1D Convolutional layers, output has size 14:
https://github.com/jwallbridge/translob/blob/master/python/LobFeatures.py
def lob_dilated(x):
"""
TransLOB dilated 1-D convolution module
"""
x = layers.Conv1D(14,kernel_size=2,strides=1,activation='relu',padding='causal')(x)
x = layers.Conv1D(14,kernel_size=2,dilation_rate=2,activation='relu',padding='causal')(x)
x = layers.Conv1D(14,kernel_size=2,dilation_rate=4,activation='relu',padding='causal')(x)
x = layers.Conv1D(14,kernel_size=2,dilation_rate=8,activation='relu',padding='causal')(x)
y = layers.Conv1D(14,kernel_size=2,dilation_rate=16,activation='relu',padding='causal')(x)
return y

Related

tell Keras to merge and train layers that do not descend from input?

Consider
import tensorflow as tf
units=11
entrada=tf.keras.Input(name="entrada", shape=(units,))
unidad= tf.Variable([[1.0]]) # + 0.0* entrada[:,:1]
denseSoftmax=tf.keras.layers.Dense(units,name="denseSoftmax",activation="softmax")
softMaxOutput=denseSoftmax(unidad)
finalproduct=tf.keras.layers.Multiply()([entrada,softMaxOutput])
modelo=tf.keras.Model(entrada,finalproduct)
modelo.summary()
This example produces a model without trainable parameters, because the denseSoftMax layer does not act in the input. If I fake it by uncommenting + 0.0 * entrada[:,:1] then it produces the expected graph
Layer (type) Output Shape Param # Connected to
==================================================================================================
entrada (InputLayer) [(None, 11)] 0 []
tf.__operators__.getitem (Slic (None, 1) 0 ['entrada[0][0]']
ingOpLambda)
tf.math.multiply (TFOpLambda) (None, 1) 0 ['tf.__operators__.getitem[0][0]'
tf.__operators__.add (TFOpLamb (None, 1) 0 ['tf.math.multiply[0][0]']
denseSoftmax (Dense) (None, 11) 22 ['tf.__operators__.add[0][0]']
multiply (Multiply) (None, 11) 0 ['entrada[0][0]',
'denseSoftmax[0][0]']
But faking a zero valued link to an input seems as bad as adding a constant branch in the set of input layers.
Is there a way to announce to keras that it should follow the subgraph for a series of layers that are going to be merged with the resulting output, but do not depend on the input?
Is the following your desired?
class CustomModel(tf.keras.Model):
def __init__(self,units) -> None:
super().__init__()
self.entrada = tf.keras.layers.InputLayer(input_shape=(units,))
self.unidad= tf.Variable([[1.0]])
self.denseSoftmax = tf.keras.layers.Dense(units,name="denseSoftmax",activation="softmax")
self.finalproduct = tf.keras.layers.Multiply()
def call(self,inputs):
x = self.entrada(inputs)
softMaxOutput = self.denseSoftmax(self.unidad)
y = self.finalproduct([x,softMaxOutput])
return y
units = 11
modelo = CustomModel(units=units)
modelo.build(input_shape=(None,units))
modelo.summary()
Model: "custom_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 11)] 0
denseSoftmax (Dense) multiple 22
multiply (Multiply) multiple 0
=================================================================
Total params: 23
Trainable params: 23
Non-trainable params: 0
_________________________________________________________________

Why does Tensor Flow add a dimension to my input & output?

Here is my code:
from tensorflow.keras import layers
import tensorflow as tf
from tensorflow import keras
TFDataType = tf.float16
XTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
YTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
model = tf.keras.models.Sequential()
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
print(model.summary())
I am feeding it a 2 dimensional matrix. But when I see the model summary, I see:
Model: "sequential"
_________________________________________________________________
2021-08-23 13:32:18.716788: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-TLG9US3
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10, 1) 11
_________________________________________________________________
dense_1 (Dense) (None, 10, 2) 4
=================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________
Why is the model asking for a 3 Dimensional (None, 10, 1) array?
How do I pass an array that meets the dimensionality of (None, 10, 1)?
I cannot call numpy.ones(None, 10, 1). I cannot reshape the array with -1 in the first dimension.
In your first layer the code input_shape=(10, 10) adds the extra dimension to account for the batch size of the data. Note you only need this code for the FIRST layer in your model so remove input_shape=(10, 10) in your second layer.

Using dilated convolution in Keras

In WaveNet, dilated convolution is used to increase receptive field of the layers above.
From the illustration, you can see that layers of dilated convolution with kernel size 2 and dilation rate of powers of 2 create a tree like structure of receptive fields. I tried to (very simply) replicate the above in Keras.
import tensorflow.keras as keras
nn = input_layer = keras.layers.Input(shape=(200, 2))
nn = keras.layers.Conv1D(5, 5, padding='causal', dilation_rate=2)(nn)
nn = keras.layers.Conv1D(5, 5, padding='causal', dilation_rate=4)(nn)
nn = keras.layers.Dense(1)(nn)
model = keras.Model(input_layer, nn)
opt = keras.optimizers.Adam(lr=0.001)
model.compile(loss='mse', optimizer=opt)
model.summary()
And the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 200, 2)] 0
_________________________________________________________________
conv1d_5 (Conv1D) (None, 200, 5) 55
_________________________________________________________________
conv1d_6 (Conv1D) (None, 200, 5) 130
_________________________________________________________________
dense_2 (Dense) (None, 200, 1) 6
=================================================================
Total params: 191
Trainable params: 191
Non-trainable params: 0
_________________________________________________________________
I was expecting axis=1 to shrink after each conv1d layer, similar to the gif. Why is this not the case?

Time distributed layer keras

Iam trying to understand the time distributed layer in keras/tensorflow.
As far as I have understood it is a kind of wrapper, making it possible to in example process a sequence of images.
Now Iam wondering how would design a time distributed network without using the time distributed layer.
In example if I would have a sequence of 3 images, each having 1 channel and a pixel dimension of 256x256px, that should first be processed by a CNN and then by LSTM cells.
My input to the time distributed layer would then be (N,3,256,256,1), where N is the batch size.
The CNN would then have 3 outputs, which are fed to the LSTM cell.
Now, without using the time distributed layers, would it be possible to accomplish the same by setting up a network with 3 different inputs and 3 similar CNNs? The outputs of the 3 CNNs could then be flattened and concatenated.
Is that any different from the time distributed approach?
Thanks in advance,
M
I created a prototype for you. I used the least number of layers and arbitrary units/kernels/filters, change them as you like. It creates a cnn model first that takes inputs of size (256,256,1). It uses the same cnn model 3 times (for your three images in the sequence) to extract features. It stacks all the features using Lambda layer to put it back in a sequence. The sequence then goes through LSTM layer. I have chosen for the LSTM to return a single feature vector per example, but if you want the output to be a sequence as well, you could change it to say return_sequences=True. You could also add final additional layers to adapt it to your needs.
from tensorflow.keras.layers import Input, LSTM, Conv2D, Flatten, Lambda
from tensorflow.keras import Model
import tensorflow.keras.backend as K
def create_cnn_model():
inp = Input(shape=(256,256,1))
x = Conv2D(filters=16, kernel_size=5, strides=2)(inp)
x = Flatten()(x)
model = Model(inputs=inp, outputs=x, name='cnn_Model')
return model
def combined_model():
cnn_model = create_cnn_model()
inp_1 = Input(shape=(256,256,1))
inp_2 = Input(shape=(256,256,1))
inp_3 = Input(shape=(256,256,1))
out_1 = cnn_model(inp_1)
out_2 = cnn_model(inp_2)
out_3 = cnn_model(inp_3)
lstm_inp = [out_1, out_2, out_3]
lstm_inp = Lambda(lambda x: K.stack(x, axis=-2))(lstm_inp)
x = LSTM(units=32, return_sequences=False)(lstm_inp)
model = Model(inputs=[inp_1, inp_2, inp_3], outputs=x)
return model
Now create the model as such:
model = combined_model()
Check the summary:
model.summary()
which will print:
Model: "model_14"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_53 (InputLayer) [(None, 256, 256, 1) 0
__________________________________________________________________________________________________
input_54 (InputLayer) [(None, 256, 256, 1) 0
__________________________________________________________________________________________________
input_55 (InputLayer) [(None, 256, 256, 1) 0
__________________________________________________________________________________________________
cnn_Model (Model) (None, 254016) 416 input_53[0][0]
input_54[0][0]
input_55[0][0]
__________________________________________________________________________________________________
lambda_3 (Lambda) (None, 3, 254016) 0 cnn_Model[1][0]
cnn_Model[2][0]
cnn_Model[3][0]
__________________________________________________________________________________________________
lstm_13 (LSTM) (None, 32) 32518272 lambda_3[0][0]
==================================================================================================
Total params: 32,518,688
Trainable params: 32,518,688
Non-trainable params: 0
The inner cnn model summary could be printed:
model.get_layer('cnn_Model').summary()
which currently prints:
Model: "cnn_Model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_52 (InputLayer) [(None, 256, 256, 1)] 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 126, 126, 16) 416
_________________________________________________________________
flatten_6 (Flatten) (None, 254016) 0
=================================================================
Total params: 416
Trainable params: 416
Non-trainable params: 0
_________________________
Your model expects a list as input. The list should have a length of 3 (since there are 3 images in a sequence). Each element of the list should be a numpy array of shape (batch_size, 256, 256, 1). I have worked a dummy example below with a batch size of 1:
import numpy as np
a = np.zeros((256,256,1)) # first image filled with zeros
b = np.zeros((256,256,1)) # second image filled with zeros
c = np.zeros((256,256,1)) # third image filled with zeros
a = np.expand_dims(a, 0) # adding batch dimension to make it (1, 256, 256, 1)
b = np.expand_dims(b, 0) # same here
c = np.expand_dims(c, 0) # same here
model.compile(loss='mse', optimizer='adam')
# train your model with model.fit(....)
e = model.predict([a,b,c]) # a,b and c have shape of (1, 256, 256, 1) where the first 1 is the batch size

How to feed only half of the RNN output to next RNN output layer in tensorflow?

I want to feed only RNN output at odd positions to the next RNN layer. How to achieve that in tensorflow?
I basically want to build the top layer in the following diagram, which halves the sequence size. The bottom layer is just a simple RNN.
Is this what you need?
from tensorflow.keras import layers, models
import tensorflow.keras.backend as K
inp = layers.Input(shape=(10, 5))
out = layers.LSTM(50, return_sequences=True)(inp)
out = layers.Lambda(lambda x: tf.stack(tf.unstack(out, axis=1)[::2], axis=1))(out)
out = layers.LSTM(50)(out)
out = layers.Dense(20)(out)
m = models.Model(inputs=inp, outputs=out)
m.summary()
You get the following model. You can see the second LSTM only gets 5 timesteps from the total 10 steps (i.e. every other output of the previous layer)
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 10, 5)] 0
_________________________________________________________________
lstm_2 (LSTM) (None, 10, 50) 11200
_________________________________________________________________
lambda_1 (Lambda) (None, 5, 50) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 50) 20200
_________________________________________________________________
dense_1 (Dense) (None, 20) 1020
=================================================================
Total params: 32,420
Trainable params: 32,420
Non-trainable params: 0