I would like to apply in Keras MobileNetV2 on images of size 39 x 39 to classify 3 classes. My images represent heat maps (e.g. what keys have been pressed on the keyboard). I think MobileNet was designed to work on images of size 224 x 224. I will not use transfer learning but train the model from scratch.
To make MobileNet work on my images, I would like to replace the first three stride 2 convolutions with stride 1. I have the following code:
from tensorflow.keras.applications import MobileNetV2
base_model = MobileNetV2(weights=None, include_top=False,
input_shape=[39,39,3])
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)
output_tensor = Dense(3, activation='softmax')(x)
cnn_model = Model(inputs=base_model.input, outputs=output_tensor)
opt = Adam(lr=learning_rate)
cnn_model.compile(loss='categorical_crossentropy',
optimizer=opt, metrics=['accuracy', tf.keras.metrics.AUC()])
How can I replace the first three stride 2 convolutions with stride 1 without building MobileNet myself?
Here is one workaround for your need but I think probably it's possible to have a more general approach. However, in the MobileNetV2, there is only one conv layer with strides 2. If you follow the source code, here
x = layers.Conv2D(
first_block_filters,
kernel_size=3,
strides=(2, 2),
padding='same',
use_bias=False,
name='Conv1')(img_input)
x = layers.BatchNormalization(
axis=channel_axis, epsilon=1e-3, momentum=0.999, name='bn_Conv1')(
x)
x = layers.ReLU(6., name='Conv1_relu')(x)
And the rest of the blocks are defined as follows
x = _inverted_res_block(
x, filters=16, alpha=alpha, stride=1, expansion=1, block_id=0)
x = _inverted_res_block(
x, filters=24, alpha=alpha, stride=2, expansion=6, block_id=1)
x = _inverted_res_block(
x, filters=24, alpha=alpha, stride=1, expansion=6, block_id=2)
So, here I will deal with the first conv with stride=(2, 2). The idea is simple, we will add a new layer in the right place of the built-in model and then remove the desired layer.
def _make_divisible(v, divisor, min_value=None):
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_v < 0.9 * v:
new_v += divisor
return new_v
alpha = 1.0
first_block_filters = _make_divisible(32 * alpha, 8)
inputLayer = tf.keras.Input(shape=(39, 39, 3), name="inputLayer")
inputcOonv = tf.keras.layers.Conv2D(
first_block_filters,
kernel_size=3,
strides=(1, 1),
padding='same',
use_bias=False,
name='Conv1_'
)(inputLayer)
The above _make_divisible function simply derived from the source code. Anyway, now we impute this layer to the MobileNetV2 right before the first conv layer, as follows:
base_model = tf.keras.applications.MobileNetV2(weights=None,
include_top=False,
input_tensor = inputcOonv)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)
output_tensor = Dense(3, activation='softmax')(x)
cnn_model = Model(inputs=base_model.input, outputs=output_tensor)
Now, if we observe
for i, l in enumerate(cnn_model.layers):
print(l.name, l.output_shape)
if i == 8: break
inputLayer [(None, 39, 39, 3)]
Conv1_ (None, 39, 39, 32)
Conv1 (None, 20, 20, 32)
bn_Conv1 (None, 20, 20, 32)
Conv1_relu (None, 20, 20, 32)
expanded_conv_depthwise (None, 20, 20, 32)
expanded_conv_depthwise_BN (None, 20, 20, 32)
expanded_conv_depthwise_relu (None, 20, 20, 32)
expanded_conv_project (None, 20, 20, 16)
Layer name Conv1_ and Conv1 are the new layer (with strides = 1) and old layer (with strides = 2) respectively. And as we need, now we remove layer Conv1 with strides = 2 as follows:
cnn_model._layers.pop(2) # remove Conv1
for i, l in enumerate(cnn_model.layers):
print(l.name, l.output_shape)
if i == 8: break
inputLayer [(None, 39, 39, 3)]
Conv1_ (None, 39, 39, 32)
bn_Conv1 (None, 20, 20, 32)
Conv1_relu (None, 20, 20, 32)
expanded_conv_depthwise (None, 20, 20, 32)
expanded_conv_depthwise_BN (None, 20, 20, 32)
expanded_conv_depthwise_relu (None, 20, 20, 32)
expanded_conv_project (None, 20, 20, 16)
expanded_conv_project_BN (None, 20, 20, 16)
Now, you have cnn_model model with strides = 1 on its first conv layer. However, in case you're wondering about this approach and possible issue, please see my other answer related to this one. Remove first N layers from a Keras Model?
Related
I am running an Involution Model (based of this example), and I am constantly running into errors during the training stage. This is my error:
ValueError: `logits` and `labels` must have the same shape, received ((None, 10) vs (None, 1)).
Below is the relevant code for dataset loading:
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_ds = train_datagen.flow_from_directory(
'data/train',
target_size=(150, 150),
batch_size=128,
class_mode='binary')
test_ds = test_datagen.flow_from_directory(
'data/test',
target_size=(150, 150),
batch_size=64,
class_mode='binary')`
And this is the code for training:
print("building the involution model...")
inputs = keras.Input(shape=(224, 224, 3))
x, _ = Involution(channel=3, group_number=1, kernel_size=3, stride=1, reduction_ratio=2, name="inv_1")(inputs)
x = keras.layers.ReLU()(x)
x = keras.layers.MaxPooling2D((2, 2))(x)
x, _ = Involution(
channel=3, group_number=1, kernel_size=3, stride=1, reduction_ratio=2, name="inv_2")(x)
x = keras.layers.ReLU()(x)
x = keras.layers.MaxPooling2D((2, 2))(x)
x, _ = Involution(
channel=3, group_number=1, kernel_size=3, stride=1, reduction_ratio=2, name="inv_3")(x)
x = keras.layers.ReLU()(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(64, activation="relu")(x)
outputs = keras.layers.Dense(10)(x)
inv_model = keras.Model(inputs=[inputs], outputs=[outputs], name="inv_model")
print("compiling the involution model...")
inv_model.compile(
optimizer="adam",
loss=keras.losses.BinaryCrossentropy(from_logits=True),
metrics=["accuracy"],
)
print("inv model training...")
inv_hist = inv_model.fit(train_ds, epochs=20, validation_data=test_ds)`
The model itself the same used by Keras, and I have not changed anything except to use my own dataset instead of the CIFAR dataset (model works for me with this dataset). So I am sure there is an error in my data loading, but I am unable to identify what that is.
Model Summary:
Model: "inv_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_14 (InputLayer) [(None, 224, 224, 3)] 0
inv_1 (Involution) ((None, 224, 224, 3), 26
(None, 224, 224, 9, 1,
1))
re_lu_39 (ReLU) (None, 224, 224, 3) 0
max_pooling2d_26 (MaxPoolin (None, 112, 112, 3) 0
g2D)
inv_2 (Involution) ((None, 112, 112, 3), 26
(None, 112, 112, 9, 1,
1))
re_lu_40 (ReLU) (None, 112, 112, 3) 0
max_pooling2d_27 (MaxPoolin (None, 56, 56, 3) 0
g2D)
inv_3 (Involution) ((None, 56, 56, 3), 26
(None, 56, 56, 9, 1, 1)
)
re_lu_41 (ReLU) (None, 56, 56, 3) 0
flatten_15 (Flatten) (None, 9408) 0
dense_26 (Dense) (None, 64) 602176
dense_27 (Dense) (None, 10) 650
=================================================================
When you called the train_datagen.flow_from_directory() function, you used class_mode='binary' which means you will have the labels of your images as 0 and 1 only, whereas you are have total 10 predictions i.e. 10 neurons in your final output layer. Hence the labels and logits dosen't match.
Solution: Use class_mode='categorical' which means that there will be as many labels as the number of classes. Do the same in test_datagen as well.
As you know, in the CNN, only layers of Convolution, BatchNormalization have weights. And Usually, they are constructed by this way. Conv - BN - ReLU - Conv - BN - ReLU
But, As you can see, below the structure remain unusual.
conv2_block1_0_conv/kernel:0
conv2_block1_0_conv/bias:0
conv2_block1_3_conv/kernel:0
conv2_block1_3_conv/bias:0
conv2_block1_1_bn/gamma:0
conv2_block1_1_bn/beta:0
conv2_block1_1_bn/moving_mean:0
conv2_block1_1_bn/moving_variance:0
conv2_block1_3_bn/gamma:0
conv2_block1_3_bn/beta:0
conv2_block1_3_bn/moving_mean:0
conv2_block1_3_bn/moving_variance:0
You can find this result by:
model = tf.keras.application.ResNet50()
#The unusual phenomenon begins with index 18.
model.weights[18]
I recommend that you use debugging mode in your IDE. Then you'll find it easier.
In the below lines, the ResNet50 has stack_fn function for creating layers
def ResNet50():
.
.
def stack_fn(x):
x = stack1(x, 64, 3, stride1=1, name='conv2')
x = stack1(x, 128, 4, name='conv3')
x = stack1(x, 256, 6, name='conv4')
return stack1(x, 512, 3, name='conv5')
.
.
In the below codes, the stack1 is for simplifying repeated residential blocks.
def stack1(x, filters, blocks, stride1=2, name=None):
x = block1(x, filters, stride=stride1, name=name + '_block1')
for i in range(2, blocks + 1):
x = block1(x, filters, conv_shortcut=False, name=name + '_block' + str(i))
return x
In the below structure, the block1 is Residential layers in ResNet50.
def block1(x, filters, kernel_size=3, stride=1, conv_shortcut=True, name=None):
bn_axis = 3 if backend.image_data_format() == 'channels_last' else 1
if conv_shortcut:
shortcut = layers.Conv2D(
4 * filters, 1, strides=stride, name=name + '_0_conv')(x)
shortcut = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name=name + '_0_bn')(shortcut)
else:
shortcut = x
x = layers.Conv2D(filters, 1, strides=stride, name=name + '_1_conv')(x)
x = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name=name + '_1_bn')(x)
x = layers.Activation('relu', name=name + '_1_relu')(x)
x = layers.Conv2D(
filters, kernel_size, padding='SAME', name=name + '_2_conv')(x)
x = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name=name + '_2_bn')(x)
x = layers.Activation('relu', name=name + '_2_relu')(x)
x = layers.Conv2D(4 * filters, 1, name=name + '_3_conv')(x)
x = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name=name + '_3_bn')(x)
x = layers.Add(name=name + '_add')([shortcut, x])
x = layers.Activation('relu', name=name + '_out')(x)
return x
My problem is why are the model instance different from the actual structures?
update
Sorry I might have misunderstood your question previously.
As shown in the picture below, there seems to be two contiguous conv layer, and I assume this is what you meant. However, this is in fact not contiguous.
ResNet has a branching structure (residual), which means it is not sequential. But in TensorFlow, summary prints its layers sequentially, so, note the last column, it represents what this layer is connected to before it TensorFlow illustrates parallel structures by specifying which layer is after which.
for example, conv2_block1_0_conv is connected to pool1_pool
conv2_block1_3_conv is connected to conv2_block1_2_relu
Which means although they are printed side by side, they are not contiguous, they are parallel structures!
conv2_block1_0_conv and conv2_block1_0_bn are on the shortcut path
while conv2_block1_3_conv and conv2_block1_3_bn are on the residual path
please feel free to comment if you have more questions on this part, or open a new post if you have other questions
model.weights return weights of a model (which is self-explanatory by name).
Conv - BN - ReLU - Conv - BN - ReLU are layers.
Conv stands for Convolutional layer, BN stands for Batch Normalization, ReLU is activation.
To get a list of layers, you can use model.layers (which returns a list of Layer objects). If you simply want to see the summary of model structure, use model.summary() to print the structure
For example, ResNet50().summary() gives (partial output)
odel: "resnet50"
__________________________________________________________________________________________________ Layer (type) Output Shape Param #
Connected to
================================================================================================== input_1 (InputLayer) [(None, 224, 224, 3) 0
__________________________________________________________________________________________________ conv1_pad (ZeroPadding2D) (None, 230, 230, 3) 0
input_1[0][0]
__________________________________________________________________________________________________ conv1_conv (Conv2D) (None, 112, 112, 64) 9472
conv1_pad[0][0]
__________________________________________________________________________________________________ conv1_bn (BatchNormalization) (None, 112, 112, 64) 256
conv1_conv[0][0]
__________________________________________________________________________________________________ conv1_relu (Activation) (None, 112, 112, 64) 0
conv1_bn[0][0]
__________________________________________________________________________________________________ pool1_pad (ZeroPadding2D) (None, 114, 114, 64) 0
conv1_relu[0][0]
__________________________________________________________________________________________________ pool1_pool (MaxPooling2D) (None, 56, 56, 64) 0
pool1_pad[0][0]
__________________________________________________________________________________________________ conv2_block1_1_conv (Conv2D) (None, 56, 56, 64) 4160
pool1_pool[0][0]
__________________________________________________________________________________________________ conv2_block1_1_bn (BatchNormali (None, 56, 56, 64) 256
conv2_block1_1_conv[0][0]
__________________________________________________________________________________________________ conv2_block1_1_relu (Activation (None, 56, 56, 64) 0
conv2_block1_1_bn[0][0]
I am trying to define a model happyModel()
# GRADED FUNCTION: happyModel
def happyModel():
"""
Implements the forward propagation for the binary classification model:
ZEROPAD2D -> CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> FLATTEN -> DENSE
Note that for simplicity and grading purposes, you'll hard-code all the values
such as the stride and kernel (filter) sizes.
Normally, functions should take these values as function parameters.
Arguments:
None
Returns:
model -- TF Keras model (object containing the information for the entire training process)
"""
model = tf.keras.Sequential(
[
## ZeroPadding2D with padding 3, input shape of 64 x 64 x 3
tf.keras.layers.ZeroPadding2D(padding=(3,3), data_format=(64,64,3)),
## Conv2D with 32 7x7 filters and stride of 1
tf.keras.layers.Conv2D(32, (7, 7), strides = (1, 1), name = 'conv0'),
## BatchNormalization for axis 3
tf.keras.layers.BatchNormalization(axis = 3, name = 'bn0'),
## ReLU
tf.keras.layers.Activation('relu'),
## Max Pooling 2D with default parameters
tf.keras.layers.MaxPooling2D((2, 2), name='max_pool0'),
## Flatten layer
tf.keras.layers.Flatten(),
## Dense layer with 1 unit for output & 'sigmoid' activation
tf.keras.layers.Dense(1, activation='sigmoid', name='fc'),
# YOUR CODE STARTS HERE
# YOUR CODE ENDS HERE
]
)
return model
and following code is for creating the object of this model defined above:
happy_model = happyModel()
# Print a summary for each layer
for layer in summary(happy_model):
print(layer)
output = [['ZeroPadding2D', (None, 70, 70, 3), 0, ((3, 3), (3, 3))],
['Conv2D', (None, 64, 64, 32), 4736, 'valid', 'linear', 'GlorotUniform'],
['BatchNormalization', (None, 64, 64, 32), 128],
['ReLU', (None, 64, 64, 32), 0],
['MaxPooling2D', (None, 32, 32, 32), 0, (2, 2), (2, 2), 'valid'],
['Flatten', (None, 32768), 0],
['Dense', (None, 1), 32769, 'sigmoid']]
comparator(summary(happy_model), output)
I got following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-67-f33284fd82fe> in <module>
1 happy_model = happyModel()
2 # Print a summary for each layer
----> 3 for layer in summary(happy_model):
4 print(layer)
5
~/work/release/W1A2/test_utils.py in summary(model)
30 result = []
31 for layer in model.layers:
---> 32 descriptors = [layer.__class__.__name__, layer.output_shape, layer.count_params()]
33 if (type(layer) == Conv2D):
34 descriptors.append(layer.padding)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in output_shape(self)
2177 """
2178 if not self._inbound_nodes:
-> 2179 raise AttributeError('The layer has never been called '
2180 'and thus has no defined output shape.')
2181 all_output_shapes = set(
AttributeError: The layer has never been called and thus has no defined output shape.
I suspect my calling of ZeroPadding2D() is not right. The project seems to require the input shape of ZeroPadding2D() to be 64X64X3. I tried many formats but could not fix the problem. Anyone can give a pointer? Thanks a lot.
In your model definition, there's an issue with the following layer:
tf.keras.layers.ZeroPadding2D(padding=(3,3), data_format=(64,64,3)),
First, you didn't define any input layer also, the data_format is a string, one of channels_last (default) or channels_first, source. The correct way to define the above model as follows:
def happyModel():
model = tf.keras.Sequential(
[
## ZeroPadding2D with padding 3, input shape of 64 x 64 x 3
tf.keras.layers.ZeroPadding2D(padding=(3,3),
input_shape=(64, 64, 3), data_format="channels_last"),
....
....
happy_model = happyModel()
happy_model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
zero_padding2d_4 (ZeroPaddin (None, 70, 70, 3) 0
_________________________________________________________________
conv0 (Conv2D) (None, 64, 64, 32) 4736
_________________________________________________________________
bn0 (BatchNormalization) (None, 64, 64, 32) 128
_________________________________________________________________
activation_2 (Activation) (None, 64, 64, 32) 0
_________________________________________________________________
max_pool0 (MaxPooling2D) (None, 32, 32, 32) 0
_________________________________________________________________
flatten_16 (Flatten) (None, 32768) 0
_________________________________________________________________
fc (Dense) (None, 1) 32769
=================================================================
Total params: 37,633
Trainable params: 37,569
Non-trainable params: 64
Per the documentation for tf.keras.Sequential() (https://www.tensorflow.org/api_docs/python/tf/keras/Sequential):
"Optionally, the first layer can receive an input_shape argument"
So instead of
tf.keras.layers.ZeroPadding2D(padding=(3,3), data_format=(64,64,3))
if you want to specify input shape it should be
tf.keras.layers.ZeroPadding2D(padding=(3,3), input_shape=(64,64,3))
model = tf.keras.Sequential([
# YOUR CODE STARTS HERE
tf.keras.layers.ZeroPadding2D(padding=(3, 3), input_shape=(64,64,3), data_format="channels_last"),
tf.keras.layers.Conv2D(32, (7, 7), strides = (1, 1)),
tf.keras.layers.BatchNormalization(axis=3),
tf.keras.layers.ReLU(),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1, activation='sigmoid'),
# YOUR CODE ENDS HERE
])
return model
Try it working perfectly......
model = tf.keras.Sequential(
[
## ZeroPadding2D with padding 3, input shape of 64 x 64 x 3
## Conv2D with 32 7x7 filters and stride of 1
## BatchNormalization for axis 3
## ReLU
## Max Pooling 2D with default parameters
## Flatten layer
## Dense layer with 1 unit for output & 'sigmoid' activation
# YOUR CODE STARTS HERE
tfl.ZeroPadding2D(padding=(3,3), input_shape=(64,64,3),data_format="channels_last"),
tfl.Conv2D(32, (7, 7), strides = (1, 1), name = 'conv0'),
tfl.BatchNormalization(axis = 3, name = 'bn0'),
tfl.ReLU(),
tfl.MaxPooling2D((2, 2), name='max_pool0'),
tfl.Flatten(),
tfl.Dense(1, activation='sigmoid', name='fc'),
# YOUR CODE ENDS HERE
])
It is working you can try it.
My input shape is (150,10,1) and my output has the same shape (150,10,1). My problem is multi-classification (3 classes). After using np_utils.to_categorical(Ytrain) the output shape will be (150,10,3) which is perfect. However during the process of modelling with GlobalAvgPool1D(), it gives the error :
"A target array with shape (150, 10, 3) was passed for an output of shape (None, 3) while using as loss categorical_crossentropy. This loss expects targets to have the same shape as the output".
How should I fix it?
My codes:
nput_size = (150, 10, 1)
Xtrain = np.random.randint(0, 100, size=(150, 10, 1))
Ytrain = np.random.choice([0,1, 2], size=(150, 10,1))
Ytrain = np_utils.to_categorical(Ytrain)
input_shape = (10, 1)
input_layer = tf.keras.layers.Input(input_shape)
conv_x = tf.keras.layers.Conv1D(filters=32, kernel_size=10, strides = 1, padding='same')(input_layer)
conv_x = tf.keras.layers.BatchNormalization()(conv_x)
conv_x = tf.keras.layers.Activation('relu')(conv_x)
g_pool = tf.keras.layers.GlobalAvgPool1D()(conv_x)
output_layer = tf.keras.layers.Dense(3, activation='softmax')(g_pool)
model = tf.keras.models.Model(inputs= input_layer, outputs = output_layer)
model.summary()
model.compile(loss='categorical_crossentropy', optimizer= tf.keras.optimizers.Adam(),
metrics='accuracy'])
hist = model.fit(Xtrain, Ytrain, batch_size= 5, epochs= 10, verbose= 0)
When I ran your code in Tensorflow Version 2.2.0 in Google colab, I got the following error - ValueError: Shapes (5, 10, 3) and (5, 3) are incompatible.
You are getting this error because, the labels Ytrain data is having the shape of (150, 10, 3) instead of (150, 3).
As your labels are having shape of (None,3), your input also should be same .i.e. (Number of records, 3). I was able to run your code successfully after modifying,
Ytrain = np.random.choice([0,1, 2], size=(150, 10,1))
to
Ytrain = np.random.choice([0,1, 2], size=(150, 1))
np_utils.to_categorical adds the 3 columns for labels thus making the shape of (150,3) which our model expects.
Fixed Code -
import tensorflow as tf
print(tf.__version__)
import numpy as np
from tensorflow.keras import utils as np_utils
Xtrain = np.random.randint(0, 100, size=(150, 10, 1))
Ytrain = np.random.choice([0,1, 2], size=(150, 1))
Ytrain = np_utils.to_categorical(Ytrain)
print(Ytrain.shape)
input_shape = (10, 1)
input_layer = tf.keras.layers.Input(input_shape)
conv_x = tf.keras.layers.Conv1D(filters=32, kernel_size=10, strides = 1, padding='same')(input_layer)
conv_x = tf.keras.layers.BatchNormalization()(conv_x)
conv_x = tf.keras.layers.Activation('relu')(conv_x)
g_pool = tf.keras.layers.GlobalAvgPool1D()(conv_x)
output_layer = tf.keras.layers.Dense(3, activation='softmax')(g_pool)
model = tf.keras.models.Model(inputs= input_layer, outputs = output_layer)
model.summary()
model.compile(loss='categorical_crossentropy', optimizer= tf.keras.optimizers.Adam(),
metrics=['accuracy'])
hist = model.fit(Xtrain, Ytrain, batch_size= 5, epochs= 10, verbose= 0)
print("Ran Successfully")
Output -
2.2.0
(150, 3)
Model: "model_13"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_21 (InputLayer) [(None, 10, 1)] 0
_________________________________________________________________
conv1d_9 (Conv1D) (None, 10, 32) 352
_________________________________________________________________
batch_normalization_15 (Batc (None, 10, 32) 128
_________________________________________________________________
activation_9 (Activation) (None, 10, 32) 0
_________________________________________________________________
global_average_pooling1d_9 ( (None, 32) 0
_________________________________________________________________
dense_14 (Dense) (None, 3) 99
=================================================================
Total params: 579
Trainable params: 515
Non-trainable params: 64
_________________________________________________________________
Ran Successfully
Hope this answers your question. Happy Learning.
After permute layer, dimensions become (None, None, 12, 16)
I want to summarize last two dimensions with a LSTM(48 units) with input_shape(12, 16)
so that overall dimension becomes (None, None, 48)
Currently I have a workaround with custom lstm&lstmcell, however its very slow, since I have used another LSTM within the cell etc.
What I would want to have is this:
(None, None, 12, 16)
(None, None, 48)
(None, None, 60)
The last two is done in custom lstm (currently), is there a way to seperate them?
Whats the proper way of doing this?
Can we create different(or more than one) lstm for cells, which have same weights but different cell states?
Would you give me some direction?
inputs (InputLayer) (None, 36, None, 1) 0
convlayer (Conv2D) (None, 36, None, 16) 160 inputs[0][0]
mp (MaxPooling2D) (None, 12, None, 16) 0 convlayer[0][0]
permute_1 (Permute) (None, None, 12, 16) 0 mp[0][0]
reshape_1 (Reshape) (None, None, 192) 0 permute_1[0][0]
custom_lstm_extended_1 (CustomL (None, None, 60) 26160 reshape_1[0][0]
Custom LSTM is called like this:
CustomLSTMExtended(units=60, summarizeUnits=48, return_sequences=True, return_state=False, input_shape=(None, 192))(inner)
LSTM class:
self.summarizeUnits = summarizeUnits
self.summarizeLSTM = CuDNNLSTM(summarizeUnits, input_shape=(None, 16), return_sequences=False, return_state=True)
cell = SummarizeLSTMCellExtended(self.summarizeLSTM, units,
activation=activation,
recurrent_activation=recurrent_activation,
use_bias=use_bias,
kernel_initializer=kernel_initializer,
recurrent_initializer=recurrent_initializer,
unit_forget_bias=unit_forget_bias,
bias_initializer=bias_initializer,
kernel_regularizer=kernel_regularizer,
recurrent_regularizer=recurrent_regularizer,
bias_regularizer=bias_regularizer,
kernel_constraint=kernel_constraint,
recurrent_constraint=recurrent_constraint,
bias_constraint=bias_constraint,
dropout=dropout,
recurrent_dropout=recurrent_dropout,
implementation=implementation)
RNN.__init__(self, cell,
return_sequences=return_sequences,
return_state=return_state,
go_backwards=go_backwards,
stateful=stateful,
unroll=unroll,
**kwargs)
Cell class:
def call(self, inputs, states, training=None):
#cell
reshaped = Reshape([12, 16])(inputs)
state_h = self.summarizeLayer(reshaped)
inputsx = state_h[0]
return super(SummarizeLSTMCellExtended, self).call(inputsx, states, training)
I have done this using tf.reshape rather than keras Reshape layer.
Keras reshape layer doesnt want you to interfere with "batch_size" dimension
shape = Lambda(lambda x: tf.shape(x), output_shape=(4,))(inner)
..
..
inner = Lambda(lambda x : customreshape(x), output_shape=(None, 48))([inner, shape])
..
def customreshape(inputs):
inner = inputs[0]
shape = inputs[1]
import tensorflow as tf2
reshaped = tf2.reshape(inner, [shape[0], shape[1], 48] )
return reshaped