As you know, in the CNN, only layers of Convolution, BatchNormalization have weights. And Usually, they are constructed by this way. Conv - BN - ReLU - Conv - BN - ReLU
But, As you can see, below the structure remain unusual.
conv2_block1_0_conv/kernel:0
conv2_block1_0_conv/bias:0
conv2_block1_3_conv/kernel:0
conv2_block1_3_conv/bias:0
conv2_block1_1_bn/gamma:0
conv2_block1_1_bn/beta:0
conv2_block1_1_bn/moving_mean:0
conv2_block1_1_bn/moving_variance:0
conv2_block1_3_bn/gamma:0
conv2_block1_3_bn/beta:0
conv2_block1_3_bn/moving_mean:0
conv2_block1_3_bn/moving_variance:0
You can find this result by:
model = tf.keras.application.ResNet50()
#The unusual phenomenon begins with index 18.
model.weights[18]
I recommend that you use debugging mode in your IDE. Then you'll find it easier.
In the below lines, the ResNet50 has stack_fn function for creating layers
def ResNet50():
.
.
def stack_fn(x):
x = stack1(x, 64, 3, stride1=1, name='conv2')
x = stack1(x, 128, 4, name='conv3')
x = stack1(x, 256, 6, name='conv4')
return stack1(x, 512, 3, name='conv5')
.
.
In the below codes, the stack1 is for simplifying repeated residential blocks.
def stack1(x, filters, blocks, stride1=2, name=None):
x = block1(x, filters, stride=stride1, name=name + '_block1')
for i in range(2, blocks + 1):
x = block1(x, filters, conv_shortcut=False, name=name + '_block' + str(i))
return x
In the below structure, the block1 is Residential layers in ResNet50.
def block1(x, filters, kernel_size=3, stride=1, conv_shortcut=True, name=None):
bn_axis = 3 if backend.image_data_format() == 'channels_last' else 1
if conv_shortcut:
shortcut = layers.Conv2D(
4 * filters, 1, strides=stride, name=name + '_0_conv')(x)
shortcut = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name=name + '_0_bn')(shortcut)
else:
shortcut = x
x = layers.Conv2D(filters, 1, strides=stride, name=name + '_1_conv')(x)
x = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name=name + '_1_bn')(x)
x = layers.Activation('relu', name=name + '_1_relu')(x)
x = layers.Conv2D(
filters, kernel_size, padding='SAME', name=name + '_2_conv')(x)
x = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name=name + '_2_bn')(x)
x = layers.Activation('relu', name=name + '_2_relu')(x)
x = layers.Conv2D(4 * filters, 1, name=name + '_3_conv')(x)
x = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name=name + '_3_bn')(x)
x = layers.Add(name=name + '_add')([shortcut, x])
x = layers.Activation('relu', name=name + '_out')(x)
return x
My problem is why are the model instance different from the actual structures?
update
Sorry I might have misunderstood your question previously.
As shown in the picture below, there seems to be two contiguous conv layer, and I assume this is what you meant. However, this is in fact not contiguous.
ResNet has a branching structure (residual), which means it is not sequential. But in TensorFlow, summary prints its layers sequentially, so, note the last column, it represents what this layer is connected to before it TensorFlow illustrates parallel structures by specifying which layer is after which.
for example, conv2_block1_0_conv is connected to pool1_pool
conv2_block1_3_conv is connected to conv2_block1_2_relu
Which means although they are printed side by side, they are not contiguous, they are parallel structures!
conv2_block1_0_conv and conv2_block1_0_bn are on the shortcut path
while conv2_block1_3_conv and conv2_block1_3_bn are on the residual path
please feel free to comment if you have more questions on this part, or open a new post if you have other questions
model.weights return weights of a model (which is self-explanatory by name).
Conv - BN - ReLU - Conv - BN - ReLU are layers.
Conv stands for Convolutional layer, BN stands for Batch Normalization, ReLU is activation.
To get a list of layers, you can use model.layers (which returns a list of Layer objects). If you simply want to see the summary of model structure, use model.summary() to print the structure
For example, ResNet50().summary() gives (partial output)
odel: "resnet50"
__________________________________________________________________________________________________ Layer (type) Output Shape Param #
Connected to
================================================================================================== input_1 (InputLayer) [(None, 224, 224, 3) 0
__________________________________________________________________________________________________ conv1_pad (ZeroPadding2D) (None, 230, 230, 3) 0
input_1[0][0]
__________________________________________________________________________________________________ conv1_conv (Conv2D) (None, 112, 112, 64) 9472
conv1_pad[0][0]
__________________________________________________________________________________________________ conv1_bn (BatchNormalization) (None, 112, 112, 64) 256
conv1_conv[0][0]
__________________________________________________________________________________________________ conv1_relu (Activation) (None, 112, 112, 64) 0
conv1_bn[0][0]
__________________________________________________________________________________________________ pool1_pad (ZeroPadding2D) (None, 114, 114, 64) 0
conv1_relu[0][0]
__________________________________________________________________________________________________ pool1_pool (MaxPooling2D) (None, 56, 56, 64) 0
pool1_pad[0][0]
__________________________________________________________________________________________________ conv2_block1_1_conv (Conv2D) (None, 56, 56, 64) 4160
pool1_pool[0][0]
__________________________________________________________________________________________________ conv2_block1_1_bn (BatchNormali (None, 56, 56, 64) 256
conv2_block1_1_conv[0][0]
__________________________________________________________________________________________________ conv2_block1_1_relu (Activation (None, 56, 56, 64) 0
conv2_block1_1_bn[0][0]
Related
I would like to apply in Keras MobileNetV2 on images of size 39 x 39 to classify 3 classes. My images represent heat maps (e.g. what keys have been pressed on the keyboard). I think MobileNet was designed to work on images of size 224 x 224. I will not use transfer learning but train the model from scratch.
To make MobileNet work on my images, I would like to replace the first three stride 2 convolutions with stride 1. I have the following code:
from tensorflow.keras.applications import MobileNetV2
base_model = MobileNetV2(weights=None, include_top=False,
input_shape=[39,39,3])
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)
output_tensor = Dense(3, activation='softmax')(x)
cnn_model = Model(inputs=base_model.input, outputs=output_tensor)
opt = Adam(lr=learning_rate)
cnn_model.compile(loss='categorical_crossentropy',
optimizer=opt, metrics=['accuracy', tf.keras.metrics.AUC()])
How can I replace the first three stride 2 convolutions with stride 1 without building MobileNet myself?
Here is one workaround for your need but I think probably it's possible to have a more general approach. However, in the MobileNetV2, there is only one conv layer with strides 2. If you follow the source code, here
x = layers.Conv2D(
first_block_filters,
kernel_size=3,
strides=(2, 2),
padding='same',
use_bias=False,
name='Conv1')(img_input)
x = layers.BatchNormalization(
axis=channel_axis, epsilon=1e-3, momentum=0.999, name='bn_Conv1')(
x)
x = layers.ReLU(6., name='Conv1_relu')(x)
And the rest of the blocks are defined as follows
x = _inverted_res_block(
x, filters=16, alpha=alpha, stride=1, expansion=1, block_id=0)
x = _inverted_res_block(
x, filters=24, alpha=alpha, stride=2, expansion=6, block_id=1)
x = _inverted_res_block(
x, filters=24, alpha=alpha, stride=1, expansion=6, block_id=2)
So, here I will deal with the first conv with stride=(2, 2). The idea is simple, we will add a new layer in the right place of the built-in model and then remove the desired layer.
def _make_divisible(v, divisor, min_value=None):
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_v < 0.9 * v:
new_v += divisor
return new_v
alpha = 1.0
first_block_filters = _make_divisible(32 * alpha, 8)
inputLayer = tf.keras.Input(shape=(39, 39, 3), name="inputLayer")
inputcOonv = tf.keras.layers.Conv2D(
first_block_filters,
kernel_size=3,
strides=(1, 1),
padding='same',
use_bias=False,
name='Conv1_'
)(inputLayer)
The above _make_divisible function simply derived from the source code. Anyway, now we impute this layer to the MobileNetV2 right before the first conv layer, as follows:
base_model = tf.keras.applications.MobileNetV2(weights=None,
include_top=False,
input_tensor = inputcOonv)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)
output_tensor = Dense(3, activation='softmax')(x)
cnn_model = Model(inputs=base_model.input, outputs=output_tensor)
Now, if we observe
for i, l in enumerate(cnn_model.layers):
print(l.name, l.output_shape)
if i == 8: break
inputLayer [(None, 39, 39, 3)]
Conv1_ (None, 39, 39, 32)
Conv1 (None, 20, 20, 32)
bn_Conv1 (None, 20, 20, 32)
Conv1_relu (None, 20, 20, 32)
expanded_conv_depthwise (None, 20, 20, 32)
expanded_conv_depthwise_BN (None, 20, 20, 32)
expanded_conv_depthwise_relu (None, 20, 20, 32)
expanded_conv_project (None, 20, 20, 16)
Layer name Conv1_ and Conv1 are the new layer (with strides = 1) and old layer (with strides = 2) respectively. And as we need, now we remove layer Conv1 with strides = 2 as follows:
cnn_model._layers.pop(2) # remove Conv1
for i, l in enumerate(cnn_model.layers):
print(l.name, l.output_shape)
if i == 8: break
inputLayer [(None, 39, 39, 3)]
Conv1_ (None, 39, 39, 32)
bn_Conv1 (None, 20, 20, 32)
Conv1_relu (None, 20, 20, 32)
expanded_conv_depthwise (None, 20, 20, 32)
expanded_conv_depthwise_BN (None, 20, 20, 32)
expanded_conv_depthwise_relu (None, 20, 20, 32)
expanded_conv_project (None, 20, 20, 16)
expanded_conv_project_BN (None, 20, 20, 16)
Now, you have cnn_model model with strides = 1 on its first conv layer. However, in case you're wondering about this approach and possible issue, please see my other answer related to this one. Remove first N layers from a Keras Model?
Problem
I was trying to re-build MobileNet model identical to the keras application provided version on Tensorflow v2.1.0.
However, no matter what I tried (i.e., Conv2d, SeparableConv2D, DepthwiseConv2D), the parameter count seems way off to a point the model starts allocating 100+ GB ram in the system.
The model summary for the keras version and my own version along with the layers for my version could be found below under snippets section.
For sake of simplicity, I am not using any width and resolution multiplier (or let's say both has the value 1.0).
Question
How might I achieve the same parameter count as low as the keras provided version?
Snippets
Portion of keras mobilenet model summary
Model: "mobilenet_1.00_224"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
conv1_pad (ZeroPadding2D) (None, 225, 225, 3) 0
_________________________________________________________________
conv1 (Conv2D) (None, 112, 112, 32) 864
_________________________________________________________________
conv1_bn (BatchNormalization (None, 112, 112, 32) 128
_________________________________________________________________
conv1_relu (ReLU) (None, 112, 112, 32) 0
_________________________________________________________________
conv_dw_1 (DepthwiseConv2D) (None, 112, 112, 32) 288
_________________________________________________________________
conv_dw_1_bn (BatchNormaliza (None, 112, 112, 32) 128
Portion of self created model summary
Model: "dummy_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
zero_padding2d (ZeroPadding2 (None, 225, 225, 3) 0
_________________________________________________________________
conv2d (Conv2D) (None, 112, 112, 32) 896
_________________________________________________________________
batch_normalization (BatchNo (None, 112, 112, 32) 128
_________________________________________________________________
re_lu (ReLU) (None, 112, 112, 32) 0
_________________________________________________________________
depthwise_conv2d (DepthwiseC (None, 112, 112, 32) 32800
_________________________________________________________________
batch_normalization_1 (Batch (None, 112, 112, 32) 128
Self created model with layers
inputs = Input(shape=(224, 224, 3))
x = ZeroPadding2D(padding=((1, 0), (1, 0)))(inputs)
x = Conv2D(32, (3, 3), strides=(2, 2), padding="valid")(x)
x = BatchNormalization()(x)
x = ReLU()(x)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x, 2)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x, 2)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x, 2)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x, 2)
x = depthwise_separable_convolution(x, 2)
x = AveragePooling2D(pool_size=7)(x)
x = Flatten()(x)
x = Dense(10, activation="softmax")(x)
return Model(inputs=inputs, outputs=x, name="dummy_model")
Depthwise separable convolution
def depthwise_separable_convolution(self, input, strides=1):
input_depth = input.shape[-1]
output_depth = input_depth * 2
x = DepthwiseConv2D(input_depth, 1, padding="same")(input)
x = BatchNormalization()(x)
x = ReLU()(x)
x = Conv2D(output_depth, 1, padding="same")(x)
x = BatchNormalization()(x)
x = ReLU()(x)
return x
Environment:
Im using TF.Keras (Tensorflow 1.14) on Google Colab, and my model architecture is MobileNet V2 1.00 224.
Problem:
I am trying (and failing) to attach a new layer and make a new output to an existing layer that is not the normal output of my Model. I.e make a branch earlier in MobileNet V2
I want this new branch to be for a regression output - but I dont want that output to serially connected off of the final embedding layer of MobileNet, but a much earlier stage (which one - im not sure, im experimenting). Basically a branch with its own output, and then the normal, pre-trained image net embedding out.
Grab MobileNet V2 as base_model:
base_model = tf.keras.applications.MobileNetV2(input_shape=(IMG_SIZE, IMG_SIZE, 3),
include_top=False,
weights='imagenet')
base_model.trainable = False
Make my layers from base_model and make my new outputs.
# get layers from mobilenet base layer
mobilenet_input = base_model.get_layer('input_1')
mobilenet_output = base_model.get_layer('out_relu')
# add our average pooling layer to our MobileNetV2 output like all of our other classifiers so we split our graph on the same nodes
out_global_pooling = tf.keras.layers.GlobalAveragePooling2D(name='embedding_pooling')(mobilenet_output.output)
out_global_pooling.trainable = False
# Our new branch and outputs for the branch
expanded_conv_depthwise_BN = base_model.get_layer('expanded_conv_depthwise_BN')
regression_dropout = tf.keras.layers.Dropout(0.5) (expanded_conv_depthwise_BN.output)
regression_global_pooling = tf.keras.layers.GlobalAveragePooling2D(name="regression_pooling")(regression_dropout)
new_regression_output = tf.keras.layers.Dense(num_labels, activation = 'sigmoid', name = "cinemanet_output") (regression_global_pooling)
This appears to be fine, and I can even make my model via the functional API:
model = tf.keras.Model(inputs=mobilenet_input.input, outputs=[out_global_pooling, new_regression_output])
My Training Code
My data set is a set of 30 floats (10 RGB duplets) I want to predict from an input image. My data set functions when training a 'sequence' model, but fails when I try to train this model.
ops.reset_default_graph()
tf.keras.backend.set_learning_phase(1) # 0 testing, 1 training mode
# preview contents of CSV to verify things are sane
import csv
import math
def lenopenreadlines(filename):
with open(filename) as f:
return len(f.readlines())
def csvheaderrow(filename):
with open(filename) as f:
reader = csv.reader(f)
return next(reader, None)
# !head {label_file}
NUM_IMAGES = ( lenopenreadlines(label_file) - 1) # remove header
COLUMN_NAMES = csvheaderrow(label_file)
LABEL_NAMES = COLUMN_NAMES[:]
LABEL_NAMES.remove("filepath")
ALL_LABELS.extend(LABEL_NAMES)
# make our data set
BATCH_SIZE = 256
NUM_EPOCHS = 50
FILE_PATH = ["filepath"]
LABELS_TO_PRINT = ' '.join(LABEL_NAMES)
print("Label contains: " + str(NUM_IMAGES) + " images")
print("Label Are: " + LABELS_TO_PRINT)
print("Creating Data Set From " + label_file)
csv_dataset = get_dataset(label_file, BATCH_SIZE, NUM_EPOCHS, COLUMN_NAMES)
#make a new data set from our csv by mapping every value to the above function
split_dataset = csv_dataset.map(split_csv_to_path_and_labels)
# make a new datas set that loads our images from the first path
image_and_labels_ds = split_dataset.map(load_and_preprocess_image_batch, num_parallel_calls=AUTOTUNE)
# update our image floating point range to match -1, 1
ds = image_and_labels_ds.map(change_range)
print(image_and_labels_ds)
model = build_model(LABEL_NAMES, use_masked_loss)
#split the final data set into train / validation splits to use for our model.
DATASET_SIZE = NUM_IMAGES
ds = ds.repeat()
steps_per_epoch = int(math.floor(DATASET_SIZE/BATCH_SIZE))
history = model.fit(ds, epochs=NUM_EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[TensorBoardColabCallback(tbc)])
print(history)
# results = model.evaluate(test_dataset)
# print('test loss, test acc:', results)
export_model(model, model_name, LABEL_NAMES, date)
ValueError: Error when checking model target:
the list of Numpy arrays that you are passing to your model is not the size the model expected.
Expected to see 2 array(s), but instead got the following list of 1 arrays:
[<tf.Tensor 'IteratorGetNext:1' shape=(?, 30) dtype=float32>]
If I instead use a Sequence and naively try to train my regression task against final output of mobile net (rather than the branch) - training works fine (although I get poor results).
My Model summary appears to tell me things are wired as I expect. My dropout is connected to expanded_conv_depthwise_BN. My regression pooling is connected to my drop out and my output layer appears in the summary connected to my regressing pooling
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224, 224, 3) 0
__________________________________________________________________________________________________
Conv1_pad (ZeroPadding2D) (None, 225, 225, 3) 0 input_1[0][0]
__________________________________________________________________________________________________
Conv1 (Conv2D) (None, 112, 112, 32) 864 Conv1_pad[0][0]
__________________________________________________________________________________________________
bn_Conv1 (BatchNormalization) (None, 112, 112, 32) 128 Conv1[0][0]
__________________________________________________________________________________________________
Conv1_relu (ReLU) (None, 112, 112, 32) 0 bn_Conv1[0][0]
__________________________________________________________________________________________________
expanded_conv_depthwise (Depthw (None, 112, 112, 32) 288 Conv1_relu[0][0]
__________________________________________________________________________________________________
expanded_conv_depthwise_BN (Bat (None, 112, 112, 32) 128 expanded_conv_depthwise[0][0]
__________________________________________________________________________________________________
expanded_conv_depthwise_relu (R (None, 112, 112, 32) 0 expanded_conv_depthwise_BN[0][0]
__________________________________________________________________________________________________
expanded_conv_project (Conv2D) (None, 112, 112, 16) 512 expanded_conv_depthwise_relu[0][0
__________________________________________________________________________________________
< snip for brevity >
________
block_16_project (Conv2D) (None, 7, 7, 320) 307200 block_16_depthwise_relu[0][0]
__________________________________________________________________________________________________
block_16_project_BN (BatchNorma (None, 7, 7, 320) 1280 block_16_project[0][0]
__________________________________________________________________________________________________
Conv_1 (Conv2D) (None, 7, 7, 1280) 409600 block_16_project_BN[0][0]
__________________________________________________________________________________________________
Conv_1_bn (BatchNormalization) (None, 7, 7, 1280) 5120 Conv_1[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, 112, 112, 32) 0 expanded_conv_depthwise_BN[0][0]
__________________________________________________________________________________________________
out_relu (ReLU) (None, 7, 7, 1280) 0 Conv_1_bn[0][0]
__________________________________________________________________________________________________
regression_pooling (GlobalAvera (None, 32) 0 dropout[0][0]
__________________________________________________________________________________________________
embedding_pooling (GlobalAverag (None, 1280) 0 out_relu[0][0]
__________________________________________________________________________________________________
cinemanet_output (Dense) (None, 30) 990 regression_pooling[0][0]
==================================================================================================
Total params: 2,258,974
Trainable params: 990
Non-trainable params: 2,257,984
It looks like you are setting things up correctly, but your training dataset doesn't include tensors for both outputs. If you only want to train the new output, you can provide dummy tensors (or even real training data) for the other one while using a loss weight of 0 to prevent the parameters from updating. That should also prevent any parameters that are not directly "upstream" of the new output layer from updating during training.
When compiling your model, use the argument loss_weights to pass the weights as either a list (e.g., loss_weights=[0, 1]) or a dictionary (e.g., loss_weights={'out_relu': 0, 'cinemanet_output': 1}).
After permute layer, dimensions become (None, None, 12, 16)
I want to summarize last two dimensions with a LSTM(48 units) with input_shape(12, 16)
so that overall dimension becomes (None, None, 48)
Currently I have a workaround with custom lstm&lstmcell, however its very slow, since I have used another LSTM within the cell etc.
What I would want to have is this:
(None, None, 12, 16)
(None, None, 48)
(None, None, 60)
The last two is done in custom lstm (currently), is there a way to seperate them?
Whats the proper way of doing this?
Can we create different(or more than one) lstm for cells, which have same weights but different cell states?
Would you give me some direction?
inputs (InputLayer) (None, 36, None, 1) 0
convlayer (Conv2D) (None, 36, None, 16) 160 inputs[0][0]
mp (MaxPooling2D) (None, 12, None, 16) 0 convlayer[0][0]
permute_1 (Permute) (None, None, 12, 16) 0 mp[0][0]
reshape_1 (Reshape) (None, None, 192) 0 permute_1[0][0]
custom_lstm_extended_1 (CustomL (None, None, 60) 26160 reshape_1[0][0]
Custom LSTM is called like this:
CustomLSTMExtended(units=60, summarizeUnits=48, return_sequences=True, return_state=False, input_shape=(None, 192))(inner)
LSTM class:
self.summarizeUnits = summarizeUnits
self.summarizeLSTM = CuDNNLSTM(summarizeUnits, input_shape=(None, 16), return_sequences=False, return_state=True)
cell = SummarizeLSTMCellExtended(self.summarizeLSTM, units,
activation=activation,
recurrent_activation=recurrent_activation,
use_bias=use_bias,
kernel_initializer=kernel_initializer,
recurrent_initializer=recurrent_initializer,
unit_forget_bias=unit_forget_bias,
bias_initializer=bias_initializer,
kernel_regularizer=kernel_regularizer,
recurrent_regularizer=recurrent_regularizer,
bias_regularizer=bias_regularizer,
kernel_constraint=kernel_constraint,
recurrent_constraint=recurrent_constraint,
bias_constraint=bias_constraint,
dropout=dropout,
recurrent_dropout=recurrent_dropout,
implementation=implementation)
RNN.__init__(self, cell,
return_sequences=return_sequences,
return_state=return_state,
go_backwards=go_backwards,
stateful=stateful,
unroll=unroll,
**kwargs)
Cell class:
def call(self, inputs, states, training=None):
#cell
reshaped = Reshape([12, 16])(inputs)
state_h = self.summarizeLayer(reshaped)
inputsx = state_h[0]
return super(SummarizeLSTMCellExtended, self).call(inputsx, states, training)
I have done this using tf.reshape rather than keras Reshape layer.
Keras reshape layer doesnt want you to interfere with "batch_size" dimension
shape = Lambda(lambda x: tf.shape(x), output_shape=(4,))(inner)
..
..
inner = Lambda(lambda x : customreshape(x), output_shape=(None, 48))([inner, shape])
..
def customreshape(inputs):
inner = inputs[0]
shape = inputs[1]
import tensorflow as tf2
reshaped = tf2.reshape(inner, [shape[0], shape[1], 48] )
return reshaped
I'm basically following this guide to build convolutional autoencoder with tensorflow backend. The main difference to the guide is that my data is 257x257 grayscale images. The following code:
TRAIN_FOLDER = 'data/OIRDS_gray/'
EPOCHS = 10
SHAPE = (257,257,1)
FILELIST = os.listdir(TRAIN_FOLDER)
def loadTrainData():
train_data = []
for fn in FILELIST:
img = misc.imread(TRAIN_FOLDER + fn)
img = np.reshape(img,(len(img[0,:]), len(img[:,0]), SHAPE[2]))
if img.shape != SHAPE:
print "image shape mismatch!"
print "Expected: "
print SHAPE
print "but got:"
print img.shape
sys.exit()
train_data.append (img)
train_data = np.array(train_data)
train_data = train_data.astype('float32')/ 255
return np.array(train_data)
def createModel():
input_img = Input(shape=SHAPE)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid',padding='same')(x)
return Model(input_img, decoded)
x_train = loadTrainData()
autoencoder = createModel()
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
print x_train.shape
autoencoder.summary()
# Run the network
autoencoder.fit(x_train, x_train,
epochs=EPOCHS,
batch_size=128,
shuffle=True)
gives me a error:
ValueError: Error when checking target: expected conv2d_7 to have shape (None, 260, 260, 1) but got array with shape (859, 257, 257, 1)
As you can see this is not the standard problem with theano/tensorflow backend dim ordering, but something else. I checked that my data is what it's supposed to be with print x_train.shape:
(859, 257, 257, 1)
And I also run autoencoder.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 257, 257, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 257, 257, 16) 160
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 129, 129, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 129, 129, 8) 1160
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 65, 65, 8) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 65, 65, 8) 584
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 33, 33, 8) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 33, 33, 8) 584
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 66, 66, 8) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 66, 66, 8) 584
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 132, 132, 8) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 132, 132, 16) 1168
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 264, 264, 16) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 264, 264, 1) 145
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________
Now I'm not exactly sure where the problem is, but it does look like things go wrong around conv2d_6 (Param # too high). I do know how CAE's work on principle, but I'm not that familiar with the exact technical details yet and I have tried to solve this mainly by messing with deconvolution padding (instead of same, using valid). The closes I got to dims matching was (None, 258, 258, 1). I achieved this by blindly trying different combinations of padding on deconvolution side, not really a smart way to solve a problem...
At this point I'm at a loss, and any help would be appreciated
Since your input and output data are the same, your final output shape should be the same as the input shape.
The last convolutional layer should have shape (None, 257,257,1).
The problem is happening because you have an odd number as the sizes of the images (257).
When you apply MaxPooling, it should divide the number by two, so it chooses rounding either up or down (it's going up, see the 129, coming from 257/2 = 128.5)
Later, when you do UpSampling, the model doesn't know the current dimensions were rounded, it simply doubles the value. This happening in sequence is adding 7 pixels to the final result.
You could try either cropping the result or padding the input.
I usually work with images of compatible sizes. If you have 3 MaxPooling layers, your size should be a multiple of 2³. The answer is 264.
Padding the input data directly:
x_train = numpy.lib.pad(x_train,((0,0),(3,4),(3,4),(0,0)),mode='constant')
This will require that SHAPE=(264,264,1)
Padding inside the model:
import keras.backend as K
input_img = Input(shape=SHAPE)
x = Lambda(lambda x: K.spatial_2d_padding(x, padding=((3, 4), (3, 4))), output_shape=(264,264,1))(input_img)
Cropping the results:
This will be required in any case where you do not change the actual data (numpy array) directly.
decoded = Lambda(lambda x: x[:,3:-4,3:-4,:], output_shape=SHAPE)(x)