How to save a part of a network? - tensorflow

I have made an autoencoder, consisting of an encoder and a decoder part.
I have managed to get the encoder separated from the full network, but I have some troubles with the decoder part.
This part works:
encoder = tf.keras.Model(inputs=autoencoder.input, outputs=autoencoder.layers[5].output)
This part however doesn't:
decoder = tf.keras.Model(inputs=autoencoder.layers[6].input, outputs=autoencoder.output)
the error:
W0514 14:57:48.965506 78976 network.py:1619] Model inputs must come from tf.keras.Input (thus holding past layer metadata), they cannot be the output of a previous non-Input layer. Here, a tensor specified as input to "model_15" was not an Input tensor, it was generated by layer flatten.
Note that input tensors are instantiated via tensor = tf.keras.Input(shape).
The tensor that caused the issue was: flatten/Reshape:0
any ideas what to try?
thanks
/mikael
EDIT:
for kruxx
autoencoder = tf.keras.models.Sequential()
# Encoder Layers
autoencoder.add(tf.keras.layers.Conv2D(16, (3, 3), activation='relu', padding='same', input_shape=x_train_tensor.shape[1:]))
autoencoder.add(tf.keras.layers.MaxPooling2D((2, 2), padding='same'))
autoencoder.add(tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(tf.keras.layers.MaxPooling2D((2, 2), padding='same'))
autoencoder.add(tf.keras.layers.Conv2D(8, (3, 3), strides=(2,2), activation='relu', padding='same'))
# Flatten encoding for visualization
autoencoder.add(tf.keras.layers.Flatten())
autoencoder.add(tf.keras.layers.Reshape((4, 4, 8)))
# Decoder Layers
autoencoder.add(tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(tf.keras.layers.UpSampling2D((2, 2)))
autoencoder.add(tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(tf.keras.layers.UpSampling2D((2, 2)))
autoencoder.add(tf.keras.layers.Conv2D(16, (3, 3), activation='relu'))
autoencoder.add(tf.keras.layers.UpSampling2D((2, 2)))
autoencoder.add(tf.keras.layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same'))
> Model: "sequential"
> _________________________________________________________________
> Layer (type).................Output Shape..............Param #
> =================================================================
> conv2d (Conv2D)..............(None, 28, 28, 16)........160
> _________________________________________________________________
> max_pooling2d (MaxPooling2D).(None, 14, 14, 16)........0
> _________________________________________________________________
> conv2d_1 (Conv2D)............(None, 14, 14, 8).........1160
> _________________________________________________________________
> max_pooling2d_1 (MaxPooling2.(None, 7, 7, 8)...........0
> _________________________________________________________________
> conv2d_2 (Conv2D)............(None, 4, 4, 8)...........584
> _________________________________________________________________
> flatten (Flatten)............(None, 128)...............0
> _________________________________________________________________
> reshape (Reshape)............(None, 4, 4, 8)...........0
> _________________________________________________________________
> conv2d_3 (Conv2D)............(None, 4, 4, 8)...........584
> _________________________________________________________________
> up_sampling2d (UpSampling2D).(None, 8, 8, 8)...........0
> _________________________________________________________________
> conv2d_4 (Conv2D)............(None, 8, 8, 8)...........584
> _________________________________________________________________
> up_sampling2d_1 (UpSampling2 (None, 16, 16, 8).........0
> _________________________________________________________________
> conv2d_5 (Conv2D)............(None, 14, 14, 16)........1168
> _________________________________________________________________
> up_sampling2d_2 (UpSampling2.(None, 28, 28, 16)........0
> _________________________________________________________________
> conv2d_6 (Conv2D)............(None, 28, 28, 1).........145
> =================================================================
> Total params: 4,385
> Trainable params: 4,385
> Non-trainable params: 0
> ______________________________________

I would approach the problem the other way:
# Encoder model:
encoder_input = Input(...)
# Encoder Hidden Layers
encoded = Dense()(...)
encoder_model = Model(inputs=[encoder_input], outputs=encoded)
# Decoder model:
decoder_input = Input(...)
# Decoder Hidden Layers
decoded = Dense()(...)
decoder_model = Model(inputs=[decoder_input], outputs=decoded)
And then the autoencoder could be defined as:
autoencoder = Model(inputs=[encoder_input], output=decoder_model(encoder_model))

Related

How to merge 2 trained model in keras?

Good evening everyone,
I have 5 classes and each one has 2000 images, I built 2 Models with different model names and that's my model code
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(5, activation=tf.nn.softmax)
], name="Model1")
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(train_images, train_labels,
batch_size=128, epochs=30, validation_split=0.2)
model.save('f3_1st_model_seg.h5')
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(5, activation=tf.nn.softmax)
], name="Model2")
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(train_images, train_labels,
batch_size=128, epochs=30, validation_split=0.2)
model.save('f3_2nd_model_seg.h5')
then I used this code to merge the 2 models
input_shape = [150, 150, 3]
model = keras.models.load_model('1st_model_seg.h5')
model.summary()
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 148, 148, 32) 896
max_pooling2d (MaxPooling2D (None, 74, 74, 32) 0
)
conv2d_1 (Conv2D) (None, 72, 72, 32) 9248
max_pooling2d_1 (MaxPooling (None, 36, 36, 32) 0
2D)
conv2d_2 (Conv2D) (None, 34, 34, 64) 18496
max_pooling2d_2 (MaxPooling (None, 17, 17, 64) 0
2D)
conv2d_3 (Conv2D) (None, 15, 15, 128) 73856
max_pooling2d_3 (MaxPooling (None, 7, 7, 128) 0
2D)
flatten (Flatten) (None, 6272) 0
dense (Dense) (None, 5) 31365
=================================================================
Total params: 133,861
Trainable params: 133,861
Non-trainable params: 0
model2 = keras.models.load_model('2nd_model_seg.h5')
model2.summary()
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 148, 148, 32) 896
max_pooling2d (MaxPooling2D (None, 74, 74, 32) 0
)
conv2d_1 (Conv2D) (None, 72, 72, 32) 9248
max_pooling2d_1 (MaxPooling (None, 36, 36, 32) 0
2D)
conv2d_2 (Conv2D) (None, 34, 34, 64) 18496
max_pooling2d_2 (MaxPooling (None, 17, 17, 64) 0
2D)
conv2d_3 (Conv2D) (None, 15, 15, 128) 73856
max_pooling2d_3 (MaxPooling (None, 7, 7, 128) 0
2D)
flatten (Flatten) (None, 6272) 0
dense (Dense) (None, 5) 31365
=================================================================
Total params: 133,861
Trainable params: 133,861
Non-trainable params: 0
def concat_horizontal(models, input_shape):
models_count = len(models)
hidden = []
input = tf.keras.layers.Input(shape=input_shape)
for i in range(models_count):
hidden.append(models[i](input))
output = tf.keras.layers.concatenate(hidden)
model = tf.keras.Model(inputs=input, outputs=output)
return model
new_model = concat_horizontal(
[model, model2], (input_shape))
new_model.save('f1_1st_merged_seg.h5')
new_model.summary()
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 150, 150, 3 0 []
)]
model1 (Sequential) (None, 5) 133861 ['input_1[0][0]']
model2 (Sequential) (None, 5) 133861 ['input_1[0][0]']
concatenate (Concatenate) (None, 10) 0 ['model1[0][0]',
'model2[0][0]']
==================================================================================================
Total params: 267,722
Trainable params: 267,722
Non-trainable params: 0
so after I tested the merged model I found some images getting classes 7 and 9 although I have only 5 classes and that's my code for prediction
class_names = ['A', 'B', 'C', D', 'E']
for img in os.listdir(path):
# predicting images
img2 = tf.keras.preprocessing.image.load_img(
os.path.join(path, img), target_size=(150, 150))
x = tf.keras.preprocessing.image.img_to_array(img2)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = np.argmax(model.predict(images), axis=-1)
y_out = class_names[classes[0]]
I got this error
y_out = class_names[classes[0]]
IndexError: list index out of range
for this case it could have been done even by sequential method, look you are trying to concatenate two output layers with 5 columns; so it would lead into increase classes from 5 to 10; try out to define these two models up to output layer (the flatten layer as the last layer defined for both these models) and then define final model with input layer, these two models, and concatenate layer and then the output layer with five units and activation;
so remove output layer
tf.keras.layers.Dense(5, activation=tf.nn.softmax)
from those two models, and implement it just as one layer after the output layer you have defined here
def concat_horizontal(models, input_shape):
models_count = len(models)
hidden = []
input = tf.keras.layers.Input(shape=input_shape)
for i in range(models_count):
hidden.append(models[i](input))
output = tf.keras.layers.concatenate(hidden)
output = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(output)
model = tf.keras.Model(inputs=input, outputs=output)
return model
But notice it would be better to define branch models based on functional API method for these cases

is this architecture an autoencoder

I want to create an autoencodre i build this architecture it works but i want to know if it is an autoencoder architecture
Encoder
layer = layers.Conv2D(16, (3, 3), activation="relu", padding="same",data_format = 'channels_first')(input)
layer = layers.MaxPooling2D((2, 2), padding="same",data_format = 'channels_first')(layer)
layer = layers.Conv2D(32, (3, 3), activation="relu", padding="same",data_format = 'channels_first')(layer)
layer = layers.MaxPooling2D((2, 2), padding="same",data_format = 'channels_first')(layer)
## Decoder
layer = layers.Conv2DTranspose(16, (3, 3), strides=2, activation="relu", padding="same",data_format = 'channels_first')(layer)
layer = layers.UpSampling2D((2,2))(layer)
layer = layers.Conv2DTranspose(32, (3, 3), strides=2, activation="relu", padding="same",data_format = 'channels_first')(layer)
layer = layers.UpSampling2D((2,2))(layer)
#layer = layers.UpSampling2D((2,2))(layer)
layer = layers.Flatten()(layer)
dense = layers.Dense(784, activation="sigmoid")
output = dense(layer)
There are some problems in your code:
You need an input layer to your model if you are using functional:
input = layers.Input(shape=(3, 192, 192))
In an autoencoder, the output of your model needs to have the same dimensions as the input. However, in your model your output is a dense vector (1D), while your input is obviously at least 2D (or 3D if you have channels, like in images).
You have specified the argument data_format = 'channels_first' which means that your input tensor has the channel dimension in the position 0. For example, if your input is an rgb image, it has shape (color_channel, width, height), instead of the more common (width, heigth, color_channel). That is ok, but 1) Make sure your images have channels first and 2) You need to pass the same argument on your upsampling layers.
With a couple of changes, the model looks like this:
## Encoder
input = layers.Input(shape=(3, 192, 192))
layer = layers.Conv2D(16, (3, 3), activation="relu", padding="same",data_format = 'channels_first')(input)
layer = layers.MaxPooling2D((2, 2), padding="same",data_format = 'channels_first')(layer)
layer = layers.Conv2D(32, (3, 3), activation="relu", padding="same",data_format = 'channels_first')(layer)
layer = layers.MaxPooling2D((2, 2), padding="same",data_format = 'channels_first')(layer)
## Decoder
layer = layers.Conv2DTranspose(16, (3, 3), strides=1, activation="relu", padding="same",data_format = 'channels_first')(layer)
layer = layers.UpSampling2D((2,2), data_format='channels_first')(layer)
layer = layers.Conv2DTranspose(32, (3, 3), strides=1, activation="relu", padding="same",data_format = 'channels_first')(layer)
layer = layers.UpSampling2D((2,2), data_format='channels_first')(layer)
output = layers.Conv2DTranspose(3, (3, 3), strides=1, activation="relu", padding="same",data_format = 'channels_first')(layer)
model = tf.keras.Model(inputs=input, outputs=output)
model.summary()
Model: "model_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) [(None, 3, 192, 192)] 0
_________________________________________________________________
conv2d_19 (Conv2D) (None, 16, 192, 192) 448
_________________________________________________________________
max_pooling2d_18 (MaxPooling (None, 16, 96, 96) 0
_________________________________________________________________
conv2d_20 (Conv2D) (None, 32, 96, 96) 4640
_________________________________________________________________
max_pooling2d_19 (MaxPooling (None, 32, 48, 48) 0
_________________________________________________________________
conv2d_transpose_19 (Conv2DT (None, 16, 48, 48) 4624
_________________________________________________________________
up_sampling2d_17 (UpSampling (None, 16, 96, 96) 0
_________________________________________________________________
conv2d_transpose_20 (Conv2DT (None, 32, 96, 96) 4640
_________________________________________________________________
up_sampling2d_18 (UpSampling (None, 32, 192, 192) 0
_________________________________________________________________
conv2d_transpose_21 (Conv2DT (None, 3, 192, 192) 867
=================================================================
Total params: 15,219
Trainable params: 15,219
Non-trainable params: 0

Error: expected conv3d_1_input to have 5 dimensions, but got array with shape (10, 224, 224, 3)

I'm trying to train a Neural Network on a dataset for liveness anti-spoofing. I have some videos in two folders named genuine and fake. I have extracted 10 frames of each video and saved them in two folders with aforementioned names under a new directory tarining.
--/training/
----/genuine/ #containes 10frame*300videos=3000images
----/fake/ #containes 10frame*800videos=8000images
I designed the following 3D Convent using Keras as my first try, but after running it, it throws the following exception:
from keras.preprocessing.image import ImageDataGenerator
from keras import Model, optimizers, activations, losses, regularizers, backend, Sequential
from keras.layers import Dense, MaxPooling3D, AveragePooling3D, Conv3D, Input, Flatten, BatchNormalization
BATCH_SIZE = 10
TARGET_SIZE = (224, 224)
train_datagen = ImageDataGenerator(rescale=1.0/255,
data_format='channels_last',
validation_split=0.2,
shear_range=0.0,
zoom_range=0,
horizontal_flip=False,
featurewise_center=False,
featurewise_std_normalization=False,
width_shift_range=False,
height_shift_range=False)
train_generator = train_datagen.flow_from_directory("./training/",
target_size=TARGET_SIZE,
batch_size=BATCH_SIZE,
class_mode='binary',
shuffle=False,
subset='training')
validation_generator = train_datagen.flow_from_directory("./training/",
target_size=TARGET_SIZE,
batch_size=BATCH_SIZE,
class_mode='binary',
shuffle=False,
subset='validation')
SHAPE = (10, 224, 224, 3)
model = Sequential()
model.add(Conv3D(filters=128, kernel_size=(1, 3, 3), data_format='channels_last', activation='relu', input_shape=(10, 224, 224, 3)))
model.add(MaxPooling3D(data_format='channels_last', pool_size=(1, 2, 2)))
model.add(Conv3D(filters=64, kernel_size=(2, 3, 3), activation='relu'))
model.add(MaxPooling3D(pool_size=(1, 2, 2)))
model.add(Conv3D(filters=32, kernel_size=(2, 3, 3), activation='relu'))
model.add(Conv3D(filters=32, kernel_size=(2, 3, 3), activation='relu'))
model.add(MaxPooling3D(pool_size=(1, 2, 2)))
model.add(Conv3D(filters=16, kernel_size=(2, 3, 3), activation='relu'))
model.add(Conv3D(filters=16, kernel_size=(2, 3, 3), activation='relu'))
model.add(AveragePooling3D())
model.add(BatchNormalization())
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(optimizer=optimizers.adam(), loss=losses.binary_crossentropy, metrics=['accuracy'])
model.fit_generator(train_generator, steps_per_epoch=train_generator.samples/train_generator.batch_size, epochs=5, validation_data=validation_generator, validation_steps=validation_generator.samples/validation_generator.batch_size)
model.save('3d.h5')
Here is the Error:
ValueError: Error when checking input: expected conv3d_1_input to have 5 dimensions, but got array with shape (10, 224, 224, 3)
And this is the output of model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv3d_1 (Conv3D) (None, 10, 222, 222, 128) 3584
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 10, 111, 111, 128) 0
_________________________________________________________________
conv3d_2 (Conv3D) (None, 9, 109, 109, 64) 147520
_________________________________________________________________
max_pooling3d_2 (MaxPooling3 (None, 9, 54, 54, 64) 0
_________________________________________________________________
conv3d_3 (Conv3D) (None, 8, 52, 52, 32) 36896
_________________________________________________________________
conv3d_4 (Conv3D) (None, 7, 50, 50, 32) 18464
_________________________________________________________________
max_pooling3d_3 (MaxPooling3 (None, 7, 25, 25, 32) 0
_________________________________________________________________
conv3d_5 (Conv3D) (None, 6, 23, 23, 16) 9232
_________________________________________________________________
conv3d_6 (Conv3D) (None, 5, 21, 21, 16) 4624
_________________________________________________________________
average_pooling3d_1 (Average (None, 2, 10, 10, 16) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 2, 10, 10, 16) 64
_________________________________________________________________
dense_1 (Dense) (None, 2, 10, 10, 32) 544
_________________________________________________________________
dense_2 (Dense) (None, 2, 10, 10, 1) 33
=================================================================
Total params: 220,961
Trainable params: 220,929
Non-trainable params: 32
__________________________________________________________
I'd appreciate any help to fix the exception. By the way, I'm using TensorFlow as backend if it helps to solve the problem.
As #thushv89 mentioned in the comments Keras has no build-in video generator which causes a lot of problems for those who will work with big video datasets. Therefore, I wrote a simple VideoDataGenerator which works almost as simple as ImageDataGenerator. The script could be found here on my github in case someone needs it in the future.

Keras (Tensorflow) Reshape Layer input error

I have a reshape input error and i don't know why.
The requested shape is 1058400, what is (1, 21168) multiplied with the batch size of 50.
What I do not understand is the apparent input size of 677376.
I don't know where this value is coming from. The layer before the reshape is a flatten layer and I directly use the shape of it when I define the target shape of the Reshape layer.
The Model compiles just fine and I use Tensorflow as the backend, so it is defined before runtime. But the error appears only when I put date in to it.
Code:
import numpy as np
import tensorflow as tf
import keras.backend as K
from keras import Model
from keras.layers import LSTM, Conv2D, Dense, Flatten, Input, Reshape
from keras.optimizers import Adam
config = tf.ConfigProto(allow_soft_placement=True)
sess = tf.Session(config=config)
K.set_session(sess)
input = Input(batch_shape=(50, 230, 230, 1))
conv1 = Conv2D(
filters=12, kernel_size=(7, 7), strides=(1, 1), padding="valid", activation="relu"
)(input)
conv2 = Conv2D(
filters=24, kernel_size=(5, 5), strides=(1, 1), padding="valid", activation="relu"
)(conv1)
conv3 = Conv2D(
filters=48, kernel_size=(3, 3), strides=(2, 2), padding="valid", activation="relu"
)(conv2)
conv4 = Conv2D(
filters=48, kernel_size=(5, 5), strides=(5, 5), padding="valid", activation="relu"
)(conv3)
conv_out = Flatten()(conv4)
conv_out = Reshape(target_shape=(1, int(conv_out.shape[1])))(conv_out)
conv_out = Dense(128, activation="relu")(conv_out)
rnn_1 = LSTM(128, stateful=True, return_sequences=True)(conv_out)
rnn_2 = LSTM(128, stateful=True, return_sequences=True)(rnn_1)
rnn_3 = LSTM(128, stateful=True, return_sequences=False)(rnn_2)
value = Dense(1, activation="linear")(rnn_3)
policy = Dense(5, activation="softmax")(rnn_3)
model = Model(inputs=input, outputs=[value, policy])
adam = Adam(lr=0.001)
model.compile(loss="mse", optimizer=adam)
model.summary()
out = model.predict(np.random.randint(1, 5, size=(50, 230, 230, 1)))
print(out)
Summary:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (50, 230, 230, 1) 0
__________________________________________________________________________________________________
conv2d (Conv2D) (50, 224, 224, 12) 600 input_1[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (50, 220, 220, 24) 7224 conv2d[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (50, 109, 109, 48) 10416 conv2d_1[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (50, 21, 21, 48) 57648 conv2d_2[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (50, 21168) 0 conv2d_3[0][0]
__________________________________________________________________________________________________
reshape (Reshape) (50, 1, 21168) 0 flatten[0][0]
__________________________________________________________________________________________________
dense (Dense) (50, 1, 128) 2709632 reshape[0][0]
__________________________________________________________________________________________________
lstm (LSTM) (50, 1, 128) 131584 dense[0][0]
__________________________________________________________________________________________________
lstm_1 (LSTM) (50, 1, 128) 131584 lstm[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM) (50, 128) 131584 lstm_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (50, 1) 129 lstm_2[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (50, 5) 645 lstm_2[0][0]
==================================================================================================
Total params: 3,181,046
Trainable params: 3,181,046
Non-trainable params: 0
EDIT:
Error for the above code:
Traceback (most recent call last):
File "foo.py", line 45, in <module>
out = model.predict(np.random.randint(1, 5, size=(50, 230, 230, 1)))
File "/home/vyz/.conda/envs/stackoverflow/lib/python3.6/site-packages/keras/engine/training.py", line 1157, in predict
'Batch size: ' + str(batch_size) + '.')
ValueError: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 50 samples. Batch size: 32.
Edition of question
Important: I have edited your question so it actually runs and represents your problem. Input should take batch_shape as provided currently. Next time please make sure your code works, will be easier.
Solution
Solution is quite simple; your batch passed to the network has wrong dimension.
677376 / 21168 = 32 it is the default size of the batch which is expected by predict. You are supposed to specify it if it's different (50 in your case), like this:
out = model.predict(np.random.randint(1, 5, size=(50, 230, 230, 1)), batch_size=50)
Everything should work fine now and remember to specify batch size if you want it hardcoded.

keras-tensorflow CAE dimension mismatch

I'm basically following this guide to build convolutional autoencoder with tensorflow backend. The main difference to the guide is that my data is 257x257 grayscale images. The following code:
TRAIN_FOLDER = 'data/OIRDS_gray/'
EPOCHS = 10
SHAPE = (257,257,1)
FILELIST = os.listdir(TRAIN_FOLDER)
def loadTrainData():
train_data = []
for fn in FILELIST:
img = misc.imread(TRAIN_FOLDER + fn)
img = np.reshape(img,(len(img[0,:]), len(img[:,0]), SHAPE[2]))
if img.shape != SHAPE:
print "image shape mismatch!"
print "Expected: "
print SHAPE
print "but got:"
print img.shape
sys.exit()
train_data.append (img)
train_data = np.array(train_data)
train_data = train_data.astype('float32')/ 255
return np.array(train_data)
def createModel():
input_img = Input(shape=SHAPE)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid',padding='same')(x)
return Model(input_img, decoded)
x_train = loadTrainData()
autoencoder = createModel()
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
print x_train.shape
autoencoder.summary()
# Run the network
autoencoder.fit(x_train, x_train,
epochs=EPOCHS,
batch_size=128,
shuffle=True)
gives me a error:
ValueError: Error when checking target: expected conv2d_7 to have shape (None, 260, 260, 1) but got array with shape (859, 257, 257, 1)
As you can see this is not the standard problem with theano/tensorflow backend dim ordering, but something else. I checked that my data is what it's supposed to be with print x_train.shape:
(859, 257, 257, 1)
And I also run autoencoder.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 257, 257, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 257, 257, 16) 160
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 129, 129, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 129, 129, 8) 1160
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 65, 65, 8) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 65, 65, 8) 584
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 33, 33, 8) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 33, 33, 8) 584
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 66, 66, 8) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 66, 66, 8) 584
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 132, 132, 8) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 132, 132, 16) 1168
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 264, 264, 16) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 264, 264, 1) 145
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________
Now I'm not exactly sure where the problem is, but it does look like things go wrong around conv2d_6 (Param # too high). I do know how CAE's work on principle, but I'm not that familiar with the exact technical details yet and I have tried to solve this mainly by messing with deconvolution padding (instead of same, using valid). The closes I got to dims matching was (None, 258, 258, 1). I achieved this by blindly trying different combinations of padding on deconvolution side, not really a smart way to solve a problem...
At this point I'm at a loss, and any help would be appreciated
Since your input and output data are the same, your final output shape should be the same as the input shape.
The last convolutional layer should have shape (None, 257,257,1).
The problem is happening because you have an odd number as the sizes of the images (257).
When you apply MaxPooling, it should divide the number by two, so it chooses rounding either up or down (it's going up, see the 129, coming from 257/2 = 128.5)
Later, when you do UpSampling, the model doesn't know the current dimensions were rounded, it simply doubles the value. This happening in sequence is adding 7 pixels to the final result.
You could try either cropping the result or padding the input.
I usually work with images of compatible sizes. If you have 3 MaxPooling layers, your size should be a multiple of 2³. The answer is 264.
Padding the input data directly:
x_train = numpy.lib.pad(x_train,((0,0),(3,4),(3,4),(0,0)),mode='constant')
This will require that SHAPE=(264,264,1)
Padding inside the model:
import keras.backend as K
input_img = Input(shape=SHAPE)
x = Lambda(lambda x: K.spatial_2d_padding(x, padding=((3, 4), (3, 4))), output_shape=(264,264,1))(input_img)
Cropping the results:
This will be required in any case where you do not change the actual data (numpy array) directly.
decoded = Lambda(lambda x: x[:,3:-4,3:-4,:], output_shape=SHAPE)(x)