ValueError: `logits` and `labels` must have the same shape, received ((None, 10) vs (None, 1)) - tensorflow

I am running an Involution Model (based of this example), and I am constantly running into errors during the training stage. This is my error:
ValueError: `logits` and `labels` must have the same shape, received ((None, 10) vs (None, 1)).
Below is the relevant code for dataset loading:
train_datagen = ImageDataGenerator(
test_datagen = ImageDataGenerator(rescale=1./255)
train_ds = train_datagen.flow_from_directory(
target_size=(150, 150),
test_ds = test_datagen.flow_from_directory(
target_size=(150, 150),
And this is the code for training:
print("building the involution model...")
inputs = keras.Input(shape=(224, 224, 3))
x, _ = Involution(channel=3, group_number=1, kernel_size=3, stride=1, reduction_ratio=2, name="inv_1")(inputs)
x = keras.layers.ReLU()(x)
x = keras.layers.MaxPooling2D((2, 2))(x)
x, _ = Involution(
channel=3, group_number=1, kernel_size=3, stride=1, reduction_ratio=2, name="inv_2")(x)
x = keras.layers.ReLU()(x)
x = keras.layers.MaxPooling2D((2, 2))(x)
x, _ = Involution(
channel=3, group_number=1, kernel_size=3, stride=1, reduction_ratio=2, name="inv_3")(x)
x = keras.layers.ReLU()(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(64, activation="relu")(x)
outputs = keras.layers.Dense(10)(x)
inv_model = keras.Model(inputs=[inputs], outputs=[outputs], name="inv_model")
print("compiling the involution model...")
print("inv model training...")
inv_hist =, epochs=20, validation_data=test_ds)`
The model itself the same used by Keras, and I have not changed anything except to use my own dataset instead of the CIFAR dataset (model works for me with this dataset). So I am sure there is an error in my data loading, but I am unable to identify what that is.
Model Summary:
Model: "inv_model"
Layer (type) Output Shape Param #
input_14 (InputLayer) [(None, 224, 224, 3)] 0
inv_1 (Involution) ((None, 224, 224, 3), 26
(None, 224, 224, 9, 1,
re_lu_39 (ReLU) (None, 224, 224, 3) 0
max_pooling2d_26 (MaxPoolin (None, 112, 112, 3) 0
inv_2 (Involution) ((None, 112, 112, 3), 26
(None, 112, 112, 9, 1,
re_lu_40 (ReLU) (None, 112, 112, 3) 0
max_pooling2d_27 (MaxPoolin (None, 56, 56, 3) 0
inv_3 (Involution) ((None, 56, 56, 3), 26
(None, 56, 56, 9, 1, 1)
re_lu_41 (ReLU) (None, 56, 56, 3) 0
flatten_15 (Flatten) (None, 9408) 0
dense_26 (Dense) (None, 64) 602176
dense_27 (Dense) (None, 10) 650

When you called the train_datagen.flow_from_directory() function, you used class_mode='binary' which means you will have the labels of your images as 0 and 1 only, whereas you are have total 10 predictions i.e. 10 neurons in your final output layer. Hence the labels and logits dosen't match.
Solution: Use class_mode='categorical' which means that there will be as many labels as the number of classes. Do the same in test_datagen as well.


How to merge 2 trained model in keras?

Good evening everyone,
I have 5 classes and each one has 2000 images, I built 2 Models with different model names and that's my model code
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Dense(5, activation=tf.nn.softmax)
], name="Model1")
loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history =, train_labels,
batch_size=128, epochs=30, validation_split=0.2)'f3_1st_model_seg.h5')
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Dense(5, activation=tf.nn.softmax)
], name="Model2")
loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history =, train_labels,
batch_size=128, epochs=30, validation_split=0.2)'f3_2nd_model_seg.h5')
then I used this code to merge the 2 models
input_shape = [150, 150, 3]
model = keras.models.load_model('1st_model_seg.h5')
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 148, 148, 32) 896
max_pooling2d (MaxPooling2D (None, 74, 74, 32) 0
conv2d_1 (Conv2D) (None, 72, 72, 32) 9248
max_pooling2d_1 (MaxPooling (None, 36, 36, 32) 0
conv2d_2 (Conv2D) (None, 34, 34, 64) 18496
max_pooling2d_2 (MaxPooling (None, 17, 17, 64) 0
conv2d_3 (Conv2D) (None, 15, 15, 128) 73856
max_pooling2d_3 (MaxPooling (None, 7, 7, 128) 0
flatten (Flatten) (None, 6272) 0
dense (Dense) (None, 5) 31365
Total params: 133,861
Trainable params: 133,861
Non-trainable params: 0
model2 = keras.models.load_model('2nd_model_seg.h5')
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 148, 148, 32) 896
max_pooling2d (MaxPooling2D (None, 74, 74, 32) 0
conv2d_1 (Conv2D) (None, 72, 72, 32) 9248
max_pooling2d_1 (MaxPooling (None, 36, 36, 32) 0
conv2d_2 (Conv2D) (None, 34, 34, 64) 18496
max_pooling2d_2 (MaxPooling (None, 17, 17, 64) 0
conv2d_3 (Conv2D) (None, 15, 15, 128) 73856
max_pooling2d_3 (MaxPooling (None, 7, 7, 128) 0
flatten (Flatten) (None, 6272) 0
dense (Dense) (None, 5) 31365
Total params: 133,861
Trainable params: 133,861
Non-trainable params: 0
def concat_horizontal(models, input_shape):
models_count = len(models)
hidden = []
input = tf.keras.layers.Input(shape=input_shape)
for i in range(models_count):
output = tf.keras.layers.concatenate(hidden)
model = tf.keras.Model(inputs=input, outputs=output)
return model
new_model = concat_horizontal(
[model, model2], (input_shape))'f1_1st_merged_seg.h5')
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) [(None, 150, 150, 3 0 []
model1 (Sequential) (None, 5) 133861 ['input_1[0][0]']
model2 (Sequential) (None, 5) 133861 ['input_1[0][0]']
concatenate (Concatenate) (None, 10) 0 ['model1[0][0]',
Total params: 267,722
Trainable params: 267,722
Non-trainable params: 0
so after I tested the merged model I found some images getting classes 7 and 9 although I have only 5 classes and that's my code for prediction
class_names = ['A', 'B', 'C', D', 'E']
for img in os.listdir(path):
# predicting images
img2 = tf.keras.preprocessing.image.load_img(
os.path.join(path, img), target_size=(150, 150))
x = tf.keras.preprocessing.image.img_to_array(img2)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = np.argmax(model.predict(images), axis=-1)
y_out = class_names[classes[0]]
I got this error
y_out = class_names[classes[0]]
IndexError: list index out of range
for this case it could have been done even by sequential method, look you are trying to concatenate two output layers with 5 columns; so it would lead into increase classes from 5 to 10; try out to define these two models up to output layer (the flatten layer as the last layer defined for both these models) and then define final model with input layer, these two models, and concatenate layer and then the output layer with five units and activation;
so remove output layer
tf.keras.layers.Dense(5, activation=tf.nn.softmax)
from those two models, and implement it just as one layer after the output layer you have defined here
def concat_horizontal(models, input_shape):
models_count = len(models)
hidden = []
input = tf.keras.layers.Input(shape=input_shape)
for i in range(models_count):
output = tf.keras.layers.concatenate(hidden)
output = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(output)
model = tf.keras.Model(inputs=input, outputs=output)
return model
But notice it would be better to define branch models based on functional API method for these cases

Keras-Tensorflow ValueError: during whehn using Jaccard or IOU

I am starting my journey in deep learning and relatively new to this topic. Hoping someone could help me out as I am stuck here for a long time.
I am trying to train deeplabv3+ architecture-based model for semantic segmentation when I faced the following error. Earlier I was using categorical cross-entropy loss and metric as accuracy where it was compiling just fine.
However, soon I realized that it is more appropriate to use mIOU or Jaccard index as metrics but am facing the following error. ( I think it is because though the "values" are the same one is a tensor? I might be wrong)
the error:
ValueError: in user code:
File "/usr/local/lib/python3.7/dist-packages/keras/engine/", line 1021, in train_function *
return step_function(self, iterator)
File "<ipython-input-24-4f4d58e1a657>", line 11, in jacard_coef_loss *
return -jacard_coef(y_true, y_pred) # -1 ultiplied as we want to minimize this value as loss function
File "<ipython-input-24-4f4d58e1a657>", line 6, in jacard_coef *
intersection = K.sum(y_true_f * y_pred_f)
ValueError: Dimensions must be equal, but are 1048576 and 4194304 for '{{node jacard_coef_loss/mul}} = Mul[T=DT_FLOAT](jacard_coef_loss/Reshape, jacard_coef_loss/Reshape_1)' with input shapes: [1048576], [4194304].
my data set shape :
Train Dataset: <BatchDataset element_spec=(TensorSpec(shape=(4, 512, 512, 3), dtype=tf.float32, name=None), TensorSpec(shape=(4, 512, 512, 1), dtype=tf.float32, name=None))>
Val Dataset: <BatchDataset element_spec=(TensorSpec(shape=(4, 512, 512, 3), dtype=tf.float32, name=None), TensorSpec(shape=(4, 512, 512, 1), dtype=tf.float32, name=None))>
Code for model creation- referred Keras deeplabv3+ sample
def convolution_block(
x = layers.Conv2D(
x = layers.BatchNormalization()(x)
return tf.nn.relu(x)
def DilatedSpatialPyramidPooling(dspp_input):
dims = dspp_input.shape
x = layers.AveragePooling2D(pool_size=(dims[-3], dims[-2]))(dspp_input)
x = convolution_block(x, kernel_size=1, use_bias=True)
out_pool = layers.UpSampling2D(
size=(dims[-3] // x.shape[1], dims[-2] // x.shape[2]), interpolation="bilinear",
out_1 = convolution_block(dspp_input, kernel_size=1, dilation_rate=1)
out_6 = convolution_block(dspp_input, kernel_size=3, dilation_rate=6)
out_12 = convolution_block(dspp_input, kernel_size=3, dilation_rate=12)
out_18 = convolution_block(dspp_input, kernel_size=3, dilation_rate=18)
x = layers.Concatenate(axis=-1)([out_pool, out_1, out_6, out_12, out_18])
output = convolution_block(x, kernel_size=1)
return output
def DeeplabV3Plus(image_size, num_classes):
model_input = keras.Input(shape=(image_size, image_size, 3))
resnet50 = keras.applications.ResNet50(
weights="imagenet", include_top=False, input_tensor=model_input
x = resnet50.get_layer("conv4_block6_2_relu").output
x = DilatedSpatialPyramidPooling(x)
input_a = layers.UpSampling2D(
size=(image_size // 4 // x.shape[1], image_size // 4 // x.shape[2]),
input_b = resnet50.get_layer("conv2_block3_2_relu").output
input_b = convolution_block(input_b, num_filters=48, kernel_size=1)
x = layers.Concatenate(axis=-1)([input_a, input_b])
x = convolution_block(x)
x = convolution_block(x)
x = layers.UpSampling2D(
size=(image_size // x.shape[1], image_size // x.shape[2]),
model_output = layers.Conv2D(num_classes, kernel_size=(1, 1), padding="same")(x)
return keras.Model(inputs=model_input, outputs=model_output)
model = DeeplabV3Plus(image_size=IMAGE_SIZE, num_classes=NUM_CLASSES)
the model:
Layer (type) Output Shape Param # Connected to
input_3 (InputLayer) [(None, 512, 512, 3 0 []
conv1_pad (ZeroPadding2D) (None, 518, 518, 3) 0 ['input_3[0][0]']
conv1_conv (Conv2D) (None, 256, 256, 64 9472 ['conv1_pad[0][0]']
conv1_bn (BatchNormalization) (None, 256, 256, 64 256 ['conv1_conv[0][0]']
conv1_relu (Activation) (None, 256, 256, 64 0 ['conv1_bn[0][0]']
pool1_pad (ZeroPadding2D) (None, 258, 258, 64 0 ['conv1_relu[0][0]']
pool1_pool (MaxPooling2D) (None, 128, 128, 64 0 ['pool1_pad[0][0]']
conv2_block1_1_conv (Conv2D) (None, 128, 128, 64 4160 ['pool1_pool[0][0]']
up_sampling2d_7 (UpSampling2D) (None, 128, 128, 25 0 ['tf.nn.relu_23[0][0]']
tf.nn.relu_24 (TFOpLambda) (None, 128, 128, 48 0 ['batch_normalization_24[0][0]']
concatenate_5 (Concatenate) (None, 128, 128, 30 0 ['up_sampling2d_7[0][0]',
4) 'tf.nn.relu_24[0][0]']
conv2d_27 (Conv2D) (None, 128, 128, 25 700416 ['concatenate_5[0][0]']
batch_normalization_25 (BatchN (None, 128, 128, 25 1024 ['conv2d_27[0][0]']
ormalization) 6)
tf.nn.relu_25 (TFOpLambda) (None, 128, 128, 25 0 ['batch_normalization_25[0][0]']
conv2d_28 (Conv2D) (None, 128, 128, 25 589824 ['tf.nn.relu_25[0][0]']
batch_normalization_26 (BatchN (None, 128, 128, 25 1024 ['conv2d_28[0][0]']
ormalization) 6)
tf.nn.relu_26 (TFOpLambda) (None, 128, 128, 25 0 ['batch_normalization_26[0][0]']
up_sampling2d_8 (UpSampling2D) (None, 512, 512, 25 0 ['tf.nn.relu_26[0][0]']
conv2d_29 (Conv2D) (None, 512, 512, 4) 1028 ['up_sampling2d_8[0][0]']
Total params: 11,853,124
Trainable params: 11,820,388
Non-trainable params: 32,736 }
compiler and fitter which error occurs when I wanna fit the model
from keras import backend as K
def jacard_coef(y_true, y_pred):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (intersection + 1.0) / (K.sum(y_true_f) + K.sum(y_pred_f) - intersection + 1.0)
def jacard_coef_loss(y_true, y_pred):
return -jacard_coef(y_true, y_pred)
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001),loss = jacard_coef_loss, metrics = jacard_coef)
history =, validation_data=val_dataset, epochs=10)

Change Model input_shape but got an : ValueError: Input 0 of layer dense_44 is incompatible with the layer

I am new to python and DL.
Please help me to correct the error.
This class was originly created with mnist dataset (28 x 28) I tried to adapt it to my work and the image that I am using are (224 x 224). I changed the input image shape but still have the incompatible shape image and the model still use the old shapes of mnist.
Knowng that the that I am using: X_train=(676, 224, 224)/y_train(676,)/X_test(170, 224, 224)/y_test(170,)
The code :
from __future__ import print_function, division
from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout, multiply, concatenate
from keras.layers import BatchNormalization, Activation, Embedding, ZeroPadding2D, Lambda
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam
from keras.utils import to_categorical
import keras.backend as K
import matplotlib.pyplot as plt
import numpy as np
class INFOGAN():
def __init__(self):
self.img_rows = 224
self.img_cols = 224
self.channels = 1
self.num_classes = 3
self.img_shape = (self.img_rows, self.img_cols, self.channels)
self.latent_dim = 72
optimizer = Adam(0.0002, 0.5)
losses = ['binary_crossentropy', self.mutual_info_loss]
# Build and the discriminator and recognition network
self.discriminator, self.auxilliary = self.build_disk_and_q_net()
# Build and compile the recognition network Q
# Build the generator
self.generator = self.build_generator()
# The generator takes noise and the target label as input
# and generates the corresponding digit of that label
gen_input = Input(shape=(self.latent_dim,))
img = self.generator(gen_input)
# For the combined model we will only train the generator
self.discriminator.trainable = False
# The discriminator takes generated image as input and determines validity
valid = self.discriminator(img)
# The recognition network produces the label
target_label = self.auxilliary(img)
# The combined model (stacked generator and discriminator)
self.combined = Model(gen_input, [valid, target_label])
def build_generator(self):
model = Sequential()
model.add(Dense(128 * 7 * 7, activation="relu", input_dim=self.latent_dim))
model.add(Reshape((7, 7, 128)))
model.add(Conv2D(128, kernel_size=3, padding="same"))
model.add(Conv2D(64, kernel_size=3, padding="same"))
model.add(Conv2D(self.channels, kernel_size=3, padding='same'))
gen_input = Input(shape=(self.latent_dim,))
img = model(gen_input)
return Model(gen_input, img)
def build_disk_and_q_net(self):
img = Input(shape=self.img_shape)
# Shared layers between discriminator and recognition network
model = Sequential()
model.add(Conv2D(64, kernel_size=3, strides=2, input_shape=self.img_shape, padding="same"))
model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
model.add(Conv2D(256, kernel_size=3, strides=2, padding="same"))
model.add(Conv2D(512, kernel_size=3, strides=2, padding="same"))
img_embedding = model(img)
# Discriminator
validity = Dense(1, activation='sigmoid')(img_embedding)
# Recognition
q_net = Dense(128, activation='relu')(img_embedding)
label = Dense(self.num_classes, activation='softmax')(q_net)
# Return discriminator and recognition network
return Model(img, validity), Model(img, label)
def mutual_info_loss(self, c, c_given_x):
"""The mutual information metric we aim to minimize"""
eps = 1e-8
conditional_entropy = K.mean(- K.sum(K.log(c_given_x + eps) * c, axis=1))
entropy = K.mean(- K.sum(K.log(c + eps) * c, axis=1))
return conditional_entropy + entropy
def sample_generator_input(self, batch_size):
# Generator inputs
sampled_noise = np.random.normal(0, 1, (batch_size, 62))
sampled_labels = np.random.randint(0, self.num_classes, batch_size).reshape(-1, 1)
sampled_labels = to_categorical(sampled_labels, num_classes=self.num_classes)
return sampled_noise, sampled_labels
def train(self, epochs, batch_size=128, sample_interval=50):
# Rescale -1 to 1
X_train = (X_train.astype(np.float32) - 127.5) / 127.5
X_train = np.expand_dims(X_train, axis=3)
y_train = y_train.reshape(-1, 1)
# Adversarial ground truths
valid = np.ones((batch_size, 1))
fake = np.zeros((batch_size, 1))
for epoch in range(epochs):
# ---------------------
# Train Discriminator
# ---------------------
# Select a random half batch of images
idx = np.random.randint(0, X_train.shape[0], batch_size)
imgs = X_train[idx]
# Sample noise and categorical labels
sampled_noise, sampled_labels = self.sample_generator_input(batch_size)
gen_input = np.concatenate((sampled_noise, sampled_labels), axis=1)
# Generate a half batch of new images
gen_imgs = self.generator.predict(gen_input)
# Train on real and generated data
d_loss_real = self.discriminator.train_on_batch(imgs, valid)
d_loss_fake = self.discriminator.train_on_batch(gen_imgs, fake)
# Avg. loss
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
# ---------------------
# Train Generator and Q-network
# ---------------------
g_loss = self.combined.train_on_batch(gen_input, [valid, sampled_labels])
# Plot the progress
print ("%d [D loss: %.2f, acc.: %.2f%%] [Q loss: %.2f] [G loss: %.2f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss[1], g_loss[2]))
# If at save interval => save generated image samples
if epoch % sample_interval == 0:
def sample_images(self, epoch):
r, c = 10, 10
fig, axs = plt.subplots(r, c)
for i in range(c):
sampled_noise, _ = self.sample_generator_input(c)
label = to_categorical(np.full(fill_value=i, shape=(r,1)), num_classes=self.num_classes)
gen_input = np.concatenate((sampled_noise, label), axis=1)
gen_imgs = self.generator.predict(gen_input)
gen_imgs = 0.5 * gen_imgs + 0.5
for j in range(r):
axs[j,i].imshow(gen_imgs[j,:,:,0], cmap='gray')
fig.savefig("images/%d.png" % epoch)
def save_model(self):
def save(model, model_name):
model_path = "saved_model/%s.json" % model_name
weights_path = "saved_model/%s_weights.hdf5" % model_name
options = {"file_arch": model_path,
"file_weight": weights_path}
json_string = model.to_json()
open(options['file_arch'], 'w').write(json_string)
save(self.generator, "generator")
save(self.discriminator, "discriminator")
if __name__ == '__main__':
infogan = INFOGAN()
infogan.train(epochs=50000, batch_size=128, sample_interval=50)
the error :
Model: "sequential_23"
Layer (type) Output Shape Param #
dense_47 (Dense) (None, 6272) 457856
reshape_11 (Reshape) (None, 7, 7, 128) 0
batch_normalization_87 (Batc (None, 7, 7, 128) 512
up_sampling2d_40 (UpSampling (None, 14, 14, 128) 0
conv2d_99 (Conv2D) (None, 14, 14, 128) 147584
activation_42 (Activation) (None, 14, 14, 128) 0
batch_normalization_88 (Batc (None, 14, 14, 128) 512
up_sampling2d_41 (UpSampling (None, 28, 28, 128) 0
conv2d_100 (Conv2D) (None, 28, 28, 64) 73792
activation_43 (Activation) (None, 28, 28, 64) 0
batch_normalization_89 (Batc (None, 28, 28, 64) 256
conv2d_101 (Conv2D) (None, 28, 28, 1) 577
activation_44 (Activation) (None, 28, 28, 1) 0
Total params: 681,089
Trainable params: 680,449
Non-trainable params: 640
WARNING:tensorflow:Model was constructed with shape (None, 224, 224, 1) for input Tensor("input_22:0", shape=(None, 224, 224, 1), dtype=float32), but it was called on an input with incompatible shape (None, 28, 28, 1).
WARNING:tensorflow:Model was constructed with shape (None, 224, 224, 1) for input Tensor("conv2d_95_input:0", shape=(None, 224, 224, 1), dtype=float32), but it was called on an input with incompatible shape (None, 28, 28, 1).
ValueError Traceback (most recent call last)
<ipython-input-45-60a1c6b0bc8b> in <module>()
226 if __name__ == '__main__':
--> 227 infogan = INFOGAN()
228 infogan.train(epochs=50000, batch_size=128, sample_interval=50)
7 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/ in assert_input_compatibility(input_spec, inputs, layer_name)
214 ' incompatible with the layer: expected axis ' + str(axis) +
215 ' of input shape to have value ' + str(value) +
--> 216 ' but received input with shape ' + str(shape))
217 # Check shape.
218 if spec.shape is not None:
ValueError: Input 0 of layer dense_44 is incompatible with the layer: expected axis -1 of input shape to have value 115200 but received input with shape [None, 2048]
You forgot to change the architecture of the generator. The generator's output shape and the discriminator's input shape have to match. That's what causing the error.
To fix it, you need to fix the architecture. The generator produces images in shape (28, 28, 1), but you want (224, 224, 1). The shape the architecture produces is the result of the architecture itself and its parameters.
So I added two Upsampling layers and changed the size of the other layers to match the discriminator's output.
Also, I removed ZeroPadding2D layer from discriminator, since it made the shape odd (15, 15, ..), and therefore it was impossible to match the same size in the generator.
Here's the code:
def build_generator(self):
model = Sequential()
model.add(Dense(512 * 14 * 14, activation="relu", input_dim=self.latent_dim))
model.add(Reshape((14, 14, 512)))
model.add(Conv2D(256, kernel_size=3, padding="same"))
model.add(Conv2D(128, kernel_size=3, padding="same"))
model.add(Conv2D(64, kernel_size=3, padding="same"))
model.add(Conv2D(self.channels, kernel_size=3, padding='same'))
gen_input = Input(shape=(self.latent_dim,))
img = model(gen_input)
return Model(gen_input, img)
def build_disk_and_q_net(self):
img = Input(shape=self.img_shape)
# Shared layers between discriminator and recognition network
model = Sequential()
model.add(Conv2D(64, kernel_size=3, strides=2, input_shape=self.img_shape, padding="same"))
model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
model.add(Conv2D(256, kernel_size=3, strides=2, padding="same"))
model.add(Conv2D(512, kernel_size=3, strides=2, padding="same"))
img_embedding = model(img)
# Discriminator
validity = Dense(1, activation='sigmoid')(img_embedding)
# Recognition
q_net = Dense(128, activation='relu')(img_embedding)
label = Dense(self.num_classes, activation='softmax')(q_net)
# Return discriminator and recognition network
return Model(img, validity), Model(img, label)
And the summaries:
Model: "sequential_14"
Layer (type) Output Shape Param #
conv2d_53 (Conv2D) (None, 112, 112, 64) 640
leaky_re_lu_28 (LeakyReLU) (None, 112, 112, 64) 0
dropout_28 (Dropout) (None, 112, 112, 64) 0
conv2d_54 (Conv2D) (None, 56, 56, 128) 73856
leaky_re_lu_29 (LeakyReLU) (None, 56, 56, 128) 0
dropout_29 (Dropout) (None, 56, 56, 128) 0
batch_normalization_46 (Batc (None, 56, 56, 128) 512
conv2d_55 (Conv2D) (None, 28, 28, 256) 295168
leaky_re_lu_30 (LeakyReLU) (None, 28, 28, 256) 0
dropout_30 (Dropout) (None, 28, 28, 256) 0
batch_normalization_47 (Batc (None, 28, 28, 256) 1024
conv2d_56 (Conv2D) (None, 14, 14, 512) 1180160
leaky_re_lu_31 (LeakyReLU) (None, 14, 14, 512) 0
dropout_31 (Dropout) (None, 14, 14, 512) 0
batch_normalization_48 (Batc (None, 14, 14, 512) 2048
flatten_7 (Flatten) (None, 100352) 0
Total params: 1,553,408
Trainable params: 1,551,616
Non-trainable params: 1,792
Model: "sequential_15"
Layer (type) Output Shape Param #
dense_31 (Dense) (None, 100352) 7325696
reshape_7 (Reshape) (None, 14, 14, 512) 0
batch_normalization_49 (Batc (None, 14, 14, 512) 2048
up_sampling2d_18 (UpSampling (None, 28, 28, 512) 0
conv2d_57 (Conv2D) (None, 28, 28, 256) 1179904
activation_25 (Activation) (None, 28, 28, 256) 0
batch_normalization_50 (Batc (None, 28, 28, 256) 1024
up_sampling2d_19 (UpSampling (None, 56, 56, 256) 0
conv2d_58 (Conv2D) (None, 56, 56, 128) 295040
activation_26 (Activation) (None, 56, 56, 128) 0
batch_normalization_51 (Batc (None, 56, 56, 128) 512
up_sampling2d_20 (UpSampling (None, 112, 112, 128) 0
conv2d_59 (Conv2D) (None, 112, 112, 64) 73792
activation_27 (Activation) (None, 112, 112, 64) 0
batch_normalization_52 (Batc (None, 112, 112, 64) 256
up_sampling2d_21 (UpSampling (None, 224, 224, 64) 0
conv2d_60 (Conv2D) (None, 224, 224, 1) 577
activation_28 (Activation) (None, 224, 224, 1) 0
Total params: 8,878,849
Trainable params: 8,876,929
Non-trainable params: 1,920
Because you decreased the number of classes from 10 to 3, therefore you have to change the latent_dim parameter to 65. Notice that the method sample_generator_input generates noise of size 62 and labels of size number of classes, which then concatenates (size becomes 62 + 3 = 65).
The generator is defined to accept input_dim of self.latent_dim, it would be appropriate to calculate the latent_dim in the constructor based on the number of classes instead: self.latent_dim = 62 + self.num_classes.
Moreover, in method sample_images, there are hardcoded magical numbers.
How can one know what it means? I mean this: r, c = 10, 10.
I assume that it means number of classes. Since you changed it from 10 to 3 in your example, I suggest you change the line to:
r, c = self.num_classes, self.num_classes
Overall, the code is badly written and if you change a constant then it all breaks. Be careful when copying full pieces of code. Make sure you understand each and every part of it before copying.
Here's the full code:
from __future__ import print_function, division
from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout, multiply, concatenate
from keras.layers import BatchNormalization, Activation, Embedding, ZeroPadding2D, Lambda
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam
from keras.utils import to_categorical
import keras.backend as K
import matplotlib.pyplot as plt
import numpy as np
class INFOGAN():
def __init__(self):
self.img_rows = 224
self.img_cols = 224
self.channels = 1
self.num_classes = 3
self.img_shape = (self.img_rows, self.img_cols, self.channels)
self.latent_dim = 62 + self.num_classes
optimizer = Adam(0.0002, 0.5)
losses = ['binary_crossentropy', self.mutual_info_loss]
# Build and the discriminator and recognition network
self.discriminator, self.auxilliary = self.build_disk_and_q_net()
# Build and compile the recognition network Q
# Build the generator
self.generator = self.build_generator()
# The generator takes noise and the target label as input
# and generates the corresponding digit of that label
gen_input = Input(shape=(self.latent_dim,))
img = self.generator(gen_input)
# For the combined model we will only train the generator
self.discriminator.trainable = False
# The discriminator takes generated image as input and determines validity
valid = self.discriminator(img)
# The recognition network produces the label
target_label = self.auxilliary(img)
# The combined model (stacked generator and discriminator)
self.combined = Model(gen_input, [valid, target_label])
def build_generator(self):
model = Sequential()
model.add(Dense(512 * 14 * 14, activation="relu", input_dim=self.latent_dim))
model.add(Reshape((14, 14, 512)))
model.add(Conv2D(256, kernel_size=3, padding="same"))
model.add(Conv2D(128, kernel_size=3, padding="same"))
model.add(Conv2D(64, kernel_size=3, padding="same"))
model.add(Conv2D(self.channels, kernel_size=3, padding='same'))
gen_input = Input(shape=(self.latent_dim,))
img = model(gen_input)
return Model(gen_input, img)
def build_disk_and_q_net(self):
img = Input(shape=self.img_shape)
# Shared layers between discriminator and recognition network
model = Sequential()
model.add(Conv2D(64, kernel_size=3, strides=2, input_shape=self.img_shape, padding="same"))
model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
model.add(Conv2D(256, kernel_size=3, strides=2, padding="same"))
model.add(Conv2D(512, kernel_size=3, strides=2, padding="same"))
img_embedding = model(img)
# Discriminator
validity = Dense(1, activation='sigmoid')(img_embedding)
# Recognition
q_net = Dense(128, activation='relu')(img_embedding)
label = Dense(self.num_classes, activation='softmax')(q_net)
# Return discriminator and recognition network
return Model(img, validity), Model(img, label)
def mutual_info_loss(self, c, c_given_x):
"""The mutual information metric we aim to minimize"""
eps = 1e-8
conditional_entropy = K.mean(- K.sum(K.log(c_given_x + eps) * c, axis=1))
entropy = K.mean(- K.sum(K.log(c + eps) * c, axis=1))
return conditional_entropy + entropy
def sample_generator_input(self, batch_size):
# Generator inputs
sampled_noise = np.random.normal(0, 1, (batch_size, 62))
sampled_labels = np.random.randint(0, self.num_classes, batch_size).reshape(-1, 1)
sampled_labels = to_categorical(sampled_labels, num_classes=self.num_classes)
return sampled_noise, sampled_labels
def train(self, epochs, batch_size=128, sample_interval=50):
X_train = np.ones([batch_size, 224, 224])
y_train = np.zeros([batch_size,])
# Rescale -1 to 1
X_train = (X_train.astype(np.float32) - 127.5) / 127.5
X_train = np.expand_dims(X_train, axis=3)
y_train = y_train.reshape(-1, 1)
# Adversarial ground truths
valid = np.ones((batch_size, 1))
fake = np.zeros((batch_size, 1))
for epoch in range(epochs):
# ---------------------
# Train Discriminator
# ---------------------
# Select a random half batch of images
idx = np.random.randint(0, X_train.shape[0], batch_size)
imgs = X_train[idx]
# Sample noise and categorical labels
sampled_noise, sampled_labels = self.sample_generator_input(batch_size)
gen_input = np.concatenate((sampled_noise, sampled_labels), axis=1)
print(sampled_labels.shape, batch_size)
# Generate a half batch of new images
gen_imgs = self.generator.predict(gen_input)
# Train on real and generated data
d_loss_real = self.discriminator.train_on_batch(imgs, valid)
d_loss_fake = self.discriminator.train_on_batch(gen_imgs, fake)
# Avg. loss
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
# ---------------------
# Train Generator and Q-network
# ---------------------
g_loss = self.combined.train_on_batch(gen_input, [valid, sampled_labels])
# Plot the progress
print ("%d [D loss: %.2f, acc.: %.2f%%] [Q loss: %.2f] [G loss: %.2f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss[1], g_loss[2]))
# If at save interval => save generated image samples
if epoch % sample_interval == 0:
def sample_images(self, epoch):
r, c = self.num_classes, self.num_classes
fig, axs = plt.subplots(r, c)
for i in range(c):
sampled_noise, _ = self.sample_generator_input(c)
label = to_categorical(np.full(fill_value=i, shape=(r,1)), num_classes=self.num_classes)
gen_input = np.concatenate((sampled_noise, label), axis=1)
gen_imgs = self.generator.predict(gen_input)
gen_imgs = 0.5 * gen_imgs + 0.5
for j in range(r):
axs[j,i].imshow(gen_imgs[j,:,:,0], cmap='gray')
fig.savefig("images/%d.png" % epoch)
def save_model(self):
def save(model, model_name):
model_path = "saved_model/%s.json" % model_name
weights_path = "saved_model/%s_weights.hdf5" % model_name
options = {"file_arch": model_path,
"file_weight": weights_path}
json_string = model.to_json()
open(options['file_arch'], 'w').write(json_string)
save(self.generator, "generator")
save(self.discriminator, "discriminator")
if __name__ == '__main__':
infogan = INFOGAN()
infogan.train(epochs=50000, batch_size=8, sample_interval=50)

How does MobileNet v1 achieve small parameter count on tensorflow?

I was trying to re-build MobileNet model identical to the keras application provided version on Tensorflow v2.1.0.
However, no matter what I tried (i.e., Conv2d, SeparableConv2D, DepthwiseConv2D), the parameter count seems way off to a point the model starts allocating 100+ GB ram in the system.
The model summary for the keras version and my own version along with the layers for my version could be found below under snippets section.
For sake of simplicity, I am not using any width and resolution multiplier (or let's say both has the value 1.0).
How might I achieve the same parameter count as low as the keras provided version?
Portion of keras mobilenet model summary
Model: "mobilenet_1.00_224"
Layer (type) Output Shape Param #
input_1 (InputLayer) [(None, 224, 224, 3)] 0
conv1_pad (ZeroPadding2D) (None, 225, 225, 3) 0
conv1 (Conv2D) (None, 112, 112, 32) 864
conv1_bn (BatchNormalization (None, 112, 112, 32) 128
conv1_relu (ReLU) (None, 112, 112, 32) 0
conv_dw_1 (DepthwiseConv2D) (None, 112, 112, 32) 288
conv_dw_1_bn (BatchNormaliza (None, 112, 112, 32) 128
Portion of self created model summary
Model: "dummy_model"
Layer (type) Output Shape Param #
input_1 (InputLayer) [(None, 224, 224, 3)] 0
zero_padding2d (ZeroPadding2 (None, 225, 225, 3) 0
conv2d (Conv2D) (None, 112, 112, 32) 896
batch_normalization (BatchNo (None, 112, 112, 32) 128
re_lu (ReLU) (None, 112, 112, 32) 0
depthwise_conv2d (DepthwiseC (None, 112, 112, 32) 32800
batch_normalization_1 (Batch (None, 112, 112, 32) 128
Self created model with layers
inputs = Input(shape=(224, 224, 3))
x = ZeroPadding2D(padding=((1, 0), (1, 0)))(inputs)
x = Conv2D(32, (3, 3), strides=(2, 2), padding="valid")(x)
x = BatchNormalization()(x)
x = ReLU()(x)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x, 2)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x, 2)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x, 2)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x)
x = depthwise_separable_convolution(x, 2)
x = depthwise_separable_convolution(x, 2)
x = AveragePooling2D(pool_size=7)(x)
x = Flatten()(x)
x = Dense(10, activation="softmax")(x)
return Model(inputs=inputs, outputs=x, name="dummy_model")
Depthwise separable convolution
def depthwise_separable_convolution(self, input, strides=1):
input_depth = input.shape[-1]
output_depth = input_depth * 2
x = DepthwiseConv2D(input_depth, 1, padding="same")(input)
x = BatchNormalization()(x)
x = ReLU()(x)
x = Conv2D(output_depth, 1, padding="same")(x)
x = BatchNormalization()(x)
x = ReLU()(x)
return x

keras-tensorflow CAE dimension mismatch

I'm basically following this guide to build convolutional autoencoder with tensorflow backend. The main difference to the guide is that my data is 257x257 grayscale images. The following code:
TRAIN_FOLDER = 'data/OIRDS_gray/'
SHAPE = (257,257,1)
def loadTrainData():
train_data = []
for fn in FILELIST:
img = misc.imread(TRAIN_FOLDER + fn)
img = np.reshape(img,(len(img[0,:]), len(img[:,0]), SHAPE[2]))
if img.shape != SHAPE:
print "image shape mismatch!"
print "Expected: "
print SHAPE
print "but got:"
print img.shape
train_data.append (img)
train_data = np.array(train_data)
train_data = train_data.astype('float32')/ 255
return np.array(train_data)
def createModel():
input_img = Input(shape=SHAPE)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid',padding='same')(x)
return Model(input_img, decoded)
x_train = loadTrainData()
autoencoder = createModel()
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
print x_train.shape
# Run the network, x_train,
gives me a error:
ValueError: Error when checking target: expected conv2d_7 to have shape (None, 260, 260, 1) but got array with shape (859, 257, 257, 1)
As you can see this is not the standard problem with theano/tensorflow backend dim ordering, but something else. I checked that my data is what it's supposed to be with print x_train.shape:
(859, 257, 257, 1)
And I also run autoencoder.summary():
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 257, 257, 1) 0
conv2d_1 (Conv2D) (None, 257, 257, 16) 160
max_pooling2d_1 (MaxPooling2 (None, 129, 129, 16) 0
conv2d_2 (Conv2D) (None, 129, 129, 8) 1160
max_pooling2d_2 (MaxPooling2 (None, 65, 65, 8) 0
conv2d_3 (Conv2D) (None, 65, 65, 8) 584
max_pooling2d_3 (MaxPooling2 (None, 33, 33, 8) 0
conv2d_4 (Conv2D) (None, 33, 33, 8) 584
up_sampling2d_1 (UpSampling2 (None, 66, 66, 8) 0
conv2d_5 (Conv2D) (None, 66, 66, 8) 584
up_sampling2d_2 (UpSampling2 (None, 132, 132, 8) 0
conv2d_6 (Conv2D) (None, 132, 132, 16) 1168
up_sampling2d_3 (UpSampling2 (None, 264, 264, 16) 0
conv2d_7 (Conv2D) (None, 264, 264, 1) 145
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
Now I'm not exactly sure where the problem is, but it does look like things go wrong around conv2d_6 (Param # too high). I do know how CAE's work on principle, but I'm not that familiar with the exact technical details yet and I have tried to solve this mainly by messing with deconvolution padding (instead of same, using valid). The closes I got to dims matching was (None, 258, 258, 1). I achieved this by blindly trying different combinations of padding on deconvolution side, not really a smart way to solve a problem...
At this point I'm at a loss, and any help would be appreciated
Since your input and output data are the same, your final output shape should be the same as the input shape.
The last convolutional layer should have shape (None, 257,257,1).
The problem is happening because you have an odd number as the sizes of the images (257).
When you apply MaxPooling, it should divide the number by two, so it chooses rounding either up or down (it's going up, see the 129, coming from 257/2 = 128.5)
Later, when you do UpSampling, the model doesn't know the current dimensions were rounded, it simply doubles the value. This happening in sequence is adding 7 pixels to the final result.
You could try either cropping the result or padding the input.
I usually work with images of compatible sizes. If you have 3 MaxPooling layers, your size should be a multiple of 2³. The answer is 264.
Padding the input data directly:
x_train = numpy.lib.pad(x_train,((0,0),(3,4),(3,4),(0,0)),mode='constant')
This will require that SHAPE=(264,264,1)
Padding inside the model:
import keras.backend as K
input_img = Input(shape=SHAPE)
x = Lambda(lambda x: K.spatial_2d_padding(x, padding=((3, 4), (3, 4))), output_shape=(264,264,1))(input_img)
Cropping the results:
This will be required in any case where you do not change the actual data (numpy array) directly.
decoded = Lambda(lambda x: x[:,3:-4,3:-4,:], output_shape=SHAPE)(x)