Tensorflow - different image size for training and inference - tensorflow

I trained my u-net architecture on images with 1024*1024 size. And now want to inference with high resolution images with different sizes and aspect ratios. But I got error:
Conv2DSlowBackpropInput: Size of out_backprop doesn't match computed: actual = 64, computed = 32 spatial_dim: 2 input: 64 filter: 3 output: 64 stride: 2 dilation: 1
[[node model_1/sequential_8/conv2d_transpose_8/conv2d_transpose (defined at tmp/ipykernel_40/3635850533.py:1) ]] [Op:__inference_predict_function_9960]
Function call stack:
predict_function
Model architecture:
base_model = tf.keras.applications.MobileNetV2(input_shape=[1024, 1024, 3], include_top=False)
# Use the activations of these layers
layer_names = [
'block_1_expand_relu', # 64x64
'block_3_expand_relu', # 32x32
'block_6_expand_relu', # 16x16
'block_13_expand_relu', # 8x8
'block_16_project', # 4x4
]
base_model_outputs = [base_model.get_layer(name).output for name in layer_names]
# Create the feature extraction model
down_stack = tf.keras.Model(inputs=base_model.input, outputs=base_model_outputs)
down_stack.trainable = False
up_stack = [
pix2pix.upsample(512, 3), # 4x4 -> 8x8
pix2pix.upsample(256, 3), # 8x8 -> 16x16
pix2pix.upsample(128, 3), # 16x16 -> 32x32
pix2pix.upsample(64, 3), # 32x32 -> 64x64
]
def unet_model(output_channels:int):
inputs = tf.keras.layers.Input(shape=[1024, 1024, 3])
# Downsampling through the model
skips = down_stack(inputs)
x = skips[-1]
skips = reversed(skips[:-1])
# Upsampling and establishing the skip connections
for up, skip in zip(up_stack, skips):
x = up(x)
concat = tf.keras.layers.Concatenate()
x = concat([x, skip])
# This is the last layer of the model
last = tf.keras.layers.Conv2DTranspose(
filters=output_channels, kernel_size=3, strides=2,
padding='same', activation='sigmoid') #64x64 -> 128x128 #
x = last(x)
return tf.keras.Model(inputs=inputs, outputs=x)

Related

How to stop training CNN part while continue training ANN part in a Multi-input Model?

I made a multi-input model in Keras which takes image shape=[N, 640, 480, 3] as well as numerical data shape=[N, 19] and does prediction on 12 classes.
Following is the model defining part of code:
# # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# # MODEL === CNN
# # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#
base_model = keras.applications.ResNet50(
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(640, 480, 3),
include_top=False) # Do not include the ImageNet classifier at the top.
base_model.trainable = False
input_Cnn = keras.Input(shape=(640, 480, 3))
x = base_model(input_Cnn, training=False)
# Convert features of shape `base_model.output_shape[1:]` to vectors
x = keras.layers.GlobalAveragePooling2D()(x)
# A Dense classifier with a single unit (binary classification)
x1 = keras.layers.Dense(1024, activation="relu")(x)
out_Cnn = keras.layers.Dense(12, activation="relu")(x1)
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# MODEL === NN
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
inp_num = keras.layers.Input(shape=(19,)) # no. of columns of the numerical data
fc1 = keras.layers.Dense(units=2 ** 6, activation="relu")(inp_num)
fc2 = keras.layers.Dense(units=2 ** 8, activation="relu")(fc1)
fc3 = keras.layers.Dense(units=2 ** 10, activation="relu")(fc2)
fc4 = keras.layers.Dense(units=2 ** 8, activation="relu")(fc3)
fc5 = keras.layers.Dense(units=2 ** 6, activation="relu")(fc4)
out_NN = keras.layers.Dense(12, activation="relu")(fc5)
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# CONCATENATION
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
result = keras.layers.concatenate((out_Cnn, out_NN), axis=-1) # [N, 12] --- concatenate [N, 12] ==> [N, 24]
result = keras.layers.Dense(1024, activation='relu')(result)
result = keras.layers.Dense(units=12, activation="softmax")(result)
model = keras.Model([input_Cnn, inp_num], result)
print(model.summary())
Problem is that the CNN part (if independently trained) trains in a less number of epochs while the ANN part (if independently trained) takes a longer time (more epochs). But here in this code when both are combined, accuracy doesn't go beyond 10%. Is there any way to stop gradients flowing into the CNN part after a certain number of epochs so that after that model trains only the ANN part?
Im not using keras but after a quick google search this should be the answer:
You can freeze layers, so that certain parameters are not learnable anymore:
# this freezes the first N layers
for layer in model.layers[:N]:
layer.trainable = False
Where N is the amount of convolutional layers you have.

Image Segmentation Tensorflow tutorials

In this tf tutorial, the U-net model has been divided into 2 parts, first contraction where they have used Mobilenet and it is not trainable. In second part, I'm not able to understand what all layers are being trained. As far as I could see, only the last layer conv2dTranspose seems trainable. Am I right?
And if I am how could only one layer is able to do such a complex task as segmentation?
Tutorial link: https://www.tensorflow.org/tutorials/images/segmentation
The code for the Image Segmentation Model, from the Tutorial is shown below:
def unet_model(output_channels):
inputs = tf.keras.layers.Input(shape=[128, 128, 3])
x = inputs
# Downsampling through the model
skips = down_stack(x)
x = skips[-1]
skips = reversed(skips[:-1])
# Upsampling and establishing the skip connections
for up, skip in zip(up_stack, skips):
x = up(x)
concat = tf.keras.layers.Concatenate()
x = concat([x, skip])
# This is the last layer of the model
last = tf.keras.layers.Conv2DTranspose(
output_channels, 3, strides=2,
padding='same') #64x64 -> 128x128
x = last(x)
return tf.keras.Model(inputs=inputs, outputs=x)
First part of the Model is Downsampling uses not the entire Mobilenet Architecture but only the Layers,
'block_1_expand_relu', # 64x64
'block_3_expand_relu', # 32x32
'block_6_expand_relu', # 16x16
'block_13_expand_relu', # 8x8
'block_16_project'
of the Pre-Trained Model, Mobilenet, which are non-trainable.
Second part of the Model (which is of your interest), before the layer, Conv2DTranspose is Upsampling part, which is present in the list,
up_stack = [
pix2pix.upsample(512, 3), # 4x4 -> 8x8
pix2pix.upsample(256, 3), # 8x8 -> 16x16
pix2pix.upsample(128, 3), # 16x16 -> 32x32
pix2pix.upsample(64, 3), # 32x32 -> 64x64
]
It means that it is accessing a Function named upsample from the Module, pix2pix. The code for the Module, pix2pix is present in this Github Link.
Code for the function, upsample is shown below:
def upsample(filters, size, norm_type='batchnorm', apply_dropout=False):
"""Upsamples an input.
Conv2DTranspose => Batchnorm => Dropout => Relu
Args:
filters: number of filters
size: filter size
norm_type: Normalization type; either 'batchnorm' or 'instancenorm'.
apply_dropout: If True, adds the dropout layer
Returns:
Upsample Sequential Model
"""
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(
tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
padding='same',
kernel_initializer=initializer,
use_bias=False))
if norm_type.lower() == 'batchnorm':
result.add(tf.keras.layers.BatchNormalization())
elif norm_type.lower() == 'instancenorm':
result.add(InstanceNormalization())
if apply_dropout:
result.add(tf.keras.layers.Dropout(0.5))
result.add(tf.keras.layers.ReLU())
return result
This means that the second part of the Model comprises of the Upsampling Layers, whose functionality is defined above, with the Number of Filters being 512, 256, 128 and 64.

Not understanding the data flow in UNET-like architetures and having problems with the output of the Conv2DTranspose layers

I have a problem or two with the input dimensions of modified U-Net architecture. In order to save your time and better understand/reproduce my results, I'll post the code and the output dimensions. The modified U-Net architecture is the MultiResUNet architecture from https://github.com/nibtehaz/MultiResUNet/blob/master/MultiResUNet.py. and is based on this paper https://arxiv.org/abs/1902.04049 Please Don't be turned off by the length of this code. You can simply copy-paste it and it shouldn't take longer than 10 seconds to reproduce my results. Also you don't need a dataset for this. Tested with TF.v1.9 Keras v.2.20.
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Conv2DTranspose, concatenate, BatchNormalization, Activation, add
from tensorflow.keras.models import Model
from tensorflow.keras.activations import relu
###{ 2D Convolutional layers
# Arguments: ######################################################################
# x {keras layer} -- input layer #
# filters {int} -- number of filters #
# num_row {int} -- number of rows in filters #
# num_col {int} -- number of columns in filters #
# Keyword Arguments:
# padding {str} -- mode of padding (default: {'same'})
# strides {tuple} -- stride of convolution operation (default: {(1, 1)})
# activation {str} -- activation function (default: {'relu'})
# name {str} -- name of the layer (default: {None})
# Returns:
# [keras layer] -- [output layer]}
# # ############################################################################
def conv2d_bn(x, filters ,num_row,num_col, padding = "same", strides = (1,1), activation = 'relu', name = None):
x = Conv2D(filters,(num_row, num_col), strides=strides, padding=padding, use_bias=False)(x)
x = BatchNormalization(axis=3, scale=False)(x)
if(activation == None):
return x
x = Activation(activation, name=name)(x)
return x
# our 2D transposed Convolution with batch normalization
# 2D Transposed Convolutional layers
# Arguments: #############################################################
# x {keras layer} -- input layer #
# filters {int} -- number of filters #
# num_row {int} -- number of rows in filters #
# num_col {int} -- number of columns in filters
# Keyword Arguments:
# padding {str} -- mode of padding (default: {'same'})
# strides {tuple} -- stride of convolution operation (default: {(2, 2)})
# name {str} -- name of the layer (default: {None})
# Returns:
# [keras layer] -- [output layer] ###################################
def trans_conv2d_bn(x, filters, num_row, num_col, padding='same', strides=(2, 2), name=None):
x = Conv2DTranspose(filters, (num_row, num_col), strides=strides, padding=padding)(x)
x = BatchNormalization(axis=3, scale=False)(x)
return x
# Our Multi-Res Block
# Arguments: ############################################################
# U {int} -- Number of filters in a corrsponding UNet stage #
# inp {keras layer} -- input layer #
# Returns: #
# [keras layer] -- [output layer] #
###################################################################
def MultiResBlock(U, inp, alpha = 1.67):
W = alpha * U
shortcut = inp
shortcut = conv2d_bn(shortcut, int(W*0.167) + int(W*0.333) +
int(W*0.5), 1, 1, activation=None, padding='same')
conv3x3 = conv2d_bn(inp, int(W*0.167), 3, 3,
activation='relu', padding='same')
conv5x5 = conv2d_bn(conv3x3, int(W*0.333), 3, 3,
activation='relu', padding='same')
conv7x7 = conv2d_bn(conv5x5, int(W*0.5), 3, 3,
activation='relu', padding='same')
out = concatenate([conv3x3, conv5x5, conv7x7], axis=3)
out = BatchNormalization(axis=3)(out)
out = add([shortcut, out])
out = Activation('relu')(out)
out = BatchNormalization(axis=3)(out)
return out
# Our ResPath:
# ResPath
# Arguments:#######################################
# filters {int} -- [description]
# length {int} -- length of ResPath
# inp {keras layer} -- input layer
# Returns:
# [keras layer] -- [output layer]#############
def ResPath(filters, length, inp):
shortcut = inp
shortcut = conv2d_bn(shortcut, filters, 1, 1,
activation=None, padding='same')
out = conv2d_bn(inp, filters, 3, 3, activation='relu', padding='same')
out = add([shortcut, out])
out = Activation('relu')(out)
out = BatchNormalization(axis=3)(out)
for i in range(length-1):
shortcut = out
shortcut = conv2d_bn(shortcut, filters, 1, 1,
activation=None, padding='same')
out = conv2d_bn(out, filters, 3, 3, activation='relu', padding='same')
out = add([shortcut, out])
out = Activation('relu')(out)
out = BatchNormalization(axis=3)(out)
return out
# MultiResUNet
# Arguments: ############################################
# height {int} -- height of image
# width {int} -- width of image
# n_channels {int} -- number of channels in image
# Returns:
# [keras model] -- MultiResUNet model###############
def MultiResUnet(height, width, n_channels):
inputs = Input((height, width, n_channels))
# downsampling part begins here
mresblock1 = MultiResBlock(32, inputs)
pool1 = MaxPooling2D(pool_size=(2, 2))(mresblock1)
mresblock1 = ResPath(32, 4, mresblock1)
mresblock2 = MultiResBlock(32*2, pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(mresblock2)
mresblock2 = ResPath(32*2, 3, mresblock2)
mresblock3 = MultiResBlock(32*4, pool2)
pool3 = MaxPooling2D(pool_size=(2, 2))(mresblock3)
mresblock3 = ResPath(32*4, 2, mresblock3)
mresblock4 = MultiResBlock(32*8, pool3)
# Upsampling part
up5 = concatenate([Conv2DTranspose(
32*4, (2, 2), strides=(2, 2), padding='same')(mresblock4), mresblock3], axis=3)
mresblock5 = MultiResBlock(32*8, up5)
up6 = concatenate([Conv2DTranspose(
32*4, (2, 2), strides=(2, 2), padding='same')(mresblock5), mresblock2], axis=3)
mresblock6 = MultiResBlock(32*4, up6)
up7 = concatenate([Conv2DTranspose(
32*2, (2, 2), strides=(2, 2), padding='same')(mresblock6), mresblock1], axis=3)
mresblock7 = MultiResBlock(32*2, up7)
conv8 = conv2d_bn(mresblock7, 1, 1, 1, activation='sigmoid')
model = Model(inputs=[inputs], outputs=[conv8])
return model
So now back to my problem with mismatched input/output dimensions in the UNet architecture.
If I choose filter height/width (128,128) or (256,256) or (512,512) and do :
model = MultiResUnet(128, 128,3)
display(model.summary())
Tensorflow gives me a perfect result of how the whole architecture looks like. Now if I do this
model = MultiResUnet(36, 36,3)
display(model.summary())
I get this error :
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) in
----> 1 model = MultiResUnet(36, 36,3)
2 display(model.summary())
in MultiResUnet(height, width,
n_channels)
25
26 up5 = concatenate([Conv2DTranspose(
---> 27 32*4, (2, 2), strides=(2, 2), padding='same')(mresblock4), mresblock3], axis=3)
28 mresblock5 = MultiResBlock(32*8, up5)
29
~/miniconda3/envs/MastersThenv/lib/python3.6/site-packages/tensorflow/python/keras/layers/merge.py
in concatenate(inputs, axis, **kwargs)
682 A tensor, the concatenation of the inputs alongside axis axis.
683 """
--> 684 return Concatenate(axis=axis, **kwargs)(inputs)
685
686
~/miniconda3/envs/MastersThenv/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py
in call(self, inputs, *args, **kwargs)
694 if all(hasattr(x, 'get_shape') for x in input_list):
695 input_shapes = nest.map_structure(lambda x: x.get_shape(), inputs)
--> 696 self.build(input_shapes)
697
698 # Check input assumptions set after layer building, e.g. input shape.
~/miniconda3/envs/MastersThenv/lib/python3.6/site-packages/tensorflow/python/keras/utils/tf_utils.py
in wrapper(instance, input_shape)
146 else:
147 input_shape = tuple(tensor_shape.TensorShape(input_shape).as_list())
--> 148 output_shape = fn(instance, input_shape)
149 if output_shape is not None:
150 if isinstance(output_shape, list):
~/miniconda3/envs/MastersThenv/lib/python3.6/site-packages/tensorflow/python/keras/layers/merge.py
in build(self, input_shape)
388 'inputs with matching shapes '
389 'except for the concat axis. '
--> 390 'Got inputs shapes: %s' % (input_shape))
391
392 def _merge_function(self, inputs):
ValueError: A Concatenate layer requires inputs with matching shapes
except for the concat axis. Got inputs shapes: [(None, 8, 8, 128),
(None, 9, 9, 128)]
Why does the Conv2DTranspose give me the wrong dimension
(None, 8, 8, 128)
instead of
(None, 9, 9, 128)
and why doesn't the Concat function complain when I choose filter sizes like (128,128),(256,256) and etc. (multiples of 32)
So to generalize this question how can I make this UNet architecture work with any filter size and how can I deal with the Conv2DTranspose layer producing an output that has one dimension less(width/height) than the actually needed dimension(when the filter size isn't a multiple of 32 or is not symmetric) and why doesn't this happen with other filter sizes that are a multiple of the 32. And what If I had variable Input sizes ??
Any help would be highly appreciated.
cheers,
H
U-Net family of models (such as the MultiResUNet model above) follow an encoder-decoder architecture. Encoder is a down-sampling path with feature extraction whereas the decoder an upsampling one. Feature maps from encoder are concatenated at the decoder through skip-connections. These feature maps are concatenated at the last axis, the 'channel' axis (considering the features to be having dimensions [batch_size, height, width, channels]). Now, for the features to be concatenated at any axis ('channel' axis, in our case), the dimensions at all the other axes must match.
In the above model architecture, there are 3 downsampling/max-pooling operations being performed (through MaxPooling2D)in the encoder path. At the decoder path 3 upsampling/transpose-conv operations are performed, aiming to restore the image back to the full dimension. However, for the concatenations (through skip-connections) to happen, the downsampled and upsampled feature dimensions of height, width & batch_size should remain identical at every "level" of the model. I'll illustrate this with the examples you mentioned in the question:
1st case : Input dimensions (128,128,3) : 128 -> 64 -> 32 -> 16 -> 32 -> 64 -> 128
2nd case: Input dimensions (36,36,3) : 36 -> 18 -> 9 -> 4 -> 8 -> 16 -> 32
In the 2nd case, when the height and width of feature map reaches 9 in the encoder path, further downsampling leads to a dimension change (loss) that cannot be regained in the decoder while upsampling. Hence, it throws an error due to inability to concatenate feature maps of dimensions [(None, 8, 8, 128)] & [(None, 9, 9, 128)].
In general, for a simple encoder-decoder model (with skip-connections) having 'n' downsampling (MaxPooling2D) layers, the input dimension must be a multiple of 2^n to be able to concatenate the model's encoder features at the decoder. In this case n=3, hence the input must be a multiple of 8 to not run into these dimension mismatch errors.
Hope this helps! :)
Thanks #Balraj Ashwath for the great answer! Then, if your input has shape h and you want to use this architecture with depth d (h >= 2^d), one possibility is to pad the dimension of h with delta_h zeros, given by the following expression:
import numpy as np
h, d = 36, 3
delta_h = np.ceil(h/(2**d)) * (2**d) - h
print(delta_h)
> 4.0
Then, following the example of #Balraj Ashwath:
40 -> 20 -> 10 -> 5 -> 10 -> 20 -> 40

I don't get the keras fit_generator to work with mixed (image and number) input

I/m breaking my head on this for 3 days now.
I followed mainly this link to create my own datagenerator. But one way or another I'm doing something wrong and I can't figure out why. My error is:
*ValueError: Error when checking input: expected dense_4_input to have 2 dimensions, but got array with shape (5, 128, 128, 3)
*
The network for the number:
def create_mlp(dim, regress=False):
# define our MLP network
model = Sequential()
model.add(Dense(8, input_dim=dim, activation="relu"))
model.add(Dense(4, activation="relu"))
# check to see if the regression node should be added
if regress:
model.add(Dense(1, activation="linear"))
return model
The CNN for the image:
def create_cnn(inputshape, filters=(16, 32, 64), regress=True):
chanDim = -1
# define the model input
inputs = Input(shape=inputshape)
# loop over the number of filters
for (i, f) in enumerate(filters):
# if this is the first CONV layer then set the input
# appropriately
if i == 0:
x= inputs
# CONV => RELU => BN => POOL
x = Conv2D(f, (3, 3), padding="same")(x)
x = Activation("relu")(x)
x = BatchNormalization(axis=chanDim)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
# flatten the volume, then FC => RELU => BN => DROPOUT
x = Flatten()(x)
x = Dense(16)(x)
x = Activation("relu")(x)
x = BatchNormalization(axis=chanDim)(x)
x = Dropout(0.5)(x)
# apply another FC layer, this one to match the number of nodes
# coming out of the MLP
x = Dense(4)(x)
x = Activation("relu")(x)
# check to see if the regression node should be added
if regress:
x = Dense(1, activation="linear")(x)
# construct the CNN
model = Model(inputs, x)
# return the CNN
return model
My own generator:
def aux_generator(img="todo", aux_input="todo", batch_size=3):
while True:
# Select files (paths/indices) for the batch
# todo make this random
img_path, gridnum, batch_output = get_batch_path()
batch_input_img = []
batch_input_sattelite = []
# Read in each input, perform preprocessing and get labels
for input_path in img_path:
input_img = get_input_image(input_path)
input = preprocess_input(image=input_img)
batch_input_img += [input]
for GridNum in gridnum:
# append is not good!
batch_input_sattelite.append(get_input_sattelite(GridNum))
# Return a tuple of (input,output) to feed the network
batch_x1 = np.array(batch_input_img)
batch_x2 = np.array(batch_input_sattelite)
batch_y = np.array(batch_output)
print("image shape : ", batch_x1.shape) #(5, 128, 128, 3)
print("Aux shape: ", batch_x2.shape, batch_x2) #(5,)
yield [batch_x1, batch_x2], batch_y
def get_batch_path():
# use the df we produced in downloadMpas to know where the images are and what their NO2 concentration is
img_info_df = pd.read_csv(r"Small/mappingTest.csv", delimiter=',', header=None,
names=['GridNum', 'id', 'score', 'lat', 'lon'])
img_info_df = img_info_df[img_info_df.score != "score"]
# the keras network needs float for the score (not object which is default when he reads in)
img_info_df = img_info_df.astype({"GridNum": 'float64', "id": 'object', "score": 'float64'})
return img_info_df['id'].head(n=5), img_info_df['GridNum'].head(n=5), img_info_df['score'].head(n=5)
def get_input_image(path):
# get image
img = image.load_img(r"Small/" + path)
img = image.img_to_array(img)
# get the corresponding value of the sattelite data
return img
def get_input_sattelite(GridNum):
sattelite_no2 = sattelite_df[sattelite_df['GridNum'] == GridNum]['sattelite'].values[0]
print("sattelite no2:", sattelite_no2)
return sattelite_no2
def preprocess_input(image):
# do whatever we want to the images
return (image)
The main:
sattelite_df = pd.read_csv(r"Small/sattelite.csv", delimiter=',', header=None,
names=['GridNum', 'id', 'score', 'lat', 'lon', 'sattelite'])
input_img_shape = (128, 128, 3)
input_aux_shape = (1)
img_model = create_cnn(input_img_shape)
aux_model = create_mlp(input_aux_shape, regress=False)
combinedInput = concatenate([aux_model.output, img_model.output])
# our final FC layer head will have two dense layers, the final one
# being our regression head
x = Dense(4, activation="relu")(combinedInput)
x = Dense(1, activation="linear")(x)
# our final model will accept categorical/numerical data on the MLP
# input and images on the CNN input, outputting a single value (the
# predicted price of the house)
model = Model(inputs=[aux_model.input, img_model.input], outputs=x)
opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss="mean_absolute_percentage_error", optimizer=opt)
batch_size = 2
early = EarlyStopping(monitor='val_loss', patience=3, verbose=1, restore_best_weights=True)
ImageFile.LOAD_TRUNCATED_IMAGES = True
model.fit_generator(
aux_generator(batch_size=batch_size),
steps_per_epoch=10 // batch_size,
epochs=2,
validation_data=aux_generator(batch_size=3),
validation_steps=20 // batch_size,
callbacks=[early])
Any help is welcome, as I don't know what I'm doing wrong?

logits and labels must be same size logits_size

hi i used my own dataset for train the model but i have error that i mention below . my dataset has 124 class and lables are 0 to 123 , size is 60*60 gray , batch is 10 and result is :
lables.eval() --> [ 1 101 101 103 103 103 103 100 102 1] -- len(lables.eval())= 10
orginal pic size -- > (?, 60, 60, 1)
First convolutional layer (?, 30, 30, 32)
Second convolutional layer. (?, 15, 15, 64)
flatten. (?, 14400)
dense .1 (?, 2048)
dense .2 (?, 124)
error
ensorflow.python.framework.errors_impl.InvalidArgumentError: logits and
labels must have the same first dimension, got logits shape [40,124] and
labels shape [10]
code
def model_fn(features, labels, mode, params):
# Reference to the tensor named "image" in the input-function.
x = features["image"]
# The convolutional layers expect 4-rank tensors
# but x is a 2-rank tensor, so reshape it.
net = tf.reshape(x, [-1, img_size, img_size, num_channels])
# First convolutional layer.
net = tf.layers.conv2d(inputs=net, name='layer_conv1',
filters=32, kernel_size=3,
padding='same', activation=tf.nn.relu)
net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)
# Second convolutional layer.
net = tf.layers.conv2d(inputs=net, name='layer_conv2',
filters=64, kernel_size=3,
padding='same', activation=tf.nn.relu)
net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)
# Flatten to a 2-rank tensor.
net = tf.contrib.layers.flatten(net)
# Eventually this should be replaced with:
# net = tf.layers.flatten(net)
# First fully-connected / dense layer.
# This uses the ReLU activation function.
net = tf.layers.dense(inputs=net, name='layer_fc1',
units=2048, activation=tf.nn.relu)
# Second fully-connected / dense layer.
# This is the last layer so it does not use an activation function.
net = tf.layers.dense(inputs=net, name='layer_fc_2',
units=num_classes)
# Logits output of the neural network.
logits = net
y_pred = tf.nn.softmax(logits=logits)
y_pred_cls = tf.argmax(y_pred, axis=1)
if mode == tf.estimator.ModeKeys.PREDICT:
spec = tf.estimator.EstimatorSpec(mode=mode,
predictions=y_pred_cls)
else:
cross_entropy =
tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,
logits=logits)
loss = tf.reduce_mean(cross_entropy)
optimizer =
tf.train.AdamOptimizer(learning_rate=params["learning_rate"])
train_op = optimizer.minimize(
loss=loss, global_step=tf.train.get_global_step())
metrics = \
{
"accuracy": tf.metrics.accuracy(labels, y_pred_cls)
}
spec = tf.estimator.EstimatorSpec(
mode=mode,
loss=loss,
train_op=train_op,
eval_metric_ops=metrics)
return spec`
this lables comes from here via tfrecords:
def input_fn(filenames, train, batch_size=10, buffer_size=2048):
# Args:
# filenames: Filenames for the TFRecords files.
# train: Boolean whether training (True) or testing (False).
# batch_size: Return batches of this size.
# buffer_size: Read buffers of this size. The random shuffling
# is done on the buffer, so it must be big enough.
# Create a TensorFlow Dataset-object which has functionality
# for reading and shuffling data from TFRecords files.
dataset = tf.data.TFRecordDataset(filenames=filenames)
# Parse the serialized data in the TFRecords files.
# This returns TensorFlow tensors for the image and labels.
dataset = dataset.map(parse)
if train:
# If training then read a buffer of the given size and
# randomly shuffle it.
dataset = dataset.shuffle(buffer_size=buffer_size)
# Allow infinite reading of the data.
num_repeat = None
else:
# If testing then don't shuffle the data.
# Only go through the data once.
num_repeat = 1
# Repeat the dataset the given number of times.
dataset = dataset.repeat(num_repeat)
# Get a batch of data with the given size.
dataset = dataset.batch(batch_size)
# Create an iterator for the dataset and the above modifications.
iterator = dataset.make_one_shot_iterator()
# Get the next batch of images and labels.
images_batch, labels_batch = iterator.get_next()
# The input-function must return a dict wrapping the images.
x = {'image': images_batch}
y = labels_batch
print(x, ' - ', y.get_shape())
return x, y
i generate labeles via this code for example image name=math-1 , lable = 1
def get_lable_and_image(path):
lbl = []
img = []
for filename in glob.glob(os.path.join(path, '*.png')):
img.append(filename)
lable = filename[41:].split()[0].split('-')[1]
lbl.append(int(lable))
lables = np.array(lbl)
images = np.array(img)
# print(images[1], lables[1])
return images, lables
i push images and lables to create tfrecords
def convert(image_paths, labels, out_path):
# Args:
# image_paths List of file-paths for the images.
# labels Class-labels for the images.
# out_path File-path for the TFRecords output file.
print("Converting: " + out_path)
# Number of images. Used when printing the progress.
num_images = len(image_paths)
# Open a TFRecordWriter for the output-file.
with tf.python_io.TFRecordWriter(out_path) as writer:
# Iterate over all the image-paths and class-labels.
for i, (path, label) in enumerate(zip(image_paths, labels)):
# Print the percentage-progress.
print_progress(count=i, total=num_images-1)
# Load the image-file using matplotlib's imread function.
img = imread(path)
# Convert the image to raw bytes.
img_bytes = img.tostring()
# Create a dict with the data we want to save in the
# TFRecords file. You can add more relevant data here.
data = \
{
'image': wrap_bytes(img_bytes),
'label': wrap_int64(label)
}
# Wrap the data as TensorFlow Features.
feature = tf.train.Features(feature=data)
# Wrap again as a TensorFlow Example.
example = tf.train.Example(features=feature)
# Serialize the data.
serialized = example.SerializeToString()
# Write the serialized data to the TFRecords file.
writer.write(serialized)