Keras model.fit causes InvalidArgumentError when training on TPU - tensorflow

When running the model.fit function an error is thrown. The main question is, what does this error mean? The code is run on a TPU V3-8 and uses Google cloud for data retrieval. I did try to look up the error on the web, however I could not find a single case of someone else getting this error.
model.fit(
dataset,
steps_per_epoch = N_IMGS // BATCH_SIZE,
epochs = EPOCHS,
)
Throws the error
InvalidArgumentError: {{function_node __inference_train_function_528542}} Compilation failure: Depth of output must be a multiple of the number of groups: 3 vs 2
[[{{node sequential/conv2d/Conv2D}}]]
TPU compilation failed
[[tpu_compile_succeeded_assert/_15965336225898828069/_5]]
The error message is not clear to me, what exactly is going wrong? The following model is used.
def get_model():
# reset to free memory and training variables
tf.keras.backend.clear_session()
with strategy.scope():
net = efn.EfficientNetB0(include_top=False, weights='noisy-student', input_shape=(HEIGHT, WIDTH, 3))
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(3, (3, 3), padding='same', input_shape=(HEIGHT, WIDTH, 1)),
net,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(N_LABELS, activation='softmax', dtype='float32'),
])
model.compile(optimizer=tf.keras.optimizers.Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
model = get_model()
tf.keras.utils.plot_model(model, 'model.png', show_shapes=True)
The dataset gives the following output
for images, labels in dataset.take(1): # only take first element of dataset
print(f'images.shape: {images.shape}, images.dtype: {images.dtype}, labels.shape: {labels.shape}, labels.dtype: {labels.dtype}')
images.shape: (64, 224, 400, 1), images.dtype: <dtype: 'float32'>, labels.shape: (64,), labels.dtype: <dtype: 'int32'>

Related

Efficientnet inputshape error with tf.data.Dataset

when feeding a tf.data.Dataset to train EfficientnetB0 model I get the following error:
ValueError: in converted code:
C:\Users\fconrad\AppData\Local\Continuum\anaconda3\envs\venv_spielereien\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py:677 map_fn
batch_size=None)
C:\Users\fconrad\AppData\Local\Continuum\anaconda3\envs\venv_spielereien\lib\site-packages\tensorflow_core\python\keras\engine\training.py:2410 _standardize_tensors
exception_prefix='input')
C:\Users\fconrad\AppData\Local\Continuum\anaconda3\envs\venv_spielereien\lib\site-packages\tensorflow_core\python\keras\engine\training_utils.py:573 standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected efficientnet-b0_input to have 4 dimensions, but got array with shape (224, 224, 3)
I realy wonder why this happens, since when I create a batch from my Dataset:
train_generator = (tf.data.Dataset
.from_tensor_slices((train_imgs, train_labels))
.map(read_img)
.map(flip_img)
.map(brightness)
.map(blur)
.map(noise)
.map(rotate_90)
.repeat()
.shuffle(512)
.batch(BATCH_SIZE)
.prefetch(True))
validation_generator = (tf.data.Dataset
.from_tensor_slices((validation_imgs, validation_labels))
.map(read_img)
)
print(train_generator.__iter__().__next__()[0].shape)
I get the expected result (64, 224, 224, 3).
But after creating the model the error above raises when I start training:
effn = tfkeras.EfficientNetB0(include_top=False, input_shape=img_shape, classes=4)
effn_model = tf.keras.Sequential()
effn_model.add(effn)
effn_model.add(tf.keras.layers.GlobalAveragePooling2D())
effn_model.add(tf.keras.layers.Dense(4, 'softmax'))
effn_model.compile(optimizer= 'adam', loss='categorical_crossentropy', metrics= ['categorical_accuracy'])
effn_model.fit(train_generator,
epochs=20,
steps_per_epoch=train_imgs.shape[0] // BATCH_SIZE,
validation_data= validation_generator)
Does anyone know why the slices from dataset have shape (64,224,224,3) but the model doesnt recognize the batch dimension? when I try to train a keras.application model, everything works fine.
I use tensorflow 2.1 and the pip install of efficientnet. Thanks
as explained here keras.io/api/applications/efficientnet/
input_shape: Optional shape tuple, only to be specified if include_top is False. It should have exactly 3 inputs channels.
as so try this->
from tensorflow.keras.applications.efficientnet import EfficientNetB0, EfficientNetB5
mm = EfficientNetB0(include_top=True, weights=None, input_tensor=None, input_shape=(128, 128, 3), pooling=None, classes=2, classifier_activation="sigmoid")
mm.summary()
note the input_shape=(128, 128, 3) It has 3 channels.

Keras(Tensorflow) LSTM error in spyder and jupyter

when I use google colab, there's no error in code
but when I use spyder or jupyter, the error occurs.
Model_10 = Sequential()
Model_10.add(LSTM(128, batch_input_shape = (1,10,5), stateful = True))
Model_10.add(Dense(5, activation = 'linear'))
Model_10.compile(loss = 'mse', optimizer = 'rmsprop')
Model_10.fit(x_train, y_train, epochs=1, batch_size=1, verbose=2, shuffle=False, callbacks=[history])
x_train_data.shape = (260,10,5)
y_train_data.shape = (260,1,5)
I'm using python3.7 and tensorflow 2.0
I don't know why error occurs in anaconda only.
ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ
error code
ValueError: A target array with shape (260, 1, 5) was passed for an output of shape (1, 5) while using as loss mean_squared_error. This loss expects targets to have the same shape as the output.
You should reshape your labels/targets:
y_train_data = y_train_data.reshape((260,5))
Since you're using batch_input_shape in the input layer and specifying batch size of 1, the model will take one example from your labels at each step which will have a shape of (1, 5) for the labels anyway.

2GB limit error when training Keras sequential model using Tensorflow dataset

I'm using tf.data.experimental.make_csv_dataset function to create the input to a Keras sequential model. My first layer is a DenseFeature that receives a list of tf.feature_column (indicator, bucketized, numeric etc). The following layers are Dense using relu activation. When I run the fit function I get the error: "Cannot create a tensor proto whose content is larger than 2GB.". What do I need to change to make this model train?
The below is the main part of the code:
train_input = tf.data.experimental.make_csv_dataset(["df_train.csv"], batch_size=64, label_name="loss_rate", num_epochs=1)
eval_input = tf.data.experimental.make_csv_dataset(["df_val.csv"], batch_size=64, label_name="loss_rate", shuffle=False, num_epochs=1)
#all_features is generated by a function (it has 87 tf.feature_column objects)
feature_layer = layers.DenseFeatures(all_features)
def deep_sequential_model():
model = tf.keras.Sequential([
feature_layer,
layers.Dense(64, activation='relu'),
layers.Dense(32, activation='relu'),
layers.Dense(1, activation='sigmoid')
])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae', 'mse'])
return model
model = deep_sequential_model()
model.fit(train_input,
validation_data=eval_input,
epochs=10)
I'm getting the error:
/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py in __init__(self, node_def, g, inputs, output_types, control_inputs, input_types, original_op, op_def)
1696 "Cannot create a tensor proto whose content is larger than 2GB.")
1697 if not _VALID_OP_NAME_REGEX.match(node_def.name):
-> 1698 raise ValueError("'%s' is not a valid node name" % node_def.name)
1699 c_op = None
1700 elif type(node_def).__name__ == "SwigPyObject":
ValueError: '_5' is not a valid node name```
I've just found the problem. The csv that I was loading had an index column that was creating a Tensor without any name and I believe this was causing issues. I just removed the index from the csv and it worked.

AlreadyExistsError while training a network on colab

I'm trying to train an LSTMs network on Google Colab. However, this error occurs:
AlreadyExistsError: Resource __per_step_116/training_4/Adam/gradients/bidirectional_4/while/ReadVariableOp/Enter_grad/ArithmeticOptimizer/AddOpsRewrite_Add/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
[[{{node training_4/Adam/gradients/bidirectional_4/while/ReadVariableOp/Enter_grad/ArithmeticOptimizer/AddOpsRewrite_Add/tmp_var}}]]
I don't know where can be the issue. This is the model of the network:
sl_model = keras.models.Sequential()
sl_model.add(keras.layers.Embedding(max_index+1, hidden_size, mask_zero=True))
sl_model.add(keras.layers.Bidirectional(keras.layers.LSTM(hidden_size,
activation='tanh', dropout=0.2, recurrent_dropout = 0.2, return_sequences=True)))
sl_model.add(keras.layers.Bidirectional(keras.layers.LSTM(hidden_size, activation='tanh', dropout=0.2, recurrent_dropout = 0.2, return_sequences=False))
)
sl_model.add(keras.layers.Dense(max_length, activation='softsign'))
optimizer = keras.optimizers.Adam()
sl_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['acc'])
batch_size = 128
epochs = 3
cbk = keras.callbacks.TensorBoard("logging/keras_model")
print("\nStarting training...")
sl_model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size,
shuffle=True, validation_data=(x_dev, y_dev), callbacks=[cbk])
Thank you so much!
You need to restart your runtime -- this happens when you have defined multiple graphs built in a single jupyter (Colaboratory) runtime.
Calling tf.reset_default_graph() may also help, but depending on whether you are using eager exection and how you've defined your sessions this may or may not work.

ValueError: Error when checking target: expected max_pooling2d_1 to have 4 dimensions, but got array with shape (61, 1)

I am working in keras tensorflow backend on Windows 10.
I am not able to interpret the meaning of the error
Here is a snippet of my code
{
model = Sequential([
#Dense(32, input_shape=(1080,1920,2)),
Dense(32, input_shape=(250,250, 3)),
#Dense(32, input_shape=(3,1080,1920,2)),
Activation('relu'),
Dense(10),
Activation('softmax'),
Dropout(0.02),
])
layer = Dropout(0.02)
#further layers:
model.add(Dense(units=3)) #hidden layer 1
model.add(Dense(units=1)) #output layer
model.add(Conv2D(3, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2),strides=None,padding='valid', data_format=None))
model.compile(loss=losses.mean_squared_error, optimizer='sgd')
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
test_generator = ImageDataGenerator()
validation_generator = test_generator.flow_from_directory(
'human_faces/validation',
target_size=(250,250),
batch_size=3,
class_mode=None,classes=0)
model.fit_generator(
train_generator,
steps_per_epoch=1,## batch_size,
#steps_per_epoch=3,
epochs=5,
validation_data=validation_generator,
# validation_steps=61 ) # batch_size)
validation_steps=1)
}
My error:
File "C:/Users/Owner/PycharmProjects/untitled1/work.py", line 89, in
validation_steps=1) ValueError: Error when checking target: expected max_pooling2d_1 to have 4 dimensions, but got array with
shape (61, 1)
There is a mismatch between the shapes of the output of your network (which is the output of the MaxPooling2D layer) and the output you seem to expect (based on the desired "true" output example you feed together with each input to model.fit_generator().
To investigate the mismatch you have to examine your (unshown) code of train_generator to see what output shape you are expecting, and can use model.summary() to see the conflicting output shape generated by the MaxPooling2D layer.
Try adding the following argument to Cov2D:
padding='SAME'
Like:
model.add(Conv2D(3, (3, 3),padding='SAME'))