this is my code :
number = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
absPath='F:/Projects/AI/Tensorflow/verificationcode/image/'
imagePaths=os.listdir('./image')
model=tf.keras.models.Sequential([
Conv2D(32,kernel_size=3, activation='relu'),
Conv2D(32,kernel_size=3, activation='relu'),
MaxPool2D((2, 2)),
Conv2D(64, kernel_size=3, activation='relu'),
Conv2D(64, kernel_size=3, activation='relu'),
MaxPool2D((2, 2)),
Conv2D(128, kernel_size=3, activation='relu'),
Conv2D(128, kernel_size=3, activation='relu'),
MaxPool2D((2, 2)),
Conv2D(256, kernel_size=3, activation='relu'),
Conv2D(256, kernel_size=3, activation='relu'),
MaxPool2D((2, 2)),
Flatten(),
Dropout(0.25),
Dense(40,activation='softmax')
])
model(inputs=tf.keras.Input(shape=(80,170,3)))
model.compile(optimizer='adam',
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.summary()
history = model.fit(x_train, y_train, batch_size=32,shuffle=True, epochs=5, validation_freq=1)
this is my model:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 78, 168, 32) 896
_________________________________________________________________
conv2d_1 (Conv2D) (None, 76, 166, 32) 9248
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 38, 83, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 36, 81, 64) 18496
_________________________________________________________________
conv2d_3 (Conv2D) (None, 34, 79, 64) 36928
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 17, 39, 64) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 15, 37, 128) 73856
_________________________________________________________________
conv2d_5 (Conv2D) (None, 13, 35, 128) 147584
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 6, 17, 128) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 4, 15, 256) 295168
_________________________________________________________________
conv2d_7 (Conv2D) (None, 2, 13, 256) 590080
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 1, 6, 256) 0
_________________________________________________________________
flatten (Flatten) (None, 1536) 0
_________________________________________________________________
dropout (Dropout) (None, 1536) 0
_________________________________________________________________
dense (Dense) (None, 40) 61480
=================================================================
Total params: 1,233,736
Trainable params: 1,233,736
Non-trainable params: 0
I use 3935 samples to train,it is result:
3616/3935 [==========================>...] - ETA: 10s - loss: 14.7558 - accuracy: 0.0105
3648/3935 [==========================>...] - ETA: 9s - loss: 14.7558 - accuracy: 0.0112
3680/3935 [===========================>..] - ETA: 8s - loss: 14.7558 - accuracy: 0.0128
3712/3935 [===========================>..] - ETA: 7s - loss: 14.7558 - accuracy: 0.0129
3744/3935 [===========================>..] - ETA: 6s - loss: 14.7558 - accuracy: 0.0134
3776/3935 [===========================>..] - ETA: 5s - loss: 14.7558 - accuracy: 0.0143
3808/3935 [============================>.] - ETA: 4s - loss: 14.7558 - accuracy: 0.0142
3840/3935 [============================>.] - ETA: 3s - loss: 14.7558 - accuracy: 0.0148
3872/3935 [============================>.] - ETA: 2s - loss: 14.7558 - accuracy: 0.0155
3904/3935 [============================>.] - ETA: 1s - loss: 14.7558 - accuracy: 0.0166
3935/3935 [==============================] - 135s 34ms/sample - loss: 14.7558 - accuracy: 0.0170
this is a captcha image:
captcha image
the loss unchanged, the accuracy is very low
How to solve it ?? thanks!
Related
I'm currently learning about gensim word2vec, and in the middle of making a model that can predict what the rating would be based on the words. All of my code compiles fine, however each epoch posses the same val_accuracy, and almost the exact same training accuracy. I've tried changing the architecture and hyperparameters, but have had no luck whatsoever. Any advice would be appreciated!
from keras.preprocessing.sequence import pad_sequences
word_index = {w: i+1 for i,w in enumerate(index_to_key) if i < max_words-1} # Keep just max_words (zero is reserved for unknown)
sequences = [[word_index.get(w, 0) for w in sent] for sent in reviews] # code the sentences
seqs_truncated = pad_sequences(sequences, maxlen=max_review_length, padding="pre", truncating="post")
ratings = np.asarray(Ratings)
# prepare training and validation data
x_val = seqs_truncated[:len_val]
partial_x_train = seqs_truncated[len_val:]
y_val = to_categorical(ratings[:len_val]-1, num_classes=5)
partial_y_train = to_categorical(ratings[len_val:]-1, num_classes=5)
print('Length of validation set =', len(x_val))
print('Length of training set =', len(partial_x_train))
def sample_network(embedding):
network = Sequential()
network.add(embedding)
network.add(Bidirectional(LSTM(64)))
network.add(Dense(64, activation='relu'))
network.add(Dense(32, activation="relu"))
network.add(Dense(16, activation="relu"))
network.add(Dropout(0.5))
network.add(Dense(5,activation='softmax'))
return network```
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_24 (Embedding) (None, 100, 300) 1500000
_________________________________________________________________
bidirectional_5 (Bidirection (None, 128) 186880
_________________________________________________________________
dense_16 (Dense) (None, 64) 8256
_________________________________________________________________
dense_17 (Dense) (None, 32) 2080
_________________________________________________________________
dense_18 (Dense) (None, 16) 528
_________________________________________________________________
dropout_10 (Dropout) (None, 16) 0
_________________________________________________________________
dense_19 (Dense) (None, 5) 85
=================================================================
Total params: 1,697,829
Trainable params: 1,697,829
Non-trainable params: 0 ```
hist_word_vec =network_wordvec.fit(partial_x_train,partial_y_train, epochs=no_epochs,
batch_size=256, validation_data=(x_val,y_val)
65/65 [==============================] - 39s 606ms/step - loss: 1.5886 - accuracy: 0.4449 - val_loss: 1.5750 - val_accuracy: 0.3875
Epoch 2/8
65/65 [==============================] - 38s 587ms/step - loss: 1.5491 - accuracy: 0.4554 - val_loss: 1.5458 - val_accuracy: 0.3875
Epoch 3/8
65/65 [==============================] - 37s 564ms/step - loss: 1.5152 - accuracy: 0.4554 - val_loss: 1.5215 - val_accuracy: 0.3875
I'm trying to make a denoise autoencoder wherein the encoder part is vgg16 and decoder is opposite of vgg16(encoder) network. My dataset consists of 5K images in grayscale and these are the steps i've followed to prepare and normalize:
input_X = [], (list having 5K (noisy)images of dims 224,224,1)
input_Y = [], (list having 5K (ground truth)images of dims 224,224,1)
test_X = [] (list having 1K (noisy)images for testing of dims 224,224,1)
input_X = input_X/255.0
input_Y = input_Y/255.0
test_X = test_X/255.0
print(input_X.shape)
print(input_Y.shape)
print(test_X.shape)
input_X = np.repeat(input_X[..., np.newaxis], 3, -1) #creating 3 channels
input_Y = np.repeat(input_Y[..., np.newaxis], 3, -1)
test_X = np.repeat(test_X[..., np.newaxis], 3, -1)
print(input_X.shape) (5000,224,224,3)
print(input_Y.shape)
print(test_X.shape)
encoder network:
vggmodel = keras.applications.vgg16.VGG16()
model_encoder = Sequential()
num = 0
for i, layer in enumerate(vggmodel.layers):
if i<19:
model_encoder.add(layer)
model_encoder.summary()
for layer in model_encoder.layers:
layer.trainable=False
output:
Metal device set to: Apple M1 Pro
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
decoder network:
encoder_output = Input(shape=(7, 7, 512,))
x = Conv2D(512, (3, 3), activation='relu', padding='same')(encoder_output)
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2,2))(x)
# Block 4
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2,2))(x)
# Block 3
x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2,2))(x)
# Block 2
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2,2))(x)
# Block 1
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
x = UpSampling2D((2, 2))(x)
model_decoder = Model(inputs=encoder_output, outputs=x)
training:
model_decoder.compile(optimizer='Adam', loss='mean_squared_error' , metrics=['accuracy'])
model_decoder.fit(trainx_encoded, train_Y,
epochs=no_epocs,
batch_size=batch_size,
validation_split=validation_split)
Now while training, the loss and accuracy doesn't changes. I can think of reducing filters in the initial decoder layers but i fear that's going to affect the autoencoder.
Here, i'm really clueless about what approach to follow.
training output:
Epoch 1/50
2022-04-27 22:10:05.336322: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
27/27 [==============================] - ETA: 0s - loss: 0.1678 - accuracy: 0.9689
2022-04-27 22:11:34.044172: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
27/27 [==============================] - 97s 4s/step - loss: 0.1678 - accuracy: 0.9689 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 2/50
27/27 [==============================] - 87s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 3/50
27/27 [==============================] - 86s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 4/50
27/27 [==============================] - 86s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 5/50
27/27 [==============================] - 91s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 6/50
27/27 [==============================] - 85s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 7/50
27/27 [==============================] - 87s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 8/50
27/27 [==============================] - 91s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 9/50
27/27 [==============================] - 85s 3s/step - loss: 0.1645 - accuracy: 1.0000 - val_loss: 0.1732 - val_accuracy: 1.0000
Epoch 10/50
27/27 [==============================] - ETA: 0s - loss: 0.1645 - accuracy: 1.0000
VGG16 is not trained to be used as an encoder for image reconstruction, it is trained to extract features from an image using which we can do classification task on the image.
This is why, you cannot use VGG16 as the encoder part of your denoise autoencoder.
However, if you want, you can use that architecture of VGG16 as the encoder part of your autoencoder, by just retraining those layers of VGG16,
vggmodel = keras.applications.vgg16.VGG16()
model_encoder = Sequential()
num = 0
for i, layer in enumerate(vggmodel.layers):
if i<19:
model_encoder.add(layer)
model_encoder.summary()
for layer in model_encoder.layers:
layer.trainable=True # Set encoder to trainable, and your autoencoder should work.
Again, there is a problem with your current architecture of autoencoder -it downsamples too much, leading to a significant loss of data. A smaller architecture would work better. For example:
model = Sequential()
model.add(Input(shape=(224,224,3)))
model.add(Conv2D(64, (3,3), activation='relu', padding='same'))
model.add(Conv2D(64, (3,3), activation='relu', padding='same'))
model.add(MaxPool2D((2,2)))
model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
model.add(MaxPool2D((2,2)))
model.add(Conv2D(256, (3,3), activation='relu', padding='same'))
model.add(Conv2D(256, (3,3), activation='relu', padding='same'))
model.add(Conv2D(256, (3,3), activation='relu', padding='same'))
model.add(UpSampling2D((2,2)))
model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
model.add(UpSampling2D((2,2)))
model.add(Conv2D(64, (3,3), activation='relu', padding='same'))
model.add(Conv2D(64, (3,3), activation='relu', padding='same'))
model.add(Conv2D(3, (3,3), activation='relu', padding='same'))
I'm Training the VGG16 model in colab while running it some time disconnects and reconnect again and sometimes while reaching 20, 21/35 epochs all connection loss and when I reconnect drive mounting restart due to this I lost all outputs, so I have to re-run all code. how can this problem be solved?
Even I'm using only 3000 images dataset for this which is divided into valid, train and test dataset
the code which I Run is
vgg16_model = tf.keras.applications.vgg16.VGG16()
vgg16_model.summary()
model = Sequential()
for layer in vgg16_model.layers[:-1]:
model.add(layer)
for layer in model.layers:
layer.trainable = False
model.add(Dense(units=2, activation='softmax'))
model.summary()
model.compile(optimizer = Adam(learning_rate=0.0001), loss = 'categorical_crossentropy', metrics = ['accuracy'])
model.fit(x=train_batches,
steps_per_epoch=len(train_batches),
validation_data=valid_batches,
validation_steps=len(valid_batches),
epochs=35,
verbose=2
)
I was able to execute sample code using VGG16 without any issues. Please refer working code as shown below
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras.applications import vgg16
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Sequential
from google.colab import drive
drive.mount('/content/drive')
train_dir = '/content/drive/My Drive/Dogs_Vs_Cats/train'
valid_dir = '/content/drive/My Drive/Dogs_Vs_Cats/test'
img_width, img_height = 224, 224
input_shape = (img_width, img_height, 3)
batch_size = 32
train_datagen = ImageDataGenerator(
rescale = 1. /255,
horizontal_flip = True)
valid_datagen = ImageDataGenerator(
rescale = 1. /255)
train_batches = train_datagen.flow_from_directory(
train_dir,
target_size = (img_width, img_height),
batch_size = batch_size,
class_mode = 'binary')
valid_batches = valid_datagen.flow_from_directory(
valid_dir,
target_size = (img_width, img_height),
class_mode = 'binary')
vgg16_model = vgg16.VGG16()
model = Sequential()
for layer in vgg16_model.layers[:-1]:
model.add(layer)
for layer in model.layers:
layer.trainable = False
model.add(Dense(units=1, activation='softmax'))
model.summary()
model.compile(optimizer = Adam(learning_rate=0.0001), loss = 'binary_crossentropy', metrics = ['accuracy'])
model.fit(
train_batches,
steps_per_epoch = 10,
epochs = 2,
validation_data = valid_batches,
verbose = 1,
validation_steps = 32)
Output:
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Found 2000 images belonging to 2 classes.
Found 1018 images belonging to 2 classes.
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
fc1 (Dense) (None, 4096) 102764544
_________________________________________________________________
fc2 (Dense) (None, 4096) 16781312
_________________________________________________________________
dense_2 (Dense) (None, 1) 4097
=================================================================
Total params: 134,264,641
Trainable params: 4,097
Non-trainable params: 134,260,544
_________________________________________________________________
Epoch 1/2
10/10 [==============================] - 503s 54s/step - loss: 0.6935 - accuracy: 0.4292 - val_loss: 0.6861 - val_accuracy: 0.4912
Epoch 2/2
10/10 [==============================] - 506s 55s/step - loss: 0.6856 - accuracy: 0.4669 - val_loss: 0.6748 - val_accuracy: 0.4912
Note: To debug your code try with 5 epoch first.
If you are facing session timeout, you can refer solutions discussed in Google Colab session timeout.
Hello I am trying to train a small model using tf.keras. with tf 2.2.0, i'm using a generator which returns sequences of [5,120,32,64,9] and labels [5,120,1] and I'm importing from tf.keras
from tensorflow.keras.metrics import Recall, Precision, Metric
Additionally I am adding them into the compile and fit section
model.compile(
loss="mse",
optimizer=Adam(learning_rate=self.learning_rate),
metrics=[Recall(), Precision()],
sample_weight_mode="temporal",
)
if callbacks is None:
callbacks = []
model.fit(
data.training(),
callbacks=callbacks,
steps_per_epoch=epoch_size,
epochs=epochs,
validation_data=data.training(),
validation_steps=validation_size,
verbose=0,
)
(I'm conscious that I'm using training as training data and validation data. I'm trying to find a bug in my code or in TF since we get strange and strong changes in results in recall and precision w.r.t validation. It never converges and produces extreme changes for example from 0 - 0.8 - 0.2 - 0.9 - 0.4 - 0.8 ...)
Additionally I'm using a generator which yields tuples of inputs and outputs, since that "corrected the problem"
however I'm still having results with precision and recall 0.00000
100/100 [==============================] - 224s 2s/step - loss: 0.0371 - recall: 0.0000e+00 - precision: 0.0000e+00 - val_loss: 0.0331 - val_recall: 0.0000e+00 - val_precision: 0.0000e+00
Does anyone know any other trick to use in tf 2.2 that I can use in order to solve that problem?
a summary of my NN is the following:
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(None, None, 32, 64, 9)] 0
_________________________________________________________________
conv_lst_m2d_1 (ConvLSTM2D) (None, None, 30, 62, 20) 20960
_________________________________________________________________
time_distributed_MP_1 (TimeD (None, None, 15, 31, 20) 0
_________________________________________________________________
time_distributed_BN_1 (TimeD (None, None, 15, 31, 20) 80
_________________________________________________________________
time_distributed_F (TimeDist (None, None, 9300) 0
_________________________________________________________________
time_distributed_D1 (TimeDis (None, None, 32) 297632
_________________________________________________________________
time_distributed (TimeDistri (None, None, 32) 0
_________________________________________________________________
time_distributed_D2 (TimeDis (None, None, 24) 792
_________________________________________________________________
time_distributed_1 (TimeDist (None, None, 24) 0
_________________________________________________________________
time_distributed_D3 (TimeDis (None, None, 16) 400
_________________________________________________________________
time_distributed_2 (TimeDist (None, None, 16) 0
_________________________________________________________________
output (TimeDistributed) (None, None, 1) 17
=================================================================
This was happening to me and I finally figured out why. My data was ordered. So for example all my negative samples were at the end of the array and all my positive at the beginning. So when the Neural Network was training at the beginning it would only find negative samples of the class.
I am training an Auto-encoder network on Tensorflow GPU 1.13.1. Initially, I used the batch size 32/64/128 but it seems the GPU is not being used at all. Although, "memory-usage" from "nvidia-smi returns the following:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |
| N/A 34C P0 53W / 300W | 31316MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
And, the training stops at 39th steps every time.
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) (None, 256, 256, 3) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 64, 64, 96) 34944
_________________________________________________________________
batch_normalization_6 (Batch (None, 64, 64, 96) 384
_________________________________________________________________
activation_6 (Activation) (None, 64, 64, 96) 0
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 31, 31, 96) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 31, 31, 256) 614656
_________________________________________________________________
batch_normalization_7 (Batch (None, 31, 31, 256) 1024
_________________________________________________________________
activation_7 (Activation) (None, 31, 31, 256) 0
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 15, 15, 256) 0
_________________________________________________________________
conv2d_8 (Conv2D) (None, 15, 15, 384) 885120
_________________________________________________________________
batch_normalization_8 (Batch (None, 15, 15, 384) 1536
_________________________________________________________________
activation_8 (Activation) (None, 15, 15, 384) 0
_________________________________________________________________
conv2d_9 (Conv2D) (None, 15, 15, 384) 1327488
_________________________________________________________________
batch_normalization_9 (Batch (None, 15, 15, 384) 1536
_________________________________________________________________
activation_9 (Activation) (None, 15, 15, 384) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 15, 15, 256) 884992
_________________________________________________________________
batch_normalization_10 (Batc (None, 15, 15, 256) 1024
_________________________________________________________________
activation_10 (Activation) (None, 15, 15, 256) 0
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 7, 7, 256) 0
_________________________________________________________________
conv2d_11 (Conv2D) (None, 1, 1, 1024) 12846080
_________________________________________________________________
batch_normalization_11 (Batc (None, 1, 1, 1024) 4096
_________________________________________________________________
encoded (Activation) (None, 1, 1, 1024) 0
_________________________________________________________________
reshape_1 (Reshape) (None, 2, 2, 256) 0
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 4, 4, 128) 819328
_________________________________________________________________
activation_11 (Activation) (None, 4, 4, 128) 0
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 8, 8, 64) 204864
_________________________________________________________________
activation_12 (Activation) (None, 8, 8, 64) 0
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 16, 16, 32) 51232
_________________________________________________________________
activation_13 (Activation) (None, 16, 16, 32) 0
_________________________________________________________________
conv2d_transpose_4 (Conv2DTr (None, 32, 32, 16) 12816
_________________________________________________________________
activation_14 (Activation) (None, 32, 32, 16) 0
_________________________________________________________________
conv2d_transpose_5 (Conv2DTr (None, 64, 64, 8) 3208
_________________________________________________________________
activation_15 (Activation) (None, 64, 64, 8) 0
_________________________________________________________________
conv2d_transpose_6 (Conv2DTr (None, 128, 128, 4) 804
_________________________________________________________________
activation_16 (Activation) (None, 128, 128, 4) 0
_________________________________________________________________
conv2d_transpose_7 (Conv2DTr (None, 256, 256, 3) 303
=================================================================
Total params: 17,695,435
Trainable params: 17,690,635
Non-trainable params: 4,800
_________________________________________________________________
Epoch 1/1
Found 11058 images belonging to 1 classes.
Found 11058 images belonging to 1 classes.
Found 11058 images belonging to 1 classes.
Found 44234 images belonging to 1 classes.
Found 11058 images belonging to 1 classes.
Found 44234 images belonging to 1 classes.
Found 44234 images belonging to 1 classes.
Found 44234 images belonging to 1 classes.
1/1382 [..............................] - ETA: 19:43:47 - loss: 0.6934 - accuracy: 0.1511
2/1382 [..............................] - ETA: 10:04:16 - loss: 0.6933 - accuracy: 0.1545
3/1382 [..............................] - ETA: 7:28:21 - loss: 0.6933 - accuracy: 0.1571
4/1382 [..............................] - ETA: 6:07:30 - loss: 0.6932 - accuracy: 0.1590
5/1382 [..............................] - ETA: 5:21:58 - loss: 0.6931 - accuracy: 0.1614
6/1382 [..............................] - ETA: 4:55:45 - loss: 0.6930 - accuracy: 0.1648
7/1382 [..............................] - ETA: 4:32:58 - loss: 0.6929 - accuracy: 0.1668
8/1382 [..............................] - ETA: 4:15:07 - loss: 0.6929 - accuracy: 0.1692
9/1382 [..............................] - ETA: 4:02:22 - loss: 0.6928 - accuracy: 0.1726
10/1382 [..............................] - ETA: 3:50:11 - loss: 0.6926 - accuracy: 0.1745
11/1382 [..............................] - ETA: 3:39:13 - loss: 0.6925 - accuracy: 0.1769
12/1382 [..............................] - ETA: 3:29:38 - loss: 0.6924 - accuracy: 0.1797
13/1382 [..............................] - ETA: 3:21:11 - loss: 0.6923 - accuracy: 0.1824
14/1382 [..............................] - ETA: 3:13:42 - loss: 0.6922 - accuracy: 0.1845
15/1382 [..............................] - ETA: 3:07:17 - loss: 0.6920 - accuracy: 0.1871
16/1382 [..............................] - ETA: 3:01:59 - loss: 0.6919 - accuracy: 0.1896
17/1382 [..............................] - ETA: 2:57:36 - loss: 0.6918 - accuracy: 0.1916
18/1382 [..............................] - ETA: 2:53:06 - loss: 0.6917 - accuracy: 0.1938
19/1382 [..............................] - ETA: 2:49:37 - loss: 0.6915 - accuracy: 0.1956
20/1382 [..............................] - ETA: 2:45:51 - loss: 0.6915 - accuracy: 0.1979
21/1382 [..............................] - ETA: 2:43:18 - loss: 0.6914 - accuracy: 0.2000
22/1382 [..............................] - ETA: 2:41:02 - loss: 0.6913 - accuracy: 0.2022
23/1382 [..............................] - ETA: 2:39:23 - loss: 0.6912 - accuracy: 0.2039
24/1382 [..............................] - ETA: 2:37:23 - loss: 0.6911 - accuracy: 0.2060
25/1382 [..............................] - ETA: 2:35:58 - loss: 0.6909 - accuracy: 0.2080
26/1382 [..............................] - ETA: 2:34:06 - loss: 0.6909 - accuracy: 0.2098
27/1382 [..............................] - ETA: 2:33:19 - loss: 0.6908 - accuracy: 0.2115
28/1382 [..............................] - ETA: 2:32:24 - loss: 0.6906 - accuracy: 0.2130
29/1382 [..............................] - ETA: 2:31:43 - loss: 0.6904 - accuracy: 0.2143
30/1382 [..............................] - ETA: 2:31:09 - loss: 0.6904 - accuracy: 0.2157
31/1382 [..............................] - ETA: 2:30:34 - loss: 0.6902 - accuracy: 0.2173
32/1382 [..............................] - ETA: 2:29:26 - loss: 0.6901 - accuracy: 0.2185
33/1382 [..............................] - ETA: 2:28:55 - loss: 0.6900 - accuracy: 0.2199
34/1382 [..............................] - ETA: 2:28:05 - loss: 0.6899 - accuracy: 0.2213
35/1382 [..............................] - ETA: 2:27:23 - loss: 0.6898 - accuracy: 0.2227
36/1382 [..............................] - ETA: 2:27:02 - loss: 0.6897 - accuracy: 0.2238
37/1382 [..............................] - ETA: 2:26:56 - loss: 0.6895 - accuracy: 0.2253
38/1382 [..............................] - ETA: 2:26:32 - loss: 0.6893 - accuracy: 0.2266
39/1382 [..............................] - ETA: 2:26:11 - loss: 0.6891 - accuracy: 0.2278
Even waiting hours, the training process doesn't move further.
Another, unusual thing I noticed is that, setting the batch size to "1", the GPU is being continuously utilized.
What could be the problem?
This might be an issue with the drive where you placed the dataset. The code was working fine everywhere but not on this server. I changed the drive (from one NFS share to another) and everything works well.