Some folds in k-fold cross-validation containing 0 True/False positives - tensorflow

I am working on a problem of time-series segmentation of sleep sensor data by using U-Net (CNN-based) in a tensorflow(keras) implementation and I am getting some really strange behavior. For a 10-fold cross-validation run, typically 9 folds work as expected, with one or two folds containing 0 positive predictions. I am at a complete loss at this point as to why some of the folds refuse to learn anything and seem to get stuck in a local minima after 1 epoch... I am really looking for ANY input that could point me in the right direction, thanks!
For example I receive:
TP=87153
TN=1889185
FP=35217
FN=93021
for a typical fold, but in a problematic one:
TP=0
TN=1912697
FP=0
FN=191879
As you can see, the model predicts no positives for these runs.
Stats for 10 folds; recall/precision is 0.000 for the problematic fold #2 (technically undefined):
fold
acc
precision
recall
f1
specificity
1
0.895
0.752
0.629
0.685
0.954
2
0.862
0.000
0.000
0.000
1.000
3
0.903
0.727
0.592
0.653
0.960
4
0.894
0.760
0.531
0.625
0.966
5
0.893
0.805
0.561
0.661
0.969
6
0.901
0.755
0.583
0.658
0.963
7
0.900
0.760
0.518
0.616
0.970
8
0.865
0.857
0.522
0.649
0.973
9
0.891
0.779
0.572
0.659
0.963
10
0.895
0.806
0.592
0.683
0.967
I have tried these measures with no effect:
Reducing the size of the model (# of nodes/feature channels)
Reducing the number of convolutional layers in U-Net
Balancing the dataset (It is quite unbalanced)
Removing dropout layers
Upgrading to tensorflow 2.3.
Running 10 runs with different randomSeeds, this simply shifts the problem to another fold, and the zero-fold behavior appeared in 7/10 of the runs. This tells me that the error is not data-dependent.
Reducing the size of the data-set to give me some clues. This makes the problem worse, and introduces more problematic zero-positive folds (up to 5/10), but the problematic folds remain a superset of the problematic folds in the runs with more data, which is further evidence that it is not data-dependent.
A problematic fold's run will have this behavior where the loss starts extremely high and gradually goes down, but val_accuracy and accuracy stay the same (except for the first epoch). I am using ReLu as the activation function, which I think should prevent vanishing gradients(?):
Epoch 1/50
397/397 [==============================] - 13s 33ms/step - loss: 0.6851 - accuracy: 0.9010 - val_loss: 0.6777 - val_accuracy: 0.8963
Epoch 2/50
397/397 [==============================] - 12s 30ms/step - loss: 0.6688 - accuracy: 0.9207 - val_loss: 0.6628 - val_accuracy: 0.8963
Epoch 3/50
397/397 [==============================] - 12s 30ms/step - loss: 0.6532 - accuracy: 0.9207 - val_loss: 0.6484 - val_accuracy: 0.8963
Epoch 4/50
397/397 [==============================] - 12s 30ms/step - loss: 0.6381 - accuracy: 0.9207 - val_loss: 0.6345 - val_accuracy: 0.8963
Epoch 5/50
397/397 [==============================] - 12s 30ms/step - loss: 0.6235 - accuracy: 0.9207 - val_loss: 0.6210 - val_accuracy: 0.8963
A normal fold's epochs by contrast:
Epoch 1/50
397/397 [==============================] - 12s 31ms/step - loss: 0.2710 - accuracy: 0.9195 - val_loss: 0.2063 - val_accuracy: 0.8963
Epoch 2/50
397/397 [==============================] - 12s 31ms/step - loss: 0.1671 - accuracy: 0.9195 - val_loss: 0.1953 - val_accuracy: 0.8963
Epoch 3/50
397/397 [==============================] - 12s 31ms/step - loss: 0.1613 - accuracy: 0.9342 - val_loss: 0.1915 - val_accuracy: 0.9302
Epoch 4/50
397/397 [==============================] - 12s 31ms/step - loss: 0.1582 - accuracy: 0.9436 - val_loss: 0.1867 - val_accuracy: 0.9318
Epoch 5/50
397/397 [==============================] - 12s 31ms/step - loss: 0.1567 - accuracy: 0.9444 - val_loss: 0.1871 - val_accuracy: 0.9316
Hyperparameters:
validationFraction: 0.33
batchSize: 500
numFolds: 10
numEpochs: 50
I would really appreciate any thoughts or anecdotal insights if you have encountered something similar.

Related

CNN1D learns only one class in a binary classification problem (Keras)

I try to find in a graph the nodes which belong to a particular structure (for example a clique). I must therefore have at the output a vector [0,0,1,0,1,1,0 .......] where the 1 represents the nodes belonging to the cliques.
My inputs are graphs where each node of the graph is represented by an embedding vector, the input is in this form:
[[-1.548624, 2.6481668, 0.21574, -0.324527 ........]
[.....] ...[.....]].
the problem is that my model only learns one of the two classes, either (1) or (0) depending on the greater presence in the dataset. after doing a data rebalancing the results are around 0.5 accuracies.
I tried the data rebalancing, change of embedding method but the result remains the same.
Does anyone have any idea what is causing the problem?
here is the code:
def model(input_shape):
model = Sequential()
model.add(Conv1D(30,3,input_shape=input_shape,activation="sigmoid"))
model.add(MaxPool1D(9))
model.add(LSTM(50,return_sequences=True))
model.add(Flatten())
model.add(Dense(889,activation="sigmoid"))
#model.summary()
opt = tf.keras.optimizers.SGD(learning_rate=0.01)
model.compile(loss='mse',optimizer='sgd',metrics=['binary_accuracy'])
return model
train_x,train_y,test_x,test_y,val_x,val_y = load_data()
model=model((889,64))
model.fit(train_x,train_y,validation_data=(val_x,val_y),epochs=500,batch_size=4)
I am not sure of my (activation function, loss, metric) even if the best results given are with this.
120/120 [==============================] - 4s 13ms/step - loss: 0.7967 - binary_accuracy: 0.3721 - val_loss: 0.4342 - val_binary_accuracy: 0.3979
Epoch 2/8
120/120 [==============================] - 1s 8ms/step - loss: 0.3795 - binary_accuracy: 0.4164 - val_loss: 0.2758 - val_binary_accuracy: 0.4871
Epoch 3/8
120/120 [==============================] - 1s 8ms/step - loss: 0.2594 - binary_accuracy: 0.5262 - val_loss: 0.2304 - val_binary_accuracy: 0.6379
Epoch 4/8
120/120 [==============================] - 1s 8ms/step - loss: 0.2255 - binary_accuracy: 0.6643 - val_loss: 0.2181 - val_binary_accuracy: 0.6910
Epoch 5/8
120/120 [==============================] - 1s 8ms/step - loss: 0.2161 - binary_accuracy: 0.6914 - val_loss: 0.2148 - val_binary_accuracy: 0.6921
Epoch 6/8
120/120 [==============================] - 1s 8ms/step - loss: 0.2136 - binary_accuracy: 0.6922 - val_loss: 0.2139 - val_binary_accuracy: 0.6921
Epoch 7/8
120/120 [==============================] - 1s 8ms/step - loss: 0.2132 - binary_accuracy: 0.6917 - val_loss: 0.2137 - val_binary_accuracy: 0.6921
Epoch 8/8
120/120 [==============================] - 1s 8ms/step - loss: 0.2129 - binary_accuracy: 0.6919 - val_loss: 0.2136 - val_binary_accuracy: 0.6921
5/5 [==============================] - 0s 11ms/step - loss: 0.2137 - binary_accuracy: 0.6915
[0.21371755003929138, 0.6915410757064819]
thank you in advance for your feedback;).
First of all, 1D convs are a lot different from 2D convs. For 2D convs you want to keep the kernel size low because the number of calculations scales as the square of the kernel size C*K^2 (for 3d convs it gets cubed!), but for 1D it scales linearly. You really want to up that kernel size.
Secondly, you need to normalize your inputs for certain ML approaches, neural networks being one of them.
Finally, I'm a bit worried CNN is the wrong approach for this anyway. On top of that you have a LSTM, which I don't really follow. Maybe start of with just a couple dense layers as a good baseline.
One debugging tip, make a really small dataset and make sure you can overfit i.e. memorize the training set.

Keras model gets worse when fine-tuning

I'm trying to follow the fine-tuning steps described in https://www.tensorflow.org/tutorials/images/transfer_learning#create_the_base_model_from_the_pre-trained_convnets to get a trained model for binary segmentation.
I create an encoder-decoder with the weights of the encoder being the ones of the MobileNetV2 and fixed as encoder.trainable = False. Then, I define my decoder as said in the tutorial and I train the network for 300 epochs using a learning rate of 0.005. I get the following loss value and Jaccard index during the lasts epochs:
Epoch 297/300
55/55 [==============================] - 85s 2s/step - loss: 0.2443 - jaccard_sparse3D: 0.5556 - accuracy: 0.9923 - val_loss: 0.0440 - val_jaccard_sparse3D: 0.3172 - val_accuracy: 0.9768
Epoch 298/300
55/55 [==============================] - 75s 1s/step - loss: 0.2437 - jaccard_sparse3D: 0.5190 - accuracy: 0.9932 - val_loss: 0.0422 - val_jaccard_sparse3D: 0.3281 - val_accuracy: 0.9776
Epoch 299/300
55/55 [==============================] - 78s 1s/step - loss: 0.2465 - jaccard_sparse3D: 0.4557 - accuracy: 0.9936 - val_loss: 0.0431 - val_jaccard_sparse3D: 0.3327 - val_accuracy: 0.9769
Epoch 300/300
55/55 [==============================] - 85s 2s/step - loss: 0.2467 - jaccard_sparse3D: 0.5030 - accuracy: 0.9923 - val_loss: 0.0463 - val_jaccard_sparse3D: 0.3315 - val_accuracy: 0.9740
I store all the weights of this model and then, I compute the fine-tuning with the following steps:
model.load_weights('my_pretrained_weights.h5')
model.trainable = True
model.compile(optimizer=Adam(learning_rate=0.00001, name='adam'),
loss=SparseCategoricalCrossentropy(from_logits=True),
metrics=[jaccard, "accuracy"])
model.fit(training_generator, validation_data=(val_x, val_y), epochs=5,
validation_batch_size=2, callbacks=callbacks)
Suddenly the performance of my model is way much worse than during the training of the decoder:
Epoch 1/5
55/55 [==============================] - 89s 2s/step - loss: 0.2417 - jaccard_sparse3D: 0.0843 - accuracy: 0.9946 - val_loss: 0.0079 - val_jaccard_sparse3D: 0.0312 - val_accuracy: 0.9992
Epoch 2/5
55/55 [==============================] - 90s 2s/step - loss: 0.1920 - jaccard_sparse3D: 0.1179 - accuracy: 0.9927 - val_loss: 0.0138 - val_jaccard_sparse3D: 7.1138e-05 - val_accuracy: 0.9998
Epoch 3/5
55/55 [==============================] - 95s 2s/step - loss: 0.2173 - jaccard_sparse3D: 0.1227 - accuracy: 0.9932 - val_loss: 0.0171 - val_jaccard_sparse3D: 0.0000e+00 - val_accuracy: 0.9999
Epoch 4/5
55/55 [==============================] - 94s 2s/step - loss: 0.2428 - jaccard_sparse3D: 0.1319 - accuracy: 0.9927 - val_loss: 0.0190 - val_jaccard_sparse3D: 0.0000e+00 - val_accuracy: 1.0000
Epoch 5/5
55/55 [==============================] - 97s 2s/step - loss: 0.1920 - jaccard_sparse3D: 0.1107 - accuracy: 0.9926 - val_loss: 0.0215 - val_jaccard_sparse3D: 0.0000e+00 - val_accuracy: 1.0000
Is there any known reason why this is happening? Is it normal?
Thank you in advance!
OK I found out what I do different that makes it NOT necessary to compile. I do not set encoder.trainable = False. What I do in the code below is equivalent
for layer in encoder.layers:
layer.trainable=False
then train your model. Then you can unfreeze the encoder weights with
for layer in encoder.layers:
layer.trainable=True
You do not need to recompile the model. I tested this and it works as expected. You can
verify by priniting model summary before and after and look at the number of trainable parameters. As for changing the learning rate I find it is best to use the the keras callback ReduceLROnPlateau to automatically adjust the learning rate based on validation loss. I also recommend using the EarlyStopping callback which monitors validation and halts training if the loss fails to reduce after 'patience' number of consecutive epochs. Setting restore_best_weights=True will load the weights for the epoch with the lowest validation loss so you don't have to save then reload the weights. Set epochs to a large number to ensure this callback activates. The code I use is shown below
es=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=3,
verbose=1, restore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1,
verbose=1)
callbacks=[es, rlronp]
In model.fit set callbacks=callbacks

Is validation dataset initialized/created every epoch during the training process?

Setup:
U-Net network is trained to process small patches (e.g. 64x64 pixels).
The network is fed with a training dataset and validation dataset using Tensorflow Dataset API.
Small patches are generated by sampling (randomly) much larger
images.
The sampling of image patches takes place during the training process
(both training and validation image patches are cropped on the fly).
Tensorflow 2.1 (eager execution mode)
Both training and validation datasets are the same:
dataset = tf.data.Dataset.from_tensor_slices((large_images, large_targets))
dataset = dataset.shuffle(buffer_size=num_large_samples)
dataset = dataset.map(get_patches_from_large_images, num_parallel_calls=num_parallel_calls)
dataset = dataset.unbatch()
dataset = dataset.shuffle(buffer_size=num_small_patches)
dataset = dataset.batch(patches_batch_size)
dataset = dataset.prefetch(1)
dataset = dataset.repeat()
Function get_patches_from_large_images samples a predefined number of small patches from a single large image using tf.image.random_crop. There are two nested loops for and while. The outer loop for is responsible for generating the predefined number of small patches and while is used to check if randomly generated patch using tf.image.random_crop meets some predefined criteria (e.g. patches containing only the background should be discarded). The inner loop while gives up if it is not able to generate a proper patch in some predefined number of iterations so we will not get stuck in this loop. This approach is based on the solution presented here.
for i in range(number_of_patches_from_one_large_image):
num_tries = 0
patches = []
while num_tries < max_num_tries_befor_giving_up:
patch = tf.image.random_crop(large_input_and_target_image,[patch_size, patch_size, 2])
if patch_meets_some_criterions:
break
num_tries = num_tries + 1
patches.append(patch)
Experiment:
training and validation datasets to feed the model are the same (5 large pairs of input-target images), both datasets produce exactly the same number of small patches from single large image
batch_size for training and validation is the same and equals to 50 image patches,
steps_per_epoch and validation_steps are equal (20 batches)
When training is run for validation_freq=5
unet_model.fit(dataset_train, epochs=10, steps_per_epoch=20, validation_data = dataset_val, validation_steps=20, validation_freq=5)
Train for 20 steps, validate for 20 steps
Epoch 1/10
20/20 [==============================] - 44s 2s/step - loss: 0.6771 - accuracy: 0.9038
Epoch 2/10
20/20 [==============================] - 4s 176ms/step - loss: 0.4952 - accuracy: 0.9820
Epoch 3/10
20/20 [==============================] - 4s 196ms/step - loss: 0.0532 - accuracy: 0.9916
Epoch 4/10
20/20 [==============================] - 4s 194ms/step - loss: 0.0162 - accuracy: 0.9942
Epoch 5/10
20/20 [==============================] - 42s 2s/step - loss: 0.0108 - accuracy: 0.9966 - val_loss: 0.0081 - val_accuracy: 0.9975
Epoch 6/10
20/20 [==============================] - 1s 36ms/step - loss: 0.0074 - accuracy: 0.9978
Epoch 7/10
20/20 [==============================] - 4s 175ms/step - loss: 0.0053 - accuracy: 0.9985
Epoch 8/10
20/20 [==============================] - 3s 169ms/step - loss: 0.0034 - accuracy: 0.9992
Epoch 9/10
20/20 [==============================] - 3s 171ms/step - loss: 0.0023 - accuracy: 0.9995
Epoch 10/10
20/20 [==============================] - 43s 2s/step - loss: 0.0016 - accuracy: 0.9997 - val_loss: 0.0013 - val_accuracy: 0.9998
we can see that the first epoch and epochs with validation (every 5th epoch) took much more time than epochs without validation. The same experiment but this time validation is run each epoch give us the following result:
history = unet_model.fit(dataset_train, epochs=10, steps_per_epoch=20, validation_data = dataset_val, validation_steps=20)
Train for 20 steps, validate for 20 steps
Epoch 1/10
20/20 [==============================] - 84s 4s/step - loss: 0.6775 - accuracy: 0.8971 - val_loss: 0.6552 - val_accuracy: 0.9542
Epoch 2/10
20/20 [==============================] - 41s 2s/step - loss: 0.5985 - accuracy: 0.9833 - val_loss: 0.4677 - val_accuracy: 0.9951
Epoch 3/10
20/20 [==============================] - 43s 2s/step - loss: 0.1884 - accuracy: 0.9950 - val_loss: 0.0173 - val_accuracy: 0.9948
Epoch 4/10
20/20 [==============================] - 44s 2s/step - loss: 0.0116 - accuracy: 0.9962 - val_loss: 0.0087 - val_accuracy: 0.9969
Epoch 5/10
20/20 [==============================] - 44s 2s/step - loss: 0.0062 - accuracy: 0.9979 - val_loss: 0.0051 - val_accuracy: 0.9983
Epoch 6/10
20/20 [==============================] - 45s 2s/step - loss: 0.0039 - accuracy: 0.9989 - val_loss: 0.0033 - val_accuracy: 0.9991
Epoch 7/10
20/20 [==============================] - 44s 2s/step - loss: 0.0025 - accuracy: 0.9994 - val_loss: 0.0023 - val_accuracy: 0.9995
Epoch 8/10
20/20 [==============================] - 44s 2s/step - loss: 0.0019 - accuracy: 0.9996 - val_loss: 0.0017 - val_accuracy: 0.9996
Epoch 9/10
20/20 [==============================] - 44s 2s/step - loss: 0.0014 - accuracy: 0.9997 - val_loss: 0.0013 - val_accuracy: 0.9997
Epoch 10/10
20/20 [==============================] - 45s 2s/step - loss: 0.0012 - accuracy: 0.9998 - val_loss: 0.0011 - val_accuracy: 0.9998
Question:
In the first example, we can see that the initialization/creation of the training data set (dataset_train) took about 40s. However, subsequent epochs (without validation) were shorter and took about 4s. Nevertheless, the duration was extended again to about 40 seconds for the epoch with the validation step. Validation dataset (dataset_val) is exactly the same as the training dataset (datasat_train) so the procedure of its creation/initialization took about 40s. However, I am surprised that each validation step is time expensive. I expected the first validation to take 40s, but the next validations should take about 4s. I thought that the validation dataset will behave like the training dataset so the first fetch will take long but subsequent should be much shorter. Am I right or maybe I'm missing something?
Update:
I have checked that creating the iterator from the dataset takes about 40s
dataset_val_it = iter(dataset_val) #40s
If we look inside the fit function, we will see that data_handler object is created once for the whole training, and it returns the data iterator that is used in the main loop of the training process. The iterator is created by calling the function enumerate_epochs. When the fit function wants to perform the validation process, it calls the evaluate function. Whenever evaluate function is called it creates new data_handler object. And then it calls enumerate_epochs function what in turn creates the iterator from the dataset. Unfortunately, in the case of complicated datasets, this process is time-consuming.
If you want just want a quickfix to speed up your input pipeline, you can try caching the elements of the validation dataset.
If we look inside the fit function, we will see that data_handler object is created once for the whole training, and it returns the data iterator that is used in the main loop of the training process. The iterator is created by calling the function enumerate_epochs. When the fit function wants to perform the validation process, it calls the evaluate function. Whenever evaluate function is called it creates new data_handler object. And then it calls enumerate_epochs function what in turn creates the iterator from the dataset. Unfortunately, in the case of complicated datasets, this process is time-consuming.
I've never dug very deep in the tf.data code, but you seem to make a point here. I think it can be interesting to open an issue on Github for this.

Networks with multiple outputs, how the loss is computed?

When training a network with more than one branch, and therefore more than one loss, the keras description mentioned that the global loss is a weighted summation of the two partial losses, i.e. final_loss = l1*loss1 + l2*loss2
However during the training of my model consisting of two branches, and compiled with a categorical cross-entropy loss for both branches, with the option loss_weights=[1., 1.]. I expected to see the global loss as the average of two losses (since the two partial losses are equally weighted), which is not the case. I got a relatively high global loss that I could not guess how it was computed using partial losses and their weights. The following is some training values. Could anyone explain to me how was the global loss computed with these parameters? and should the sum of loss weights not exceed 1 (i.e. should I use loss_weights=[0.5, 0.5] instead?)
I will be very grateful to those who could help because I have been blocked for a long time.
Epoch 2/200
26/26 [==============================] - 39s 1s/step - loss: 9.2902 -
dense_1_loss: 0.0801 - dense_2_loss: 0.0717 -
Epoch 3/200
26/26 [==============================] - 39s 1s/step - loss: 8.2261 -
dense_1_loss: 0.0251 - dense_2_loss: 0.0199 -
Epoch 4/200
26/26 [==============================] - 39s 2s/step - loss: 7.3107 -
dense_1_loss: 0.0595 - dense_2_loss: 0.0048 -
Epoch 5/200
26/26 [==============================] - 39s 1s/step - loss: 6.4586 -
dense_1_loss: 0.0560 - dense_2_loss: 0.0025 -
Epoch 6/200
26/26 [==============================] - 39s 1s/step - loss: 5.9463 -
dense_1_loss: 0.1964 - dense_2_loss: 0.0653 -
Epoch 7/200
26/26 [==============================] - 39s 1s/step - loss: 5.3730 -
dense_1_loss: 0.1722 - dense_2_loss: 0.0447 -
Epoch 8/200
26/26 [==============================] - 39s 1s/step - loss: 4.8407 -
dense_1_loss: 0.1396 - dense_2_loss: 0.0169 -
Epoch 9/200
26/26 [==============================] - 39s 1s/step - loss: 4.4465 -
dense_1_loss: 0.1614 - dense_2_loss: 0.0124 -
Epoch 10/200
26/26 [==============================] - 39s 2s/step - loss: 3.9898 -
dense_1_loss: 0.0588 - dense_2_loss: 0.0119 -
Epoch 11/200
26/26 [==============================] - 39s 1s/step - loss: 3.6347 -
dense_1_loss: 0.0302 - dense_2_loss: 0.0085 -
Correct. Global loss is weighted sum of two partial losses as
Global loss=(loss1 * weight1 + loss2 * weight2)
I have taken a keras functional model to demonstrate global loss is weighted sum of two partial losses. Please take a look at the entire code here.
Model compiled as
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss=[keras.losses.BinaryCrossentropy(from_logits=True),
keras.losses.CategoricalCrossentropy(from_logits=True)],
loss_weights=[1., 0.2])
Model trained as
model.fit({'title': title_data, 'body': body_data, 'tags': tags_data},
{'priority': priority_targets, 'department': dept_targets},
epochs=2,batch_size=32)
Epoch 1/2
40/40 [==============================] - 2s 45ms/step - loss: 1.2723 - priority_loss: 0.7062 - department_loss: 2.8304
Epoch 2/2
40/40 [==============================] - 2s 46ms/step - loss: 1.2593 - priority_loss: 0.6995 - department_loss: 2.7993
Check how the weights and two loss are used to get overall loss
(loss1*weight1+loss2*weight2)
(0.7062*1.0+2.8304*0.2) #1.27228
Hope this helps.

Understanding output during training - what does the durations mean & what what does TF do between two epochs?

I am quite new to Tensorflow and Keras and mighty google couldn't help me with the following question so far:
Below you can see TF/Keras-output from training a pre-trained CNN in spyder (using anaconda):
What is those (bold) timings about? As far as I could measure, it is the total time needed for this epoch. Am I correct?
The italicized numbers mean seconds for the complete batch (steps*batch-size) and time/batch.
What is TF/Keras doing in the significant time-span between two training-batches
Lets look at Epoch 2:
The whole epoch took 42 seconds, the training itself only 7 seconds. What's going on in the remaining 42-7 = 35 seconds?
From my understanding, the training time includes:
+everything about learning (fwd prop, calculating gradients, backwards prop)
Is the remaining time purely loading and re-scaling images?
Epoch 1/50
50/50 [==============================] - *9s 186ms/step* - loss: 0.6557 - acc: 0.9076
- **53s** - loss: 0.8610 - acc: 0.8472 - val_loss: 0.6557 - val_acc: 0.9076
Epoch 2/50
50/50 [==============================] - *7s 147ms/step* - loss: 0.4148 - acc: 0.9478
- **41s** - loss: 0.2432 - acc: 0.9097 - val_loss: 0.4148 - val_acc: 0.9478
Epoch 3/50
50/50 [==============================] - *8s 158ms/step* - loss: 0.5873 - acc: 0.9384 - **42s** - loss: 0.1696 - acc: 0.9335 - val_loss: 0.5873 - val_acc: 0.9384
Epoch 4/50
50/50 [==============================] - *7s 149ms/step* - loss: 0.5356 - acc: 0.9492
- **41s** - loss: 0.1274 - acc: 0.9548 - val_loss: 0.5356 - val_acc: 0.9492
.....
If it matters: I am using an image generator (see code below) and augmentation. The small (usually <500kb) pics are loaded from an SSD (Samsung 960 1TB).
train_datagen = ImageDataGenerator(rescale=1./255.)
train_generator = train_datagen.flow_from_directory(train_dir,
batch_size=20,
class_mode='binary',
target_size=(IMAGE_WIDTH, IMAGE_HEIGHT))
Thanks a lot guys.