Local output of Keras cost is nan but colab output of Keras cost is valid - tensorflow

I was trying a neural network on the cifar-10 dataset but the training outputted nan for the cost of the model on the first epoch when I was using my GTX 1650 for laptop.
I tried to normalize the data with tf.keras.layers.Normalization with mean 0 and standard deviation 1. I also tried tf.keras.layers.Rescaling(1./255) to get values between 0 and 1. I also added the LossScaleOptimizer to prevent underflow. I also used clipnorm = 1 in the optimizer to prevent overflow. But none of the above helped with the issue.
However, I copied the code to colab and used a gpu runtime and the training is actually successful without any nan for the cost.
Code
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
import seaborn as sns
sns.set(style="dark")
from tensorflow.keras.datasets.cifar10 import load_data
from tensorflow.keras import Model, Input, Sequential
from tensorflow.keras.layers import Add, Rescaling, Dense, Conv2D, GlobalAveragePooling2D, MaxPool2D, Dropout, BatchNormalization, ReLU, Layer, Reshape, Flatten, Activation, Normalization, Multiply, AveragePooling2D
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard, TerminateOnNaN, CSVLogger
from tensorflow.keras.optimizers import SGD, Adam
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.regularizers import l2
# from tensorflow.keras.applications import efficientnet_v2
from functools import partial
from tensorflow.image import random_flip_left_right, random_crop, resize_with_crop_or_pad
from tensorflow.keras.models import load_model
from tensorflow.keras import mixed_precision
import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
mixed_precision.set_global_policy('mixed_float16')# if this line is deleted, there will be a out of memory error.
(x_train, y_train),(x_test, y_test) = load_data()
from tensorflow.keras.regularizers import l2
regparam = 0.0005
class WideResNet(Model):
def __init__(self , activation , numfilters , identity = False ):
super().__init__()
self.add = Add()
self.activation = Activation(activation)
self.batchnorm = BatchNormalization()
self.batchnorm2 = BatchNormalization()
self.mainconv1 = Conv2D(numfilters , (3,3), padding = 'same' , kernel_regularizer= l2(regparam) )
self.mainconv2 = Conv2D(numfilters , (3,3), padding = 'same' , kernel_regularizer= l2(regparam))
self.sideconv = Conv2D(numfilters , (1,1), padding = 'same' , kernel_regularizer= l2(regparam)) if not identity else None
def call(self , X):
mainbranch = self.batchnorm(X)
mainbranch = self.activation(mainbranch)
mainbranch = self.mainconv1(mainbranch)
mainbranch = self.batchnorm2(mainbranch)
mainbranch = self.activation(mainbranch)
mainbranch = self.mainconv2(mainbranch)
sidebranch = self.sideconv(X) if self.sideconv is not None else X
return self.add([mainbranch , sidebranch ])
def buildwrnmodel(k, shapes = [16,32 , 64], n_inner_layers =4 ,n_classes =10 , imageshape = (32,32,3) ,loss = 'sparse_categorical_crossentropy',
activation = 'relu', optimizer = 'adam' , metric = 'accuracy'):
inputs = Input(imageshape)
x = tf.keras.layers.Rescaling(1./255)(inputs)
x = Conv2D(16 , (3,3))(x)
x = BatchNormalization( )(x)
x = Activation(activation)(x)
for i, length in enumerate(shapes):
for j in range(n_inner_layers):
x = WideResNet(activation , length* k , identity = (j != 0) )(x)
x = BatchNormalization( )(x)
x = Activation('tanh')(x)
x = GlobalAveragePooling2D()(x)
x = Flatten()(x)
outputs = Dense(n_classes, activation="softmax", dtype='float32')(x)
model = Model(inputs=inputs, outputs=outputs)
model.compile( loss = loss, optimizer = optimizer, metrics = [metric])
print(model.summary())
return model
from tensorflow.keras.mixed_precision import LossScaleOptimizer
from keras.layers import LeakyReLU
wrn = buildwrnmodel( 7 ,optimizer = LossScaleOptimizer(tf.keras.optimizers.Adam(clipnorm = 1) , initial_scale = 2**30 ) , activation = LeakyReLU(0.1))
history = wrn.fit(x = x_train , y = y_train , epochs=50 , batch_size = 16 , validation_split = 0.2 , verbose = 1 )
Output for Local / GTX 1650:
Epoch 1/50
2500/2500 [==============================] - ETA: 0s - loss: nan - accuracy: 0.0997 # <---- the nan cost here
Output for Colab:
2500/2500 [==============================] - 350s 131ms/step - loss: 2.0917 - accuracy: 0.3268 - val_loss: 2.2110 - val_accuracy: 0.2778
Epoch 2/50
2500/2500 [==============================] - 326s 131ms/step - loss: 1.6841 - accuracy: 0.4144 - val_loss: 2.0834 - val_accuracy: 0.2945
Epoch 3/50
2500/2500 [==============================] - 326s 130ms/step - loss: 1.5518 - accuracy: 0.4737 - val_loss: 1.7759 - val_accuracy: 0.4252
Epoch 4/50
2500/2500 [==============================] - 325s 130ms/step - loss: 1.4494 - accuracy: 0.5221 - val_loss: 1.7152 - val_accuracy: 0.4548
Epoch 5/50
2500/2500 [==============================] - 325s 130ms/step - loss: 1.3813 - accuracy: 0.5464 - val_loss: 1.9141 - val_accuracy: 0.3800
Epoch 6/50
2500/2500 [==============================] - 325s 130ms/step - loss: 1.3320 - accuracy: 0.5684 - val_loss: 1.5846 - val_accuracy: 0.4920
Epoch 7/50
2500/2500 [==============================] - 324s 129ms/step - loss: 1.2882 - accuracy: 0.5847 - val_loss: 1.7444 - val_accuracy: 0.4798
Epoch 8/50
2500/2500 [==============================] - 324s 130ms/step - loss: 1.2460 - accuracy: 0.6057 - val_loss: 1.2865 - val_accuracy: 0.5981
Epoch 9/50
2500/2500 [==============================] - 324s 130ms/step - loss: 1.2215 - accuracy: 0.6112 - val_loss: 1.5941 - val_accuracy: 0.4577
Epoch 10/50
2500/2500 [==============================] - 323s 129ms/step - loss: 1.1926 - accuracy: 0.6244 - val_loss: 1.5356 - val_accuracy: 0.5154
Epoch 11/50
2500/2500 [==============================] - 323s 129ms/step - loss: 1.1703 - accuracy: 0.6353 - val_loss: 1.6718 - val_accuracy: 0.4706
Epoch 12/50
2500/2500 [==============================] - 323s 129ms/step - loss: 1.1521 - accuracy: 0.6450 - val_loss: 1.4850 - val_accuracy: 0.5209
Epoch 13/50
2500/2500 [==============================] - 324s 130ms/step - loss: 1.1241 - accuracy: 0.6562 - val_loss: 1.7300 - val_accuracy: 0.4685
Epoch 14/50
2500/2500 [==============================] - 324s 130ms/step - loss: 1.1133 - accuracy: 0.6625 - val_loss: 2.5892 - val_accuracy: 0.3180
Epoch 15/50
2500/2500 [==============================] - 324s 130ms/step - loss: 1.0970 - accuracy: 0.6719 - val_loss: 1.2511 - val_accuracy: 0.6163
Epoch 16/50
2500/2500 [==============================] - 324s 129ms/step - loss: 1.0848 - accuracy: 0.6785 - val_loss: 1.6947 - val_accuracy: 0.5217
Epoch 17/50
2500/2500 [==============================] - 323s 129ms/step - loss: 1.0742 - accuracy: 0.6823 - val_loss: 2.1976 - val_accuracy: 0.4288
Epoch 18/50
2500/2500 [==============================] - 324s 130ms/step - loss: 1.0602 - accuracy: 0.6896 - val_loss: 1.5810 - val_accuracy: 0.5695
Epoch 19/50
2500/2500 [==============================] - 323s 129ms/step - loss: 1.0435 - accuracy: 0.6962 - val_loss: 1.4429 - val_accuracy: 0.5653
Epoch 20/50
2500/2500 [==============================] - 323s 129ms/step - loss: 1.0381 - accuracy: 0.6973 - val_loss: 1.5911 - val_accuracy: 0.5423
Epoch 21/50
2500/2500 [==============================] - 323s 129ms/step - loss: 1.0231 - accuracy: 0.7044 - val_loss: 1.4593 - val_accuracy: 0.5889
Epoch 22/50
2500/2500 [==============================] - 322s 129ms/step - loss: 1.0160 - accuracy: 0.7096 - val_loss: 1.4631 - val_accuracy: 0.5841
Epoch 23/50
2500/2500 [==============================] - 322s 129ms/step - loss: 1.0110 - accuracy: 0.7095 - val_loss: 1.8995 - val_accuracy: 0.5124
Epoch 24/50
2500/2500 [==============================] - 322s 129ms/step - loss: 0.9988 - accuracy: 0.7141 - val_loss: 1.1256 - val_accuracy: 0.6848
Epoch 25/50
2500/2500 [==============================] - 322s 129ms/step - loss: 0.9927 - accuracy: 0.7188 - val_loss: 1.9539 - val_accuracy: 0.4719
Epoch 26/50
2500/2500 [==============================] - 322s 129ms/step - loss: 0.9923 - accuracy: 0.7165 - val_loss: 1.4381 - val_accuracy: 0.6026
Epoch 27/50
2500/2500 [==============================] - 323s 129ms/step - loss: 0.9826 - accuracy: 0.7223 - val_loss: 2.3859 - val_accuracy: 0.4096
Epoch 28/50
2500/2500 [==============================] - 322s 129ms/step - loss: 0.9820 - accuracy: 0.7217 - val_loss: 1.7952 - val_accuracy: 0.5303
Epoch 29/50
2500/2500 [==============================] - 321s 129ms/step - loss: 0.9767 - accuracy: 0.7260 - val_loss: 1.5632 - val_accuracy: 0.5590
Epoch 30/50
2500/2500 [==============================] - 321s 129ms/step - loss: 0.9643 - accuracy: 0.7307 - val_loss: 2.1064 - val_accuracy: 0.4547
Epoch 31/50
2500/2500 [==============================] - 322s 129ms/step - loss: 0.9550 - accuracy: 0.7322 - val_loss: 4.3578 - val_accuracy: 0.3707
Epoch 32/50
2500/2500 [==============================] - 321s 128ms/step - loss: 0.9596 - accuracy: 0.7326 - val_loss: 2.3511 - val_accuracy: 0.4620
Epoch 33/50
2500/2500 [==============================] - 321s 128ms/step - loss: 0.9552 - accuracy: 0.7345 - val_loss: 3.2045 - val_accuracy: 0.3117
Epoch 34/50
2500/2500 [==============================] - 322s 129ms/step - loss: 0.9454 - accuracy: 0.7367 - val_loss: 2.4369 - val_accuracy: 0.4574
Epoch 35/50
1970/2500 [======================>.......] - ETA: 1:04 - loss: 0.9404 - accuracy: 0.7360
I thought the mixed_precision.set_global_policy('mixed_float16') caused this problem in the local environment / laptop as float16 has a smaller range than float32. However, mixed floats (with the same limitations as mixed floats on local) do not seem to cause a problem in the colab.

The epsilon parameter of the Adam optimizer is used for numerical stability:
A small constant for numerical stability. This epsilon is
"epsilon hat" in the Kingma and Ba paper (in the formula just before
Section 2.1), not the epsilon in Algorithm 1 of the paper. Defaults to
1e-7.
I tried your model, and increasing the epsilon parameter helped:
wrn = buildwrnmodel( 7 ,optimizer = LossScaleOptimizer(tf.keras.optimizers.Adam(clipnorm = 1, epsilon=1e-4) , initial_scale = 2**30 ) , activation = LeakyReLU(0.1))

Related

Keras load model after saving the model, why start training from the beginning?

Epoch 1/8
222/222 [==============================] - 18s 67ms/step - loss: 1.4523 - accuracy: 0.9709 - val_loss: 1.3310 - val_accuracy: 0.9865
Epoch 2/8
222/222 [==============================] - 14s 63ms/step - loss: 1.3345 - accuracy: 0.9747 - val_loss: 1.2312 - val_accuracy: 0.9865
Epoch 3/8
222/222 [==============================] - 14s 64ms/step - loss: 1.1911 - accuracy: 0.9868 - val_loss: 1.1245 - val_accuracy: 0.9887
Epoch 4/8
222/222 [==============================] - 14s 63ms/step - loss: 1.0926 - accuracy: 0.9873 - val_loss: 1.0798 - val_accuracy: 0.9769
Epoch 5/8
222/222 [==============================] - 14s 63ms/step - loss: 1.0622 - accuracy: 0.9760 - val_loss: 1.0887 - val_accuracy: 0.9555
Epoch 6/8
222/222 [==============================] - 14s 63ms/step - loss: 0.9589 - accuracy: 0.9841 - val_loss: 0.9216 - val_accuracy: 0.9814
Epoch 7/8
222/222 [==============================] - 14s 64ms/step - loss: 0.8648 - accuracy: 0.9885 - val_loss: 0.8241 - val_accuracy: 0.9896
Epoch 8/8
222/222 [==============================] - 14s 63ms/step - loss: 0.7993 - accuracy: 0.9908 - val_loss: 0.7694 - val_accuracy: 0.9893
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 32, 32, 3)] 0
_________________________________________________________________
model_1 (Functional) (None, 10) 3250058
=================================================================
Total params: 3,250,058
Trainable params: 3,228,170
Non-trainable params: 21,888
_________________________________________________________________
Epoch 1/8
222/222 [==============================] - 18s 66ms/step - loss: 1.4423 - accuracy: 0.9741 - val_loss: 1.3361 - val_accuracy: 0.9839
Epoch 2/8
222/222 [==============================] - 14s 64ms/step - loss: 1.3457 - accuracy: 0.9734 - val_loss: 1.2327 - val_accuracy: 0.9845
Epoch 3/8
222/222 [==============================] - 14s 63ms/step - loss: 1.1927 - accuracy: 0.9893 - val_loss: 1.1287 - val_accuracy: 0.9870
this is my output, as you can see when I load the model after training, the value of the loss is still the same compared with the value before training. I am really confused about it.
This is my code, I want to use two models (After combining, Final combining), and I use load_mode and model.save . Cuz I want to mimic Federated Learning process.
Hope someone can give me some ideas.
def train2():
img_input = Input(shape=(32, 32, 3))
Mobilenet2 = load_model('Final combining.h5')
output = Mobilenet2(img_input)
model = Model(img_input, output)
model.summary()
# set optimizer
sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
# start training
h2 = model.fit(X_train2, y_2_train, batch_size=batch_size,
steps_per_epoch=len(X_train2) // batch_size,
epochs=epochs1,
# callbacks=cbks,
validation_data=(X_test, y_test))
# callbacks=callbacks
model.save('After combining.h5')
def train3():
img_input = Input(shape=(32, 32, 3))
Mobilenet1 = load_model('After combining.h5')
output = Mobilenet1(img_input)
model = Model(img_input, output)
model.summary()
# set optimizer
sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
# start training
h3 = model.fit(X_train1, y_1_train, batch_size=batch_size,
steps_per_epoch=len(X_train1) // batch_size,
epochs=epochs1,
# callbacks=cbks,
validation_data=(X_test, y_test))
# callbacks=callbacks
model.save('Final combining.h5')
I use the for loop to control the training process, the output is the last iteration... , the value of accuracy and loss is almost the same compared with the first iteration
for _ in range(5):
num = 0
if num % 2==0:
train2()
num+=1
else:
train3()
num+=1
I solve it after changing the same name of model
_________________________________________________________________
Epoch 1/8
222/222 [==============================] - 25s 100ms/step - loss: 0.2912 - accuracy: 0.9854 - val_loss: 0.3016 - val_accuracy: 0.9800
Epoch 2/8
222/222 [==============================] - 22s 98ms/step - loss: 0.2637 - accuracy: 0.9906 - val_loss: 0.3110 - val_accuracy: 0.9800
Epoch 3/8
222/222 [==============================] - 22s 97ms/step - loss: 0.2420 - accuracy: 0.9922 - val_loss: 0.2764 - val_accuracy: 0.9865
Epoch 4/8
222/222 [==============================] - 22s 97ms/step - loss: 0.2960 - accuracy: 0.9743 - val_loss: 0.2632 - val_accuracy: 0.9842
Epoch 5/8
222/222 [==============================] - 22s 98ms/step - loss: 0.2291 - accuracy: 0.9928 - val_loss: 0.2757 - val_accuracy: 0.9789
Epoch 6/8
222/222 [==============================] - 22s 97ms/step - loss: 0.2286 - accuracy: 0.9921 - val_loss: 0.2806 - val_accuracy: 0.9744
Epoch 7/8
222/222 [==============================] - 22s 98ms/step - loss: 0.2161 - accuracy: 0.9920 - val_loss: 0.2381 - val_accuracy: 0.9828
Epoch 8/8
222/222 [==============================] - 22s 98ms/step - loss: 0.1936 - accuracy: 0.9953 - val_loss: 0.2192 - val_accuracy: 0.9887
Model: "model_20"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_22 (InputLayer) [(None, 32, 32, 3)] 0
_________________________________________________________________
model_19 (Functional) (None, 10) 3250058
=================================================================
Total params: 3,250,058
Trainable params: 3,228,170
Non-trainable params: 21,888
_________________________________________________________________
Epoch 1/8
222/222 [==============================] - 25s 101ms/step - loss: 0.1774 - accuracy: 0.9972 - val_loss: 0.2197 - val_accuracy: 0.9876
Epoch 2/8
222/222 [==============================] - 22s 98ms/step - loss: 0.1805 - accuracy: 0.9928 - val_loss: 0.2880 - val_accuracy: 0.9713
Epoch 3/8
222/222 [==============================] - 22s 98ms/step - loss: 0.2062 - accuracy: 0.9852 - val_loss: 0.2234 - val_accuracy: 0.9814
Epoch 4/8
222/222 [==============================] - 22s 97ms/step - loss: 0.1765 - accuracy: 0.9938 - val_loss: 0.2218 - val_accuracy: 0.9769
Epoch 5/8
222/222 [==============================] - 22s 98ms/step - loss: 0.1792 - accuracy: 0.9905 - val_loss: 0.2180 - val_accuracy: 0.9803
Epoch 6/8
222/222 [==============================] - 22s 98ms/step - loss: 0.1608 - accuracy: 0.9942 - val_loss: 0.2602 - val_accuracy: 0.9679
Epoch 7/8
222/222 [==============================] - 22s 98ms/step - loss: 0.1581 - accuracy: 0.9925 - val_loss: 0.1826 - val_accuracy: 0.9873
Epoch 8/8
222/222 [==============================] - 22s 98ms/step - loss: 0.2309 - accuracy: 0.9734 - val_loss: 0.2034 - val_accuracy: 0.9831

Fine tuning in CNN using Tensor Flow - 2.0

I am currently working defect classification problem in solar panel. It's a multi class classification problem. Currently its 3 class. I have done the coding part but my accuracy is very low. How to improve my accuracy?
Total training images - 900
Testing/validation - 300
Class - 3
My code is given below -
import tensorflow as tf
import keras_preprocessing
from keras_preprocessing import image
from keras_preprocessing.image import ImageDataGenerator
TRAINING_DIR = "/content/drive/My Drive/solar_images/solar_images/train/"
training_datagen = ImageDataGenerator(
rescale = 1./255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
VALIDATION_DIR = "/content/drive/My Drive/solar_images/solar_images/test/"
validation_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = training_datagen.flow_from_directory(
TRAINING_DIR,
target_size=(150,150),
class_mode='categorical',
batch_size=64
)
validation_generator = validation_datagen.flow_from_directory(
VALIDATION_DIR,
target_size=(150,150),
class_mode='categorical',
batch_size=64
)
model = tf.keras.models.Sequential([
# Note the input shape is the desired size of the image 150x150 with 3 bytes color
# This is the first convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The third convolution
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fourth convolution
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
tf.keras.layers.Dropout(0.5),
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])
model.summary()
model.compile(loss = 'categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
batch_size=64
history = model.fit(train_generator,
epochs=20,
steps_per_epoch=int(894/batch_size),
validation_data = validation_generator,
verbose = 1,
validation_steps=int(289/batch_size))
model.save("solar_images_weight.h5")
My accuracy is -
Epoch 1/20
13/13 [==============================] - 1107s 85s/step - loss: 1.2893 - accuracy: 0.3470 - val_loss: 1.0926 - val_accuracy: 0.3594
Epoch 2/20
13/13 [==============================] - 1239s 95s/step - loss: 1.1037 - accuracy: 0.3566 - val_loss: 1.0954 - val_accuracy: 0.3125
Epoch 3/20
13/13 [==============================] - 1203s 93s/step - loss: 1.0964 - accuracy: 0.3904 - val_loss: 1.0841 - val_accuracy: 0.5625
Epoch 4/20
13/13 [==============================] - 1182s 91s/step - loss: 1.0980 - accuracy: 0.3750 - val_loss: 1.0894 - val_accuracy: 0.3633
Epoch 5/20
13/13 [==============================] - 1218s 94s/step - loss: 1.1086 - accuracy: 0.3386 - val_loss: 1.0874 - val_accuracy: 0.3125
Epoch 6/20
13/13 [==============================] - 1214s 93s/step - loss: 1.0953 - accuracy: 0.3257 - val_loss: 1.0763 - val_accuracy: 0.6094
Epoch 7/20
13/13 [==============================] - 1136s 87s/step - loss: 1.0851 - accuracy: 0.3831 - val_loss: 1.0754 - val_accuracy: 0.3164
Epoch 8/20
13/13 [==============================] - 1170s 90s/step - loss: 1.1005 - accuracy: 0.3940 - val_loss: 1.0545 - val_accuracy: 0.5039
Epoch 9/20
13/13 [==============================] - 1138s 88s/step - loss: 1.1294 - accuracy: 0.4337 - val_loss: 1.0130 - val_accuracy: 0.5703
Epoch 10/20
13/13 [==============================] - 1131s 87s/step - loss: 1.0250 - accuracy: 0.4531 - val_loss: 0.8911 - val_accuracy: 0.6055
Epoch 11/20
13/13 [==============================] - 1162s 89s/step - loss: 1.0243 - accuracy: 0.4735 - val_loss: 0.9160 - val_accuracy: 0.4727
Epoch 12/20
13/13 [==============================] - 1153s 89s/step - loss: 0.9978 - accuracy: 0.4783 - val_loss: 0.7754 - val_accuracy: 0.6406
Epoch 13/20
13/13 [==============================] - 1187s 91s/step - loss: 1.0080 - accuracy: 0.4687 - val_loss: 0.7701 - val_accuracy: 0.6602
Epoch 14/20
13/13 [==============================] - 1204s 93s/step - loss: 0.9851 - accuracy: 0.5048 - val_loss: 0.7450 - val_accuracy: 0.6367
Epoch 15/20
13/13 [==============================] - 1181s 91s/step - loss: 0.9699 - accuracy: 0.4892 - val_loss: 0.7409 - val_accuracy: 0.6289
Epoch 16/20
13/13 [==============================] - 1187s 91s/step - loss: 0.8884 - accuracy: 0.5241 - val_loss: 0.7169 - val_accuracy: 0.6133
Epoch 17/20
13/13 [==============================] - 1197s 92s/step - loss: 0.9372 - accuracy: 0.5084 - val_loss: 0.7464 - val_accuracy: 0.5859
Epoch 18/20
13/13 [==============================] - 1224s 94s/step - loss: 0.9230 - accuracy: 0.5229 - val_loss: 0.9198 - val_accuracy: 0.5156
Epoch 19/20
13/13 [==============================] - 1270s 98s/step - loss: 0.9161 - accuracy: 0.5192 - val_loss: 0.6785 - val_accuracy: 0.6289
Epoch 20/20
13/13 [==============================] - 1173s 90s/step - loss: 0.8728 - accuracy: 0.5193 - val_loss: 0.6674 - val_accuracy: 0.5781
Training and validation accuracy plot is given below -
You could use transfer learning. Using a pre-trained model such as mobilenet or inception to train on your dataset. This would significantly improve your accuracy.

Increase accuracy of CNN model

I have been working on image classification problem. I have used ImageDataGenerator to load and preprocess data and then train my CNN model on image data set but the accuracy is stuck to 51%. I have tried using:
My data set is of 1000 signatures where there are 4000 real sample images and 4000 fake samples images for each signature. In total i have 8000 images.
Different train test split ratio
Different number of epochs and batch size
Increase/Decrease number of layers of CNN
but either model overfits while the accuracy is still 51% or accuracy decreases further.
batch_size = 128
epochs = 15
IMG_HEIGHT = 150
IMG_WIDTH = 150
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.15,
height_shift_range=0.15,
shear_range=0.15,
zoom_range=0.15,
horizontal_flip=True,
fill_mode='nearest',
validation_split=0.4)
train_data_gen = train_image_generator.flow_from_directory(
train_dir,
target_size=(IMG_HEIGHT,IMG_WIDTH),
batch_size=batch_size,
class_mode='binary',
subset='training')
val_data_gen = train_image_generator.flow_from_directory(
train_dir, # same directory as training data
target_size=(IMG_HEIGHT, IMG_WIDTH),
batch_size=batch_size,
class_mode='binary',
subset='validation')
model = Sequential([
Conv2D(16, 3, padding='same', activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
MaxPooling2D(),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
Dense(1)])
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
history_more = model.fit_generator(
train_data_gen,
steps_per_epoch=train_data_gen.samples // batch_size,
epochs=epochs,
validation_data=val_data_gen,
validation_steps=val_data_gen.samples // batch_size)
37/37 [==============================] - 2886s 78s/step - loss: 0.8010 - accuracy: 0.4994 - val_loss: 0.6933 - val_accuracy: 0.5000
Epoch 2/15
37/37 [==============================] - 985s 27s/step - loss: 0.6934 - accuracy: 0.5015 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 3/15
37/37 [==============================] - 986s 27s/step - loss: 0.6931 - accuracy: 0.4991 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 4/15
37/37 [==============================] - 985s 27s/step - loss: 0.6931 - accuracy: 0.4998 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 5/15
37/37 [==============================] - 988s 27s/step - loss: 0.6930 - accuracy: 0.4961 - val_loss: 0.6927 - val_accuracy: 0.5000
Epoch 6/15
37/37 [==============================] - 991s 27s/step - loss: 0.6934 - accuracy: 0.5021 - val_loss: 0.6923 - val_accuracy: 0.5000
Epoch 7/15
37/37 [==============================] - 979s 26s/step - loss: 0.6917 - accuracy: 0.5028 - val_loss: 0.6909 - val_accuracy: 0.5000
Epoch 8/15
37/37 [==============================] - 974s 26s/step - loss: 0.6858 - accuracy: 0.4998 - val_loss: 0.6897 - val_accuracy: 0.4991
Epoch 9/15
37/37 [==============================] - 967s 26s/step - loss: 0.6802 - accuracy: 0.5078 - val_loss: 0.6909 - val_accuracy: 0.5003
Epoch 10/15
37/37 [==============================] - 970s 26s/step - loss: 0.6808 - accuracy: 0.5045 - val_loss: 0.6943 - val_accuracy: 0.5081
Epoch 11/15
37/37 [==============================] - 967s 26s/step - loss: 0.6741 - accuracy: 0.5103 - val_loss: 0.7072 - val_accuracy: 0.5131
Epoch 12/15
37/37 [==============================] - 950s 26s/step - loss: 0.6732 - accuracy: 0.5128 - val_loss: 0.7064 - val_accuracy: 0.5041
Epoch 13/15
37/37 [==============================] - 947s 26s/step - loss: 0.6707 - accuracy: 0.5171 - val_loss: 0.6996 - val_accuracy: 0.5078
Epoch 14/15
37/37 [==============================] - 951s 26s/step - loss: 0.6675 - accuracy: 0.5103 - val_loss: 0.7122 - val_accuracy: 0.5016
Epoch 15/15
37/37 [==============================] - 952s 26s/step - loss: 0.6724 - accuracy: 0.5197 - val_loss: 0.7105 - val_accuracy: 0.5119

Validation Loss Increases every iteration

Recently I have been trying to do multi-class classification. My datasets consist of 17 image categories. Previously I was using 3 conv layers and 2 hidden layers. It resulted my model overfitting with huge validation loss around 11.0++ and my validation accuracy was very low. So I decided to decrease the conv layers by 1 and hidden layer by 1. I also have removed dropout and it still have the same problem with the validation which still overfitting, even though my training accuracy and loss are getting better.
Here is my code for prepared datasets:
import cv2
import numpy as np
import os
import pickle
import random
CATEGORIES = ["apple_pie", "baklava", "caesar_salad","donuts",
"fried_calamari", "grilled_salmon", "hamburger",
"ice_cream", "lasagna", "macaroni_and_cheese", "nachos", "omelette","pizza",
"risotto", "steak", "tiramisu", "waffles"]
DATALOC = "D:/Foods/Datasets"
IMAGE_SIZE = 50
data_training = []
def create_data_training():
for category in CATEGORIES:
path = os.path.join(DATALOC, category)
class_num = CATEGORIES.index(category)
for image in os.listdir(path):
try:
image_array = cv2.imread(os.path.join(path,image), cv2.IMREAD_GRAYSCALE)
new_image_array = cv2.resize(image_array, (IMAGE_SIZE,IMAGE_SIZE))
data_training.append([new_image_array,class_num])
except Exception as exc:
pass
create_data_training()
random.shuffle(data_training)
X = []
y = []
for features, label in data_training:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, IMAGE_SIZE, IMAGE_SIZE, 1)
y = np.array(y)
pickle_out = open("X.pickle", "wb")
pickle.dump(X, pickle_out)
pickle_out.close()
pickle_out = open("y.pickle", "wb")
pickle.dump(y, pickle_out)
pickle_out.close()
pickle_in = open("X.pickle","rb")
X = pickle.load(pickle_in)
Here is the code of my model:
import pickle
import tensorflow as tf
import time
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import Activation, Conv2D, Dense, Dropout, Flatten, MaxPooling2D
NAME = "Foods-Model-{}".format(int(time.time()))
tensorboard = TensorBoard(log_dir='logs\{}'.format(NAME))
X = pickle.load(open("X.pickle","rb"))
y = pickle.load(open("y.pickle","rb"))
X = X/255.0
model = Sequential()
model.add(Conv2D(32,(3,3), input_shape = X.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size =(2,2)))
model.add(Conv2D(64,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size =(2,2)))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation("relu"))
model.add(Dense(17))
model.add(Activation('softmax'))
model.compile(loss = "sparse_categorical_crossentropy", optimizer = "adam", metrics = ['accuracy'])
model.fit(X, y, batch_size = 16, epochs = 20 , validation_split = 0.1, callbacks = [tensorboard])
The result of the trained model:
Train on 7650 samples, validate on 850 samples
Epoch 1/20
7650/7650 [==============================] - 242s 32ms/sample - loss: 2.7826 - accuracy: 0.1024 - val_loss: 2.7018 - val_accuracy: 0.1329
Epoch 2/20
7650/7650 [==============================] - 241s 31ms/sample - loss: 2.5673 - accuracy: 0.1876 - val_loss: 2.5597 - val_accuracy: 0.2059
Epoch 3/20
7650/7650 [==============================] - 234s 31ms/sample - loss: 2.3529 - accuracy: 0.2617 - val_loss: 2.5329 - val_accuracy: 0.2153
Epoch 4/20
7650/7650 [==============================] - 233s 30ms/sample - loss: 2.0707 - accuracy: 0.3510 - val_loss: 2.6628 - val_accuracy: 0.2059
Epoch 5/20
7650/7650 [==============================] - 231s 30ms/sample - loss: 1.6960 - accuracy: 0.4753 - val_loss: 2.8143 - val_accuracy: 0.2047
Epoch 6/20
7650/7650 [==============================] - 230s 30ms/sample - loss: 1.2336 - accuracy: 0.6247 - val_loss: 3.3130 - val_accuracy: 0.1929
Epoch 7/20
7650/7650 [==============================] - 233s 30ms/sample - loss: 0.7738 - accuracy: 0.7715 - val_loss: 3.9758 - val_accuracy: 0.1776
Epoch 8/20
7650/7650 [==============================] - 231s 30ms/sample - loss: 0.4271 - accuracy: 0.8827 - val_loss: 4.7325 - val_accuracy: 0.1882
Epoch 9/20
7650/7650 [==============================] - 233s 30ms/sample - loss: 0.2080 - accuracy: 0.9519 - val_loss: 5.7198 - val_accuracy: 0.1918
Epoch 10/20
7650/7650 [==============================] - 233s 30ms/sample - loss: 0.1402 - accuracy: 0.9668 - val_loss: 6.0608 - val_accuracy: 0.1835
Epoch 11/20
7650/7650 [==============================] - 236s 31ms/sample - loss: 0.0724 - accuracy: 0.9872 - val_loss: 6.7468 - val_accuracy: 0.1753
Epoch 12/20
7650/7650 [==============================] - 232s 30ms/sample - loss: 0.0549 - accuracy: 0.9895 - val_loss: 7.4844 - val_accuracy: 0.1718
Epoch 13/20
7650/7650 [==============================] - 229s 30ms/sample - loss: 0.1541 - accuracy: 0.9591 - val_loss: 7.3335 - val_accuracy: 0.1553
Epoch 14/20
7650/7650 [==============================] - 231s 30ms/sample - loss: 0.0477 - accuracy: 0.9905 - val_loss: 7.8453 - val_accuracy: 0.1729
Epoch 15/20
7650/7650 [==============================] - 233s 30ms/sample - loss: 0.0346 - accuracy: 0.9908 - val_loss: 8.1847 - val_accuracy: 0.1753
Epoch 16/20
7650/7650 [==============================] - 231s 30ms/sample - loss: 0.0657 - accuracy: 0.9833 - val_loss: 7.8582 - val_accuracy: 0.1624
Epoch 17/20
7650/7650 [==============================] - 233s 30ms/sample - loss: 0.0555 - accuracy: 0.9830 - val_loss: 8.2578 - val_accuracy: 0.1553
Epoch 18/20
7650/7650 [==============================] - 230s 30ms/sample - loss: 0.0423 - accuracy: 0.9892 - val_loss: 8.6970 - val_accuracy: 0.1694
Epoch 19/20
7650/7650 [==============================] - 236s 31ms/sample - loss: 0.0291 - accuracy: 0.9927 - val_loss: 8.5275 - val_accuracy: 0.1882
Epoch 20/20
7650/7650 [==============================] - 234s 31ms/sample - loss: 0.0443 - accuracy: 0.9873 - val_loss: 9.2703 - val_accuracy: 0.1812
Thank You for your time. Any help and suggestion will be really appreciated.
Your model suggests early over-fitting.
Get rid of the dense layer completely and use global pooling.
model = Sequential()
model.add(Conv2D(32,(3,3), input_shape = X.shape[1:]))
model.add(Activation("relu"))
model.add(Conv2D(64,(3,3)))
model.add(Activation("relu"))
model.add(Conv2D(128,(3,3)))
model.add(Activation("relu"))
model.add(GlobalAveragePooling2D())
model.add(Dense(17))
model.add(Activation('softmax'))
model.summary()
Use SpatialDropout2D after conv layers.
ref: https://www.tensorflow.org/api_docs/python/tf/keras/layers/SpatialDropout2D
Use early stopping to get a balanced model.
Your output suggests categorical_crossentropy as a better-fit loss.

Keras is not learning anything

I am trying to learn keras and none of the code I use is learning. From the example code on Deep Learning with Python to the code on https://medium.com/nybles/create-your-first-image-recognition-classifier-using-cnn-keras-and-tensorflow-backend-6eaab98d14dd. With the last link, I am not able to use the full 10000 dataset, but even with my 1589 size training dataset the accuracy still stays at .5 the whole time.
I'm almost starting to think the problem is my overclocked cpu, and ram, but thats more of a crazy guess.
I initially thought the problem was that I had the tensorflow2.0.0-alpha. However, even after I went to regular tensorflow-gpu still nothing is learning.
#Convolutional Neural Network
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.models import model_from_json
import os
#initialize the cnn
classifier = Sequential()
#Step 1 convolution
classifier.add(Convolution2D(32, 3, 3, input_shape = (64, 64, 3), activation = 'relu'))
#Step 2 Pooling
classifier.add(MaxPooling2D(pool_size = (2,2)))
#Step 3 Flattening
classifier.add(Flatten())
#Step 4 Full Connection
classifier.add(Dense(output_dim = 128, activation = 'relu'))
classifier.add(Dense(output_dim = 64, activation = 'relu'))
classifier.add(Dense(output_dim = 32, activation = 'relu'))
classifier.add(Dense(output_dim = 1, activation = 'sigmoid'))
#Compiling the CNN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
#Part 2 Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1/.255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
training_set = train_datagen.flow_from_directory(
'dataset/training_set',
target_size=(64, 64),
batch_size=32,
class_mode='binary')
test_set = test_datagen.flow_from_directory(
'dataset/test_set',
target_size=(64, 64),
batch_size=32,
class_mode='binary')
from IPython.display import display
from PIL import Image
classifier.fit_generator(
training_set,
steps_per_epoch=1589,
epochs=10,
validation_data=test_set,
validation_steps=378)
import numpy as np
from keras.preprocessing import image
test_image = image.load_img('dataset/test_set/cats/cat.4012.jpg', target_size = (64,64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)
result = classifier.predict(test_image)
training_set.class_indices
if result[0][0] >= 0.5:
prediction = 'dog'
else:
prediction = 'cat'
print(prediction)
examples from deep-learning with python
from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) =
imdb.load_data(num_words = 10000)
import numpy as np
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i,sequence in enumerate(sequences):
results[i, sequence]=1.
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(16, activation='relu',input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['acc'])
history = model.fit(partial_x_train,
partial_y_train,
epochs=20,
batch_size=512,
validation_data=(x_val,y_val))
dogs an cats output:
Epoch 1/10
1589/1589 [==============================] - 112s 70ms/step - loss: 7.8736 - acc: 0.5115 - val_loss: 7.9528 - val_acc: 0.4976
Epoch 2/10
1589/1589 [==============================] - 111s 70ms/step - loss: 7.8697 - acc: 0.5117 - val_loss: 7.9606 - val_acc: 0.4971
Epoch 3/10
1589/1589 [==============================] - 111s 70ms/step - loss: 7.8740 - acc: 0.5115 - val_loss: 7.9499 - val_acc: 0.4978
Epoch 4/10
1589/1589 [==============================] - 111s 70ms/step - loss: 7.8674 - acc: 0.5119 - val_loss: 7.9634 - val_acc: 0.4969
Epoch 5/10
1589/1589 [==============================] - 111s 70ms/step - loss: 7.8765 - acc: 0.5113 - val_loss: 7.9499 - val_acc: 0.4977
Epoch 6/10
1589/1589 [==============================] - 111s 70ms/step - loss: 7.8737 - acc: 0.5115 - val_loss: 7.9634 - val_acc: 0.4970
Epoch 7/10
1589/1589 [==============================] - 129s 81ms/step - loss: 7.8623 - acc: 0.5122 - val_loss: 7.9626 - val_acc: 0.4970
Epoch 8/10
1589/1589 [==============================] - 112s 71ms/step - loss: 7.8758 - acc: 0.5114 - val_loss: 7.9508 - val_acc: 0.4977
Epoch 9/10
1589/1589 [==============================] - 115s 72ms/step - loss: 7.8708 - acc: 0.5117 - val_loss: 7.9519 - val_acc: 0.4976
Epoch 10/10
1589/1589 [==============================] - 112s 70ms/step - loss: 7.8738 - acc: 0.5115 - val_loss: 7.9614 - val_acc: 0.4971
cat
deeplearning imdb example output:
WARNING:tensorflow:From C:\Users\Mike\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 15000 samples, validate on 10000 samples
Epoch 1/20
15000/15000 [==============================] - 4s 246us/step - loss: 0.6932 - acc: 0.4982 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 2/20
15000/15000 [==============================] - 2s 115us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 3/20
15000/15000 [==============================] - 2s 115us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 4/20
15000/15000 [==============================] - 2s 119us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 5/20
15000/15000 [==============================] - 2s 120us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 6/20
15000/15000 [==============================] - 2s 119us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6933 - val_acc: 0.4947
Epoch 7/20
15000/15000 [==============================] - 2s 113us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 8/20
15000/15000 [==============================] - 2s 113us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 9/20
15000/15000 [==============================] - 2s 119us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6933 - val_acc: 0.4947
Epoch 10/20
15000/15000 [==============================] - 2s 122us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6933 - val_acc: 0.4947
Epoch 11/20
15000/15000 [==============================] - 2s 116us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6933 - val_acc: 0.4947
Epoch 12/20
15000/15000 [==============================] - 2s 116us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6933 - val_acc: 0.4947
Epoch 13/20
15000/15000 [==============================] - 2s 121us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6933 - val_acc: 0.4947
Epoch 14/20
15000/15000 [==============================] - 2s 127us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 15/20
15000/15000 [==============================] - 2s 121us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 16/20
15000/15000 [==============================] - 2s 113us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 17/20
15000/15000 [==============================] - 2s 115us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 18/20
15000/15000 [==============================] - 2s 114us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 19/20
15000/15000 [==============================] - 2s 114us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947
Epoch 20/20
15000/15000 [==============================] - 2s 119us/step - loss: 0.6931 - acc: 0.5035 - val_loss: 0.6932 - val_acc: 0.4947