I'm currently developing a CNN to predict image classification between two classes: Weapons, and not Weapons. The purpose of this project is to be able to detect whether or not a weapon (handgun/rifle) is present in an image.
My issue: No matter what I try, the classifier predicts that there is no weapon in the image. Can you guys find a flaw in my code that might be causing this issue?
I am senior Computer Science Student, but I have very little background the realm of Machine Learning.
Any help is appreciated!
# Initializing the CNN
classifier = Sequential()
# Step 1 - Convolution
classifier.add(Conv2D(32, (3, 3), input_shape=(64, 64, 3), activation='relu'))
# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size=(2, 2)))
# Adding a second convolutional layer
classifier.add(Conv2D(32, (3, 3), activation='relu'))
classifier.add(MaxPooling2D(pool_size=(2, 2)))
# Step 3 - Flattening
classifier.add(Flatten())
# Step 4 - Full connection
classifier.add(Dense(units=128, activation='relu'))
classifier.add(Dense(units=1, activation='sigmoid'))
# Compiling the CNN
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Part 2 - Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1. / 255)
test_datagen = ImageDataGenerator(rescale=1. / 255)
valid_datagen = ImageDataGenerator(rescale=1. / 255)
training_set = train_datagen.flow_from_directory('C:/Users/chill/PycharmProjects/499Actual/venv/data/TrainDataSet/',
target_size=(64, 64),
batch_size=29,
class_mode='binary')
test_set = test_datagen.flow_from_directory('C:/Users/chill/PycharmProjects/499Actual/venv/data/TestDataSet/',
target_size=(64, 64),
batch_size=7,
class_mode='binary')
valid_set = valid_datagen.flow_from_directory('C:/Users/chill/PycharmProjects/499Actual/venv/data/ValidationDataSet/',
target_size=(64, 64),
batch_size=7,
class_mode='binary')
classifier.fit_generator(training_set,
steps_per_epoch=348,
epochs=1,
validation_data=valid_set,
validation_steps=100)
# Part 3 - Making new predictions
import numpy as np
from keras.preprocessing import image
# test_image = image.load_img('C:/Users/chill/PycharmProjects/499Actual/venv/data/TestDataSet/ProbablySoap/P1030135.jpg',
# target_size=(64, 64))
test_image = image.load_img('C:/Users/chill/PycharmProjects/499Actual/venv/data/TestDataSet/Guns/301.jpeg',
target_size=(64, 64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis=0)
result = classifier.predict_classes(test_image)
print(result[0][0])
var = training_set.class_indices
if result[0][0] == 1:
prediction = 1
print("Gun!")
else:
prediction = 0
print("Not.")
Disclaimer: "ProbablySoap" is just the set of images that do not contain weapons.
UPDATE
The input image in this scenario is an image containing a weapon.
The output predicts "Not." every time.
UPDATE 2
Here is the output of the code:
Found 348 images belonging to 2 classes.
Found 42 images belonging to 2 classes.
Found 42 images belonging to 2 classes.
Epoch 1/1
1/348 [..............................] - ETA: 1:15 - loss: 0.6915 - accuracy: 0.5517
2/348 [..............................] - ETA: 47s - loss: 0.6994 - accuracy: 0.6724
3/348 [..............................] - ETA: 38s - loss: 0.7130 - accuracy: 0.6897
4/348 [..............................] - ETA: 33s - loss: 0.6565 - accuracy: 0.7155
5/348 [..............................] - ETA: 30s - loss: 0.6496 - accuracy: 0.7103
6/348 [..............................] - ETA: 28s - loss: 0.6384 - accuracy: 0.7241
7/348 [..............................] - ETA: 27s - loss: 0.6301 - accuracy: 0.7340
...
346/348 [============================>.] - ETA: 0s - loss: 0.0940 - accuracy: 0.9628
347/348 [============================>.] - ETA: 0s - loss: 0.0937 - accuracy: 0.9629
348/348 [==============================] - 34s 98ms/step - loss: 0.0935 - accuracy: 0.9630 - val_loss: 0.2081 - val_accuracy: 0.9757
0
Not.
Process finished with exit code 0
I think your problem:
You generated a test_set with scaling:
test_datagen = ImageDataGenerator(rescale=1. / 255)
test_set = test_datagen.flow_from_directory('C:/Users/chill/PycharmProjects/499Actual/venv/data/TestDataSet/',
target_size=(64, 64),
batch_size=7,
class_mode='binary')
But you never use it, you use an imported file from the test directory and use it without scaling:
test_image = image.load_img('C:/Users/chill/PycharmProjects/499Actual/venv/data/TestDataSet/Guns/301.jpeg',
target_size=(64, 64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis=0)
That is why the predictor can not classify the later imported image correctly.
Hope it helps.
Related
We are using CNN to classify images with labels 0 and 1 in tensorflow.
However, in reality, images have probability values between 0 and 1, not one-hot labels of 0 and 1. Images with probabilities in the range [0, 0.5) are labeled 0, and images in the range [0.5, 1.0] are labeled 1. I want to check whether the classification performance is better if binary classification is performed using soft labels between 0 and 1 instead of one-hot labels.
The code below is an example of binary classification only with data labeled 0 and 1 in the cifar10 dataset.
In the code below, the accuracy is about 98% without 'making soft labels part', but about 48% with 'making soft labels part'.
Should I modify the 'BinaryCrossEntropy_custom' function, which is the loss function, to solve the problem? Or is something else wrong?
This answer says that using logits solves it. I understand soft_labels argument, but what value should I put in logits argument in this example code?
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras import optimizers
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications import vgg16
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
# Rewrite the binary cross entropy function. We will modify this function to return a loss that fits the soft label later.
def BinaryCrossEntropy_custom(y_true, y_pred):
y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
term_0 = (1 - y_true) * K.log(1 - y_pred + K.epsilon())
term_1 = y_true * K.log(y_pred + K.epsilon())
return -K.mean(term_0 + term_1, axis=0)
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
# Only data with labels 0 and 1 are used.
train_ind01 = np.where((train_labels == 0) | (train_labels == 1))[0]
test_ind01 = np.where((test_labels == 0) | (test_labels == 1))[0]
train_images = train_images[train_ind01, :, :, :]
test_images = test_images[test_ind01, :, :, :]
train_labels = train_labels[train_ind01, :]
test_labels = test_labels[test_ind01, :]
train_labels = np.array(train_labels).astype('float64')
test_labels = np.array(test_labels).astype('float64')
# making soft labels part start
# Samples with label 0 are replaced with labels in the range [0,0.2],
# and samples with label 1 are replaced by labels in the range [0.8, 1.0].
sampl_train = np.random.uniform(low=-0.2, high=0.2, size=train_labels.shape)
sampl_test = np.random.uniform(low=-0.2, high=0.2, size=test_labels.shape)
train_labels = train_labels + sampl_train
test_labels = test_labels + sampl_test
train_labels = np.clip(train_labels, 0.0, 1.0)
test_labels = np.clip(test_labels, 0.0, 1.0)
# making soft labels part end
vgg = vgg16.VGG16(include_top=False, weights='imagenet', input_shape=(32, 32, 3))
output = vgg.layers[-1].output
output = layers.Flatten()(output)
output = layers.Dense(512, activation='relu')(output)
output = layers.Dropout(0.2)(output)
output = layers.Dense(256, activation='relu')(output)
output = layers.Dropout(0.2)(output)
predictions = layers.Dense(units=1, activation="sigmoid")(output)
model = Model(inputs=vgg.input, outputs=predictions)
model.compile(optimizer=Adam(learning_rate=.0001), loss=BinaryCrossEntropy_custom, metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=100,
validation_data=(test_images, test_labels))
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label='val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(test_acc)
The console output with making soft labels part is
Epoch 1/100
2022-09-16 15:29:29.136931: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8101
313/313 [==============================] - 17s 42ms/step - loss: 0.2951 - accuracy: 0.4779 - val_loss: 0.2775 - val_accuracy: 0.4650
Epoch 2/100
313/313 [==============================] - 12s 38ms/step - loss: 0.2419 - accuracy: 0.4931 - val_loss: 0.2488 - val_accuracy: 0.4695
Epoch 3/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2290 - accuracy: 0.4978 - val_loss: 0.2424 - val_accuracy: 0.4740
Epoch 4/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2161 - accuracy: 0.5002 - val_loss: 0.2404 - val_accuracy: 0.4765
Epoch 5/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2139 - accuracy: 0.5007 - val_loss: 0.2620 - val_accuracy: 0.4730
Epoch 6/100
313/313 [==============================] - 12s 38ms/step - loss: 0.2118 - accuracy: 0.5023 - val_loss: 0.2480 - val_accuracy: 0.4745
Epoch 7/100
313/313 [==============================] - 12s 38ms/step - loss: 0.2097 - accuracy: 0.5019 - val_loss: 0.2350 - val_accuracy: 0.4775
Epoch 8/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2098 - accuracy: 0.5024 - val_loss: 0.2289 - val_accuracy: 0.4780
Epoch 9/100
313/313 [==============================] - 12s 38ms/step - loss: 0.2034 - accuracy: 0.5039 - val_loss: 0.2364 - val_accuracy: 0.4780
Epoch 10/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2025 - accuracy: 0.5040 - val_loss: 0.2481 - val_accuracy: 0.4720
I have been a Tensorflow user and start to use Pytorch. As a trial, I implemented simple classification tasks with both libraries.
However, PyTorch is much slower than Tensorflow: Pytorch takes 42min while TensorFlow 11min. I referred to PyTorch official Tutorial, and made only little change from it.
Could anyone share some advice for this problem?
Here is the summary what I tried.
environment: Colab Pro+
dataset: Cifar10
classifier: VGG16
optimizer: Adam
loss: crossentropy
batch size: 32
PyTorch
Code:
import torch, torchvision
from torch import nn
from torchvision import transforms, models
from tqdm import tqdm
import time, copy
trans = transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor(),])
data = {phase: torchvision.datasets.CIFAR10('./', train = (phase=='train'), transform=trans, download=True) for phase in ['train', 'test']}
dataloaders = {phase: torch.utils.data.DataLoader(data[phase], batch_size=32, shuffle=True) for phase in ['train', 'test']}
def train_model(model, criterion, optimizer, dataloaders, device, num_epochs=5):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'test']:
if phase == 'train':
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for inputs, labels in tqdm(iter(dataloaders[phase])):
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / len(dataloaders[phase])
epoch_acc = running_corrects.double() / len(dataloaders[phase])
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))
# deep copy the model
if phase == 'test' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
# load best model weights
model.load_state_dict(best_model_wts)
return model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = models.vgg16(pretrained=False)
model = model.to(device)
model = train_model(model=model,
criterion=nn.CrossEntropyLoss(),
optimizer=torch.optim.Adam(model.parameters(), lr=0.001),
dataloaders=dataloaders,
device=device,
)
Result:
Epoch 0/4
----------
0%| | 0/1563 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
100%|██████████| 1563/1563 [07:50<00:00, 3.32it/s]
train Loss: 75.5199 Acc: 3.2809
100%|██████████| 313/313 [00:38<00:00, 8.11it/s]
test Loss: 73.7274 Acc: 3.1949
Epoch 1/4
----------
100%|██████████| 1563/1563 [07:50<00:00, 3.33it/s]
train Loss: 73.8162 Acc: 3.2514
100%|██████████| 313/313 [00:38<00:00, 8.13it/s]
test Loss: 73.6114 Acc: 3.1949
Epoch 2/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7741 Acc: 3.1369
100%|██████████| 313/313 [00:38<00:00, 8.11it/s]
test Loss: 73.5873 Acc: 3.1949
Epoch 3/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7493 Acc: 3.1331
100%|██████████| 313/313 [00:38<00:00, 8.12it/s]
test Loss: 73.6191 Acc: 3.1949
Epoch 4/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7289 Acc: 3.1939
100%|██████████| 313/313 [00:38<00:00, 8.13it/s]test Loss: 73.5955 Acc: 3.1949
Training complete in 42m 22s
Best val Acc: 3.194888
Tensorflow
Code:
import tensorflow_datasets as tfds
from tensorflow.keras import applications, models
import tensorflow as tf
import time
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])
def resize(ip):
image = ip['image']
label = ip['label']
image = tf.image.resize(image, (224, 224))
image = tf.expand_dims(image,0)
label = tf.one_hot(label,10)
label = tf.expand_dims(label,0)
return (image, label)
ds_train_ = ds_train.map(resize)
ds_test_ = ds_test.map(resize)
model = applications.vgg16.VGG16(input_shape = (224, 224, 3), weights=None, classes=10)
model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics= ['accuracy'])
batch_size = 32
since = time.time()
history = model.fit(ds_train_,
batch_size = batch_size,
steps_per_epoch = len(ds_train)//batch_size,
epochs = 5,
validation_steps = len(ds_test),
validation_data = ds_test_,
shuffle = True,)
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60 ))
Result:
Epoch 1/5
1562/1562 [==============================] - 125s 69ms/step - loss: 36.9022 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000
Epoch 2/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3031 - accuracy: 0.1005 - val_loss: 2.3033 - val_accuracy: 0.1000
Epoch 3/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3035 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000
Epoch 4/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3038 - accuracy: 0.1024 - val_loss: 2.3030 - val_accuracy: 0.1000
Epoch 5/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3028 - accuracy: 0.1024 - val_loss: 2.3033 - val_accuracy: 0.1000
Training complete in 11m 23s
It is because in your tensorflow codes, the data pipeline is feeding a batch of 1 image into the model per step instead of a batch of 32 images.
Passing batch_size into model.fit does not really control the batch size when the data is in the form of datasets. The reason why it showed a seemingly correct steps per epoch from the log is that you passed steps_per_epoch into model.fit.
To correctly set the batch size:
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])
def resize(ip):
image = ip['image']
label = ip['label']
image = tf.image.resize(image, (224, 224))
label = tf.one_hot(label,10)
return (image, label)
train_size=len(ds_train)
test_size=len(ds_test)
ds_train_ = ds_train.shuffle(train_size).batch(32).map(resize)
ds_test_ = ds_test.shuffle(test_size).batch(32).map(resize)
model.fit call:
history = model.fit(ds_train_,
epochs = 1,
validation_data = ds_test_)
After fixed the problem, tensorflow got similar speed performance with pytorch. In my machine, pytorch took ~27 minutes per epoch while tensorflow took ~24 minutes per epoch.
According to the benchmarks from NVIDIA, pytorch and tensorflow had similar speed performance in most popular deep learning applications with real-world datasets and problem size. (Reference: https://developer.nvidia.com/deep-learning-performance-training-inference)
I was training a CNN with 120 thousand pictures, and it was ok. About 320 seconds per epoch.
3073/3073 [==============================] - 340s 110ms/step - loss: 0.4146 - accuracy: 0.8319 - val_loss: 0.3776 - val_accuracy: 0.8489
Epoch 2/20
3073/3073 [==============================] - 324s 105ms/step - loss: 0.3462 - accuracy: 0.8683 - val_loss: 0.3241 - val_accuracy: 0.8770
Epoch 3/20
3073/3073 [==============================] - 314s 102ms/step - loss: 0.3061 - accuracy: 0.8878 - val_loss: 0.2430 - val_accuracy: 0.9052
Epoch 4/20
3073/3073 [==============================] - 327s 107ms/step - loss: 0.2851 - accuracy: 0.8977 - val_loss: 0.2236 - val_accuracy: 0.9149
Epoch 5/20
3073/3073 [==============================] - 318s 104ms/step - loss: 0.2725 - accuracy: 0.9033 - val_loss: 0.2450 - val_accuracy: 0.9119
Epoch 6/20
3073/3073 [==============================] - 309s 101ms/step - loss: 0.2642 - accuracy: 0.9065 - val_loss: 0.2168 - val_accuracy: 0.9218
Epoch 7/20
3073/3073 [==============================] - 311s 101ms/step - loss: 0.2589 - accuracy: 0.9083 - val_loss: 0.1996 - val_accuracy: 0.9286
Epoch 8/20
3073/3073 [==============================] - 317s 103ms/step - loss: 0.2538 - accuracy: 0.9110 - val_loss: 0.2653 - val_accuracy: 0.9045
Epoch 9/20
3073/3073 [==============================] - 1346s 438ms/step - loss: 0.2497 - accuracy: 0.9116 - val_loss: 0.2353 - val_accuracy: 0.9219
Epoch 10/20
3073/3073 [==============================] - 1434s 467ms/step - loss: 0.2457 - accuracy: 0.9141 - val_loss: 0.1943 - val_accuracy: 0.9326`
Then, after a few tests, it became 12x (times) slower. With the same parameters. I thought it was because the CPU got too hot, but it never worked like before again. I tried reseting my keras session, tried reinstalling Ubuntu 18.04, tried setting up my GPU, tried installing on conda, but none of these worked. Also tried other codes, with different datasets, which are also slower. Quite frustrating.
No other symptom of a burnt CPU problem. Everything runs as always did.
Tensorflow 2.2.0
I would appreciate some help.
Edit: actually, if I wait one hour, it gets back to normal speed! Now I think it could be some problem with the first import from the disk into the memory.
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator
from keras.models import model_from_json
from keras.preprocessing import image
import os
#======= GPU
#os.environ["CUDA_VISIBLE_DEVICES"] = "-1" #disables GPU
#os.environ['TF_CPP_MIN_LOG_LEVEL'] = '4' #verbose
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
from tensorflow.compat.v1.keras import backend as K
K.clear_session()
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
K.set_session(session)
#======= GPU
print(tf.__version__)
callbacks = [
tf.keras.callbacks.EarlyStopping(
# Stop training when `val_loss` is no longer improving
monitor='val_accuracy',
# "no longer improving" being defined as "no better than 1e-2 less"
min_delta=1e-3,
# "no longer improving" being further defined as "for at least 2 epochs"
patience=2,
verbose=1)
]
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
#flow_from_directory knows categories are divided by folders
training_set = train_datagen.flow_from_directory('dataset/training_set', #applies a method to the object
target_size = (64, 64),
batch_size = 32, #minibatch
class_mode = 'categorical')
print(training_set.class_indices)
test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory('dataset/test_set',
target_size = (64, 64), #has to be the same as that of the training set
batch_size = 32,
class_mode = 'categorical')
cnn = tf.keras.models.Sequential() #an ANN base
#convolution
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=[64, 64, 3])) #convolutional layer
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2)) #maxPooling
#new convolution
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu')) #remove input_shape
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
cnn.add(tf.keras.layers.Flatten())
cnn.add(tf.keras.layers.Dense(units=128, activation='relu'))
cnn.add(tf.keras.layers.Dense(units=2, activation='softmax'))
cnn.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
print("Got here.")
cnn.fit(x = training_set, validation_data = test_set, epochs = 20)
# serialize model to JSON
model_json = cnn.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
# serialize weights to HDF5
cnn.save_weights("model.h5")
print("Saved model to disk.")
I was verifying with a basic example my TensorFlow (v2.2.0), Cuda (10.1), and cudnn (libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb) and I'm getting weird results...
When running the following example in Keras as shown in https://keras.io/examples/mnist_cnn/ I get the ~99% acc #validation. When I adapt the imports run via the TensorFlow I get only 86%.
I might be forgetting something.
To run using tensorflow:
from __future__ import print_function
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K
batch_size = 128
num_classes = 10
epochs = 12
# input image dimensions
img_rows, img_cols = 28, 28
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
if K.image_data_format() == 'channels_first':
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.optimizers.Adadelta(),
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Sadly, I get the following output:
Epoch 2/12
469/469 [==============================] - 3s 6ms/step - loss: 2.2245 - accuracy: 0.2633 - val_loss: 2.1755 - val_accuracy: 0.4447
Epoch 3/12
469/469 [==============================] - 3s 7ms/step - loss: 2.1485 - accuracy: 0.3533 - val_loss: 2.0787 - val_accuracy: 0.5147
Epoch 4/12
469/469 [==============================] - 3s 6ms/step - loss: 2.0489 - accuracy: 0.4214 - val_loss: 1.9538 - val_accuracy: 0.6021
Epoch 5/12
469/469 [==============================] - 3s 6ms/step - loss: 1.9224 - accuracy: 0.4845 - val_loss: 1.7981 - val_accuracy: 0.6611
Epoch 6/12
469/469 [==============================] - 3s 6ms/step - loss: 1.7748 - accuracy: 0.5376 - val_loss: 1.6182 - val_accuracy: 0.7039
Epoch 7/12
469/469 [==============================] - 3s 6ms/step - loss: 1.6184 - accuracy: 0.5750 - val_loss: 1.4296 - val_accuracy: 0.7475
Epoch 8/12
469/469 [==============================] - 3s 7ms/step - loss: 1.4612 - accuracy: 0.6107 - val_loss: 1.2484 - val_accuracy: 0.7719
Epoch 9/12
469/469 [==============================] - 3s 6ms/step - loss: 1.3204 - accuracy: 0.6402 - val_loss: 1.0895 - val_accuracy: 0.7945
Epoch 10/12
469/469 [==============================] - 3s 6ms/step - loss: 1.2019 - accuracy: 0.6650 - val_loss: 0.9586 - val_accuracy: 0.8097
Epoch 11/12
469/469 [==============================] - 3s 7ms/step - loss: 1.1050 - accuracy: 0.6840 - val_loss: 0.8552 - val_accuracy: 0.8216
Epoch 12/12
469/469 [==============================] - 3s 7ms/step - loss: 1.0253 - accuracy: 0.7013 - val_loss: 0.7734 - val_accuracy: 0.8337
Test loss: 0.7734305262565613
Test accuracy: 0.8337000012397766
Nowhere near 99.25% as when I import Keras.
What am I missing?
Discrepancy in optimiser parameters between keras and tensorflow.keras
So the crux of the issue lies in the different default parameters for the Adadelta optimisers in Keras and Tensorflow. Specifically, the different learning rates. We can see this with a simple check. Using the Keras version of the code, print(keras.optimizers.Adadelta().get_config()) outpus
{'learning_rate': 1.0, 'rho': 0.95, 'decay': 0.0, 'epsilon': 1e-07}
And in the Tensorflow version, print(tf.optimizers.Adadelta().get_config() gives us
{'name': 'Adadelta', 'learning_rate': 0.001, 'decay': 0.0, 'rho': 0.95, 'epsilon': 1e-07}
As we can see, there is a discrepancy between the learning rates for the Adadelta optimisers. Keras has a default learning rate of 1.0 while Tensorflow has a default learning rate of 0.001 (consistent with their other optimisers).
Effects of a higher learning rate
Since the Keras version of the Adadelta optimiser has a larger learning rate, it converges much faster and achieves a high accuracy within 12 epochs, while the Tensorflow Adadelta optimiser requires a longer training time. If you increased the number of training epochs, the Tensorflow model could potentially achieve a 99% accuracy as well.
The fix
But instead of increasing the training time, we can simply initialise the Tensorflow model to behave in a similar way to the Keras model by changing the learning rate of Adadelta to 1.0. i.e.
model.compile(
loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.optimizers.Adadelta(learning_rate=1.0), # Note the new learning rate
metrics=['accuracy'])
Making this change, we get the following performance running on Tensorflow:
Epoch 12/12
60000/60000 [==============================] - 102s 2ms/sample - loss: 0.0287 - accuracy: 0.9911 - val_loss: 0.0291 - val_accuracy: 0.9907
Test loss: 0.029134796149221757
Test accuracy: 0.9907
which is close to the desired 99.25% accuracy.
p.s. Incidentally, it seems that the different default parameters between Keras and Tensorflow is a known issue that was fixed but then reverted:
https://github.com/keras-team/keras/pull/12841 software development is hard.
I am using TF 2.0. I was trying to train a network on my own data. It was not going well. The validation accuracy was close to 0 and stagnant. I tried many regularizations to no effect. Then I tried training a network on 3 classes of data where all images in each class are the same so as to eliminate the possibility of variability. But this is not working either. Since all in-class images are the same, I would expect the validation accuracy to perfectly match the training accuracy since there is no new data. Why is that not the case? Here is my code:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.applications.mobilenet import preprocess_input
import matplotlib.pyplot as plt
base_model = tf.keras.applications.MobileNet(weights='imagenet', include_top=False)
def turn_off(n):
for layer in model.layers[:n]:
layer.trainable = False
for layer in model.layers[n:]:
layer.trainable = True
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(
x) # we add dense layers so that the model can learn more complex functions and classify for better results.
x = Dense(1024, activation='relu')(x) # dense layer 2
x = Dense(512, activation='relu')(x) # dense layer 3
preds = Dense(3, activation='softmax')(x) # final layer with softmax activation
model = Model(inputs=base_model.input, outputs=preds)
turn_off(87)
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
rescale=1. / 255,
validation_split=0.2) # set validation split
train_generator = train_datagen.flow_from_directory(
'/users/josh.flori/desktop/colors/',
target_size=(224, 224),
batch_size=32,
color_mode='rgb',
class_mode='categorical',
subset='training',
shuffle=True) # set as training data
validation_generator = train_datagen.flow_from_directory(
'/users/josh.flori/desktop/colors/',
target_size=(224, 224),
batch_size=32,
color_mode='rgb',
class_mode='categorical',
subset='validation',
shuffle=True) # set as validation data
# Adam optimizer
# loss function will be categorical cross entropy
# evaluation metric will be accuracy
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
validation_data=validation_generator,
validation_steps=validation_generator.samples // train_generator.batch_size,
epochs=6)
Here is the training output
9/9 [==============================] - 19s 2s/step - loss: 0.2645 - accuracy: 0.9134 - val_loss: 1.6668 - val_accuracy: 0.3438
Epoch 2/6
9/9 [==============================] - 20s 2s/step - loss: 0.0417 - accuracy: 0.9567 - val_loss: 2.6176 - val_accuracy: 0.3438
Epoch 3/6
9/9 [==============================] - 17s 2s/step - loss: 0.4771 - accuracy: 0.9422 - val_loss: 4.0694 - val_accuracy: 0.3438
Epoch 4/6
9/9 [==============================] - 18s 2s/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 2.1304 - val_accuracy: 0.3125
Epoch 5/6
9/9 [==============================] - 18s 2s/step - loss: 9.7658e-07 - accuracy: 1.0000 - val_loss: 3.1633 - val_accuracy: 0.3125
Epoch 6/6
9/9 [==============================] - 18s 2s/step - loss: 2.2571e-05 - accuracy: 1.0000 - val_loss: 3.4949 - val_accuracy: 0.3125
My image folders look like this
where there are exactly 128 identical images per folder.
I've been reading all day, trying different images, I can't seem to get anywhere. What is causing this particular behavior? It has to be something obvious but I'm not sure.