keras predict only gives 1 but loss value decrease - tensorflow

I'm trying to implement a JSCC autoencoder using Keras on CIfar-10 dataset. but the values of the output image is always just 1.
I'm new to Keras and I didn't find out how to fix this.
model = Sequential()
model.add(Conv2D(16,(5,5),padding = 'same', strides = 2, input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32,(5,5),padding = 'same', strides = 2))
model.add(Activation('relu'))
model.add(Conv2D(32,(5,5),padding = 'same'))
model.add(Activation('relu'))
model.add(Conv2D(32,(5,5),padding = 'same'))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(100))
model.add(Activation('relu'))
model.add(keras.layers.GaussianNoise(0.1))
model.add(Dense(2048))
model.add(Activation('relu'))
model.add(Reshape((8,8,32)))
model.add(Conv2DTranspose(32,(5,5), padding = 'same'))
model.add(Activation('relu'))
model.add(Conv2DTranspose(32,(5,5), padding = 'same'))
model.add(Activation('relu'))
model.add(Conv2DTranspose(32,(5,5), strides = 2 ,padding = 'same'))
model.add(Activation('relu'))
model.add(Conv2DTranspose(3,(5,5), strides = 2 ,padding = 'same'))
model.add(Activation('sigmoid'))
model.compile(loss='mse', optimizer='adam')
model.fit(X_train_norm, X_train_norm,
batch_size=128,
epochs=20,
validation_data=(X_test_norm, X_test_norm),
shuffle=True)
this model compresses the image to a vector with length of 100 and it adds up with a gaussian noise and then it upsamples the vector to original input.
Train on 50000 samples, validate on 10000 samples
Epoch 1/20
50000/50000 [==============================] - 7s 138us/step - loss: 0.0245 - val_loss: 0.0226
Epoch 2/20
50000/50000 [==============================] - 6s 120us/step - loss: 0.0225 - val_loss: 0.0222
Epoch 3/20
50000/50000 [==============================] - 6s 121us/step - loss: 0.0220 - val_loss: 0.0216
Epoch 4/20
50000/50000 [==============================] - 6s 121us/step - loss: 0.0214 - val_loss: 0.0211
Epoch 5/20
50000/50000 [==============================] - 6s 119us/step - loss: 0.0208 - val_loss: 0.0207
...
>>>model.predict(X_train[:32])
array([[[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
...,

You used the normalised data during training but the raw data at prediction.
instead of:
model.predict(X_train[:32])
use:
model.predict(X_train_norm[:32])

Related

Keras Callback - Save the per-epoch outputs

I am trying to create a callback for extracting the per-epoch outputs in the 1st hidden layer of my model. With what I have written, self.model.layers[0].output outputs a Tensor object but I could not see the actual entries.
Ideally I would like to save these output tensors, and visualise using an epoch vs mean-output plot. This has been implemented in Glorot & Bengio (2010) but the source code is not available.
How shall I edit my code in order to make the model fitting process save the outputs in each epoch? Thanks in advance.
class PerEpochOutputCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
print('First layer output of epoch:', epoch+1, self.model.layers[0].output)
model_relu_3= Sequential()
# Use ReLU hidden layer
model_relu_3.add(Dense(3, input_dim= 8, activation= 'relu', kernel_initializer= 'uniform'))
model_relu_3.add(Dense(5, input_dim= 3, activation= 'relu', kernel_initializer= 'uniform'))
model_relu_3.add(Dense(5, input_dim= 5, activation= 'relu', kernel_initializer= 'uniform'))
model_relu_3.add(Dense(1, activation= 'sigmoid', kernel_initializer= 'uniform'))
model_relu_3.compile(loss='binary_crossentropy', optimizer='adam', metrics= ['accuracy'])
# Train model
tr_results = model_relu_3.fit(X, y, validation_split=0.2, epochs=10, batch_size=32,
verbose=2, callbacks=[PerEpochOutputCallback()])
====
Train on 614 samples, validate on 154 samples
Epoch 1/10
First layer output of epoch: 1 Tensor("dense_42/Relu:0", shape=(None, 3), dtype=float32)
614/614 - 0s - loss: 0.6915 - accuracy: 0.6531 - val_loss: 0.6897 - val_accuracy: 0.6429
Epoch 2/10
First layer output of epoch: 2 Tensor("dense_42/Relu:0", shape=(None, 3), dtype=float32)
614/614 - 0s - loss: 0.6874 - accuracy: 0.6531 - val_loss: 0.6853 - val_accuracy: 0.6429
Epoch 3/10
First layer output of epoch: 3 Tensor("dense_42/Relu:0", shape=(None, 3), dtype=float32)
614/614 - 0s - loss: 0.6824 - accuracy: 0.6531 - val_loss: 0.6783 - val_accuracy: 0.6429

Tricks to improve CNN model performance

I am fitting a large CNN network on my training data, validating on 20%. It appears the model performs better on the training than the validation set. What do you suggest so I can improve the model performance.
CNN Architecture:
model = Sequential()
activ = 'relu'
model.add(Conv2D(32, (1, 3), strides=(1, 1), padding='same', activation=activ, input_shape=(1, 100, 4)))
model.add(Conv2D(32, (1, 3), strides=(1, 1), padding='same', activation=activ))
#model.add(BatchNormalization(axis = 3))
model.add(MaxPooling2D(pool_size=(1, 2) ))
model.add(Conv2D(64, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(Conv2D(64, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(MaxPooling2D(pool_size=(1, 2)))
model.add(Conv2D(128, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(Conv2D(128, (1, 3), strides=(1, 1), padding='same', activation=activ ))
model.add(MaxPooling2D(pool_size=(1, 2)))
model.add(Dropout(.5))
model.add(Flatten())
A = model.output_shape
model.add(Dense(int(A[1] * 1/4.), activation=activ))
model.add(Dropout(.5))
model.add(Dense(5, activation='softmax'))
optimizer = Adam(lr=0.003, beta_1=0.9, beta_2=0.999, epsilon=1e-04, decay=0.0)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=100, batch_size=64, shuffle=False,
validation_split=0.2)
However, the validation accuracy doesn't change for the number of epochs.
Epoch 1/100
1065/1065 [==============================] - 14s 13ms/step - loss: 1.4174 - accuracy: 0.5945 - val_loss: 1.4966 - val_accuracy: 0.4417
Epoch 2/100
1065/1065 [==============================] - 14s 13ms/step - loss: 1.1494 - accuracy: 0.6207 - val_loss: 1.4634 - val_accuracy: 0.4417
Epoch 3/100
1065/1065 [==============================] - 19s 18ms/step - loss: 1.1111 - accuracy: 0.6196 - val_loss: 1.4674 - val_accuracy: 0.4417
Epoch 4/100
1065/1065 [==============================] - 15s 14ms/step - loss: 1.1040 - accuracy: 0.6196 - val_loss: 1.4660 - val_accuracy: 0.4417
Epoch 5/100
1065/1065 [==============================] - 18s 17ms/step - loss: 1.1027 - accuracy: 0.6196 - val_loss: 1.4624 - val_accuracy: 0.4417
NOTE: I AdamĀ“s default learning rate 0.001 as well as 0.003 but the output is the same (log).
Your model is working but improving very slowly. I would reduce the dropout value down to .1 initially, then run the model and see if it overfits or not.If it does then slowly increase the dropout rate. Unless your data is already shuffled I would set shuffle=True in model.fit. Also you might try replacing the Flatten layer with a GlobalMaxPooling layer. I also recommend using the EarlyStopping callback which monitors validation and halts training if the loss fails to reduce after 'patience' number of consecutive epochs. Setting restore_best_weights=True will load the weights for the epoch with the lowest validation loss so you don't have to save then reload the weights. Set epochs to a large number to ensure this callback activates. Also use the ReduceLROnPlateau to automatically adjust the learning rate based on validation loss.
The code I use is shown below
es=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=3,
verbose=1, restore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1,
verbose=1)
callbacks=[es, rlronp]
In model.fit set callbacks=callbacks. Increase the number of epochs you run to say 100 so that the early stopping callback triggers.

tensorflow accuracy, val_accuracy remains the same while training

I build a CNN based on the Chest X-Ray Images (Pneumonia) dataset and for some reason when I train the model I get the same accuracy and val_accuracy over epochs.
train_ds = ImageDataGenerator()
traindata = train_ds.flow_from_directory(directory="../input/chest-xray-pneumonia/chest_xray/train",target_size=(IMG_HEIGHT,IMG_WIDTH),shuffle=True)
// Found 5216 images belonging to 2 classes.
test_ds = ImageDataGenerator()
testdata = test_ds.flow_from_directory(directory="../input/chest-xray-pneumonia/chest_xray/test",target_size=(IMG_HEIGHT,IMG_WIDTH),shuffle=True)
//Found 624 images belonging to 2 classes.
model = keras.Sequential([
keras.layers.Conv2D(input_shape=(224,224,3),filters=64,kernel_size=(3,3),padding="same", activation="relu"),
keras.layers.Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Flatten(),
keras.layers.Dense(units=4096,activation="relu"),
# keras.layers.Dropout(.5),
keras.layers.Dense(units=4096,activation="relu"),
# keras.layers.Dropout(.5),
keras.layers.Dense(units=2, activation="softmax"),
])
opt = keras.optimizers.Adam(lr=0.001)
model.compile(optimizer=opt,
loss="categorical_crossentropy",
metrics=['accuracy'])
logdir = "logs\\training\\" + datetime.now().strftime("%Y%m%d-%H%M%S")
checkpoint = keras.callbacks.ModelCheckpoint("vgg16_1.h5", verbose=1, monitor='val_accuracy', save_best_only=True, mode='auto')
early = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)
hist = model.fit(traindata,
steps_per_epoch=STEPS_PER_EPOCH,
epochs=100,
validation_data=testdata,
validation_steps=VALIDATION_STEPS,
callbacks=[early, tensorboard_callback])
Epoch 1/100
163/163 [==============================] - 172s 1s/step - loss: 62.6885 - accuracy: 0.7375 - val_loss: 0.6827 - val_accuracy: 0.6250
Epoch 2/100
163/163 [==============================] - 157s 961ms/step - loss: 0.5720 - accuracy: 0.7429 - val_loss: 0.7133 - val_accuracy: 0.6250
Epoch 3/100
163/163 [==============================] - 159s 975ms/step - loss: 0.5725 - accuracy: 0.7429 - val_loss: 0.6691 - val_accuracy: 0.6250
Epoch 4/100
163/163 [==============================] - 159s 973ms/step - loss: 0.5721 - accuracy: 0.7429 - val_loss: 0.7036 - val_accuracy: 0.6250
Epoch 5/100
163/163 [==============================] - 158s 971ms/step - loss: 0.5715 - accuracy: 0.7429 - val_loss: 0.7169 - val_accuracy: 0.6250
Epoch 6/100
163/163 [==============================] - 160s 983ms/step - loss: 0.5718 - accuracy: 0.7429 - val_loss: 0.6982 - val_accuracy: 0.6250
I've tried changing the activation function for the last layer, adding dropout layers and toyed around with the number of neurons but nothing seemed to work. does anyone have an ideas what causes this strange behaviour?
You only have a small dataset. From the looks of your training loss, I would suppose that your network actually already converged after 1 epoch. (with a substantial amount of overfit)
For the amount of data you have, I would suggest trying a way smaller network, or work with data augmentation techniques to regularize your model.

Significantly higher testing accuracy on mnist with keras than tensorflow.keras

I was verifying with a basic example my TensorFlow (v2.2.0), Cuda (10.1), and cudnn (libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb) and I'm getting weird results...
When running the following example in Keras as shown in https://keras.io/examples/mnist_cnn/ I get the ~99% acc #validation. When I adapt the imports run via the TensorFlow I get only 86%.
I might be forgetting something.
To run using tensorflow:
from __future__ import print_function
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K
batch_size = 128
num_classes = 10
epochs = 12
# input image dimensions
img_rows, img_cols = 28, 28
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
if K.image_data_format() == 'channels_first':
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.optimizers.Adadelta(),
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Sadly, I get the following output:
Epoch 2/12
469/469 [==============================] - 3s 6ms/step - loss: 2.2245 - accuracy: 0.2633 - val_loss: 2.1755 - val_accuracy: 0.4447
Epoch 3/12
469/469 [==============================] - 3s 7ms/step - loss: 2.1485 - accuracy: 0.3533 - val_loss: 2.0787 - val_accuracy: 0.5147
Epoch 4/12
469/469 [==============================] - 3s 6ms/step - loss: 2.0489 - accuracy: 0.4214 - val_loss: 1.9538 - val_accuracy: 0.6021
Epoch 5/12
469/469 [==============================] - 3s 6ms/step - loss: 1.9224 - accuracy: 0.4845 - val_loss: 1.7981 - val_accuracy: 0.6611
Epoch 6/12
469/469 [==============================] - 3s 6ms/step - loss: 1.7748 - accuracy: 0.5376 - val_loss: 1.6182 - val_accuracy: 0.7039
Epoch 7/12
469/469 [==============================] - 3s 6ms/step - loss: 1.6184 - accuracy: 0.5750 - val_loss: 1.4296 - val_accuracy: 0.7475
Epoch 8/12
469/469 [==============================] - 3s 7ms/step - loss: 1.4612 - accuracy: 0.6107 - val_loss: 1.2484 - val_accuracy: 0.7719
Epoch 9/12
469/469 [==============================] - 3s 6ms/step - loss: 1.3204 - accuracy: 0.6402 - val_loss: 1.0895 - val_accuracy: 0.7945
Epoch 10/12
469/469 [==============================] - 3s 6ms/step - loss: 1.2019 - accuracy: 0.6650 - val_loss: 0.9586 - val_accuracy: 0.8097
Epoch 11/12
469/469 [==============================] - 3s 7ms/step - loss: 1.1050 - accuracy: 0.6840 - val_loss: 0.8552 - val_accuracy: 0.8216
Epoch 12/12
469/469 [==============================] - 3s 7ms/step - loss: 1.0253 - accuracy: 0.7013 - val_loss: 0.7734 - val_accuracy: 0.8337
Test loss: 0.7734305262565613
Test accuracy: 0.8337000012397766
Nowhere near 99.25% as when I import Keras.
What am I missing?
Discrepancy in optimiser parameters between keras and tensorflow.keras
So the crux of the issue lies in the different default parameters for the Adadelta optimisers in Keras and Tensorflow. Specifically, the different learning rates. We can see this with a simple check. Using the Keras version of the code, print(keras.optimizers.Adadelta().get_config()) outpus
{'learning_rate': 1.0, 'rho': 0.95, 'decay': 0.0, 'epsilon': 1e-07}
And in the Tensorflow version, print(tf.optimizers.Adadelta().get_config() gives us
{'name': 'Adadelta', 'learning_rate': 0.001, 'decay': 0.0, 'rho': 0.95, 'epsilon': 1e-07}
As we can see, there is a discrepancy between the learning rates for the Adadelta optimisers. Keras has a default learning rate of 1.0 while Tensorflow has a default learning rate of 0.001 (consistent with their other optimisers).
Effects of a higher learning rate
Since the Keras version of the Adadelta optimiser has a larger learning rate, it converges much faster and achieves a high accuracy within 12 epochs, while the Tensorflow Adadelta optimiser requires a longer training time. If you increased the number of training epochs, the Tensorflow model could potentially achieve a 99% accuracy as well.
The fix
But instead of increasing the training time, we can simply initialise the Tensorflow model to behave in a similar way to the Keras model by changing the learning rate of Adadelta to 1.0. i.e.
model.compile(
loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.optimizers.Adadelta(learning_rate=1.0), # Note the new learning rate
metrics=['accuracy'])
Making this change, we get the following performance running on Tensorflow:
Epoch 12/12
60000/60000 [==============================] - 102s 2ms/sample - loss: 0.0287 - accuracy: 0.9911 - val_loss: 0.0291 - val_accuracy: 0.9907
Test loss: 0.029134796149221757
Test accuracy: 0.9907
which is close to the desired 99.25% accuracy.
p.s. Incidentally, it seems that the different default parameters between Keras and Tensorflow is a known issue that was fixed but then reverted:
https://github.com/keras-team/keras/pull/12841 software development is hard.

Convolutional Neural Network seems to be randomly guessing

So I am currently trying to build a race recognition program using a convolution neural network. I'm inputting 200px by 200px versions of the UTKFaceRegonition dataset (put my dataset on a google drive if you want to take a look). Im using 8 different classes (4 races * 2 genders) using keras and tensorflow, each having about 700 images but I have done it with 1000. The problem is when I run the network it gets at best 13.5% accuracy and about 11-12.5% validation accuracy, with a loss around 2.079-2.081, even after 50 epochs or so it won't improve at all. My current hypothesis is that it is randomly guessing stuff/not learning because 8/100=12.5%, which is about what it is getting and on other models I have made with 3 classes it was getting about 33%
I noticed the validation accuracy is different on the first and sometimes second epoch, but after that it ends up staying constant. I've increased the pixel resolution, changed amount of layers, types of layer and neurons per layer, I've tried optimizers (sgd at the normal lr and at very large and small (.1 and 10^-6) and I've tried different loss functions like KLDivergence but nothing seems to have any effect on it except KLDivergence which on one run did pretty well (about 16%) but then it flopped again. Some ideas I had are maybe theres too much noise in the dataset or maybe it has to do with the amount of dense layers, but honestly I dont know why it is not learning.
Heres the code to make the tensors
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import os
import cv2
import random
import pickle
WIDTH_SIZE = 200
HEIGHT_SIZE = 200
CATEGORIES = []
for CATEGORY in os.listdir('./TRAINING'):
CATEGORIES.append(CATEGORY)
DATADIR = "./TRAINING"
training_data = []
def create_training_data():
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path)[:700]:
try:
img_array = cv2.imread(os.path.join(path,img), cv2.IMREAD_COLOR)
new_array = cv2.resize(img_array,(WIDTH_SIZE,HEIGHT_SIZE))
training_data.append([new_array,class_num])
except Exception as error:
print(error)
create_training_data()
random.shuffle(training_data)
X = []
y = []
for features, label in training_data:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, WIDTH_SIZE, HEIGHT_SIZE, 3)
y = np.array(y)
pickle_out = open("X.pickle", "wb")
pickle.dump(X, pickle_out)
pickle_out = open("y.pickle", "wb")
pickle.dump(y, pickle_out)
Heres my built model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
import pickle
pickle_in = open("X.pickle","rb")
X = pickle.load(pickle_in)
pickle_in = open("y.pickle","rb")
y = pickle.load(pickle_in)
X = X/255.0
model = Sequential()
model.add(Conv2D(256, (2,2), activation = 'relu', input_shape = X.shape[1:]))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(8, activation="softmax"))
model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['accuracy'])
model.fit(X, y, batch_size=16,epochs=100,validation_split=.1)
Heres a log of 10 epochs I ran.
5040/5040 [==============================] - 55s 11ms/sample - loss: 2.0803 - accuracy: 0.1226 - val_loss: 2.0796 - val_accuracy: 0.1250
Epoch 2/100
5040/5040 [==============================] - 53s 10ms/sample - loss: 2.0797 - accuracy: 0.1147 - val_loss: 2.0798 - val_accuracy: 0.1161
Epoch 3/100
5040/5040 [==============================] - 53s 10ms/sample - loss: 2.0797 - accuracy: 0.1190 - val_loss: 2.0800 - val_accuracy: 0.1161
Epoch 4/100
5040/5040 [==============================] - 53s 11ms/sample - loss: 2.0797 - accuracy: 0.1173 - val_loss: 2.0799 - val_accuracy: 0.1107
Epoch 5/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1183 - val_loss: 2.0802 - val_accuracy: 0.1107
Epoch 6/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1226 - val_loss: 2.0801 - val_accuracy: 0.1107
Epoch 7/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1238 - val_loss: 2.0803 - val_accuracy: 0.1107
Epoch 8/100
5040/5040 [==============================] - 54s 11ms/sample - loss: 2.0797 - accuracy: 0.1169 - val_loss: 2.0802 - val_accuracy: 0.1107
Epoch 9/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1212 - val_loss: 2.0803 - val_accuracy: 0.1107
Epoch 10/100
5040/5040 [==============================] - 53s 11ms/sample - loss: 2.0797 - accuracy: 0.1177 - val_loss: 2.0802 - val_accuracy: 0.1107
So yeah, any help on why my network seems to be just guessing? Thank you!
The problem lies in the design of you network.
Typically you'd want in the first layers to learn high-level features and use larger kernel with odd size. Currently you're essentially interpolating neighbouring pixels. Why odd size? Read e.g. here.
Number of filters typically increases from small (e.g. 16, 32) number to larger values when going deeper into the network. In your network all layers learn the same number of filters. The reasoning is that the deeper you go, the more fine-grained features you'd like to learn - hence increase in number of filters.
In your ANN each layer also cuts out valuable information from the image (by default you are using valid padding).
Here's a very basic network that gets me after 40 seconds and 10 epochs over 95% training accuracy:
import pickle
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D
pickle_in = open("X.pickle","rb")
X = pickle.load(pickle_in)
pickle_in = open("y.pickle","rb")
y = pickle.load(pickle_in)
X = X/255.0
model = Sequential()
model.add(Conv2D(16, (5,5), activation = 'relu', input_shape = X.shape[1:], padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(32, (3,3), activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3,3), activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(512))
model.add(Dense(8, activation='softmax'))
model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['accuracy'])
Architecture:
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_19 (Conv2D) (None, 200, 200, 16) 1216
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 100, 100, 16) 0
_________________________________________________________________
conv2d_20 (Conv2D) (None, 100, 100, 32) 4640
_________________________________________________________________
max_pooling2d_15 (MaxPooling (None, 50, 50, 32) 0
_________________________________________________________________
conv2d_21 (Conv2D) (None, 50, 50, 64) 18496
_________________________________________________________________
max_pooling2d_16 (MaxPooling (None, 25, 25, 64) 0
_________________________________________________________________
flatten_4 (Flatten) (None, 40000) 0
_________________________________________________________________
dense_7 (Dense) (None, 512) 20480512
_________________________________________________________________
dense_8 (Dense) (None, 8) 4104
=================================================================
Total params: 20,508,968
Trainable params: 20,508,968
Non-trainable params: 0
Training:
Train on 5040 samples, validate on 560 samples
Epoch 1/10
5040/5040 [==============================] - 7s 1ms/sample - loss: 2.2725 - accuracy: 0.1897 - val_loss: 1.8939 - val_accuracy: 0.2946
Epoch 2/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.7831 - accuracy: 0.3375 - val_loss: 1.8658 - val_accuracy: 0.3179
Epoch 3/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.4857 - accuracy: 0.4623 - val_loss: 1.9507 - val_accuracy: 0.3357
Epoch 4/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.1294 - accuracy: 0.6028 - val_loss: 2.1745 - val_accuracy: 0.3250
Epoch 5/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.8060 - accuracy: 0.7179 - val_loss: 3.1622 - val_accuracy: 0.3000
Epoch 6/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.5574 - accuracy: 0.8169 - val_loss: 3.7494 - val_accuracy: 0.2839
Epoch 7/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.3756 - accuracy: 0.8813 - val_loss: 4.9125 - val_accuracy: 0.2643
Epoch 8/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.3001 - accuracy: 0.9036 - val_loss: 5.6300 - val_accuracy: 0.2821
Epoch 9/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.2345 - accuracy: 0.9337 - val_loss: 5.7263 - val_accuracy: 0.2679
Epoch 10/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.1549 - accuracy: 0.9581 - val_loss: 7.3682 - val_accuracy: 0.2732
As you can see, validation score is terrible, but the point was to demonstrate that poor architecture can prevent training altogether.