I am attempting to create a simple CNN to be able to distinguish eye (retinal) scans of different severities. It is a multi-class classification problem, 5 classes. This by now is probably a fairly standard, textbook case for CNNs. I am using the Kaggle EyePACs dataset. The photos are very big, so I'm using a dataset that has rescaled them.
My issue is, when I'm training the model, I expect to see the usual learning curves where both training and validation curves increase together like this example from google:
However my curves look like this:
I haven't done any image pre-processing on the data, I was hoping that there would be some rudimentary learning going on which I can then improve upon using CLAHE and what have you. I've changed the classes so that instead of trying to predict the grades from 0 to 4, I've removed the middle classes so that we just have the extremes: 0 and 4 (and thus it became a binary classification problem, where class 4 was relabelled 1 and so it's 0 and 1). However the curve didn't change much and still looks like this:
What could be the issue? I thought that as the model gets better with the training data, it must improve on the validation. Yes, this is overfitting, but I assumed that kicks in after some positive learning, not straight away. Validation set doesn't seem to be learning at all. Also, shouldn't these models start with random parameters, so that the initial accuracy would be random; but instead it's around 0.75 from the get-go. It just doesn't learn after that. What's going on? What should I look at changing? Is this a data problem or a hyperparameter problem? Shall I include the code here? Many thanks.
Here's the code I used. I know it's rudimentary, it's a mishmash of both the 'image classification from scratch' Keras tutorial as well as some standard MNIST tutorials you get around the web. Grateful for any pointers.
Creating the image-label dataset objects for train (+validation split) and test:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/content/drive/MyDrive/Colab Notebooks/resized train 15/Binary 0-4",
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/content/drive/MyDrive/Colab Notebooks/resized train 15/Binary 0-4",
test_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/content/drive/MyDrive/Colab Notebooks/resized test 15/0-4/",
Found 26518 files belonging to 2 classes.
Using 21215 files for training.
Found 26518 files belonging to 2 classes.
Using 5303 files for validation.
Found 36759 files belonging to 2 classes.
#To make it run faster (I think?):
train_ds = train_ds.prefetch(buffer_size=32)
val_ds = val_ds.prefetch(buffer_size=32)
test_ds = test_ds.prefetch(buffer_size=32)
#The architecture:
from keras.models import Sequential
from keras.layers import Dense, Rescaling, Conv2D, MaxPool2D, Flatten
model = Sequential()
model.add(Rescaling(1.0 / 255))
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(256,256,3)))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Dense(units=2, activation='sigmoid'))
#Compile it:
from keras import optimizers
model.compile(optimizer=keras.optimizers.Adam(1e-3), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
#And then finally train (run) it:
history = model.fit(
#I think this is how I evaluate the trained model against the test data:
loss, acc = model.evaluate(test_ds)
print("Accuracy", acc)
#It prints out the following output:
1149/1149 [==============================] - 278s 238ms/step - loss: 0.1408 - accuracy: 0.9672
Accuracy 0.9671916961669922
And then of course I end it with model.save('Binary CNN 0-4').
I think I have spotted one thing I can change already -- that's to change the loss function to binary_crossentropy and adjust the number of units at the final dense layer to 1 (instead of 2)(?). But surely that little change won't actually address why the validation set isn't learning.

You've not included code, so I hope it's OK to give a couple of
tentative general answers
Q) initial accuracy, how can it be as high as 0.75?
A) Tensorflow reports the average training accuracy over the epoch, and if
there are many batches then it learns during epoch 0.
The first accuracy reported is the average over epoch 0
and can be much better than random.
If, for example, the input data is unbalanced and has 75% of
labels in one category, the model may learn very quickly that
it can achieve 75% accuracy by allocating 100% of training data
to that category.
Q) Can overfitting start at the beginning?
A) It can start very close to the beginning. A network may in effect just be memorising the training set.
There are standard approaches to overfitting, which include
i) Try a simpler network. It makes sense anyway to start simple and add
complexity as required.
ii) Regularization of layers - add (e.g.) L2 regularizers to your layers
iii) Add dropout layers between hidden layers
iv) Batch normalisation between hidden layers.
v) Image augmentation (randomly add some rotation, shift, flipping if appropriate)
vi) Get more training data
vii) Use transfer learning
as another answer has suggested. This is most likely appropriate if
you don't have much training data. You can then just add a layer
or two to the pre-built model (probably removing its last
layer or two), and train only the new layers.
Only trial and error will show what works


