I am training my first ML model. I am working on a 10-class classification problem. From what I can see, the model is overfitting since there is a significant difference between the training and validation accuracy.
This is the relevant code for the model
model = keras.Sequential()
model.add(keras.Input(shape=(x_train[0].shape)))
model.add(tf.keras.layers.Conv2D(filters=32,kernel_size=3, strides = (3, 3), padding = "same", activation = "relu", kernel_regularizer=tf.keras.regularizers.l1_l2(0.01)))
model.add(tf.keras.layers.MaxPool2D(strides=2))
model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), padding='valid', activation='relu', kernel_regularizer=tf.keras.regularizers.l1_l2(0.01)))
model.add(tf.keras.layers.MaxPool2D(strides=2))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(10))
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001/2)
model.summary()
model.compile(optimizer=optimizer,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs = 30, validation_data = (x_val, y_val), callbacks=tf.keras.callbacks.EarlyStopping(verbose=1, patience=4))
There are large fluctuations in the validation accuracy and I am not sure why.
I have tried augmenting the data and have also injected noise into the training data. (This is an audio classification problem with 10 different classes)
https://i.stack.imgur.com/TXe50.png
Related
I am (very) new to deep learning and I am trying to train a dog breed classifier using Tensorflow/Keras. I have selected a subset of 10 breeds to speed up calculations, and I am using all the images available in the Stanford dataset for those breeds, which I have placed in train/test/val directories. I have 1338 images for training, 379 images for validation and 200 images for test.
I have first tried building a simple CNN from scratch without data augmentation, and I quickly reached 99% accuracy for the training set and got stuck at 30% for the val set (which I assume is quite normal without augmentation ?)
Then I applied data augmentation and tried two approaches, building a CNN from scratch and using transfer learning. With the "home-made" CNN I can't reach more than around 30 % accuracy even for the training set, and I can't figure out what the problem is. And I am stuck around 80 % with transfer learning, which I guess is not that good either ?
Here is the code for data augmentation:
`
# Creating image generator steps
train_datagen = ImageDataGenerator(rescale=1.0/255.0,
rotation_range=60,
width_shift_range=0.3,
height_shift_range=0.3,
shear_range=0.2,
zoom_range=[0.5, 1.5],
brightness_range=[0.5, 1.5],
horizontal_flip=True
)
val_datagen = ImageDataGenerator(rescale=1.0/255.0)
test_datagen = ImageDataGenerator(rescale=1.0/255.0)
train_generator = train_datagen.flow_from_directory(
directory="split_output/train",
target_size=(224,224),
color_mode="rgb",
batch_size=8,
class_mode='sparse',
shuffle='True',
seed=42
)
val_generator = val_datagen.flow_from_directory(
directory="split_output/val",
target_size=(224,224),
color_mode="rgb",
batch_size=8,
class_mode='sparse',
shuffle='True',
seed=42
)
test_generator = test_datagen.flow_from_directory(
directory="split_output/test",
target_size=(224,224),
color_mode="rgb",
batch_size=8,
class_mode='sparse',
shuffle='False',
seed=42
)
`
Here is the first CNN I tried (for which accuracies are both stuck around 25 %):
`
# The CNN architecture
model = Sequential()
model.add(Conv2D(32,(3,3), padding="same", activation='relu',input_shape = (224,224,3)))
model.add(MaxPooling2D((2,2)))
# 32 = number of filters
# (3, 3) = kernel size
model.add(Conv2D(64,(3,3), padding="same", activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(64,(3,3), padding="same", activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dense(64,activation='relu'))
model.add(Dense(10,activation='softmax'))
# Fitting the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
history = model.fit_generator(train_generator,
# steps_per_epoch=1000,
epochs=50,
validation_data=val_generator,
# validation_steps=250,
verbose=1
)
`
And the second one, a bit deeper and including BatchNorm and Dropout (accuracies are stuck around 35%):
`
# The CNN architecture
model = Sequential()
model.add(Conv2D(32,(3,3), padding="same", activation='relu',input_shape = (224,224,3)))
model.add(MaxPooling2D((2,2)))
# 32 = number of filters
# (3, 3) = kernel size
model.add(Conv2D(32,(3,3),activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(64,(3,3), padding="same", activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(128,(3,3), padding="same", activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(512,activation='relu'))
model.add(Dense(10,activation='softmax'))
model.summary()
opt = Adam(lr=0.0001)
# Fitting the model
model.compile(optimizer=opt,
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
history = model.fit(train_generator,
# steps_per_epoch=1000,
epochs=50,
validation_data=val_generator,
# validation_steps=250,
verbose=1
)
`
Here is the history for that second CNN:
accuracies for 2nd CNN
And finally I tried with a resnet, which gets stuck around 90% for train and 80% for val:
`
model = Sequential()
model.add(ResNet50(include_top=False, pooling='avg', weights="imagenet"))
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(2048, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(1024, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(10, activation='softmax'))
opt = Adam(lr=0.0001)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(train_generator,
# steps_per_epoch=1000,
epochs=150,
validation_data=val_generator,
# validation_steps=250,
verbose=1
)
`
And the history for this last one:
resnet history
I'm a bit surprised at how the accuracies (especially val) get stuck so fast at a nearly constant value...
Again I'm very new at this so there could be very basic mistakes!
I'm trying to get the 'logits' out of my Keras CNN classifier.
I have tried the suggested method here: link.
First I created two models to check the implementation :
create_CNN_MNIST CNN classifier that returns the softmax probabilities.
create_CNN_MNIST_logits CNN with the same layers as in (1) with a little twist in the last layer - changed the activation function to linear to return logits.
Both models were fed with the same Train and Test data of MNIST. Then I applied softmax on the logits, I got a different output from the softmax CNN.
I couldn't find a problem in my code. Maybe you could help advise another method to extract 'logits' from the model?
the code:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
def create_CNN_MNIST_logits() :
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='linear'))
# compile model
opt = SGD(learning_rate=0.01, momentum=0.9)
def my_sparse_categorical_crossentropy(y_true, y_pred):
return keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=True)
model.compile(optimizer=opt, loss=my_sparse_categorical_crossentropy, metrics=['accuracy'])
return model
def create_CNN_MNIST() :
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
# compile model
opt = SGD(learning_rate=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
return model
# load data
X_train = np.load('./data/X_train.npy')
X_test = np.load('./data/X_test.npy')
y_train = np.load('./data/y_train.npy')
y_test = np.load('./data/y_test.npy')
#create models
model_softmax = create_CNN_MNIST()
model_logits = create_CNN_MNIST_logits()
pixels = 28
channels = 1
num_labels = 10
# Reshaping to format which CNN expects (batch, height, width, channels)
trainX_cnn = X_train.reshape(X_train.shape[0], pixels, pixels, channels).astype('float32')
testX_cnn = X_test.reshape(X_test.shape[0], pixels, pixels, channels).astype('float32')
# Normalize images from 0-255 to 0-1
trainX_cnn /= 255
testX_cnn /= 255
train_y_cnn = utils.to_categorical(y_train, num_labels)
test_y_cnn = utils.to_categorical(y_test, num_labels)
#train the models:
model_logits.fit(trainX_cnn, train_y_cnn, validation_split=0.2, epochs=10,
batch_size=32)
model_softmax.fit(trainX_cnn, train_y_cnn, validation_split=0.2, epochs=10,
batch_size=32)
On the evaluation stage, I'll do softmax on the logits to check if its the same as the regular model:
#predict
y_pred_softmax = model_softmax.predict(testX_cnn)
y_pred_logits = model_logits.predict(testX_cnn)
#apply softmax on the logits to get the same result of regular CNN
y_pred_logits_activated = softmax(y_pred_logits)
Now I get different values in both y_pred_logits_activated and y_pred_softmax that lead to different accuracy on the test set.
Your models are probably being trained differently, make sure to set the seed prior to both fit commands so that they're initialised the same weights and have the same train/val split. Also, is the softmax might be incorrect:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x)
return e_x / e_x.sum(axis=1)
This is numerically equivalent to subtracting the max (https://stackoverflow.com/a/34969389/10475762), and the axis should be 1 if your matrix is of shape [batch, outputs].
I have a dataset of about 500 .mat files 300 train and 200 test and these are really small sized cropped images that are at most 3kb each. when I try training on the below architecture with the following parameters, I get a test accuracy and loss of 69% and the validation accuracy over 25 epochs remains around 51%. I want to know how to improve my test accuracy and fix the constant validation accuracy problem.
note: The problem is a binary classification problem and the class split is in the 60:40 ratio
weight_decay = 1e-3
model = models.Sequential()
model.add(layers.Conv2D(16, (3, 3), kernel_regularizer=regularizers.l2(weight_decay),padding='same',input_shape=X_train.shape[1:]))
model.add(layers.Activation('relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Conv2D(32, (3, 3),kernel_regularizer=regularizers.l2(weight_decay), padding='same'))
model.add(layers.Activation('relu'))
#model.add(layers.Dropout(0.2))
model.add(layers.Flatten())
#model.add(layers.Dropout(0.4))
model.add(layers.Dense(20, activation='relu'))
model.add(layers.Dropout(0.50))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=optimizers.adam(lr=0.001), metrics=['acc'])
es_callback = callbacks.EarlyStopping(monitor='val_loss', patience=5)
history= model.fit(#train_generator,
X_train,Y_train,
batch_size= batch_size,
#steps_per_epoch=trainSize,
epochs=25,
validation_data=(X_val,Y_val),#val_generator,
#validation_steps=valSize,
#callbacks=[LearningRateScheduler(lr_schedule)]
callbacks=[es_callback]
)
I have 1D sequences which I want to use as input to a Keras VGG classification model, split in x_train and x_test. For each sequence, I also have custom features stored in feats_train and feats_test which I do not want to input to the convolutional layers, but to the first fully connected layer.
A complete sample of train or test would thus consist of a 1D sequence plus n floating point features.
What is the best way to feed the custom features first to the fully connected layer? I thought about concatenating the input sequence and the custom features, but I do not know how to make them separate inside the model. Are there any other options?
The code without the custom features:
x_train, x_test, y_train, y_test, feats_train, feats_test = load_balanced_datasets()
model = Sequential()
model.add(Conv1D(10, 5, activation='relu', input_shape=(timesteps, 1)))
model.add(Conv1D(10, 5, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.5, seed=789))
model.add(Conv1D(5, 6, activation='relu'))
model.add(Conv1D(5, 6, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.5, seed=789))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5, seed=789))
model.add(Dense(2, activation='softmax'))
model.compile(loss='logcosh', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=batch_size, epochs=20, shuffle=False, verbose=1)
y_pred = model.predict(x_test)
Sequential model is not very flexible. You should look into the functional API.
I would try something like this:
from keras.layers import (Conv1D, MaxPool1D, Dropout, Flatten, Dense,
Input, concatenate)
from keras.models import Model, Sequential
timesteps = 50
n = 5
def network():
sequence = Input(shape=(timesteps, 1), name='Sequence')
features = Input(shape=(n,), name='Features')
conv = Sequential()
conv.add(Conv1D(10, 5, activation='relu', input_shape=(timesteps, 1)))
conv.add(Conv1D(10, 5, activation='relu'))
conv.add(MaxPool1D(2))
conv.add(Dropout(0.5, seed=789))
conv.add(Conv1D(5, 6, activation='relu'))
conv.add(Conv1D(5, 6, activation='relu'))
conv.add(MaxPool1D(2))
conv.add(Dropout(0.5, seed=789))
conv.add(Flatten())
part1 = conv(sequence)
merged = concatenate([part1, features])
final = Dense(512, activation='relu')(merged)
final = Dropout(0.5, seed=789)(final)
final = Dense(2, activation='softmax')(final)
model = Model(inputs=[sequence, features], outputs=[final])
model.compile(loss='logcosh', optimizer='adam', metrics=['accuracy'])
return model
m = network()
Y_train = to_categorical(Y_train, num_classes = 10)#
random_seed = 2
X_train,X_val,Y_train,Y_val = train_test_split(X_train, Y_train, test_size = 0.1, random_state=random_seed)
Y_train.shape
model = Sequential()
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy',metrics=['accuracy'])
model.fit(X_train, Y_train, batch_size = 86, epochs = 3,validation_data = (X_val, Y_val), verbose =2)
I have to classify the MNIST data into 10 classes. I am converting the Y_train into one hot encoded array. I have gone through a number of answers but none have helped. Kindly guide me in this regard as I am a novice in ML and neural network.
It seems there is no need to use model.add(Flatten()) in your first layer. Instead of doing so, you can use a dense layer with a specific input size like: model.add(Dense(64, input_shape=your_input_shape, activation="relu").
To ensure this issue happens because of the layers, you can check whether to_categorical() function works alone with jupyter notebook.
Updated Answer
Before the model, you should reshape your model. In that case 28*28 to 784.
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))
I also suggest to normalize the data that could be done by simply dividing the images to 255
After that step you should create your model.
model = Sequential([
Dense(64, activation='relu', input_shape=(784,)),
Dense(64, activation='relu'),
Dense(10, activation='softmax'),
])
Have you noticed input_shape=(784,) That is the shape of your flattened input.
Last step, compiling and fitting.
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'],
)
model.fit(
train_images,
train_labels,
epochs=10,
batch_size=16,
)
What you do is you have just flattened the input layer without feeding the network with an input. That's why you experience an issue. The point is you should manually reshape your inputs and feed forward to the Dense() layers with parameter input_shape