Loss function and Loss Weight for Multi-Output Keras Classification model - tensorflow

I am trying to understand the loss function using Keras functional API.
I have a sample multi-output model based on the B-CNN model.
img_input = Input(shape=input_shape, name='input')
#--- block 1 ---
x = Conv2D(32, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
#--- coarse 1 branch ---
c_1_bch = Flatten(name='c_flatten')(x)
c_1_bch = Dense(64, activation='relu', name='c_dense')(c_1_bch)
c_1_bch = BatchNormalization()(c_1_bch)
c_1_bch = Dropout(0.5)(c_1_bch)
c_1_pred = Dense(num_c, activation='softmax', name='pred_coarse')(c_1_bch)
#--- block 3 ---
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
x = BatchNormalization()(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)
#--- fine block ---
x = Flatten(name='flatten')(x)
x = Dense(128, activation='relu', name='fc_1')(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
fine_pred = Dense(num_classes, activation='softmax', name='pred_fine')(x)
model = keras.Model(inputs= [img_input],
outputs= [c_1_pred, fine_pred],
name='B-CNN_Model')
This classification model takes one input and provides 2 predictions.
According to this post, we need to compile it first with the proper loss function, metrics, and optimizer by mentioning the name variables for each output layer.
I have done this in the following way.
model.compile(optimizer = optimizers.SGD(learning_rate=0.003, momentum=0.9, nesterov=True),
loss={'pred_coarse':'mse',
'pred_fine':'categorical_crossentropy'},
loss_weights={'pred_coarse':beta,
'pred_fine':gamma},
metrics={'pred_coarse':'accuracy',
'pred_fine':'accuracy'})
[Note: Here, output layer pred_coarse is using Mean Square Error and pred_fine is using Categorical Cross Entropy loss function. The loss_weights beta and gamma are variable and update the value after certain epochs using keras.callbacks.Callback function ]
Now, My question is, what happens if we compile the model without mentioning the name variables for each output layer and provide only one function instead? For example, we compile the model as follows:
model.compile(optimizer=optimizers.SGD(learning_rate=0.003, momentum=0.9, nesterov=True),
loss='categorical_crossentropy',
loss_weights=[beta, gamma],
metrics=['accuracy'])
Unlike the previous compile example, this one uses the Categorical Cross Entropy loss function. The model compiles and runs without any errors. Does the model using Categorical Cross Entropy loss function for both pred_coarse and pred_fine output layers?

Related

Keras Shape errors when trying to use pre-trained model

I want to use a pre-trained model (from Keras Applications), with weights, and append my (very simple) CNN model at the end. To this end I am trying to loosely follow the tutorial here under the sub-header 'Fine-tune InceptionV3 on a new set of classes'.
My original simple CNN model was this:
model = Sequential()
model.add(Rescaling(1.0 / 255))
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(256,256,3)))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Flatten())
model.add(Dense(units=5, activation='softmax'))
As I'm following the tutorial, I've converted it as so:
x = base_model.output
x = Rescaling(1.0 / 255)(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(256,256,3))(x)
x = MaxPool2D(pool_size=(2, 2), strides=2)(x)
x = Conv2D(64, kernel_size=(3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2), strides=2)(x)
x = GlobalAveragePooling2D()(x)
predictions = Dense(units=5, activation='softmax')(x)
As you can see, the difference is that the top model is a Sequential() model while the bottom is Functional (I think?), and also, that the Flatten() layer has been replaced with GlobalAveragePooling2D(). I did this because I kept getting shape-related errors and it wasn't compiling. I thought I got it once I replaced the Flatten() layer with the GlobalAveragePooling() as this part of the code finally did compile, however now that I'm trying to train the model, it's giving me the following error:
ValueError: Exception encountered when calling layer "max_pooling2d_7" (type MaxPooling2D).
Negative dimension size caused by subtracting 2 from 1 for '{{node model/max_pooling2d_7/MaxPool}} = MaxPool[T=DT_FLOAT, data_format="NHWC", explicit_paddings=[], ksize=[1, 2, 2, 1], padding="VALID", strides=[1, 2, 2, 1]](model/conv2d_10/Relu)' with input shapes: [?,1,1,64].
Call arguments received:
• inputs=tf.Tensor(shape=(None, 1, 1, 64), dtype=float32)
I don't want to remove the MaxPooling layer as I want this fine-tuned model append to be as close to the 'simple CNN' model I originally had, so that I can compare the two results. But I keep getting hit with these shape errors, which I don't really understand, and it's coming to the end of the day.
Is there a nice quick-fix that can enable this VGG16+simple CNN to work?
the first most important technical problem in your model structure is that you are rescaling images after passed through the base_model, so you should implement it just before the base model
the second one is that you have defined input_shape in the model above in convolution layer while data first pass throught base model, so you should define input layer before base model and then pass its output thorough base_model and the other layers
here i've edited your code:
inputs = Input(shape = (input_shape=(256,256,3))
x = Rescaling(1.0 / 255)(inputs)
x = base_model(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2), strides=2)(x)
x = Conv2D(64, kernel_size=(3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2), strides=2)(x)
x = GlobalAveragePooling2D()(x)
predictions = Dense(units=5, activation='softmax')(x)
model = keras.Model(inputs = [inputs], outputs = [predictions])
And for the error raised, in this case you could set convolution layers padding parameter to 'same' or even resize images to larger size to override the problem.

Tensorflow NIH Chest X-ray CNN validation accuracy not improving even with regularization

I’ve been working on a CNN that takes in a 224x224 grayscale xray image and outputs either 0 or 1 based on whether it detects an abnormality.
This is the dataset I am using. I split the dataset into two with 106496 images for training and the remaining 5624 for validation. Since they’re both from the same dataset, they should both come from the same distribution.
I tried training the model I described above using the pretrained InceptionV3 and VGG19 architectures without success. I then tried making my own model similar to the VGG19 architecture.
I simplified the model as much as possible so that the training accuracy was above 90% and added various regularizers such as dropout and l2. I also tried different hyperparameters and image augmentation but the validation accuracy wouldn’t exceed 70% after 5-10 epochs. The validation loss doesn't seem to drop at all either.
Here are my accuracy vs epoch and loss vs epoch curves (pink is train, green in validation):
And here is my code:
def create_model(settings):
"""
Create a basic model
"""
# create model
model = tf.keras.models.Sequential()
model.add(layers.Input((224, 224, 1)))
# block 1
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block1_conv'))
model.add(layers.MaxPool2D((2, 2), strides=(2, 2), name='block1_pool'))
# block 2
model.add(layers.Conv2D(96, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block2_conv'))
model.add(layers.MaxPool2D((2, 2), strides=(2, 2), name='block2_pool'))
# block 3
model.add(layers.Conv2D(192, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block3_conv1'))
model.add(layers.Conv2D(192, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block3_conv2'))
model.add(layers.MaxPool2D((2, 2), strides=(2, 2), name='block3_pool'))
# block 4
model.add(layers.Conv2D(384, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block4_conv1'))
model.add(layers.Conv2D(384, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block4_conv2'))
model.add(layers.Conv2D(384, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block4_conv3'))
model.add(layers.MaxPool2D((2, 2), strides=(2, 2), name='block4_pool'))
# block 5
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block5_conv1'))
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block5_conv2'))
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block5_conv3'))
model.add(layers.MaxPool2D((2, 2), strides=(2, 2), name='block5_pool'))
# fully connected
model.add(layers.GlobalAveragePooling2D(name='fc_pool'))
model.add(layers.Dropout(0.3, name='fc_dropout'))
model.add(layers.Dense(1, activation='sigmoid', name='fc_output'))
# compile model
model.compile(
optimizers.SGD(
learning_rate=settings["lr_init"],
momentum=settings["momentum"],
),
loss='binary_crossentropy',
metrics=[
'accuracy',
metrics.Precision(),
metrics.Recall(),
metrics.AUC()
]
)
model.summary()
return model
def configure_callbacks(settings):
"""
Create a list of callback objects
"""
# tensorboard
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
# learning rate reduction on plateau
lrreduce_callback = callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=settings["lr_factor"],
patience=settings["lr_patience"],
min_lr=settings["lr_min"],
verbose=1,
)
# save model
checkpoint_callback = callbacks.ModelCheckpoint(
filepath="saves/" + settings["modelname"] + "/cp-{epoch:03d}",
monitor='val_accuracy',
mode='max',
save_weights_only=True,
save_best_only=True,
verbose=1,
)
return [tensorboard_callback, lrreduce_callback, checkpoint_callback]
def get_data(settings):
"""
Create a generator that will be used for training
"""
df=pd.read_csv("dataset/y_train_binary.csv")
columns = [
"Abnormal"
]
datagen = ImageDataGenerator(
rescale=1./255.,
rotation_range=5,
brightness_range=(0.9, 1.1),
zoom_range=(1, 1.1),
)
# 94.983% for training (106496 = 64*6656)
traindata = datagen.flow_from_dataframe(
dataframe=df[:NTRAIN],
directory="dataset/images",
x_col="Image Index",
y_col=columns,
color_mode='grayscale',
batch_size=settings["batchsize"],
class_mode="raw",
target_size=(224,224),
shuffle=True,
)
# 5.017% for testing (5624)
testdata = datagen.flow_from_dataframe(
dataframe=df[NTRAIN:],
directory="dataset/images",
x_col="Image Index",
y_col=columns,
color_mode='grayscale',
batch_size=settings["batchsize"],
class_mode="raw",
target_size=(224,224),
shuffle=True,
)
return (traindata, testdata)
def newtrain(settings):
"""
Create a new model "(modelname)" and train for (epoch) epochs
"""
model = create_model(settings)
callbacks = configure_callbacks(settings)
traindata, testdata = get_data(settings)
# train
model.fit(
x=traindata,
epochs=settings["epoch"],
validation_data=testdata,
callbacks=callbacks,
verbose=1,
)
model.save_weights(f"saves/{settings['modelname']}/cp-{settings['epoch']:03d}")
I’m running out of ideas and it takes half a day to train 50 epochs so I would appreciate if anyone knows how I can solve this issue. Thanks.
If you do some EDA on NIH Chest X-rays you may also see that there is a significant class imbalance issue among 14 classes. By your model definition, I can assume that you put a normal image on one side and an abnormal (13 cases) on the other side. First of all, if this true, I would say, it's better to classify all cases - all are important in clinician practice.
Shift to 14 cases classification
You're using your own design model, but you should first start with the pre-trained model. It's better and next you can gradually integrate your own idea.
Use pretriend model, e.g DenseNet, EfficientNet, NFNet etc
In your data generator, you use shuffle=True for the test set, which is wrong, rather it should be False.
testdata = datagen.flow_from_dataframe(
....
target_size=(224,224),
shuffle=False
For better control of your input pipeline, IMO, you should write your own custom data generator and experiment with advanced augmentation to prevent overfitting stuff.

Custom loss function: How to add hidden layer's output in loss function in keras with Tensorflow

In my model, the output of the hidden layer, namely 'encoded', has two channels (eg. shape: [none, 128, 128, 2]). I hope to add SSIM between these two channels in loss function:
loss = ssim(input, output) + theta*ssim(encoded(channel1), encoded(channel2)).
How could I implement this? The following is the architecture of my model.
def structural_similarity_index(y_true, y_pred):
loss = 1 - tf.image.ssim(y_true, y_pred, max_val=1.0)
return loss
def mymodel():
input_img = Input(shape=(256, 256, 1))
# encoder
x = Conv2D(4, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
encoded = Conv2D(2, (3, 3), activation='relu', padding='same', name='encoder')(x)
# decoder
x = Conv2D(4, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer = 'adadelta', loss = structural_similarity_index)
autoencoder.summary()
return autoencoder
I tried to define a 'loss_warper' function as shown below, but it didn't work. This is how I added this loss function:
autoencoder.add_loss(loss_wrapper(encoded[:,:,:,0],encoded[:,:,:,1])(input_img, decoded))
the 'loss_warper' function:
def loss_wrapper(CH1, CH2):
def structural_similarity_index(y_true, y_pred):
regweight = 0.01
loss = 1 - tf.image.ssim(y_true, y_pred, max_val=1.0)
loss = loss + regweight*(1-tf.image.ssim(CH1, CH2, max_val=1.0))
return loss
return structural_similarity_index
The error message:
File "E:/Autoencoder.py", line 160, in trainprocess
validation_data= (x_validate, x_validate))
...
ValueError: ('Error when checking model target: expected no data, but got:', array([...]...[...]))
Does anyone know how to implement this? Any help is highly appreciated!

Pretrained Tensorflow model RGB -> RGBY channel extension

I am working on the protein analysis project. We receive the images* of proteins with 4 filters (Red, Green, Blue and Yellow). Every of those RGBY channels contains unique data as different cellular structures are visible with different filters.
The idea is to use a pre-trained network e.g. VGG19 and extend the number of channels from default 3 to 4. Something like this:
(My appologies, I am not allowed to add images directly before 10 reputation, please press the "Run code snippet" button to visualize):
<img src="https://i.stack.imgur.com/TZKka.png" alt="Italian Trulli">
Picture: VGG model with RGB extended to RGBY
The Y channel should be the copy of the existing pretrained channel. Then it is possible to make use of the pretrained weights.
Does anyone have an idea of how such extension of a pretrained network can be achieved?
*
Author of the collage - Allunia from Kaggle, "Protein Atlas - Exploration and Baseline" kernel.
Use the layer.get_weights() and layer.set_weights() functions of Keras api.
Create a template structure for 4-layers VGG (set input shape=(width, height, 4)). Then load the weights from 3-channel RGB model into 4-channel as RGBB.
Below is the code that does the procedure. In case of sequential VGG, the only layer that needs to be modified is the first Convolution layer. The structure of the subsequent layers is independent on the number of channels.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from keras.applications.vgg19 import VGG19
from keras.models import Model
vgg19 = VGG19(weights='imagenet')
vgg19.summary() # To check which layers will be omitted in 'pretrained' model
# Load part of the VGG without the top layers into 'pretrained' model
pretrained = Model(inputs=vgg19.input, outputs=vgg19.get_layer('block5_pool').output)
pretrained.summary()
#%% Prepare model template with 4 input channels
config = pretrained.get_config() # run config['layers'][i] for reference
# to restore layer-by layer structure
from keras.layers import Input, Conv2D, MaxPooling2D
from keras import optimizers
# For training from scratch change kernel_initializer to e.g.'VarianceScaling'
inputs = Input(shape=(224, 224, 4), name='input_17')
# block 1
x = Conv2D(64, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block1_conv1')(inputs)
x = Conv2D(64, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block1_conv2')(x)
x = MaxPooling2D(pool_size=(2, 2), name='block1_pool')(x)
# block 2
x = Conv2D(128, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block2_conv1')(x)
x = Conv2D(128, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block2_conv2')(x)
x = MaxPooling2D(pool_size=(2, 2), strides=(2,2), name='block2_pool')(x)
# block 3
x = Conv2D(256, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block3_conv1')(x)
x = Conv2D(256, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block3_conv2')(x)
x = Conv2D(256, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block3_conv3')(x)
x = Conv2D(256, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block3_conv4')(x)
x = MaxPooling2D(pool_size=(2, 2), strides=(2,2), name='block3_pool')(x)
# block 4
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block4_conv1')(x)
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block4_conv2')(x)
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block4_conv3')(x)
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block4_conv4')(x)
x = MaxPooling2D(pool_size=(2, 2), strides=(2,2), name='block4_pool')(x)
# block 5
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block5_conv1')(x)
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block5_conv2')(x)
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block5_conv3')(x)
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block5_conv4')(x)
x = MaxPooling2D(pool_size=(2, 2), strides=(2,2), name='block5_pool')(x)
vgg_template = Model(inputs=inputs, outputs=x)
vgg_template.compile(optimizer=optimizers.RMSprop(lr=2e-4),
loss='categorical_crossentropy',
metrics=['acc'])
#%% Rewrite the weight loading/modification function
import numpy as np
layers_to_modify = ['block1_conv1'] # Turns out the only layer that changes
# shape due to 4th channel is the first
# convolution layer.
for layer in pretrained.layers: # pretrained Model and template have the same
# layers, so it doesn't matter which to
# iterate over.
if layer.get_weights() != []: # Skip input, pooling and no weights layers
target_layer = vgg_template.get_layer(name=layer.name)
if layer.name in layers_to_modify:
kernels = layer.get_weights()[0]
biases = layer.get_weights()[1]
kernels_extra_channel = np.concatenate((kernels,
kernels[:,:,-1:,:]),
axis=-2) # For channels_last
target_layer.set_weights([kernels_extra_channel, biases])
else:
target_layer.set_weights(layer.get_weights())
#%% Save 4 channel model populated with weights for futher use
vgg_template.save('vgg19_modified_clear.hdf5')
Beyond the RGBY case, the following snippet works generally by copying or removing the layer's weights and/or biases vectors dimensions as needed. Please refer to numpy documentation on what numpy.resize does: in the case of the original question it copies the B-channel weights onto the Y-channel (or more generally onto any higher dimensionality).
import numpy as np
import tensorflow as tf
...
model = ... # your RGBY model is here
pretrained_model = tf.keras.models.load_model(...) # pretrained RGB model
# the following assumes that the layers match with the two models and
# only the shapes of weights and/or biases are different
for pretrained_layer, layer in zip(pretrained_model.layers, model.layers):
pretrained = pretrained_layer.get_weights()
target = layer.get_weights()
if len(pretrained) == 0: # skip input, pooling and other no weights layers
continue
try:
# set the pretrained weights as is whenever possible
layer.set_weights(pretrained)
except:
# numpy.resize to the rescue whenever there is a shape mismatch
for idx, (l1, l2) in enumerate(zip(pretrained, target)):
target[idx] = np.resize(l1, l2.shape)
layer.set_weights(target)

CNN Overfitting

I have a siamese CNN that is performing very well (96% accuracy, 0.08 loss) on training data but poorly (70% accuracy, 0.1 loss) on testing data.
The architecture is below:
input_main = Input(shape=input_shape, dtype='float32')
x = Conv2D(32, (3, 3), padding='same', activation='relu',
kernel_regularizer=l2(0.005))(input_main)
x = Conv2D(16, (5, 5), activation='relu',
kernel_regularizer=l2(0.005))(x)
x = MaxPooling2D(pool_size=(5, 5))(x)
x = Dropout(0.5)(x)
x = Conv2D(32, (3, 3), padding='same', activation='relu',
kernel_regularizer=l2(0.0005))(x)
x = Conv2D(32, (7, 7), activation='relu',
kernel_regularizer=l2(0.005))(x)
x = MaxPooling2D(pool_size=(3, 3))(x)
x = Dropout(0.5)(x)
x = Flatten()(x)
#x = Dropout(0.5)(x)
x = Dense(16, activation='relu',
kernel_regularizer=l2(0.005))(x)
model = Model(inputs=input_main, outputs=x)
Two of these are then combined to make a siamese architecture, and the difference between the vectors from the final layer informs the result. I have experimented with dropout and regularization, and neither has been able to solve the problem (these parameters are the ones I am testing at time of posting)
I have also tried simplifying the architecture to fewer conv layers, and this has not solved the problem.
The data is 256x128x1 images, sent through the network in pairs with binary labels based on whether they are the same or not. I also use data augmentation, with some small rotations and translations.
Can anyone suggest anything else to try to solve this overfitting problem?