How freeze training of particular layer after particular epoches - tensorflow

I want to freeze training of first two layers of following code after 3rd epoch. Total epoch is set to 10.
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

How can I "freeze" Keras layers?
To "freeze" a layer means to exclude it from training, i.e. its weights will never be updated. This is useful in the context of fine-tuning a model or using fixed embeddings for a text input.
You can change the trainable attribute of a layer.
for layer in model.layers[:2]:
layer.trainable = False
For this to take effect, you will need to call compile() on your model after modifying the trainable property. If you don't you will receive a warning "Discrepancy between trainable weights and collected trainable weights" and all your layers will be still trainable. So:
Build and compile the model
Train it for 3 epochs
Freeze layers you want
compile the model again
Train the rest epochs

This should work:
for epoch in range(3):
model.fit(.., epochs=1)
# save the weights of this model
model.save_weights("weight_file.h5")
# freeze the layers you want
for layer in model.layers[:2]:
layer.trainable = False
In order to train further with these weights but first two layers frozen, you need to compile the model again.
model.compile(..)
# train further
for epoch in range(3, 10):
model.fit(..., epochs=1)

Related

Keras binary classification model's AUC score doesn't increase

I have a imbalanced dataset which has 57000 zeros and 2500 ones. I gave class weights as an input to my model, tried to change optimizers, tried to resize number of layers and neurons. Finally I stick to ;
because it was the only one that seems systematic, tried to change layer weight regularization rules but nothing helped me yet. I am not talking about just for my validation AUC score, even train success doesn't rise satisfyingly.
Here is how I declared my model, don't mind if you think the problem is layer and node sizes. I think I tried everything that sounds sensible.
class_weight = {0: 23.59,
1: 1.}
model=Sequential()
model.add(Dense(40, input_dim=x_train.shape[1], activation='relu'))
model.add(Dense(33, activation='relu',kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),bias_regularizer=regularizers.l2(1e-4),activity_regularizer=regularizers.l2(1e-5)))
model.add(Dense(28, activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(Dense(15, activation='relu'))
model.add(Dense(9, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(1, activation='sigmoid',kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),bias_regularizer=regularizers.l2(1e-4),activity_regularizer=regularizers.l2(1e-5)))
opt = keras.optimizers.SGD(learning_rate=0.1)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['AUC'])
model.fit(x_train,y_train,epochs=600,verbose=1,validation_data=(x_test,y_test),class_weight=class_weight)
After approximate 100 epoch, it was stuck at 0.73-0.75 auc, doesn't rise anymore. I couldn't even overfit my model

Why is my CNN/Image Classifier model accuracy so low?

I'm currently trying to build a CNN that can detect whether a patient has pnemonia caused by covid or not, and no matter what parameters I change the model accuracy is staying at 49%/50% so its basically useless because it's the same as a coin flip. Here is my code, I thought I would try using the VGG-16 model.
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, GlobalAveragePooling2D
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from keras.preprocessing.image import ImageDataGenerator
# Loading in the dataset
traindata = ImageDataGenerator(rescale=1/255)
trainingdata = traindata.flow_from_directory(
directory="Covid-19CT/TrainingData",
target_size=(224,224),
batch_size=100,
class_mode="binary")
testdata = ImageDataGenerator(rescale=1/255)
testingdata = testdata.flow_from_directory(
directory="Covid-19CT/TestingData",
target_size=(224,224),
batch_size=100,
class_mode="binary")
# Initialize the model w/ Sequential & add layers + input and output <- will refer to the VGG 16 model architecture
model = Sequential()
model.add(Conv2D(input_shape=(224,224,3),filters=64,kernel_size=(2,2),padding="same", activation="relu"))
model.add(Conv2D(filters=64, kernel_size=(3,3), padding="same", activation ="relu"))
model.add(MaxPool2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2), strides=2))
model.add(GlobalAveragePooling2D())
model.add(Dense(units=4096, activation="relu"))
model.add(Dense(units=4096, activation="relu"))
model.add(Dense(units=1000, activation="relu"))
model.add(Dense(units=1, activation="softmax"))
# Compile the model
model_optimizer = Adam(lr=0.001)
model.compile(optimizer=model_optimizer, loss=keras.losses.binary_crossentropy, metrics=['accuracy'])
# Add the callbacks
checkpoint = ModelCheckpoint(filepath="Covid-19.hdf5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto')
early = EarlyStopping(monitor='val_acc', min_delta=0, patience=50, verbose=1, mode='auto')
fit = model.fit_generator(steps_per_epoch=25, generator=trainingdata, validation_data=testingdata, validation_steps=10,epochs=10,callbacks=[checkpoint,early])
This always gives:
Epoch 1/10 6/25 [======>.......................] - ETA: 1:22:37 -
loss: 7.5388 - accuracy: 0.5083
<- Well, it just always gives a really poor accuracy...
Additional info:
Some of the images in the data set are JPG others are PNG (Not sure if this is the culprit)
The Dataset has 2072 images for training Covid CTs and 2098 images for training NonCovid CTs
The Dataset has 576 images for testing Covid CTs and 532 images for testing NonCovid CTs
File structure looks like this: Covid19ModelImages -> Training Data & Testing Data - Training Data has 2 subfolders Covid19CT and noncovid19 CT and testing data also has 2 subfolders Covid19CT and noncovid19CT
Also: Am I just being too impatient? I never let it run past the 1st epoch cause I just assume its never going to get better than 50%, could it be that the model will improve more on the next epochs?
If anyone would be willing to help out, or if you need any other additional info to maybe help you gain a better understanding of the problem, please let me know!
Since you are using binary cross entropy, the activation function in the dense layer with 1 unit should be "sigmoid". Since you are not using a GPU you have very long training times per epoch. To see if the model is working correctly you may want to reduce this time. There are few things you could do. Try reducing the image size say to 128 by 128. With 224 X 224 you have 50176 pixels to process versus 16384 for the 128 X 128 image so you reduce the computations by about a factor of 3. Also you have two dense layers with 4096 units. This is also computationally expense. It may also lead to overfitting. Try your model initially without these layers and see how it performs. I am not a fan of early stopping because it is a crutch to avoid dealing with the over fitting issue. If you encounter over fitting add a dropout layer to help avoid it. Finally I recommend you use an adjustable learning rate. The callback ReduceLROnPlateau makes this easy to do. Set it to monitor validation loss. You can set the parameters to reduce the learning rate a factor<1 if the loss fails to decrease after "patience" number of consecutive epochs. I usually use factor=.5 and patience=1. This also enables you to use a larger initial learning rate for faster convergence. Documentation is here. You need to let your model run for several epochs to see if the training loss and validation loss are decreasing.

Keras: How to load CNN pre-trained weights (freezing the net) to use them in LSTM?

I have this cnn model:
model = Sequential()
model.add(Convolution2D(32, (3, 3), activation='relu', input_shape=(n_rows,n_cols,1)))
model.add(Convolution2D(32, (3, 3), activation='relu'))
model.add(AveragePooling2D(pool_size=(1,3)))
model.add(Flatten())
model.add(Dense(1024, activation='relu')) #needed?
model.add(Dense(3)) #default linear activation
I can train it and obtain related weights.
After I want to load the weights up to Flatten (the dense part is not useful for the second stage) and pass the Flatten to an LSTM.
Of course, it is also suggested to use TimeDistributed on the CNN net.
How to do all this: load weights, take only CNN part, TimeDistribute it, and finally add LSTM?
Thanks!
You can use model.save_weights("filename.h5") to save the weights, and model.load_weights("filename.h5") to load them back into the model.
Source: https://keras.io/getting-started/faq/#savingloading-only-a-models-weights

How to specify number of layers in keras?

I'm trying to define a fully connected neural network in keras using tensorflow backend, I have a sample code but I dont know what it means.
model = Sequential()
model.add(Dense(10, input_dim=x.shape[1], kernel_initializer='normal', activation='relu'))
model.add(Dense(50, input_dim=x.shape[1], kernel_initializer='normal', activation='relu'))
model.add(Dense(20, input_dim=x.shape[1], kernel_initializer='normal', activation='relu'))
model.add(Dense(10, input_dim=x.shape[1], kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
model.add(Dense(y.shape[1],activation='softmax'))
From the above code I want to know what is the number of inputs to my network, number of outputs, number of hidden layers and number of neurons in each layer. And what is the number coming after model.add(Dense ? assuming x.shape[1]=60.
What is the name of this network exacly? Should I call it a fully connected network or convolutional network?
That should be quite easy.
For knowing about the model's inputs and outputs use,
input_tensor = model.input
output_tensor = model.output
You can print these tf.Tensor objects to get the shape and dtype.
For fetching the Layers of a model use,
layers = model.layers
print( layers[0].units )
With these tricks you can easily get the input and output tensors for a model or its layer.

A neural network that can't overfit?

I am fitting a model to some noisy satellite data. The labels are measurements of rock on the bars of a river. There is a noisy but significant relationship. I only have 250 points but the method would expand and eventually run off much bigger datasets. I'm looking at a mix of models (RANSAC, Huber, SVM Regression) and DNNs. My DNN results seem too good to be true. The network looks like:
model = Sequential()
model.add(Dense(128, kernel_regularizer= regularizers.l2(0.001), input_dim=NetworkDims, kernel_initializer='he_normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128, kernel_regularizer= regularizers.l2(0.001), kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, kernel_regularizer= regularizers.l2(0.001), kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, kernel_regularizer= regularizers.l2(0.001), kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, kernel_regularizer= regularizers.l2(0.001), kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
And when I save the history and plot training loss (green dots) and validation loss (cyan line) vs epoch I get this:
Training and validation loss just creep down. With a small dataset, I was expecting the validation loss to go its own way. In fact, if I run a 10-fold cross val score with this network, the error reported by cross val score does creep down. This just looks too good to be true. It implies that I could train this thing for 1000 epochs and still improve results. If it looks too good to be true, it usually is, but why?
EDIT: More results.
So I tried to cut dropout to 0.1 at each and remove the L2. Inteesting. With the toned-down drop-out, I get even better results:
10% dropout rate
Without the L2, there is overfitting:
No L2 reg
My guess would be that you have such a high dropout on every layer, which is why it's having trouble just overfitting on the training data. My prediction is that if you lower that dropout and regularization, it'll learn the training data much faster.
I'm not too sure if the results are too good to be true because it's hard to base how good a model is based on loss function. But it should be the dropout and regularization that is preventing it from overfitting in a few epochs.