How to show latent layer in tensorboard? - tensorflow

I have a trained auto-encoder model which I want to visualize the latent layer in tensor-board.
How can I do it ?
el1 = Conv2D(8, (3, 3), activation='relu', padding='same', input_shape=(224, 224, 3))
el2 = MaxPooling2D((2, 2), padding='same')
el3 = Conv2D(8, (3, 3), activation='relu', padding='same')
el4 = MaxPooling2D((2, 2), padding='same')
dl1 = Conv2DTranspose(8, (3, 3), strides=2, activation='relu', padding='same')
dl2 = Conv2DTranspose(8, (3, 3), strides=2, activation='relu', padding='same')
output_layer = Conv2D(3, (3, 3), activation='sigmoid', padding='same')
autoencoder = Sequential()
autoencoder.add(el1)
autoencoder.add(el2)
autoencoder.add(el3)
autoencoder.add(el4)
autoencoder.add(dl1)
autoencoder.add(dl2)
autoencoder.add(output_layer)
autoencoder.compile(optimizer='adam', loss="binary_crossentropy")
logdir = os.path.join("logs/fit/", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
autoencoder.fit(X_train, X_train, epochs=100, batch_size=64, validation_data=(X_test, X_test), verbose=1,
callbacks=[tensorboard_callback])
After the model was fitted, how can I add the latent layer into tensor-board and view it after running tsne or pca ?

You can follow the guide: Visualizing Data using the Embedding Projector in TensorBoard.
I assumed that by "latent layer" you mean "latent space", i.e the representation of the encoded input.
In your case, if you want to represent your latent space, it's first needed to extract the encoder part from your autoencoder. This can be achieved with the functional API of keras:
# After fitting the autoencoder, we create a model that represents the encoder
encoder = tf.keras.Model(autoencoder.input, autoencoder.get_layer(el4.name).output)
Then, it's possible to calculate the latent representation of your test set using the encoder:
latent_test = encoder(X_test)
Then, by following the guide linked above, the latent representation can be saved in a Checkpoint to be visualized with the Tensorboard projector:
# Save the weights we want to analyze as a variable.
# The weights need to have the shape (Number of sample, Total Dimensions)
# Hence why we flatten the Tensor
weights = tf.Variable(tf.reshape(latent_test,(X_test.shape[0],-1)), name="latent_test")
# Create a checkpoint from embedding, the filename and key are the
# name of the tensor.
checkpoint = tf.train.Checkpoint(latent_test=weights)
checkpoint.save(os.path.join(logdir, "embedding.ckpt"))
from tensorboard.plugins import projector
# Set up config.
config = projector.ProjectorConfig()
embedding = config.embeddings.add()
# The name of the tensor will be suffixed by `/.ATTRIBUTES/VARIABLE_VALUE`.
embedding.tensor_name = "latent_test/.ATTRIBUTES/VARIABLE_VALUE"
projector.visualize_embeddings(logdir, config)
Finally, the projector can be accessed by running the Tensorboard:
$ tensorboard --logdir /path/to/logdir
Finally an image of the projector with PCA (here with some random data):

Related

Keras Shape errors when trying to use pre-trained model

I want to use a pre-trained model (from Keras Applications), with weights, and append my (very simple) CNN model at the end. To this end I am trying to loosely follow the tutorial here under the sub-header 'Fine-tune InceptionV3 on a new set of classes'.
My original simple CNN model was this:
model = Sequential()
model.add(Rescaling(1.0 / 255))
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(256,256,3)))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Flatten())
model.add(Dense(units=5, activation='softmax'))
As I'm following the tutorial, I've converted it as so:
x = base_model.output
x = Rescaling(1.0 / 255)(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(256,256,3))(x)
x = MaxPool2D(pool_size=(2, 2), strides=2)(x)
x = Conv2D(64, kernel_size=(3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2), strides=2)(x)
x = GlobalAveragePooling2D()(x)
predictions = Dense(units=5, activation='softmax')(x)
As you can see, the difference is that the top model is a Sequential() model while the bottom is Functional (I think?), and also, that the Flatten() layer has been replaced with GlobalAveragePooling2D(). I did this because I kept getting shape-related errors and it wasn't compiling. I thought I got it once I replaced the Flatten() layer with the GlobalAveragePooling() as this part of the code finally did compile, however now that I'm trying to train the model, it's giving me the following error:
ValueError: Exception encountered when calling layer "max_pooling2d_7" (type MaxPooling2D).
Negative dimension size caused by subtracting 2 from 1 for '{{node model/max_pooling2d_7/MaxPool}} = MaxPool[T=DT_FLOAT, data_format="NHWC", explicit_paddings=[], ksize=[1, 2, 2, 1], padding="VALID", strides=[1, 2, 2, 1]](model/conv2d_10/Relu)' with input shapes: [?,1,1,64].
Call arguments received:
• inputs=tf.Tensor(shape=(None, 1, 1, 64), dtype=float32)
I don't want to remove the MaxPooling layer as I want this fine-tuned model append to be as close to the 'simple CNN' model I originally had, so that I can compare the two results. But I keep getting hit with these shape errors, which I don't really understand, and it's coming to the end of the day.
Is there a nice quick-fix that can enable this VGG16+simple CNN to work?
the first most important technical problem in your model structure is that you are rescaling images after passed through the base_model, so you should implement it just before the base model
the second one is that you have defined input_shape in the model above in convolution layer while data first pass throught base model, so you should define input layer before base model and then pass its output thorough base_model and the other layers
here i've edited your code:
inputs = Input(shape = (input_shape=(256,256,3))
x = Rescaling(1.0 / 255)(inputs)
x = base_model(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2), strides=2)(x)
x = Conv2D(64, kernel_size=(3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2), strides=2)(x)
x = GlobalAveragePooling2D()(x)
predictions = Dense(units=5, activation='softmax')(x)
model = keras.Model(inputs = [inputs], outputs = [predictions])
And for the error raised, in this case you could set convolution layers padding parameter to 'same' or even resize images to larger size to override the problem.

Keras - Trying to get 'logits' - one layer before the softmax activation function

I'm trying to get the 'logits' out of my Keras CNN classifier.
I have tried the suggested method here: link.
First I created two models to check the implementation :
create_CNN_MNIST CNN classifier that returns the softmax probabilities.
create_CNN_MNIST_logits CNN with the same layers as in (1) with a little twist in the last layer - changed the activation function to linear to return logits.
Both models were fed with the same Train and Test data of MNIST. Then I applied softmax on the logits, I got a different output from the softmax CNN.
I couldn't find a problem in my code. Maybe you could help advise another method to extract 'logits' from the model?
the code:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
def create_CNN_MNIST_logits() :
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='linear'))
# compile model
opt = SGD(learning_rate=0.01, momentum=0.9)
def my_sparse_categorical_crossentropy(y_true, y_pred):
return keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=True)
model.compile(optimizer=opt, loss=my_sparse_categorical_crossentropy, metrics=['accuracy'])
return model
def create_CNN_MNIST() :
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
# compile model
opt = SGD(learning_rate=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
return model
# load data
X_train = np.load('./data/X_train.npy')
X_test = np.load('./data/X_test.npy')
y_train = np.load('./data/y_train.npy')
y_test = np.load('./data/y_test.npy')
#create models
model_softmax = create_CNN_MNIST()
model_logits = create_CNN_MNIST_logits()
pixels = 28
channels = 1
num_labels = 10
# Reshaping to format which CNN expects (batch, height, width, channels)
trainX_cnn = X_train.reshape(X_train.shape[0], pixels, pixels, channels).astype('float32')
testX_cnn = X_test.reshape(X_test.shape[0], pixels, pixels, channels).astype('float32')
# Normalize images from 0-255 to 0-1
trainX_cnn /= 255
testX_cnn /= 255
train_y_cnn = utils.to_categorical(y_train, num_labels)
test_y_cnn = utils.to_categorical(y_test, num_labels)
#train the models:
model_logits.fit(trainX_cnn, train_y_cnn, validation_split=0.2, epochs=10,
batch_size=32)
model_softmax.fit(trainX_cnn, train_y_cnn, validation_split=0.2, epochs=10,
batch_size=32)
On the evaluation stage, I'll do softmax on the logits to check if its the same as the regular model:
#predict
y_pred_softmax = model_softmax.predict(testX_cnn)
y_pred_logits = model_logits.predict(testX_cnn)
#apply softmax on the logits to get the same result of regular CNN
y_pred_logits_activated = softmax(y_pred_logits)
Now I get different values in both y_pred_logits_activated and y_pred_softmax that lead to different accuracy on the test set.
Your models are probably being trained differently, make sure to set the seed prior to both fit commands so that they're initialised the same weights and have the same train/val split. Also, is the softmax might be incorrect:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x)
return e_x / e_x.sum(axis=1)
This is numerically equivalent to subtracting the max (https://stackoverflow.com/a/34969389/10475762), and the axis should be 1 if your matrix is of shape [batch, outputs].

Tensorflow NIH Chest X-ray CNN validation accuracy not improving even with regularization

I’ve been working on a CNN that takes in a 224x224 grayscale xray image and outputs either 0 or 1 based on whether it detects an abnormality.
This is the dataset I am using. I split the dataset into two with 106496 images for training and the remaining 5624 for validation. Since they’re both from the same dataset, they should both come from the same distribution.
I tried training the model I described above using the pretrained InceptionV3 and VGG19 architectures without success. I then tried making my own model similar to the VGG19 architecture.
I simplified the model as much as possible so that the training accuracy was above 90% and added various regularizers such as dropout and l2. I also tried different hyperparameters and image augmentation but the validation accuracy wouldn’t exceed 70% after 5-10 epochs. The validation loss doesn't seem to drop at all either.
Here are my accuracy vs epoch and loss vs epoch curves (pink is train, green in validation):
And here is my code:
def create_model(settings):
"""
Create a basic model
"""
# create model
model = tf.keras.models.Sequential()
model.add(layers.Input((224, 224, 1)))
# block 1
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block1_conv'))
model.add(layers.MaxPool2D((2, 2), strides=(2, 2), name='block1_pool'))
# block 2
model.add(layers.Conv2D(96, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block2_conv'))
model.add(layers.MaxPool2D((2, 2), strides=(2, 2), name='block2_pool'))
# block 3
model.add(layers.Conv2D(192, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block3_conv1'))
model.add(layers.Conv2D(192, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block3_conv2'))
model.add(layers.MaxPool2D((2, 2), strides=(2, 2), name='block3_pool'))
# block 4
model.add(layers.Conv2D(384, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block4_conv1'))
model.add(layers.Conv2D(384, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block4_conv2'))
model.add(layers.Conv2D(384, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block4_conv3'))
model.add(layers.MaxPool2D((2, 2), strides=(2, 2), name='block4_pool'))
# block 5
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block5_conv1'))
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block5_conv2'))
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', use_bias=True, name='block5_conv3'))
model.add(layers.MaxPool2D((2, 2), strides=(2, 2), name='block5_pool'))
# fully connected
model.add(layers.GlobalAveragePooling2D(name='fc_pool'))
model.add(layers.Dropout(0.3, name='fc_dropout'))
model.add(layers.Dense(1, activation='sigmoid', name='fc_output'))
# compile model
model.compile(
optimizers.SGD(
learning_rate=settings["lr_init"],
momentum=settings["momentum"],
),
loss='binary_crossentropy',
metrics=[
'accuracy',
metrics.Precision(),
metrics.Recall(),
metrics.AUC()
]
)
model.summary()
return model
def configure_callbacks(settings):
"""
Create a list of callback objects
"""
# tensorboard
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
# learning rate reduction on plateau
lrreduce_callback = callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=settings["lr_factor"],
patience=settings["lr_patience"],
min_lr=settings["lr_min"],
verbose=1,
)
# save model
checkpoint_callback = callbacks.ModelCheckpoint(
filepath="saves/" + settings["modelname"] + "/cp-{epoch:03d}",
monitor='val_accuracy',
mode='max',
save_weights_only=True,
save_best_only=True,
verbose=1,
)
return [tensorboard_callback, lrreduce_callback, checkpoint_callback]
def get_data(settings):
"""
Create a generator that will be used for training
"""
df=pd.read_csv("dataset/y_train_binary.csv")
columns = [
"Abnormal"
]
datagen = ImageDataGenerator(
rescale=1./255.,
rotation_range=5,
brightness_range=(0.9, 1.1),
zoom_range=(1, 1.1),
)
# 94.983% for training (106496 = 64*6656)
traindata = datagen.flow_from_dataframe(
dataframe=df[:NTRAIN],
directory="dataset/images",
x_col="Image Index",
y_col=columns,
color_mode='grayscale',
batch_size=settings["batchsize"],
class_mode="raw",
target_size=(224,224),
shuffle=True,
)
# 5.017% for testing (5624)
testdata = datagen.flow_from_dataframe(
dataframe=df[NTRAIN:],
directory="dataset/images",
x_col="Image Index",
y_col=columns,
color_mode='grayscale',
batch_size=settings["batchsize"],
class_mode="raw",
target_size=(224,224),
shuffle=True,
)
return (traindata, testdata)
def newtrain(settings):
"""
Create a new model "(modelname)" and train for (epoch) epochs
"""
model = create_model(settings)
callbacks = configure_callbacks(settings)
traindata, testdata = get_data(settings)
# train
model.fit(
x=traindata,
epochs=settings["epoch"],
validation_data=testdata,
callbacks=callbacks,
verbose=1,
)
model.save_weights(f"saves/{settings['modelname']}/cp-{settings['epoch']:03d}")
I’m running out of ideas and it takes half a day to train 50 epochs so I would appreciate if anyone knows how I can solve this issue. Thanks.
If you do some EDA on NIH Chest X-rays you may also see that there is a significant class imbalance issue among 14 classes. By your model definition, I can assume that you put a normal image on one side and an abnormal (13 cases) on the other side. First of all, if this true, I would say, it's better to classify all cases - all are important in clinician practice.
Shift to 14 cases classification
You're using your own design model, but you should first start with the pre-trained model. It's better and next you can gradually integrate your own idea.
Use pretriend model, e.g DenseNet, EfficientNet, NFNet etc
In your data generator, you use shuffle=True for the test set, which is wrong, rather it should be False.
testdata = datagen.flow_from_dataframe(
....
target_size=(224,224),
shuffle=False
For better control of your input pipeline, IMO, you should write your own custom data generator and experiment with advanced augmentation to prevent overfitting stuff.

VGG16 training with colab reutrns WARNING:root:kernel

im still beginner with DL, here im trying to train VGG16 on a list of images to return 2048 features, but my issue was it returns 4094 features instead of 2048, so what i did to solve the issue is to create sequential model and remove one layer to be 2048 but will during the training on google cloab it returns kernel issue as below :
WARNING:root:kernel d3b89e05-3e27-4035-afcd-79a2c35a319a restarted
i have restarted the runtime and make a factorial restart, but still issue exists maybe because I did something wrong I'm my code, this issue happened only in the last code block
def read_images(folder_path, classlbl):
# load all images into a list
images = []
img_width, img_height = 224, 224
class1=[]
for img in os.listdir(folder_path):
img = os.path.join(folder_path, img)
img = load_img(img, target_size=(img_width, img_height))
class1.append(classlbl)# class one.
images.append(img)
return images, class1
#compute features for each image.
def computefeatures(model,image):
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# get extracted features
features = model.predict(image)
return features
model = Sequential()
# second convolutional block
model.add(Conv2D(128, (3,3), strides=(1,1), padding="same", activation="relu"))
model.add(Conv2D(128,(2,2), strides=(1,1), padding="same",activation="relu"))
model.add(MaxPooling2D((2, 2), strides=(2,2)))
# third convolutional block
model.add(Conv2D(256, (3,3), strides=(1,1), padding="same", activation="relu"))
model.add(Conv2D(256,(3,3), strides=(1,1), padding="same",activation="relu"))
model.add(Conv2D(256,(3,3), strides=(1,1), padding="same",activation="relu"))
model.add(MaxPooling2D((2, 2), strides=(2,2)))
# third convolutional block
model.add(Conv2D(512, (3,3), strides=(1,1), padding="same", activation="relu"))
model.add(Conv2D(512,(3,3), strides=(1,1), padding="same",activation="relu"))
model.add(Conv2D(512,(3,3), strides=(1,1), padding="same",activation="relu"))
model.add(MaxPooling2D((2, 2), strides=(2,2)))
# third convolutional block
model.add(Conv2D(512, (3,3), strides=(1,1), padding="same", activation="relu"))
model.add(Conv2D(512,(3,3), strides=(1,1), padding="same",activation="relu"))
model.add(Conv2D(512,(3,3), strides=(1,1), padding="same",activation="relu"))
#DNN Backend
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
#Output layer for classification (1000 classes)
model.add(Dense(1000, activation='softmax'))
#use the categorical cross entropy loss function for a multi-class classifier.
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# call the image read and
folder_path = '/content/imageDir/101_ObjectCategories/windsor_chair'
classlbl=0
images, class1 =read_images(folder_path, classlbl)
# call the fucntion to compute the features for each image.
list_features1=[]
list_features1 = np.empty((0,4096), float)# create an empty array with 0 row and 4096 columns this number from fature
# extraction from vg16
for img in range(len(images)):
f2=computefeatures(model,images[img]) # compute features forea each image
list_features1 = np.append(list_features1, f2, axis=0)
One of the most popular reasons leads to restart kernel itself is out of memory (CPU RAM). It seems like because of the read_images function, you were trying to read and store too many image files at the same time.
You may want to try ImageDataGenerator API for reading images in small batches.
OR
You can simply change your read_images function as below:
def read_images(folder_path, classlbl):
# load all images into a list
images = []
img_width, img_height = 224, 224
class1=[]
for img in os.listdir(folder_path):
img = os.path.join(folder_path, img)
img = load_img(img, target_size=(img_width, img_height))
yield img, classlbl ## <--- this line
And use it like this
result =read_images(folder_path, classlbl)
# call the fucntion to compute the features for each image.
list_features1=[]
list_features1 = np.empty((0,4096), float)# create an empty array with 0 row and 4096 columns this number from fature
# extraction from vg16
for img, _ in result: # <--- change this line
f2=computefeatures(model, img) # compute features forea each image
list_features1 = np.append(list_features1, f2, axis=0)

Pretrained Tensorflow model RGB -> RGBY channel extension

I am working on the protein analysis project. We receive the images* of proteins with 4 filters (Red, Green, Blue and Yellow). Every of those RGBY channels contains unique data as different cellular structures are visible with different filters.
The idea is to use a pre-trained network e.g. VGG19 and extend the number of channels from default 3 to 4. Something like this:
(My appologies, I am not allowed to add images directly before 10 reputation, please press the "Run code snippet" button to visualize):
<img src="https://i.stack.imgur.com/TZKka.png" alt="Italian Trulli">
Picture: VGG model with RGB extended to RGBY
The Y channel should be the copy of the existing pretrained channel. Then it is possible to make use of the pretrained weights.
Does anyone have an idea of how such extension of a pretrained network can be achieved?
*
Author of the collage - Allunia from Kaggle, "Protein Atlas - Exploration and Baseline" kernel.
Use the layer.get_weights() and layer.set_weights() functions of Keras api.
Create a template structure for 4-layers VGG (set input shape=(width, height, 4)). Then load the weights from 3-channel RGB model into 4-channel as RGBB.
Below is the code that does the procedure. In case of sequential VGG, the only layer that needs to be modified is the first Convolution layer. The structure of the subsequent layers is independent on the number of channels.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from keras.applications.vgg19 import VGG19
from keras.models import Model
vgg19 = VGG19(weights='imagenet')
vgg19.summary() # To check which layers will be omitted in 'pretrained' model
# Load part of the VGG without the top layers into 'pretrained' model
pretrained = Model(inputs=vgg19.input, outputs=vgg19.get_layer('block5_pool').output)
pretrained.summary()
#%% Prepare model template with 4 input channels
config = pretrained.get_config() # run config['layers'][i] for reference
# to restore layer-by layer structure
from keras.layers import Input, Conv2D, MaxPooling2D
from keras import optimizers
# For training from scratch change kernel_initializer to e.g.'VarianceScaling'
inputs = Input(shape=(224, 224, 4), name='input_17')
# block 1
x = Conv2D(64, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block1_conv1')(inputs)
x = Conv2D(64, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block1_conv2')(x)
x = MaxPooling2D(pool_size=(2, 2), name='block1_pool')(x)
# block 2
x = Conv2D(128, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block2_conv1')(x)
x = Conv2D(128, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block2_conv2')(x)
x = MaxPooling2D(pool_size=(2, 2), strides=(2,2), name='block2_pool')(x)
# block 3
x = Conv2D(256, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block3_conv1')(x)
x = Conv2D(256, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block3_conv2')(x)
x = Conv2D(256, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block3_conv3')(x)
x = Conv2D(256, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block3_conv4')(x)
x = MaxPooling2D(pool_size=(2, 2), strides=(2,2), name='block3_pool')(x)
# block 4
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block4_conv1')(x)
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block4_conv2')(x)
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block4_conv3')(x)
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block4_conv4')(x)
x = MaxPooling2D(pool_size=(2, 2), strides=(2,2), name='block4_pool')(x)
# block 5
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block5_conv1')(x)
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block5_conv2')(x)
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block5_conv3')(x)
x = Conv2D(512, (3,3), padding='same', activation='relu', kernel_initializer='zeros', name='block5_conv4')(x)
x = MaxPooling2D(pool_size=(2, 2), strides=(2,2), name='block5_pool')(x)
vgg_template = Model(inputs=inputs, outputs=x)
vgg_template.compile(optimizer=optimizers.RMSprop(lr=2e-4),
loss='categorical_crossentropy',
metrics=['acc'])
#%% Rewrite the weight loading/modification function
import numpy as np
layers_to_modify = ['block1_conv1'] # Turns out the only layer that changes
# shape due to 4th channel is the first
# convolution layer.
for layer in pretrained.layers: # pretrained Model and template have the same
# layers, so it doesn't matter which to
# iterate over.
if layer.get_weights() != []: # Skip input, pooling and no weights layers
target_layer = vgg_template.get_layer(name=layer.name)
if layer.name in layers_to_modify:
kernels = layer.get_weights()[0]
biases = layer.get_weights()[1]
kernels_extra_channel = np.concatenate((kernels,
kernels[:,:,-1:,:]),
axis=-2) # For channels_last
target_layer.set_weights([kernels_extra_channel, biases])
else:
target_layer.set_weights(layer.get_weights())
#%% Save 4 channel model populated with weights for futher use
vgg_template.save('vgg19_modified_clear.hdf5')
Beyond the RGBY case, the following snippet works generally by copying or removing the layer's weights and/or biases vectors dimensions as needed. Please refer to numpy documentation on what numpy.resize does: in the case of the original question it copies the B-channel weights onto the Y-channel (or more generally onto any higher dimensionality).
import numpy as np
import tensorflow as tf
...
model = ... # your RGBY model is here
pretrained_model = tf.keras.models.load_model(...) # pretrained RGB model
# the following assumes that the layers match with the two models and
# only the shapes of weights and/or biases are different
for pretrained_layer, layer in zip(pretrained_model.layers, model.layers):
pretrained = pretrained_layer.get_weights()
target = layer.get_weights()
if len(pretrained) == 0: # skip input, pooling and other no weights layers
continue
try:
# set the pretrained weights as is whenever possible
layer.set_weights(pretrained)
except:
# numpy.resize to the rescue whenever there is a shape mismatch
for idx, (l1, l2) in enumerate(zip(pretrained, target)):
target[idx] = np.resize(l1, l2.shape)
layer.set_weights(target)