Tensorboard graph orphan layers - tensorflow

While building a model that includes transfer learning (from VGG-16).
I encounter this strange behavior. Tensorboard graph shows the layers which are not part of the new model but part of the old, above the point of seperation, and they are just dangling there.
When investigating further, model.summary() does not show these layers, model.get_layer("block4_conv1") can't find them either, and the keras tf.keras.utils.plot_model doesn't show them too. but if they are not part of the graph, how would tensorboard know about them?
To build the new model, I used the recommended method.
Model first stage:
vgg_input_model = tf.keras.applications.VGG16(weights='imagenet', include_top=False, input_tensor=x)
final_vgg_kayer = vgg_input_model.get_layer("block3_pool")
input_model = tf.keras.Model(inputs=vgg_input_model.inputs, outputs=final_vgg_kayer.output)
input_model.trainable = True
x = tf.keras.layers.Conv2D(512, 1, padding="same", activation='relu', name="stage0_final_conv1")(input_model.output)
x = tf.keras.layers.Conv2D(512, 1, padding="same", activation='relu', name="stage0_final_conv2")(x)
x = tf.keras.layers.Conv2D(256, 1, padding="same", activation='relu', name="stage0_final_conv3")(x)
x = tf.keras.layers.Conv2D(128, 1, padding="same", activation='relu', name="stage0_final_conv4")(x)
TF:2.1 (nightly-2.x)
PY:3.5
Tensorboard: 2.1.0a20191124

After trying multiple methods, I came to the conclusion that the recommended way is wrong. doing model_b=tf.keras.Model(inputs=model_a.inputs,outputs=model_a.get_layet("some_layer").output) will lead to dangling layers from model_a.
Using tf.keras.backend.clear_session() in between may cleans the keras graph, but tensorboard's graph is left empty then.
The best solution I found is config+weights copy of the required model, layer by layer.
And rebuilding the connections in a new model. that way there is no relationship whatsoever in the Keras graph between the two models.
(This is simple for a sequential model like VGG, but might be more difficult for something like ResNet)
Sample code:
tf.keras.backend.clear_session()
input_shape = (368, 368, 3) #only the input shape is shared between the models
#transfer learning model definition
input_layer_vgg = tf.keras.layers.Input(shape=input_shape)
vgg_input_model = tf.keras.applications.VGG16(weights='imagenet', include_top=False, input_tensor=input_layer_vgg)
name_last_layer = "block3_pool" #the last layer to copy
tf.keras.backend.clear_session() #clean the graph from the transfer learning model
input_layer = tf.keras.layers.Input(shape=input_shape) #define the input layer for the first model
x=input_layer
for layer in vgg_input_model.layers[1:]: #copy over layers, without the other input layer
config=layer.get_config() #get config
weights=layer.get_weights() #get weights
#print(config)
copy_layer=type(layer).from_config(config) #create the new layer from config
x=copy_layer(x) #connect to previous layers,
#required for the proper sizing of the layer,
#set_weights will not work without it
copy_layer.set_weights(weights)
if layer.name == name_last_layer:
break
del vgg_input_model
input_model=tf.keras.Model(inputs=input_layer,outputs=x) #create the new model,
#if needed x can be used further doen the line

Related

Keras feature extractor clarification - which layers does an input goes through

When extracting a model layer output as in the Tensorflow sequential model document example below, does the input x in the code go through the my_first_layer as well before going into my_intermediate_layer layer? Or does it directly go into the my_intermediate_layer layer without going through the my_first_layer layer?
If it directly goes into the my_intermediate_layer, the input to the my_intermediate_layer does not have the transformation done by my_first_layer Conv2D. However, it seems not right to me because the input should go through all the preceding layers.
Please help understand what layers does x go through?
Feature extraction with a Sequential model
initial_model = keras.Sequential(
[
keras.Input(shape=(250, 250, 3)),
layers.Conv2D(32, 5, strides=2, activation="relu", name="my_first_layer"),
layers.Conv2D(32, 3, activation="relu", name="my_intermediate_layer"),
layers.Conv2D(32, 3, activation="relu"),
]
)
# The model goes through the training.
...
# Feature extractor
feature_extractor = keras.Model(
inputs=initial_model.inputs,
outputs=initial_model.get_layer(name="my_intermediate_layer").output,
)
# Call feature extractor on test input.
x = tf.ones((1, 250, 250, 3))
features = feature_extractor(x)
Keras offers higher level of API, which runs on top of the TensorFlow machine learning platform. Keras offers two types of class to define the neural network model, namely 'Sequential Class' and 'Model Class.'
Sequential Class:
It groups a linear stack of layers to form a model, such that each layer has one input and one output tensor. One can add required layers to the defined model (schema-1) as shown below to execute sequentially as name suggests Keras Sequential Class,
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(8, input_shape=(16,)))
model.add(tf.keras.layers.Dense(4))
The schema for defining a sequential model Keras-Sequential Class Definition has shown below (schema-2),
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential(
[
layers.Dense(2, activation="relu", name="layer1"),
layers.Dense(3, activation="relu", name="layer2"),
layers.Dense(4, name="layer3"),
]
)
# Call model on a test input
x = tf.ones((3, 3))
y = model(x)
Model Class
It allows the user to build a custom model along with many layers as shown below,
import tensorflow as tf
inputs = tf.keras.Input(shape=(3,))
x = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
It allows one to create a new functional API model with additional layers Keras - Model Class as follows,
inputs = keras.Input(shape=(None, None, 3))
processed = keras.layers.RandomCrop(width=32, height=32)(inputs)
conv = keras.layers.Conv2D(filters=2, kernel_size=3)(processed)
pooling = keras.layers.GlobalAveragePooling2D()(conv)
feature = keras.layers.Dense(10)(pooling)
Note: The input tensors supports only dicts, lists or tuples but not lists of list, or dicts of dict.
I hope that this helps.

Using only certain layers from pretrained network for transfer learning

I am building a model for human face segmentation into skin and non-skin area. As a model, I am using the model/method shown here as a starting point and adding a dense layer at the end with sigmoid activation. The model works very well for my purpose, giving good dice metric score. The model uses 2 pre-trained layers from Resnet50 as a model backbone for feature detection. I have read several articles, books and code but couldn't find any information on how to determine which layer to choses for feature extraction.
I compared the Resnet50 architecture with Xception and picked up two similar layers, replaced the layer in the original network (here) and ran the training. I got similar results, not better not worse.
I have the following questions
How to determine which layer is responsible for low-level/high-level features?
Does using only pre-trained layers any better than using full pre-trained networks in terms of training time and the number of trainable parameters?
where can I find more information about using only layers from pre-trained networks?
here is the code for quick over-view
def DeeplabV3Plus(image_size, num_classes):
model_input = keras.Input(shape=(image_size, image_size, 3))
resnet50 = keras.applications.ResNet50(
weights="imagenet", include_top=False, input_tensor=model_input)
x = resnet50.get_layer("conv4_block6_2_relu").output
x = DilatedSpatialPyramidPooling(x)
input_a = layers.UpSampling2D(size=(image_size // 4 // x.shape[1], image_size // 4 // x.shape[2]), interpolation="bilinear")(x)
input_b = resnet50.get_layer("conv2_block3_2_relu").output
input_b = convolution_block(input_b, num_filters=48, kernel_size=1)
x = layers.Concatenate(axis=-1)([input_a, input_b])
x = convolution_block(x)
x = convolution_block(x)
x = layers.UpSampling2D(size=(image_size // x.shape[1], image_size // x.shape[2]), interpolation="bilinear")(x)
model_output = layers.Conv2D(num_classes, kernel_size=(1, 1), padding="same")(x)
return keras.Model(inputs=model_input, outputs=model_output)
And here is my modified code using Xception layers as the backbone
def DeeplabV3Plus(image_size, num_classes):
model_input = keras.Input(shape=(image_size, image_size, 3))
Xception_model = keras.applications.Xception(
weights="imagenet", include_top=False, input_tensor=model_input)
xception_x1 = Xception_model.get_layer("block9_sepconv3_act").output
x = DilatedSpatialPyramidPooling(xception_x1)
input_a = layers.UpSampling2D(size=(image_size // 4 // x.shape[1], image_size // 4 // x.shape[2]), interpolation="bilinear")(x)
input_a = layers.AveragePooling2D(pool_size=(2, 2))(input_a)
xception_x2 = Xception_model.get_layer("block4_sepconv1_act").output
input_b = convolution_block(xception_x2, num_filters=256, kernel_size=1)
x = layers.Concatenate(axis=-1)([input_a, input_b])
x = convolution_block(x)
x = convolution_block(x)
x = layers.UpSampling2D(size=(image_size // x.shape[1], image_size // x.shape[2]),interpolation="bilinear")(x)
x = layers.Conv2D(num_classes, kernel_size=(1, 1), padding="same")(x)
model_output = layers.Dense(x.shape[2], activation='sigmoid')(x)
return keras.Model(inputs=model_input, outputs=model_output)
Thanks in advance!
In general, the first layers (the ones closer to the input) are the one responsible for learning high-level features, whereas the last layers are more dataset/task-specific. This is the reason why, when transfer learning, you usually want to delete only the last few layers to replace them with others which can deal with your specific problem.
It depends. Transfering the whole network, without deleting nor adding any new layer, basically means that the network won't learn anything new (unless you are not freezing the layers - in that case you are fine tuning). On the other hand, if you delete some layers and add a few more, than you the number of trainable parameters only depend on the new layers you just added.
What I suggest you to do is:
Delete a few layers from a pre-trained network, freeze these layers and add a few more layers (even just one)
Train the new network with a certain learning rate (usually this learning rate is not very low)
Fine Tune!: unfreeze all the layers, lower the learning rate, and re-train the whole network

Trying to visualize activations in Tensorflow

I created a simple cnn to detect custom digits and I am trying to visualize the activations of my layers. When I run the following code layer_outputs = [layer.output for layer in model.layers[:9]] I get the error Layer conv2d has no inbound nodes
When I searched online, it said to define input shape of first layer, but I've already done that and I'm not sure why that is happening. Below is my model.
class myModel(Model):
def __init__(self):
super().__init__()
self.conv1 = Conv2D(filters=32, kernel_size=(3,3), activation='relu', padding='same',
input_shape=(image_height, image_width, num_channels))
self.maxPool1 = MaxPool2D(pool_size=(2,2))
self.conv2 = Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same')
self.maxPool2 = MaxPool2D(pool_size=(2,2))
self.conv3 = Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same')
self.maxPool3 = MaxPool2D(pool_size=(2,2))
self.flatten = Flatten()
self.d1 = Dense(128, activation='relu')
self.d2 = Dense(10, activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.maxPool1(x)
x = self.conv2(x)
x = self.maxPool2(x)
x = self.conv3(x)
x = self.maxPool3(x)
x = self.flatten(x)
x = self.d1(x)
x = self.d2(x)
return x
Based on your stated goal and what you've posted, I believe the problem here is slightly (and very understandably) misunderstanding the way the TensorFlow APIs work. The model object and its constituent parts only store state for the model, not the evaluation of it, for example the hyperparameters you've set and the parameters the model learns when its fed training data. Even if you worked to fix the problem with what you're trying, the .output of the layer objects as part of the model wouldn't return the activations you want to visualize. It instead returns the part of the TensorFlow graph that represents that part of the computation.
For what you want to do, you'll need to manipulate an object that's the result of calling the .predict function on the model that you've set up and trained. Or you could drop down to below the Keras abstractions and manipulate the tensors directly.
If I gave this more thought, there's probably a reasonably elegant way to get this by only evaluating your graph (i.e., calling .predict) once, but the most obvious naïve way is simply to instantiate several new models (or several subclasses of your model) with each of the layers of interest as the terminal output, which should get you what you want.
For example, you could do something like this for each of the layers whose outputs you're interested in:
my_test_image = # get an image
input = Input(shape=(None, 256, 256, 3)) # input size will need to be set according to the relevant model
outputs_of_interest = Model(input, my_model.layers[-2].output)
outputs_of_interest.predict(my_test_image) # <=== this has the output you want

'Channels first' training accuracy very low compared to 'channels last'

My issue:
I am trying to train a semantic segmentation model in tf.keras, in fact it works very well when I am using channels_last (WHC) mode (it reaches 96%+ val acc). I wanted to train it in channels_first (CHW) mode so the weights are compatible with TensorRT. When I do this, the ~80% training accuracy in the first few epochs dips down to around 0.020% and stays there permanently.
It is useful to know that the base of my model is a tf.keras.applications.MobileNet() model with the pre-trained 'imagenet' weights. (Model architecture at the bottom.)
The transformation process:
I used the guidelines provided and I change only a few things here:
Set tf.keras.backend.set_image_data_format() to 'channels_first'.
I change the channel order in the input tensor from: input_tensor=Input(shape=(376, 672, 3)) to: input_tensor=Input(shape=(3, 376, 672))
In my image preprocessing (using tf.data.Dataset), i use tf.transpose(img, perm=[2, 0, 1]) on both my input image and one-hot encoded mask to change the channel orders. I checked this with equality assertion to make sure its correct and it seems to be fine.
When I change these the training starts fine but as I said the training accuracy goes down to almost zero. When I revert back everything's fine again.
Possible leads:
What am I doing wrong or what could be the problematic part here? My suspicions are around these questions:
Are the pre-trained imageNet weights changed to the 'channels_first' order also when I set the backend? Is this something I should consider at all?
Could it be that the tf.transpose() function messes up the mask's one-hot encoding? (I have 3 classes represented by 3 colors: lane, opposing lane, background)
Maybe I am not seeing something obvious. I can provide further code and answers as needed.
EDIT:
08/17: This is still an ongoing issue, I have tried several things:
I checked if the image and the mask is correct after the transpose with numpy assertion, seems correct.
I suspected that the loss function calculates on the wrong axis, so I customized the loss function for the first axis (where the channels are). Here it is:
def ReverseAxisLoss(y_true, y_pred):
return K.categorical_crossentropy(y_true, y_pred, from_logits=True, axis=1)
My main suspicion is that the 'channels first' backend setting does nothing to transpose the pretrained 'imagenet' weights for the mobilenet part. Is there an updated way for TF2.x / Keras to transpose the pre-trained weights into CHW format?
Here is the architecture that I use (the skipNet() is the head network and the mobilenet is the base, and it is connected in the create_model() function)
def skipNet(encoder_output, feed1, feed2, classes):
# random initializer and regularizer
stddev = 0.01
init = RandomNormal(stddev=stddev)
weight_decay = 1e-3
reg = l2(weight_decay)
score_feed2 = Conv2D(kernel_size=(1, 1), filters=classes, padding="SAME",
kernel_initializer=init, kernel_regularizer=reg)(feed2)
score_feed2_bn = BatchNormalization()(score_feed2)
score_feed1 = Conv2D(kernel_size=(1, 1), filters=classes, padding="SAME",
kernel_initializer=init, kernel_regularizer=reg)(feed1)
score_feed1_bn = BatchNormalization()(score_feed1)
upscore2 = Conv2DTranspose(kernel_size=(4, 4), filters=classes, strides=(2, 2),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg)(encoder_output)
height_pad1 = ZeroPadding2D(padding=((1,0),(0,0)))(upscore2)
upscore2_bn = BatchNormalization()(height_pad1)
fuse_feed1 = add([score_feed1_bn, upscore2_bn])
upscore4 = Conv2DTranspose(kernel_size=(4, 4), filters=classes, strides=(2, 2),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg)(fuse_feed1)
height_pad2 = ZeroPadding2D(padding=((0,1),(0,0)))(upscore4)
upscore4_bn = BatchNormalization()(height_pad2)
fuse_feed2 = add([score_feed2_bn, upscore4_bn])
upscore8 = Conv2DTranspose(kernel_size=(16, 16), filters=classes, strides=(8, 8),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg, activation="softmax")(fuse_feed2)
return upscore8
def create_model(classes):
base_model = tf.keras.applications.MobileNet(input_tensor=Input(shape=IMG_SHAPE),
include_top=False,
weights='imagenet')
conv4_2_output = base_model.get_layer(index=43).output
conv3_2_output = base_model.get_layer(index=30).output
conv_score_output = base_model.output
head_model = skipNet(conv_score_output, conv4_2_output, conv3_2_output, classes)
for layer in base_model.layers:
layer.trainable = False
model = Model(inputs=base_model.input, outputs=head_model)
return model

Tensorflow CNN: Accessing the data from the Convolution layer to be used in the Lambda layer

I'm working on creating a Lambda layer after the Convolution Layer using Tensorflow 2.x.
I have a function named "custom_layer" which takes in the tensor output from the previous convolution layer.
I need to extract each feature maps of the convolution layer from this tensor and perform mathematical operations.
Finally, the outputs have to be combined into a single tensor and returned to be used in the next layer.
#Lamba layer
def custom_layer(tensor):
# perform operation on individual feature maps
# return the combined tensor output
return tensor
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(filters= 64, kernel_size= (3,3), input_shape = (28,28,1), activation = 'relu', name = 'conv2D_1'),
tf.keras.layers.Lambda(custom_layer, name="lambda_layer"),
tf.keras.layers.Conv2D(filters= 64, kernel_size= (3,3), activation = 'relu', name = 'conv2D_2'),
tf.keras.layers.MaxPooling2D(pool_size = (2,2), name = 'MaxPool2D_1'),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation= 'softmax')])
Using tf.print(tensor) I was able to view the tensor output (feature maps). But I'm not able to figure out a method to access those individual feature maps.
The issue is solved. I used tf.py_function() within the custom_layer(tensor) function.