Image Segmentation Tensorflow tutorials - tensorflow

In this tf tutorial, the U-net model has been divided into 2 parts, first contraction where they have used Mobilenet and it is not trainable. In second part, I'm not able to understand what all layers are being trained. As far as I could see, only the last layer conv2dTranspose seems trainable. Am I right?
And if I am how could only one layer is able to do such a complex task as segmentation?
Tutorial link: https://www.tensorflow.org/tutorials/images/segmentation

The code for the Image Segmentation Model, from the Tutorial is shown below:
def unet_model(output_channels):
inputs = tf.keras.layers.Input(shape=[128, 128, 3])
x = inputs
# Downsampling through the model
skips = down_stack(x)
x = skips[-1]
skips = reversed(skips[:-1])
# Upsampling and establishing the skip connections
for up, skip in zip(up_stack, skips):
x = up(x)
concat = tf.keras.layers.Concatenate()
x = concat([x, skip])
# This is the last layer of the model
last = tf.keras.layers.Conv2DTranspose(
output_channels, 3, strides=2,
padding='same') #64x64 -> 128x128
x = last(x)
return tf.keras.Model(inputs=inputs, outputs=x)
First part of the Model is Downsampling uses not the entire Mobilenet Architecture but only the Layers,
'block_1_expand_relu', # 64x64
'block_3_expand_relu', # 32x32
'block_6_expand_relu', # 16x16
'block_13_expand_relu', # 8x8
'block_16_project'
of the Pre-Trained Model, Mobilenet, which are non-trainable.
Second part of the Model (which is of your interest), before the layer, Conv2DTranspose is Upsampling part, which is present in the list,
up_stack = [
pix2pix.upsample(512, 3), # 4x4 -> 8x8
pix2pix.upsample(256, 3), # 8x8 -> 16x16
pix2pix.upsample(128, 3), # 16x16 -> 32x32
pix2pix.upsample(64, 3), # 32x32 -> 64x64
]
It means that it is accessing a Function named upsample from the Module, pix2pix. The code for the Module, pix2pix is present in this Github Link.
Code for the function, upsample is shown below:
def upsample(filters, size, norm_type='batchnorm', apply_dropout=False):
"""Upsamples an input.
Conv2DTranspose => Batchnorm => Dropout => Relu
Args:
filters: number of filters
size: filter size
norm_type: Normalization type; either 'batchnorm' or 'instancenorm'.
apply_dropout: If True, adds the dropout layer
Returns:
Upsample Sequential Model
"""
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(
tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
padding='same',
kernel_initializer=initializer,
use_bias=False))
if norm_type.lower() == 'batchnorm':
result.add(tf.keras.layers.BatchNormalization())
elif norm_type.lower() == 'instancenorm':
result.add(InstanceNormalization())
if apply_dropout:
result.add(tf.keras.layers.Dropout(0.5))
result.add(tf.keras.layers.ReLU())
return result
This means that the second part of the Model comprises of the Upsampling Layers, whose functionality is defined above, with the Number of Filters being 512, 256, 128 and 64.

Related

Extracting gradient of Keras Embedding layer

I want to extract the gradient of a RNN model starting with an embedding layer using Tensorflow's GradientTape (using tensorflow 1.14 with eager execution). The model is a simple LSTM binary classifier, which is trained with a binary crossentropy loss:
inputs = Input(name='inputs', shape=[150])
layer = Embedding(2000, 50, input_length=150)(inputs)
layer = LSTM(64)(layer)
layer = Dense(256, name='FC1')(layer)
layer = Activation('relu')(layer)
layer = Dropout(0.5)(layer)
layer = Dense(1, name='out_layer')(layer)
layer = Activation('sigmoid')(layer)
model = Model(inputs=inputs, outputs=layer)
GradientTape should return "... a list or nested structure of Tensors (or IndexedSlices, or None, or CompositeTensor), one for each element in sources". What is the correct way to use it to recover (and apply) the gradient?
I tried the following code:
with tf.GradientTape() as tape:
y_ = model(inputs)
loss_value = BinaryCrossEntropy()(y_true=targets, y_pred=y_)
grads = tape.gradient(loss_value, model.trainable_variables)
# some custom processing
optimizer = RMSprop(learning_rate=0.001, name="context")
optimizer.apply_gradients(list(zip(grads, model.trainable_variables)), name="context")
I would expect the returned gradient to be of size (2000,50), i.e., the shape of weights for the embedding layer. Instead, it takes a size that depends on the batch size, and cannot be used (at least with the code above) with apply_gradient. Changing the number of inputs consistently changes the first dimension of the gradient to batch_size * 150, while the shape of the trainable variables stays correct. If using 8 inputs, for example, I get the following result:
input shape: (8, 150), output shape: (8, 1)
model.trainable_variables shapes: (2000, 50),(50, 256),(64, 256),(256,),(64, 256),(256,),(256, 1),(1,)
tape.gradient shapes: (1200, 50),(50, 256),(64, 256),(256,),(64, 256),(256,),(256, 1),(1,)
With a batch size of 32, the first compunent would be (4800, 50), and so on. This doesn't match my understanding of GradientTape.gradient, since the returned gradient doesn't have the same size as the sources parameter. What did I miss?

How to stop training CNN part while continue training ANN part in a Multi-input Model?

I made a multi-input model in Keras which takes image shape=[N, 640, 480, 3] as well as numerical data shape=[N, 19] and does prediction on 12 classes.
Following is the model defining part of code:
# # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# # MODEL === CNN
# # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#
base_model = keras.applications.ResNet50(
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(640, 480, 3),
include_top=False) # Do not include the ImageNet classifier at the top.
base_model.trainable = False
input_Cnn = keras.Input(shape=(640, 480, 3))
x = base_model(input_Cnn, training=False)
# Convert features of shape `base_model.output_shape[1:]` to vectors
x = keras.layers.GlobalAveragePooling2D()(x)
# A Dense classifier with a single unit (binary classification)
x1 = keras.layers.Dense(1024, activation="relu")(x)
out_Cnn = keras.layers.Dense(12, activation="relu")(x1)
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# MODEL === NN
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
inp_num = keras.layers.Input(shape=(19,)) # no. of columns of the numerical data
fc1 = keras.layers.Dense(units=2 ** 6, activation="relu")(inp_num)
fc2 = keras.layers.Dense(units=2 ** 8, activation="relu")(fc1)
fc3 = keras.layers.Dense(units=2 ** 10, activation="relu")(fc2)
fc4 = keras.layers.Dense(units=2 ** 8, activation="relu")(fc3)
fc5 = keras.layers.Dense(units=2 ** 6, activation="relu")(fc4)
out_NN = keras.layers.Dense(12, activation="relu")(fc5)
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# CONCATENATION
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
result = keras.layers.concatenate((out_Cnn, out_NN), axis=-1) # [N, 12] --- concatenate [N, 12] ==> [N, 24]
result = keras.layers.Dense(1024, activation='relu')(result)
result = keras.layers.Dense(units=12, activation="softmax")(result)
model = keras.Model([input_Cnn, inp_num], result)
print(model.summary())
Problem is that the CNN part (if independently trained) trains in a less number of epochs while the ANN part (if independently trained) takes a longer time (more epochs). But here in this code when both are combined, accuracy doesn't go beyond 10%. Is there any way to stop gradients flowing into the CNN part after a certain number of epochs so that after that model trains only the ANN part?
Im not using keras but after a quick google search this should be the answer:
You can freeze layers, so that certain parameters are not learnable anymore:
# this freezes the first N layers
for layer in model.layers[:N]:
layer.trainable = False
Where N is the amount of convolutional layers you have.

Is there a method in tensorflow.keras.layers to get the output shape of a specific layer before running the model?

I am building a CNN with Keras using a TensorFlow backend with the following structure:
# Create the Second Model in Ensemble
def createModel(self, model_input, n_outputs, first_session=True):
if first_session != True:
model = load_model('ideal_model.hdf5')
return model
# Define Input Layer
inputs = model_input
# Define Max Pooling Layer
conv = MaxPooling2D(pool_size=(3, 3), padding='same')(inputs)
# Define Layer Normalization Layer
conv = LayerNormalization()(inputs)
# Define Leaky ReLU Layer
conv = LeakyReLU(alpha=0.1)(conv)
# Define Dropout Layer
conv = Dropout(0.2)(conv)
# Define First Conv2D Layer
conv = Conv2D(filters=64,
kernel_size=(3, 3),
activation='relu',
padding='same',
strides=(3, 2))(conv)
conv = Dropout(0.3)(conv)
# Define Second Conv2D Layer
conv = Conv2D(filters=32,
kernel_size=(5, 5),
activation='relu',
padding='same',
strides=(3, 2))(conv)
conv = Dropout(0.3)(conv)
# Define Softmax Layer
conv = Softmax(axis=1)(conv)
# Define Reshape Layer
conv = Reshape((conv._keras_shape[1]*conv._keras_shape[2]*conv._keras_shape[3],))(conv)
# Define Sigmoid Dense Layer
conv = Dense(64, activation='sigmoid')(conv)
# Define Output Layer
outputs = Dense(n_outputs, activation='softmax')(conv)
# Create Model
model = Model(inputs, outputs)
model.summary()
return model
Currently, I am running into a bit of trouble since I am trying to use a Reshape layer to flatten the tensor, and I am trying to avoid hard-coding the dimensions of the output from the previous layer into the Reshape layer, if possible. (Note: Flatten layers are not supported by the kernels in the FPGA on which the program will ultimately run, so I cannot use them.) The above code produces the following error:
AttributeError: 'Tensor' object has no attribute '_keras_shape'
This occurs because I had to import the layers using tensorflow.keras.layers (as opposed to keras.layers) due to the LayerNormalization layer at the beginning of the model architecture.
So, I was wondering if there is a method to get the output shape of a specific layer in tensorflow.keras.layers before compiling the model.
conv.shape or maybe tf.shape(conv)

How to compute saliency map using keras backend

I am trying to construct a basic "vanilla gradient" saliency heatmap (gradient-based feature attribution) for MNIST using keras. I know there are libraries such as this one to compute saliency heatmaps, but I would like to construct this from scratch since the vanilla gradient approach seems conceptually straightforward to implement. I have trained the following digit classifier in Keras using functional model definition:
input = layers.Input(shape=(28,28,1), name='input')
conv2d_1 = layers.Conv2D(32, kernel_size=(3, 3), activation='relu')(input)
maxpooling2d_1 = layers.MaxPooling2D(pool_size=(2, 2), name='maxpooling2d_1')(conv2d_1)
conv2d_2 = layers.Conv2D(64, kernel_size=(3, 3), activation='relu')(maxpooling2d_1)
maxpooling2d_2 = layers.MaxPooling2D(pool_size=(2, 2))(conv2d_2)
flatten = layers.Flatten(name='flatten')(maxpooling2d_2)
dropout = layers.Dropout(0.5, name='dropout')(flatten)
dense = layers.Dense(num_classes, activation='softmax', name='dense')(dropout)
model = keras.models.Model(inputs=input, outputs=dense)
Now, I want to compute the saliency map for a single MNIST image. Since the final layer has a softmax activation and the denominator is a normalization term (so that the output nodes add up to 1), I believe that I need to either take the pre-softmax output or change the activation of the trained model linear for computing saliency maps. I will do the latter.
model.layers[-1].activation = tf.keras.activations.linear # swap activation to linear
input = loaded_model.layers[0].input
output = loaded_model.layers[-1].output
input_image = x_test[0] # shape is (28, 28, 1)
pred = np.argmax(loaded_model.predict(np.expand_dims(input_image, axis=0))) # predicted class
However, I am not sure what to do beyond this. I know I can use the following K.gradients(output, input) to compute gradients. That being said, I believe I should compute the gradient of the predicted class with respect to the input image, versus computing the gradient of the entire output. How would I do this? Also, I'm not sure how to evaluate the saliency heatmap for a specific image/prediction. I imagine I will have to use sess = tf.keras.backend.get_session() and sess.run(), but not sure exactly. I would greatly appreciate any help with completing the saliency heatmap code. Thanks!
If you add the activation as a single layer after the last dense layer with:
keras.layers.Activation('softmax')
you can do:
linear_model = keras.Model(input=model, output=model.layers[-2].output)
To then compute the gradients like:
def get_saliency_map(model, image, class_idx):
with tf.GradientTape() as tape:
tape.watch(image)
predictions = model(image)
loss = predictions[:, class_idx]
# Get the gradients of the loss w.r.t to the input image.
gradient = tape.gradient(loss, image)
# take maximum across channels
gradient = tf.reduce_max(gradient, axis=-1)
# convert to numpy
gradient = gradient.numpy()
# normaliz between 0 and 1
min_val, max_val = np.min(gradient), np.max(gradient)
smap = (gradient - min_val) / (max_val - min_val + keras.backend.epsilon())
return smap

a problem using LSTM network (neural networks)

im trying to create a speaker diarization system using lstm (im trying to make the network tell the difference between speakers).
this is the model i've created:
model = Sequential()
model.add(LSTM(768, batch_input_shape=(39, 40, 1), return_sequences=True))
model.add(Dense(256))
model.add(LSTM(768, return_sequences=True))
model.add(Dense(256))
model.add(LSTM(768, return_sequences=True))
model.add(Dense(4))
there are 4 different speakers.
in my dataset i have the array 'features' (256 at length for 256 speech segments).
for each segment in 'features' i have 39 vectors to represent each segment and each of these vectors is at size 40.
each of these 39 vectors is extracted from a different time window. (i used log mel filterbank energies).
i also have the array 'lables' which is also 256 at length and contains the lables for each segment.
i used 'to_categorical' for it:
labels = tf.keras.utils.to_categorical(labels, num_classes=4)
i tried using a generator to feed it to the network but it didnt work.
this is the class i used:
class KerasBatchGenerator(object):
def __init__(self, features, batch_size, labels):
self.features = features
self.batch_size = batch_size
self.labels = labels
def generate(self):
while True:
for i in self.labels:
for j in self.features:
temp = [j, i]
# temp = np.expand_dims(temp, axis=1)
temp = np.expand_dims(temp, axis=2)
yield tuple(temp)
and the code i used to run the network is:
train_data_generator = KerasBatchGenerator(features, batch_size, labels)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit_generator(train_data_generator.generate(), 100, 1)
please help!!!
If i guessed correctly, you want to classify which input in spoke by which speaker.
In that case your final layer should have a shape (batch_size, numOfClasses) or (39, 4)
But if you take close look at the summary the output shape for final layer is (39, 40, 4)
to get the proper shape remove the argument return_sequences=True from last LSTM layer.