Related
I have finetuned a model and I have saved all weights with name checkpoint_vgg16.h5. I want to apply gradCAM on my finetuned model (not pretrained VGG model) and if it required that using something for GradCAM while training phase, I didn't use anything.
Implementation details like that:
base_model = keras.applications.VGG16(include_top=False, input_shape=(128, 88, 3))
model1 = Sequential()
for layer in base_model.layers:
model1.add(layer)
model1.add(Flatten())
model1.add(Dense(4096, activation="relu"))
model1.add(Dense(124, activation="softmax"))
model1.load_weights("checkpoint_vgg16.h5")
model = Model(model1.input, model1.layers[-1].output)
GradCAM class:
class GradCAM:
def __init__(self, model, classIdx, layerName=None):
# store the model, the class index used to measure the class
# activation map, and the layer to be used when visualizing
# the class activation map
self.model = model
self.classIdx = classIdx
self.layerName = layerName
# if the layer name is None, attempt to automatically find
# the target output layer
if self.layerName is None:
self.layerName = self.find_target_layer()
def find_target_layer(self):
# attempt to find the final convolutional layer in the network
# by looping over the layers of the network in reverse order
for layer in reversed(self.model.layers):
# check to see if the layer has a 4D output
if len(layer.output_shape) == 4:
return layer.name
# otherwise, we could not find a 4D layer so the GradCAM
# algorithm cannot be applied
raise ValueError("Could not find 4D layer. Cannot apply GradCAM.")
def compute_heatmap(self, image, eps=1e-8):
# construct our gradient model by supplying (1) the inputs
# to our pre-trained model, (2) the output of the (presumably)
# final 4D layer in the network, and (3) the output of the
# softmax activations from the model
gradModel = Model(
inputs=[self.model.inputs],
outputs=[self.model.get_layer(self.layerName).output, self.model.output])
# record operations for automatic differentiation
with tf.GradientTape() as tape:
# cast the image tensor to a float-32 data type, pass the
# image through the gradient model, and grab the loss
# associated with the specific class index
inputs = tf.cast(image, tf.float32)
(convOutputs, predictions) = gradModel(inputs)
loss = predictions[:, tf.argmax(predictions[0])]
print(loss)
# use automatic differentiation to compute the gradients
grads = tape.gradient(loss, convOutputs)
print(grads)
# compute the guided gradients
castConvOutputs = tf.cast(convOutputs > 0, "float32")
castGrads = tf.cast(grads > 0, "float32")
guidedGrads = castConvOutputs * castGrads * grads
# the convolution and guided gradients have a batch dimension
# (which we don't need) so let's grab the volume itself and
# discard the batch
convOutputs = convOutputs[0]
guidedGrads = guidedGrads[0]
# compute the average of the gradient values, and using them
# as weights, compute the ponderation of the filters with
# respect to the weights
weights = tf.reduce_mean(guidedGrads, axis=(0, 1))
cam = tf.reduce_sum(tf.multiply(weights, convOutputs), axis=-1)
# grab the spatial dimensions of the input image and resize
# the output class activation map to match the input image
# dimensions
(w, h) = (image.shape[2], image.shape[1])
heatmap = cv2.resize(cam.numpy(), (w, h))
# normalize the heatmap such that all values lie in the range
# [0, 1], scale the resulting values to the range [0, 255],
# and then convert to an unsigned 8-bit integer
numer = heatmap - np.min(heatmap)
denom = (heatmap.max() - heatmap.min()) + eps
heatmap = numer / denom
heatmap = (heatmap * 255).astype("uint8")
# return the resulting heatmap to the calling function
return heatmap
def overlay_heatmap(self, heatmap, image, alpha=0.5,
colormap=cv2.COLORMAP_VIRIDIS):
# apply the supplied color map to the heatmap and then
# overlay the heatmap on the input image
heatmap = cv2.applyColorMap(heatmap, colormap)
output = cv2.addWeighted(image, alpha, heatmap, 1 - alpha, 0)
# return a 2-tuple of the color mapped heatmap and the output,
# overlaid image
return (heatmap, output)
For the prediction part:
image = cv2.imread('sample.jpg')
image = image.astype('float32') / 255
image = np.expand_dims(image, axis=0)
preds = model.predict(image)
i = np.argmax(preds[0])
icam = GradCAM(model, i, 'block5_conv3') #block5_conv3: the last conv block in my model
print(icam)
heatmap = icam.compute_heatmap(image)
heatmap = cv2.resize(heatmap, (128, 88))
image = cv2.imread('sample.jpg')
(heatmap, output) = icam.overlay_heatmap(heatmap, image, alpha=0.5)
fig, ax = plt.subplots(1, 3)
ax[0].imshow(heatmap)
ax[1].imshow(image)
ax[2].imshow(output)
However, I get following error:
castGrads = tf.cast(grads > 0, "float32") TypeError: '>' not supported
between instances of 'NoneType' and 'int'
The grads = tape.gradient(loss, convOutputs) code line which belongs to compute_heatmap method in GradCAM returns None.
How can I fix this error? I want to apply GradCAM on my model.
Ref. Grad-CAM class activation visualization
Also, this question is similar to mine but, solution didn't work for me.
How do I multiply tf.keras.layers with tf.Variable?
Context: I am creating a sample dependent convolutional filter, which consists of a generic filter W that is transformed through sample dependent shifting + scaling. Therefore, the convolutional original filter W is transformed into aW + b where a is sample dependent scaling and b is sample dependent shifting. One application of this is training an autoencoder where the sample dependency is the label, so each label shifts/scales the convolutional filter. Because of sample/label dependent convolutions, I am using tf.nn.conv2d which takes the actual filters as input (as opposed to just the number/size of filters) and a lambda layer with tf.map_fn to apply a different "transformed filter" (based on the label) for each sample. Although the details are different, this kind of sample-dependent convolution approach is discussed in this post: Tensorflow: Convolutions with different filter for each sample in the mini-batch.
Here is what I am thinking:
input_img = keras.Input(shape=(28, 28, 1))
label = keras.Input(shape=(10,)) # number of classes
num_filters = 32
shift = layers.Dense(num_filters, activation=None, name='shift')(label) # (32,)
scale = layers.Dense(num_filters, activation=None, name='scale')(label) # (32,)
# filter is of shape (filter_h, filter_w, input channels, output filters)
filter = tf.Variable(tf.ones((3,3,input_img.shape[-1],num_filters)))
# TODO: need to shift and scale -> shift*(filter) + scale along each output filter dimension (32 filter dimensions)
I am not sure how to implement the TODO part. I was thinking of tf.keras.layers.Multiply() for scaling and tf.keras.layers.Add() for shifting, but they do not seem to work with tf.Variable to my knowledge. How do I get around this? Assuming the dimensions/shape broadcasting work out, I would like to do something like this (note: the output should still be the same shape as var and is just scaled along each of the 32 output filter dimensions)
output = tf.keras.layers.Multiply()([var, scale])
It requires some work and needs a custom layer. For example you cannot use tf.Variable with tf.keras.Lambda
class ConvNorm(layers.Layer):
def __init__(self, height, width, n_filters):
super(ConvNorm, self).__init__()
self.height = height
self.width = width
self.n_filters = n_filters
def build(self, input_shape):
self.filter = self.add_weight(shape=(self.height, self.width, input_shape[-1], self.n_filters),
initializer='glorot_uniform',
trainable=True)
# TODO: Add bias too
def call(self, x, scale, shift):
shift_reshaped = tf.expand_dims(tf.expand_dims(shift,1),1)
scale_reshaped = tf.expand_dims(tf.expand_dims(scale,1),1)
norm_conv_out = tf.nn.conv2d(x, self.filter*scale + shift, strides=(1,1,1,1), padding='SAME')
return norm_conv_out
Using the layer
import tensorflow as tf
import tensorflow.keras.layers as layers
input_img = layers.Input(shape=(28, 28, 1))
label = layers.Input(shape=(10,)) # number of classes
num_filters = 32
shift = layers.Dense(num_filters, activation=None, name='shift')(label) # (32,)
scale = layers.Dense(num_filters, activation=None, name='scale')(label) # (32,)
conv_norm_out = ConvNorm(3,3,32)(input_img, scale, shift)
print(norm_conv_out.shape)
Note: Note that I haven't added bias. You will need bias as well for the convolution layer. But that's straightfoward.
I have had some considerable trouble in getting my TensorFlow model to actually run on my own input data.
I am drawing images from labeled directories. I have two classes of images, "good" and "bad," which are stored in their own respective directories.
I read them in, using TensorFlow's built-in list_files(glob), and process them with strictly TensorFlow operations. However, now that I am trying to run my model, it ceases to run on the first epoch and outputs the error code: tensorflow/core/framework/op_kernel.cc:1730] OP_REQUIRES failed at cast_op.cc:123 : Unimplemented: Cast string to float is not supported
My code is as follows:
import numpy as np
import matplotlib.pyplot as plt
import pathlib
import tensorflow as tf
from tensorflow.keras import layers, models
import random
import os
class E6Classifier:
image_size = 256
train_test_proportion = .8
current_working_directory = pathlib.Path.cwd()
data_directory = current_working_directory.parent / 'Jupyter Notebooks' / 'Tensorflow' / 'e6Classifier' / 'data'
categories = ['good', 'bad']
good_images = []
bad_images = []
batch_size = 32
total_dataset = None
train_dataset = None
test_dataset = None
def __init__(self):
self.read_images()
self.print_statistics()
#TensorFlow implementation
self.make_tensorflow_dataset()
self.train_test_split()
self.create_model()
self.train_model()
def read_images(self):
good_path = self.data_directory / self.categories[0]
bad_path = self.data_directory / self.categories[1]
filetypes = ('*.jpg', '*.png')
for filetype in filetypes:
self.good_images.extend(good_path.glob(filetype))
self.bad_images.extend(bad_path.glob(filetype))
def print_statistics(self):
self.num_good_images = len(self.good_images)
self.num_bad_images = len(self.bad_images)
self.total_images = self.num_good_images + self.num_bad_images
self.proportion_good = round(self.num_good_images / self.total_images * 100, 2)
print(str(self.total_images) + ' total images | ' + str(self.num_good_images) + ' good images, ' + str(self.num_bad_images) + ' bad images | ' + str(self.proportion_good) + ' percent good to bad')
def make_tensorflow_dataset(self):
directory_strings = []
for filetype in ['*.jpg', '*.png']:
directory_strings.append(str(self.data_directory / 'good' / filetype))
directory_strings.append(str(self.data_directory / 'bad' / filetype))
list_dataset = tf.data.Dataset.list_files(directory_strings)
labeled_dataset = list_dataset.map(self.process_tensor_path)
self.total_dataset = labeled_dataset
def process_tensor_path(self, filepath):
label = tf.strings.split(filepath, os.sep)[-2]
image = tf.io.read_file(filepath)
image = tf.image.decode_image(image, channels = 3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize_with_pad(image, target_width = self.image_size, target_height = self.image_size)
return image, label
def train_test_split(self):
num_training_images = int(round(self.total_images * self.train_test_proportion,0))
self.total_dataset.shuffle(buffer_size = self.total_images)
self.train_dataset = self.total_dataset.take(num_training_images)
self.test_dataset = self.total_dataset.skip(num_training_images)
def create_model(self):
#Batch datasets
self.train_dataset = self.train_dataset.batch(self.batch_size, drop_remainder = True)
self.test_dataset = self.test_dataset.batch(self.batch_size, drop_remainder = True)
self.total_dataset = self.total_dataset.batch(self.batch_size, drop_remainder = True)
#Create model
self.model = models.Sequential()
#Add a convolutional layer to detect features in the image
self.model.add(layers.Conv2D(32, (3, 3), activation = 'relu', input_shape = (self.image_size, self.image_size, 3)))
#Add a pooling layer to remove sensitivity to position in the image of the feature
self.model.add(layers.MaxPooling2D((2, 2)))
#Repeat ad nauseum
self.model.add(layers.Conv2D(64, (3, 3), activation = 'relu'))
self.model.add(layers.MaxPooling2D(2, 2))
self.model.add(layers.Conv2D(64, (3, 3), activation = 'relu'))
self.model.add(layers.MaxPooling2D(2, 2))
self.model.add(layers.Conv2D(64, (3, 3), activation = 'relu'))
self.model.add(layers.MaxPooling2D(2, 2))
self.model.add(layers.Conv2D(64, (3, 3), activation = 'relu'))
#Add dense layers
self.model.add(layers.Flatten())
self.model.add(layers.Dense(64, activation = 'relu'))
self.model.add(layers.Dense(2))
#Compile the model for training
self.model.compile(optimizer = 'adam', loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits = True), metrics = ['accuracy'])
def train_model(self):
#self.train_dataset.repeat()
#self.test_dataset.repeat()
self.model.fit(self.train_dataset, epochs = 1, validation_data = self.test_dataset, verbose = True)
def main():
E6Classifier()
if __name__ == '__main__':
main()
I don't understand what is causing this error to be thrown, and there doesn't seem to be a tremendous amount of information about where the call is actually failing. I have looked at the datatypes of my TensorFlow datasets, and they indicate they are tuples of Tensors with types float32 and string, respectively.
Is this an issue with having strings for category names? If so, how would I go about replacing the category names with numbers?
You want to classify you images into two categories: good and bad. Since training a network involves calculating a loss value (which is a numberical value) and then backpropagating it to update the weights (also numerical values), you should have numerical outputs and labels.
Convert your 'good' label to eg. 1, and the 'bad' label to 0. You can do this in tensorflow (see the example below) or in you folder structure (rename the folders and modify your code accordingly).
string_label = tf.strings.split(filepath, os.sep)[-2]
label = tf.constant(1.) if tf.math.equal(string_label, tf.constant('good', dtype=tf.string)) else tf.constant(0.)
For tensorflow version 2.10.0 the accepted answer produced the following error:
OperatorNotAllowedInGraphError: Using a symbolic tf.Tensor as a Python bool is not allowed in Graph execution. Use Eager execution or decorate this function with #tf.function.
This can be rectified by using the tf.cond function as so:
str_label = tf.strings.split(file_path, os.sep)[-2]
label = tf.cond(tf.math.equal(str_label, tf.constant('land', dtype=tf.string)),
lambda:tf.constant(1, shape=(1,)),
lambda:tf.constant(0, shape=(1,)))
n.b. including the shape was required for some use cases so I included here.
The following code is copied from a GAN MNIST tutorial on UDEMY. When I run the code, it converges towards creating images with a large white area in the center that is black at the sides (picture an empty filled circle against a black background). I have no idea what the problem is as I have only done what the tutorial told me to do word for word. The only difference is that I extract the MNIST data differently. Is there something about tensorflow that has changed recently?
import tensorflow as tf
import numpy as np
import gzip
from PIL import Image
import os.path
def extract_data(filename, num_images):
"""Extract the images into a 4D tensor [image index, y, x, channels].
Values are rescaled from [0, 255] down to [-0.5, 0.5].
"""
print('Extracting', filename)
with gzip.open(filename) as bytestream:
bytestream.read(16)
buf = bytestream.read(28 * 28 * num_images)
data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
#data = (data - (PIXEL_DEPTH / 2.0)) / PIXEL_DEPTH
data = data.reshape(num_images, 28, 28, 1)
return data
fname_img_train = extract_data('../Data/MNIST/train-images-idx3-ubyte.gz', 60000)
def generator(z, reuse=None):
with tf.variable_scope('gen',reuse=reuse):
hidden1 = tf.layers.dense(inputs=z,units=128)
alpha = 0.01
hidden1=tf.maximum(alpha*hidden1,hidden1)
hidden2=tf.layers.dense(inputs=hidden1,units=128)
hidden2 = tf.maximum(alpha*hidden2,hidden2)
output=tf.layers.dense(hidden2,units=784, activation=tf.nn.tanh)
return output
def discriminator(X, reuse=None):
with tf.variable_scope('dis',reuse=reuse):
hidden1=tf.layers.dense(inputs=X,units=128)
alpha=0.01
hidden1=tf.maximum(alpha*hidden1,hidden1)
hidden2=tf.layers.dense(inputs=hidden1,units=128)
hidden2=tf.maximum(alpha*hidden2,hidden2)
logits=tf.layers.dense(hidden2,units=1)
output=tf.sigmoid(logits)
return output, logits
real_images=tf.placeholder(tf.float32,shape=[None,784])
z=tf.placeholder(tf.float32,shape=[None,100])
G = generator(z)
D_output_real, D_logits_real = discriminator(real_images)
D_output_fake, D_logits_fake = discriminator(G,reuse=True)
def loss_func(logits_in,labels_in):
return tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits_in,labels=labels_in))
D_real_loss = loss_func(D_logits_real,tf.ones_like(D_logits_real)*0.9)
D_fake_loss = loss_func(D_logits_fake,tf.zeros_like(D_logits_real))
D_loss = D_real_loss + D_fake_loss
G_loss = loss_func(D_logits_fake,tf.ones_like(D_logits_fake))
learning_rate = 0.001
tvars = tf.trainable_variables()
d_vars= [var for var in tvars if 'dis' in var.name]
g_vars = [var for var in tvars if 'gen' in var.name]
D_trainer = tf.train.AdamOptimizer(learning_rate).minimize(D_loss,var_list=d_vars)
G_trainer = tf.train.AdamOptimizer(learning_rate).minimize(G_loss,var_list=g_vars)
batch_size=100
epochs=30
set_size=60000
init = tf.global_variables_initializer()
samples=[]
def create_image(img, name):
img = np.reshape(img, (28, 28))
print("before")
print(img)
img = (np.multiply(np.divide(np.add(img, 1.0), 2.0),255.0).astype(np.int16))
print("after")
print(img)
im = Image.fromarray(img.astype('uint8'))
im.save(name)
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
np.random.shuffle(fname_img_train)
num_batches=int(set_size/batch_size)
for i in range(num_batches):
batch = fname_img_train[i*batch_size:((i+1)*batch_size)]
batch_images = np.reshape(batch, (batch_size,784))
batch_images = batch_images*2.0-1.0
batch_z = np.random.uniform(-1,1,size=(batch_size,100))
_ = sess.run(D_trainer, feed_dict={real_images:batch_images,z:batch_z})
_ = sess.run(G_trainer,feed_dict={z:batch_z})
print("ON EPOCH {}".format(epoch))
sample_z = np.random.uniform(-1,1,size=(batch_size,100))
gen_sample = sess.run(G,feed_dict={z:sample_z})
create_image(gen_sample[0], "img"+str(epoch)+".png")
As far as I can see, you are not normalizing the training data. Instead of using your extract_data() function, it is much easier to do the following:
from tensorflow.keras.datasets.mnist import load_data()
(train_data, train_labels), _ = load_data()
train_data /= 255.
Besides, usually people sample twice from the latent space each epoch: once for the discriminator and once for the generator. Still, it did not seem to make a difference.
After implementing these changes, using a batch size of 200 and training for 100 epochs, I got the following result: gen_sample. The result is pretty bad, but it is definitely better than an "empty filled circle against a black background".
Note that the architecture of the generator and of the discriminator that you are using is very simple. From my experience, stacking some convolutional layers gives perfect results. In addition, I would not use the tf.maximum() function, since it creates discontinuities that may negatively impact the flow of the gradients.
Finally, instead of your create_image() function, I used the following:
def plot_mnist(samples, name):
fig = plt.figure(figsize=(6,6))
gs = gridspec.GridSpec(6,6)
gs.update(wspace=0.05, hspace=0.05)
for i, sample in enumerate(samples):
ax = plt.subplot(gs[i])
plt.axis('off')
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.set_aspect('equal')
plt.imshow(sample.reshape(28,28), cmap='Greys_r')
plt.savefig('{}.png'.format(name))
plt.close()
There are many different ways of improving the quality of a GAN model, and the majority of those techniques can be easily found online. Please let me know if you have any specific question.
Posting here to check if there's anything wrong with my implementation of a simple semantic segmentation model in TensorFlow. This code represents a sanity check I'm doing with just a single image from the database, for which I'm trying to overfit the model.
It is a binary classification problem with each image pixel mapped to [0,1] in the ground truth label.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
img = plt.imread('image.png') #Image of size [750,750,3]
img = plt.imread('map.png') # Ground Truth of size [750, 750]
img = np.expand_dims(img, 0)
lab = np.expand_dims(lab, 0)
w1 = tf.Variable(tf.constant(0.001, shape=[3,3,3,32]))
b1 = tf.Variable(tf.constant(0.0, shape=[32]))
w2 = tf.Variable(tf.constant(0.001, shape=[3,3,32,2]))
b2 = tf.Variable(tf.constant(0.0, shape=[2]))
mul = tf.nn.conv2d(img, w1, strides=[1,1,1,1], padding='SAME')
bias_add = tf.add(mul, b1)
conv1 = tf.nn.relu(bias_add)
mul2 = tf.nn.conv2d(conv1, w2, strides=[1,1,1,1], padding='SAME')
bias_add2 = tf.add(mul2, b2)
conv2 = tf.nn.relu(bias_add2)
sess = tf.InteractiveSession()
lab = lab.astype('int32')
conv2_out = tf.reshape(conv2, [-1, 2])
lab = np.reshape(lab, [-1])
prediction = tf.nn.softmax(pred) # I use this to visualize prediction of the model, and calculate accuracy
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(conv2_out, lab))
optimizer = tf.train.AdamOptimizer(0.001).minimize(loss)
correct_pred = tf.equal(tf.argmax(prediction, 1), lab)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.int32)
tf.initialize_all_variables().run()
step = 1
iter = 5
while step < iter:
sess.run(optimizer, feed_dict={x: img, y: lab})
loss_val,acc = sess.run([loss,accuracy], feed_dict={x: img, y: lab})
print ("Iter:"+ str(step) +" Loss : " + "{:.6f}".format(loss_val)#+ " Accuracy : " + "{:.6f}".format(acc))
step += 1
print ("optimization finished!")
prediction_logits = prediction.eval()
weights = w1.eval() # first layer learned weights
prediction_logits = np.reshape(prediction_logits, [750,750,2])
plt.figure() # Plotting original image with predicted labels
plt.imshow(img[0,:,:,:])
plt.imshow(prediction_logits[:,:,0], cmap=plt.cm.binary)
plt.show()
plt.figure() # Plotting first layer weights
for i in range(32):
plt.subplot(8,4,i+1)
plt.imshow(weights[:,:,:,i])
plt.show()
When I run this (as an interactive session), just to train the model to overfit on this single image, the loss minimizes, but my accuracy doesn't seem to change. I'm not quite sure I understand how the tf.argmax function works or if I've implemented it correctly - and the accuracy sticks to a single value no matter how many iterations.
Thoughts? Also, am I going about plotting the figure and predicted label correctly, or are there any errors here? (any other errors as well - or best practices I'm not following, do point them out)
Additionally, what is the recommended way to implement a regularization over the weights? I found tf.contrib.layers.l2_regularizer to be a feasible option - how do I include it in this scenario, though? A simple sum with the loss function?