This question is about ensuring the prediction time input images to be in the same range as the images fed during the training time. I know that it's the usual practice to repeat the same steps that were done during the training time to process an image at the prediction time. But in my case, I apply random_trasnform() function inside a custom data generator during the training time, which won't make sense to add during the prediction time.
import cv2
import tensorflow as tf
import seaborn as sns
To simplify my problem, assume I'm doing the following changes to a grayscale image that I read in a custom data generator.
img_1 is an output of the data generator, that is supposed to be the input to a VGG19 model.
# using a simple augmenter
augmenter = tf.keras.preprocessing.image.ImageDataGenerator(
brightness_range=(0.75, 1.25),
preprocessing_function=tf.keras.applications.vgg19.preprocess_input # preprocessing function of VGG19
)
# read the image
img = cv2.imread('sphx_glr_plot_camera_001.png')
# add a random trasnform
img_1 = augmenter.random_transform(img)/255
The above random_tranform() has made the grayscale value distribution to be as follows (between [0,1]):
plt.imshow(img_1); plt.show();
sns.histplot(img_1[:, :, 0].ravel()); # select the 0th layer and ravel because the augmenter stacks 3 layers of the grayscale image to make it an RGB image
Now, I want to do the same in the prediction time, but, I don't want a random transform applied to the image so I just pass the input image through the preprocessing_function().
# read image
img = cv2.imread('sphx_glr_plot_camera_001.png')
# pass through the preprocessing function
img_2 = tf.keras.applications.vgg19.preprocess_input(img)/255
But I'm unable to make the input to be in the range of the [0, 1] as was done during the training.
plt.imshow(img_2); plt.show();
sns.histplot(img_2[:, :, 0].ravel());
This makes the predictions completely incorrect. How can I make sure that the inputs to the model at the prediction time undergo the same steps so that they end up having a similar distribution to the inputs that were fed during training? I don't want to add a random_transform() at the prediction time as well.
I will recommend to add an per image standardization in your model this will ensure you that the mean of the image is 0 and standard deviation is 1 in you training set and in your inference
The Context:
I am creating a test application that largely replicates the functionality described here.
I was able to run the code found in the tutorial linked above, and I see losses and accuracies that are reasonable, even after just a couple of epochs.
Tutorial Code: Early into the training of the two-headed CNN, losses and accuracy look good
This is because the code starts with the VGG16 model and the already trained weights, and it freezes those layers so that no learning is required for the core classification.
My test code largely replicates the tutorial structure. It uses the exact same dataset, and the already-trained VGG16 weights. However I load the image dataset using generators (rather than pulling all data into memory, as the tutorial does).
You can find how I created those generators in the answer provided here. I had struggled for a while, before I finally got it to a point that I think is correct.
The Problem:
When I train my model the classification loss and accuracy are as expected, however the bounding box loss grows, and the bounding box accuracy does not improve, over the epochs.
My Code: Even after just a couple epochs you see the bounding box loss starting to grow
Further Details:
I've spent a lot of time looking at the (image, target) tuples yielded by the generator, and I think I am handling the yielded data properly (including the unitrect).
A pycharm view of the images and target tuples yielded by generator
In fact I've also added a debug mode that allows me to display the images and rectangles fed into the training session.
A motorcycle with the bounding box as computed from the unit rectangle bounding box loaded from CSV into the dataframe (df); df is an input to flow_from_dataframe
The model I am using:
imodel = tf.keras.applications.vgg16.VGG16(weights=None, include_top=False,
input_tensor=Input(shape=(224, 224, 3)))
imodel.load_weights(weights, by_name=True)
imodel.trainable = False
# flatten the max-pooling output of VGG
flatten = imodel.output
flatten = Flatten()(flatten)
# construct a fully-connected layer header to output the predicted
# bounding box coordinates
bboxHead = Dense(128, activation="relu")(flatten)
bboxHead = Dense(64, activation="relu")(bboxHead)
bboxHead = Dense(32, activation="relu")(bboxHead)
bboxHead = Dense(4, activation="sigmoid",
name="bounding_box")(bboxHead)
# construct a second fully-connected layer head, this one to predict
# the class label
softmaxHead = Dense(512, activation="relu")(flatten)
softmaxHead = Dropout(0.5)(softmaxHead)
softmaxHead = Dense(512, activation="relu")(softmaxHead)
softmaxHead = Dropout(0.5)(softmaxHead)
softmaxHead = Dense(len(classes), activation="softmax",
name="class_label")(softmaxHead)
# put together our model which accept an input image and then output
# bounding box coordinates and a class label
model = Model(
inputs=imodel.input,
outputs=(bboxHead, softmaxHead))
# define a dictionary to set the loss methods -- categorical
# cross-entropy for the class label head and mean absolute error
# for the bounding box head
losses = {
"class_label": "categorical_crossentropy",
"bounding_box": "mean_squared_error",
}
# define a dictionary that specifies the weights per loss (both the
# class label and bounding box outputs will receive equal weight)
lossWeights = {
"class_label": 1.0,
"bounding_box": 1.0
}
# initialize the optimizer, compile the model, and show the model
# summary
opt = Adam(lr=learning_rate)
model.compile(loss=losses, optimizer=opt, metrics=["accuracy"], loss_weights=lossWeights)
My call to "fit"
model.fit(x=train_generator[0], steps_per_epoch=train_generator[1],
validation_data=validation_generator[0], validation_steps=validation_generator[1],
epochs=epochs, verbose=1)
The weights that I load I've used in other experiments and downloaded them from kaggle - (see vgg16_weights_tf_dim_ordering_tf_kernels.h5).
My Generator:
def generate_image_generator(generator, data_directory, df, subset, target_size, batch_size, shuffle, seed):
genImages = generator.flow_from_dataframe(dataframe=df, directory=data_directory, target_size=target_size,
x_col="file",
y_col=['cls_onehot', 'bbox'],
subset=subset,
class_mode="multi_output",
batch_size=batch_size, shuffle=shuffle, seed=seed)
while True:
images, labels = genImages.next()
targets = {
'class_label': labels[0],
'bounding_box': np.array(labels[1], dtype="float32")
}
yield images, targets
def get_train_and_validate_generators(self, data_directory, files, max_images, validation_split, shuffle, seed, target_size):
generator = ImageDataGenerator(validation_split=validation_split,
rescale=1./255.)
df = get_dataframe(data_directory, files)
if max_images:
df = df.head(max_images)
train_generator = generate_image_generator(generator, data_directory, df, "training",
target_size,
self.batch_size,
shuffle, seed)
valid_generator = generate_image_generator(generator, data_directory, df, "validation",
target_size,
self.batch_size,
shuffle, seed)
Loading the dataframe from a list of CSV
def get_dataframe(data_directory, files):
frames=[]
for di in files:
df = pd.read_csv(data_directory+di["file"])
frames.append(df)
df = pd.concat(frames)
df['cls_onehot'] = df['cls'].str.get_dummies().values.tolist()
df['bbox'] = df[['sxu', 'syu', 'exu', 'eyu']].values.tolist()
return df
A snippet of the CSV:
id,file,sx,sy,ex,ey,cls,sxu,syu,exu,eyu,w,h
0,motorcycle.0001.jpg,31,19,233,141,motorcycle,0.1183206106870229,0.11801242236024845,0.8893129770992366,0.8757763975155279,262,161
1,motorcycle.0002.jpg,32,15,232,142,motorcycle,0.12167300380228137,0.09259259259259259,0.8821292775665399,0.8765432098765432,263,162
2,motorcycle.0003.jpg,30,20,234,143,motorcycle,0.11406844106463879,0.12269938650306748,0.8897338403041825,0.8773006134969326,263,163
3,motorcycle.0004.jpg,30,15,231,132,motorcycle,0.11450381679389313,0.1,0.8816793893129771,0.88,262,150
4,motorcycle.0005.jpg,31,19,232,145,motorcycle,0.1183206106870229,0.1144578313253012,0.8854961832061069,0.8734939759036144,262,166
When I load weights from "imagenet", rather than use those I received from kaggle, I see the very same increase in bounding box loss
imodel = tf.keras.applications.vgg16.VGG16(weights="imagenet", include_top=False,
input_tensor=Input(shape=(224, 224, 3)))
The Question:
Please provide suggestions on how to isolate this bounding box loss growth problem.
Ok. It looks like the problem was not at all with my generator. The code was fine except for one silly oversight. I still had an old call to compile running. I called compile correctly the first time with the composite loss function. Then I called it a second time strictly with categorical cross entropy as the cost, effectively ignoring my bounding boxes.
Anyways, if someone stumbles on this post, I hope they find the complete view of how to do classification and object detection, with a generator function, useful.
I edited the above question with the correct details.. so it now reflects the right answer.
I'd still like to get the perspective of the experts who have had to dig into the workings of a model to better understand the underlying details that lead to loss calculation.
Now that I'm starting to understand tensorflow at the high-level, its clear how to recognize when things are working.. its not clear how to diagnose when things aren't working.
Can I add a dimension to an image within flow_from_directory() pipeline or do I need to write my own implementation for this? Can I use it on ingested data post factum or does it pull images into the memory at the time of training?
Thanks
The generator pulls in the batches of images as need by model.fit so it does not try to store them all in memory to avoid an OOM(out of memory) error. The generator does have a preprocessing_function which I tried to use to expand the dimensions but it throws an error so that will not work. Guess you have to create your own generator to accomplish what you wish. Think what I show below will work
train_dir=r'c:\temp\people\test' # point this to your training directory
# put in your data into the ImageDataGenerator and flow_from_directory my images are 128,128,3 color images
train_gen=ImageDataGenerator(rescale=1.0/255).flow_from_directory( train_dir, target_size=(128, 128),
batch_size=10, seed=123,
class_mode='categorical', color_mode='rgb',shuffle=True)
def img_gen(input_gen,axis):
images, labels=next(train_gen) # get the next batch of images and labels
images=np.expand_dims(images,axis=axis) # expand dimensions of the images
yield (images, labels ) # output a batch of tuples to model.fit
# showing dimension of images have been expanded
images, labels=next(img_gen(train_gen, axis=4))
print (images.shape)
images will now have shape (10,128,128,3,1) where 10 was the batch size I used
In text processing there is embedding to show up (if I understood it correctly) the database words as vector (after dimension reduction).
now, I am wondering, is there any method like this to show extracted features via CNN?
for example: consider we have a CNN and train and test sets. we want to train the CNN with train set and meanwhile see the extracted features (from dense layer) corresponding class labels via CNN in the embedding section of tensorboard.
the purpose of this work is seeing the features of input data in every batch and understand how close or far are they from together. and finally, in the trained model, we can find out accuracy of our classifier (like softmax or etc.).
thank you in advance for your help.
I have taken help of Tensorflow documentation.
For in depth information on how to run TensorBoard and make sure you are logging all the necessary information, see TensorBoard: Visualizing Learning.
To visualize your embeddings, there are 3 things you need to do:
1) Setup a 2D tensor that holds your embedding(s).
embedding_var = tf.get_variable(....)
2) Periodically save your model variables in a checkpoint in LOG_DIR.
saver = tf.train.Saver()
saver.save(session, os.path.join(LOG_DIR, "model.ckpt"), step)
3) (Optional) Associate metadata with your embedding.
If you have any metadata (labels, images) associated with your embedding, you can tell TensorBoard about it either by directly storing a projector_config.pbtxt in the LOG_DIR, or use our python API.
For instance, the following projector_config.ptxt associates the word_embedding tensor with metadata stored in $LOG_DIR/metadata.tsv:
embeddings {
tensor_name: 'word_embedding'
metadata_path: '$LOG_DIR/metadata.tsv'
}
The same config can be produced programmatically using the following code snippet:
from tensorflow.contrib.tensorboard.plugins import projector
# Create randomly initialized embedding weights which will be trained.
vocabulary_size = 10000
embedding_size = 200
embedding_var = tf.get_variable('word_embedding', [vocabulary_size,
embedding_size])
# Format: tensorflow/tensorboard/plugins/projector/projector_config.proto
config = projector.ProjectorConfig()
# You can add multiple embeddings. Here we add only one.
embedding = config.embeddings.add()
embedding.tensor_name = embedding_var.name
# Link this tensor to its metadata file (e.g. labels).
embedding.metadata_path = os.path.join(LOG_DIR, 'metadata.tsv')
#Use the same LOG_DIR where you stored your checkpoint.
summary_writer = tf.summary.FileWriter(LOG_DIR)
# The next line writes a projector_config.pbtxt in the LOG_DIR. TensorBoard will
# read this file during startup.
projector.visualize_embeddings(summary_writer, config)
Almost all examples on github or other blogs uses mnist dataset for demo. When I am trying to use same deep NN for my images data I encounter following problem.
They use:
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop)
sess.run(train_op, feed_dict={X: trainimg, Y: trainlabel, keep_prob: 0.8})
next_batch method to feed data in batches.
My question is:
Do we have any similar method to feed data in batches?
You should have a look at tf.contrib.data.Dataset. You can create an input pipeline: define the source, apply a transforation, and batch it. See the programmer's guide for importing data.
From the documentation:
The Dataset API enables you to build complex input pipelines from simple, reusable pieces. For example, the pipeline for an image model might aggregate data from files in a distributed file system, apply random perturbations to each image, and merge randomly selected images into a batch for training
EDIT:
I guess what you have is an array of pictures (filenames). Here is an example from the programmer's guide.
Depending on your input files, the transformation part will change. Here is the extract for consuming an array of picture files.
# Reads an image from a file, decodes it into a dense tensor, and resizes it
# to a fixed shape.
def _parse_function(filename, label):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_image(image_string)
image_resized = tf.image.resize_images(image_decoded, [28, 28])
return image_resized, label
# A vector of filenames.
filenames = tf.constant(["/var/data/image1.jpg", "/var/data/image2.jpg", ...])
# labels[i] is the label for the image in filenames[i].
labels = tf.constant([0, 37, ...])
dataset = tf.contrib.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.map(_parse_function)
# Now you have a dataset of (image, label). Basically kind of a list with
# all your pictures encoded along with a label.
# Batch it.
dataset = dataset.batch(32)
# Create an iterator.
iterator = dataset.make_one_shot_iterator()
# Retrieve the next element.
image_batch, label_batch = iterator.get_next()
You could also shuffle your images.
Now you can use your image_batch and label_batch as placeholders in your model definition.