How to make image cube for 3d convolution with file path - tensorflow

Recently, I am studying 3d convolution for video image processing with tensorflow.
I make model with tutorial blog. But i want to make my custom dataset. My input image's shape is (128,128,3) and i want to make image cube(128,128,100,3). I use tensorflow.data.dataset and I tried to create a map function by recalling my memories I used for 2d convolution. I want to image cube using path that consist of (Number of image cube, 100) with tf.data.dataset map function because of running out of memory when using NumPy.
I tried to use code like the following
def load_image(path):
images = []
for i, p in enumerate(path):
image_string = tf.io.read_file(p)
image = tf.io.decode_jpeg(p, channels=3)
image = tf.reshape(image, [128,128,1,3])
image = image / 255
images.append(image)
image_block = tf.concat(images, axis=2)
return image_block
train_data = tf.data.Dataset.from_tensor_slices(total_files) # shape (1077,100)
train_data = train_data.map(load_images, num_parallel_calls=tf.data.experimental.AUTOTUNE)
But have error that tensor's shape changes. And i also use tf.Variable using .assign but have similar error.
How to make 3d convolution's input image cube with path??? I use tensorflow 2.0.

So you cannnot iterate over tensor like for x in tensor. In that case you can for example iterate over range and get value by index like
for x in range(tf.shape(tensor)[0]):
y = tensor[x]

Related

Three channel vs. grayscale image input to a CNN

I am trying to mimic and build an image classification model described in a paper. At first I was using 100x100x3 images as input to the neural network. The model was giving very good accuracy (~95%) in training phase, but producing very poor predictions, resulting in a confusion matrix that had same values in all cells. But when I trained the model with single band grayscale images as input, as done in the paper itself, the accuracy reduced to ~84% but the model made good predictions with good confusion matrix. I am building the CNN model with keras and using ImageDataGenerator for flowing images.
This is how I create image augmentation.(same as described in the paper):
# Flip image along y=x line: returns flipped image as an integer array
def flip_xy(img):
img_flip = tfimg.rot90(np.flipud(PIL.Image.fromarray(img).resize((100,100))))
return np.array(img_flip)
# Rotate img thrice by 90 degrees: returns 3 rotated images as integer arrays
def rotate90(img):
img90 = tfimg.rot90(np.array(img.resize((100,100))))
img180 = tfimg.rot90(np.array(img90))
img270 = tfimg.rot90(np.array(img180))
return np.array(img90), np.array(img180), np.array(img270)
# Crop image once from each corner and once from centre: return image object
def getCrops(img):
upper_left = img.crop((10,10,110,110))
upper_right = img.crop((0,10,100,110))
lower_left = img.crop((10,0,110,100))
lower_right = img.crop((0,0,100,100))
centre = img.crop((5,5,105,105))
return upper_left, upper_right, lower_left, lower_right, centre
# Create 40 copies from single image: data augmentation
def createCopies(img_path, target_folder):
img = image.load_img(img_path, target_size = (110,110))
img_file = img_path.split("\\")[-1].split(".")[0]
k = 1
ul, ur, ll, lr, ctr = getCrops(img)
for image0 in [ul, ur, ll, lr, ctr]:
i90, i180, i270 = rotate90(image0)
for image1 in [i90, i180, i270, np.array(image0)]:
flip = flip_xy(image1)
PIL.Image.fromarray(image1).save(os.path.join(target_folder, f"{img_file}({k}).jpg"))
PIL.Image.fromarray(flip).save(os.path.join(target_folder, f"{img_file}({k+1}).jpg"))
k += 2
I then iterate over all the original images, creating augmentations.
Here is the code for model training:
Model training. Ignore the fact that the training phase is not completed in the notebook. It is from a later run. In the original run, the training ran for all 30 epochs.
Here is the code for model evaluation:
Model evaluation. Here I have a second doubt. Why does predicting each image individually and predicting the generator object produce different results?
Can anyone explain why this happens? Shouldn't three channel images be able to train the model better? And why is there a discrepancy between model accuracy while training and actual predictions?

Using saved models from TF hub to extract feature vectors

I've been playing with different models from TF hub to extract feture vectors:
module = hub.load('https://tfhub.dev/google/tf2-preview/inception_v3/feature_vector/4')
features = module(image)
What i don't quite understand is how input image should be preprocessed.
Every model from the hub has this generic instruction:
The input image1 are expected to have color values in the range [0,1], following the common image input conventions. The expected size of the input images is height x width = 299 x 299 pixels by default, but other input sizes are possible (within limits).
where "common image input" is a link to a the following:
A signature that takes a batch of images as input accepts them as a dense 4-D tensor of dtype float32 and shape [batch_size, height, width, 3] whose elements are RGB color values of pixels normalized to the range [0, 1]. This is what you get from tf.image.decode_*() followed by tf.image.convert_image_dtype(..., tf.float32).
and this is indeed what i see quite often online:
image = tf.io.read_file(path)
# Decodes the image to W x H x 3 shape tensor with type of uint8
image = tf.io.decode_jpeg(image, channels=3)
# Resize the image to for model
image = tf.image.resize(image, [model_input_size, model_input_size])
# 1 x model_input_size x model_input_size x 3 tensor with the data type of float32
image = tf.image.convert_image_dtype(image, tf.float32)[tf.newaxis, ...]
BUT, color values are expected to be in range [0,1], in this case colors are in range [0,255] and should be scaled down:
image = numpy.array(image) * (1. / 255)
Is it just a common mistake or is the TF documentation is not up to date?
I was playing with models from tf.keras.applications and reading source code in github. I noticed in some of the models (EfficientNet) first layer is:
x = layers.Rescaling(1. / 255.)(x)
but in some models there is no such layer, instead and an utility function scales colors to [0,1] range, for example tf.keras.applications.mobilenet.preprocess_input.
So, how important for TF hub saved models image colors to be in [0,1] range?
This is just a convention TF Hub proposes: "Models for the same task are encouraged to implement a common API so that model consumers can easily exchange them without modifying the code that uses them, even if they come from different publishers" (from here).
As you've noted, the publisher of google/tf2-preview/inception_v3/feature_vector/4 decided that input images "are expected to have color values in the range [0,1]", while the publisher of tensorflow/efficientdet/d1/1 decided to add a Rescaling layer to the model itself such that "[a tensor] with values in [0, 255]" can be passed. So ultimately, it's up to the publisher how they implement their model. In any case, when using models from tfhub.dev, the expected preprocessing steps will always be documented on the model page.

How to resize elements in a ragged tensor in TensorFlow

I would like to resize every element in a ragged tensor. For example, if I have a ragged tensor of various sized images, how can I resize each one so that the dimensions are the same?
For example,
digits = tf.ragged.constant([np.zeros((1,60,60,1)), np.zeros((1,46,75,1))])
resize_lambda = lambda x: tf.image.resize(x, (60,60))
res = tf.ragged.map_flat_values(resize_lambda, digits)
I wish res to be a tensor of shape (2,60,60,1). How can I achieve this?
To clarify, this would be useful if within a custom layer we wanted to slice or crop sections from a single image to batch for inference in the next layer. In my case, I am attempting to combine two models (a model to segment an image into multiple cropped images of varying size and a classifier to predict each sub-image). I am also using tf 2.0
You should be able to do the following.
import tensorflow as tf
import numpy as np
digits = tf.ragged.constant([np.zeros((1,60,60,1)), np.zeros((1,46,75,1))])
res = tf.concat(
[tf.image.resize(digits[i].to_tensor(), (60,60)) for i in tf.range(digits.nrows())],
axis=0)

Batch input to a certain layer in tensorflow

I'm working on a network based on inception-v3 .I train the network successfully, and now I want to feed a batch of opencv images to my network and get some output.
The original placeholder of the network accepts a string and decodes it a jpg (this image) But I read the video frames with opencv and convert them in a list of nparray :
for cnt in range(batch_size):
frameBuffer = []
if (currentPosition >= nFrames):
break
ret, frame = vidFile.read()
img_data = np.asarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
frameBuffer.append(img_data)
currentPosition += multiplier
If I want to work with a single images, beacuse i read frames directly from opencv, I convert them to np-array and then feed it to "Cast:0" layer of the inception network:
pred = sess.run([predictions], {'Cast:0': img_data})
Results are OK to this point. But I want to feed a batch of frames: I tried to use feed_dict in the current way:
images = tf.placeholder(tf.float32, [batch_size,width,height, 3])
image_batch = tf.stack(frameBuffer)
feed_dict = {images: image_batch}
avgRepresentation, pred = sess.run([pool_avg, predictions],{'Cast:0': feed_dict})
but i got errors; I know i have a mistake in feeding the batch. do you have any suggestion how i can feed a batch of images to a certain layer of a network ?
There is (at least) a problem with your feed_dict: a feed_dict is typically a dictionary with tensors or strings (for tensor name) as keys, and the values (given as usual types, np arrays, etc.).
Here you're using {'Cast:0': feed_dict}, so the value of your dictionary is itself a dictionary, which makes no sense for tensorflow. You need to put the values there, i.e. the concatenation of the images (decoded, converted, etc.). Also, sorry if I'm missing something, but I guess that frameBuffer should contain all the images of the batch, so it should be initialized out of the for loop.
This code should work:
frameBuffer = []
for cnt in range(batch_size):
if (currentPosition >= nFrames):
break
ret, frame = vidFile.read()
img_data = np.asarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
frameBuffer.append(img_data)
currentPosition += multiplier
avgRepresentation, pred = sess.run([pool_avg, predictions],{'Cast:0': np.asarray(frameBuffer)})

Tensorflow slim how to specify batch size during training

I'm trying to use slim interface to create and train a convolutional neural network, but I couldn't figure out how to specify the batch size for training.
During the training my net crashes because of "Out of Memory" on my graphic card.
So I think that should be a way to handle this condition...
Do I have to split the data and the labels in batches and then explicitly loop or the slim.learning.train is taking care of it?
In the code I paste train_data are all the data in my training set (numpy array)..and the model definition is not included here
I had a quick loop to the sources but no luck so far...
g = tf.Graph()
with g.as_default():
# Set up the data loading:
images = train_data
labels = tf.contrib.layers.one_hot_encoding(labels=train_labels, num_classes=num_classes)
# Define the model:
predictions = model7_2(images, num_classes, is_training=True)
# Specify the loss function:
slim.losses.softmax_cross_entropy(predictions, labels)
total_loss = slim.losses.get_total_loss()
tf.scalar_summary('losses/total loss', total_loss)
# Specify the optimization scheme:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001)
train_tensor = slim.learning.create_train_op(total_loss, optimizer)
slim.learning.train(train_tensor,
train_log_dir,
number_of_steps=1000,
save_summaries_secs=300,
save_interval_secs=600)
Any hints suggestions?
Edit:
I re-read the documentation...and I found this example
image, label = MyPascalVocDataLoader(...)
images, labels = tf.train.batch([image, label], batch_size=32)
But It's not clear at all how to feed image and label to be passed to tf.train.batch... as MyPascalVocDataLoader function is not specified...
In my case my data set are loaded from a sqlite database and I have training data and labels as numpy array....still confused.
Of course I tried to pass my numpy arrays (converted to constant tensor) to the tf.train.batch like this
image = tf.constant(train_data)
label = tf.contrib.layers.one_hot_encoding(labels=train_labels, num_classes=num_classes)
images, labels = tf.train.batch([image, label], batch_size=32)
But seems not the right path to follow... it seems that the train.batch wants only one element from my data set...(how to pass this? it does not make sense to me to pass only train_data[0] and train_labels[0])
Here you can create the tfrecords which is the special type of binary file format used by the tensorflow. As you mentioned you have the training images and the labels, you can easily create the TFrecords for training and validation.
After creating the TFrecords, all you need to right is decode the images from the encoded TFrecords and give it to your model input. There you can select the batch size and all.