I am using
train_data_gen = tf.keras.utils.image_dataset_from_directory(...)
AUTOTUNE = tf.data.AUTOTUNE
train_data_gen = train_data_gen.cache().prefetch(buffer_size=AUTOTUNE)
I want to apply multiple image transforms like it is done in Pytorch:
# Define the transformation pipeline
transform = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(), # Convert the image to a PyTorch tensor
Resize((256, 256)), # Resize the images to a target size while keeping the aspect ratio intact
torchvision.transforms.Pad(padding=0, fill=0.5, padding_mode='constant'), # Add padding with the average color of the ImageNet dataset
torchvision.transforms.RandomRotation(degrees=(-15, 15)), # Randomly rotate the image between -15 and 15 degrees
torchvision.transforms.RandomCrop(224), # Randomly crop and resize the image to the target input size
torchvision.transforms.RandomHorizontalFlip(), # Flip the image horizontally with a probability of 50%
torchvision.transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1), # Randomly change the brightness, contrast, saturation, and hue
torchvision.transforms.RandomGrayscale(p=0.2), # Convert the image to grayscale with a probability of 20%
torchvision.transforms.RandomErasing(p=0.2, scale=(0.02, 0.33), ratio=(0.3, 3.3), value=0.5), # Randomly erase a rectangular region of the image
Resize((256, 256)), # Resize the images to a target size while keeping the aspect ratio intact
torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # Normalize the image according to the ImageNet dataset
])
So how can I do that? can you show me how its done preferably with a code example.
Hey are you aware of Algumentation !!
Algumentation library artificially creates training images through different ways of processing or combination of multiple processing, such as random rotation, shifts, shear and flips, etc.
You can refer the link for code reference,
https://albumentations.ai/
Albumentation Tutoral Kernal Link
Related
This question is about ensuring the prediction time input images to be in the same range as the images fed during the training time. I know that it's the usual practice to repeat the same steps that were done during the training time to process an image at the prediction time. But in my case, I apply random_trasnform() function inside a custom data generator during the training time, which won't make sense to add during the prediction time.
import cv2
import tensorflow as tf
import seaborn as sns
To simplify my problem, assume I'm doing the following changes to a grayscale image that I read in a custom data generator.
img_1 is an output of the data generator, that is supposed to be the input to a VGG19 model.
# using a simple augmenter
augmenter = tf.keras.preprocessing.image.ImageDataGenerator(
brightness_range=(0.75, 1.25),
preprocessing_function=tf.keras.applications.vgg19.preprocess_input # preprocessing function of VGG19
)
# read the image
img = cv2.imread('sphx_glr_plot_camera_001.png')
# add a random trasnform
img_1 = augmenter.random_transform(img)/255
The above random_tranform() has made the grayscale value distribution to be as follows (between [0,1]):
plt.imshow(img_1); plt.show();
sns.histplot(img_1[:, :, 0].ravel()); # select the 0th layer and ravel because the augmenter stacks 3 layers of the grayscale image to make it an RGB image
Now, I want to do the same in the prediction time, but, I don't want a random transform applied to the image so I just pass the input image through the preprocessing_function().
# read image
img = cv2.imread('sphx_glr_plot_camera_001.png')
# pass through the preprocessing function
img_2 = tf.keras.applications.vgg19.preprocess_input(img)/255
But I'm unable to make the input to be in the range of the [0, 1] as was done during the training.
plt.imshow(img_2); plt.show();
sns.histplot(img_2[:, :, 0].ravel());
This makes the predictions completely incorrect. How can I make sure that the inputs to the model at the prediction time undergo the same steps so that they end up having a similar distribution to the inputs that were fed during training? I don't want to add a random_transform() at the prediction time as well.
I will recommend to add an per image standardization in your model this will ensure you that the mean of the image is 0 and standard deviation is 1 in you training set and in your inference
I've been playing with different models from TF hub to extract feture vectors:
module = hub.load('https://tfhub.dev/google/tf2-preview/inception_v3/feature_vector/4')
features = module(image)
What i don't quite understand is how input image should be preprocessed.
Every model from the hub has this generic instruction:
The input image1 are expected to have color values in the range [0,1], following the common image input conventions. The expected size of the input images is height x width = 299 x 299 pixels by default, but other input sizes are possible (within limits).
where "common image input" is a link to a the following:
A signature that takes a batch of images as input accepts them as a dense 4-D tensor of dtype float32 and shape [batch_size, height, width, 3] whose elements are RGB color values of pixels normalized to the range [0, 1]. This is what you get from tf.image.decode_*() followed by tf.image.convert_image_dtype(..., tf.float32).
and this is indeed what i see quite often online:
image = tf.io.read_file(path)
# Decodes the image to W x H x 3 shape tensor with type of uint8
image = tf.io.decode_jpeg(image, channels=3)
# Resize the image to for model
image = tf.image.resize(image, [model_input_size, model_input_size])
# 1 x model_input_size x model_input_size x 3 tensor with the data type of float32
image = tf.image.convert_image_dtype(image, tf.float32)[tf.newaxis, ...]
BUT, color values are expected to be in range [0,1], in this case colors are in range [0,255] and should be scaled down:
image = numpy.array(image) * (1. / 255)
Is it just a common mistake or is the TF documentation is not up to date?
I was playing with models from tf.keras.applications and reading source code in github. I noticed in some of the models (EfficientNet) first layer is:
x = layers.Rescaling(1. / 255.)(x)
but in some models there is no such layer, instead and an utility function scales colors to [0,1] range, for example tf.keras.applications.mobilenet.preprocess_input.
So, how important for TF hub saved models image colors to be in [0,1] range?
This is just a convention TF Hub proposes: "Models for the same task are encouraged to implement a common API so that model consumers can easily exchange them without modifying the code that uses them, even if they come from different publishers" (from here).
As you've noted, the publisher of google/tf2-preview/inception_v3/feature_vector/4 decided that input images "are expected to have color values in the range [0,1]", while the publisher of tensorflow/efficientdet/d1/1 decided to add a Rescaling layer to the model itself such that "[a tensor] with values in [0, 255]" can be passed. So ultimately, it's up to the publisher how they implement their model. In any case, when using models from tfhub.dev, the expected preprocessing steps will always be documented on the model page.
I am using https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3 to extract image feature vectors. However, I'm confused when it comes to how to preprocess the images prior to passing them through the module.
Based on the related Github explanation, it's said that the following should be done:
image_path = "path/to/the/jpg/image"
image_string = tf.read_file(image_path)
image = tf.image.decode_jpeg(image_string, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
# All other transformations (during training), in my case:
image = tf.random_crop(image, [224, 224, 3])
image = tf.image.random_flip_left_right(image)
# During testing:
image = tf.image.resize_image_with_crop_or_pad(image, 224, 224)
However, using the aforementioned transformation, the results I am getting suggest that something might be wrong. Moreover, the Resnet paper is saying that the images should be preprocessed by:
A 224×224 crop is randomly sampled from an image or its
horizontal flip, with the per-pixel mean subtracted...
which I can't quite understand what is means. Can someone point me in the right direction?
Looking forward to you answers!
The image modules on TensorFlow Hub all expect pixel values in range [0,1], like you get in your code snippet above. This makes it easy and safe to switch between modules.
Inside the module, the input values are scaled to the range that the network was trained for. The module https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3 has been published from a TF-Slim checkpoint (see documentation), which uses yet another convention for normalizing inputs than He&al. -- but all this is taken care of.
To demystify the language in He&al.: it refers to the mean R, G and B values aggregated over all pixels of the dataset they studied, following the old wisdom that normalizing inputs to zero mean helps neural networks train better. However, later papers on image classification no longer expended this degree of attention to dataset-specific preprocessing.
The citation from the Resnet paper you mentioned is based on the following explanation from the Alexnet paper:
ImageNet consists of variable-resolution images, while our system requires a constant input dimensionality. Therefore, we down-sampled the images to a fixed resolution of256×256. Given a rectangular image, we first rescaled the image such that the shorter side was of length 256, and thencropped out the central 256×256patch from the resulting image. We did not pre-process the images in any other way, except for subtracting the mean activity over the training set from each pixel.
So in the Resnet paper, a similar process consist in taking a of 224x224 pixels part of the image (or of its horizontally flipped version) to ensure the network is given constant-sized images, and then center it by substracting the mean.
I found many methods in tf.image to resize images, but almost all of them are crop or pad or interpolation. Can the methods with interpolation shrink images? I just want to shrink my images without crop.
Thanks a lot!
Well, seems you need tf.image.resize_images?
res = tf.image.resize_images(images, size, method=ResizeMethod.BILINEAR, align_corners=False)
default resize method is BILINEAR
val = np.random.rand(100, 70, 3)
x = tf.constant(val)
y = tf.image.resize_images(x, (30,30))
with tf.Session() as sess:
a = sess.run(y) # size (30, 30, 3)
At the end of the day, images are matrices/tensors. In order to shrink an image directly you will have to do some lossy compression. Namely, average or max pooling on individual groups of pixels (or matrix values) and them save them out as pre-processed images for your model. Try looking into the Tensorflow max-pool and average-pool functions:
tf.nn.max_pool(value, ksize, strides, padding, data_format='NHWC', name=None)
tf.nn.avg_pool(value, ksize, strides, padding, data_format='NHWC', name=None)
Depending on the application, you may just want to "shrink" images using a model that "creates features" such as a Convolutional Neural Network (CNN) that then you can use for your model. However, you still have to process your original sized images on every run.
Another advanced approach is to do some sort of decomposition, such as the SVD, to get a lower dimensional representation of your images. However, most methods are linear techniques and might not preserve everything exactly. Another method is to train an autoencoder and then save out the lower-dimensional results out to be the pre-processed data.
Hope this helps!
In tensorflow tutorial example usage of TFRecords is provided with the MNIST dataset.
MNIST dataset is converted to TFRecords file like this:
def convert_to(data_set, name):
images = data_set.images
labels = data_set.labels
num_examples = data_set.num_examples
if images.shape[0] != num_examples:
raise ValueError('Images size %d does not match label size %d.' %
(images.shape[0], num_examples))
rows = images.shape[1]
cols = images.shape[2]
depth = images.shape[3]
filename = os.path.join(FLAGS.directory, name + '.tfrecords')
print('Writing', filename)
writer = tf.python_io.TFRecordWriter(filename)
for index in range(num_examples):
image_raw = images[index].tostring()
example = tf.train.Example(features=tf.train.Features(feature={
'height': _int64_feature(rows),
'width': _int64_feature(cols),
'depth': _int64_feature(depth),
'label': _int64_feature(int(labels[index])),
'image_raw': _bytes_feature(image_raw)}))
writer.write(example.SerializeToString())
writer.close()
And then it is readed and decoded like this:
def read_and_decode(filename_queue):
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(
serialized_example,
# Defaults are not specified since both keys are required.
features={
'image_raw': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenFeature([], tf.int64),
})
# Convert from a scalar string tensor (whose single string has
# length mnist.IMAGE_PIXELS) to a uint8 tensor with shape
# [mnist.IMAGE_PIXELS].
image = tf.decode_raw(features['image_raw'], tf.uint8)
image.set_shape([mnist.IMAGE_PIXELS])
# OPTIONAL: Could reshape into a 28x28 image and apply distortions
# here. Since we are not applying any distortions in this
# example, and the next step expects the image to be flattened
# into a vector, we don't bother.
# Convert from [0, 255] -> [-0.5, 0.5] floats.
image = tf.cast(image, tf.float32) * (1. / 255) - 0.5
# Convert label from a scalar uint8 tensor to an int32 scalar.
label = tf.cast(features['label'], tf.int32)
return image, label
Question: it there a way to read images from TFRecords with different sizes? Because at this point
image.set_shape([mnist.IMAGE_PIXELS])
all tensors sizes need to be known. Which means I can't do something like
width = tf.cast(features['width'], tf.int32)
height = tf.cast(features['height'], tf.int32)
tf.reshape(image, [width, height, 3])
So how do I use TFRecords in this case?
Also I can't understand why in the tutorial authors are saving height and width in TFRecords file if they don't use it after, and use a predefined constant instead when they read and decode the image.
For the training in this particular case there is no reason to keep the width and height, however since the images are serialized into a single byte stream a future you might wonder what shape that data originally had instead of 784 bytes - essentially, they're just creating self-contained examples.
As for differently sized images, you have to keep in mind that at some point you need to map your feature tensors to weights and that since the number of weights is fixed for a given network, so have to be the dimensions of the feature tensors. Another point to think about is data normalization: If you're using differently shaped images, do they have the same mean and variance? You might chose to ignore that point, but if you don't, you have to come up with a solution for it as well.
If you are just asking to use images of different sizes, i.e. 100x100x3 instead of 28x28x1, you can of course use
image.set_shape([100, 100, 3])
in order to reshape a single tensor of 30000 "elements" total to a single rank-3 tensor.
Or, if you are working with batches (of to-be-determined size), you might use
image_batch.set_shape([None, 100, 100, 3])
Note that this is not a list of tensors but a single rank 4 tensor and because of that all images in that batch have to have the same dimensions; i.e. having a 100x100x3 image followed by a 28x28x1 image in the same batch is not possible.
Before batching though you are free to have whatever size and shape you want and you can as well load the shapes from the records - which they did not do in the MNIST example. You might, for example, apply any of the image processing operations in order to obtain augmented image of fixed size for further processing.
Note also that the serialized representations of the images may indeed have different lengths and shapes. You may for example decide to store JPEG or PNG bytes instead of raw pixel values; they would obviously have different sizes.
Finally, there's tf.FixedLenFeature() as well, but they are creating SparseTensor representations. That's typically nothing related to nonbinary images though.