Using saved models from TF hub to extract feature vectors - tensorflow

I've been playing with different models from TF hub to extract feture vectors:
module = hub.load('https://tfhub.dev/google/tf2-preview/inception_v3/feature_vector/4')
features = module(image)
What i don't quite understand is how input image should be preprocessed.
Every model from the hub has this generic instruction:
The input image1 are expected to have color values in the range [0,1], following the common image input conventions. The expected size of the input images is height x width = 299 x 299 pixels by default, but other input sizes are possible (within limits).
where "common image input" is a link to a the following:
A signature that takes a batch of images as input accepts them as a dense 4-D tensor of dtype float32 and shape [batch_size, height, width, 3] whose elements are RGB color values of pixels normalized to the range [0, 1]. This is what you get from tf.image.decode_*() followed by tf.image.convert_image_dtype(..., tf.float32).
and this is indeed what i see quite often online:
image = tf.io.read_file(path)
# Decodes the image to W x H x 3 shape tensor with type of uint8
image = tf.io.decode_jpeg(image, channels=3)
# Resize the image to for model
image = tf.image.resize(image, [model_input_size, model_input_size])
# 1 x model_input_size x model_input_size x 3 tensor with the data type of float32
image = tf.image.convert_image_dtype(image, tf.float32)[tf.newaxis, ...]
BUT, color values are expected to be in range [0,1], in this case colors are in range [0,255] and should be scaled down:
image = numpy.array(image) * (1. / 255)
Is it just a common mistake or is the TF documentation is not up to date?
I was playing with models from tf.keras.applications and reading source code in github. I noticed in some of the models (EfficientNet) first layer is:
x = layers.Rescaling(1. / 255.)(x)
but in some models there is no such layer, instead and an utility function scales colors to [0,1] range, for example tf.keras.applications.mobilenet.preprocess_input.
So, how important for TF hub saved models image colors to be in [0,1] range?

This is just a convention TF Hub proposes: "Models for the same task are encouraged to implement a common API so that model consumers can easily exchange them without modifying the code that uses them, even if they come from different publishers" (from here).
As you've noted, the publisher of google/tf2-preview/inception_v3/feature_vector/4 decided that input images "are expected to have color values in the range [0,1]", while the publisher of tensorflow/efficientdet/d1/1 decided to add a Rescaling layer to the model itself such that "[a tensor] with values in [0, 255]" can be passed. So ultimately, it's up to the publisher how they implement their model. In any case, when using models from tfhub.dev, the expected preprocessing steps will always be documented on the model page.

Related

How to make image cube for 3d convolution with file path

Recently, I am studying 3d convolution for video image processing with tensorflow.
I make model with tutorial blog. But i want to make my custom dataset. My input image's shape is (128,128,3) and i want to make image cube(128,128,100,3). I use tensorflow.data.dataset and I tried to create a map function by recalling my memories I used for 2d convolution. I want to image cube using path that consist of (Number of image cube, 100) with tf.data.dataset map function because of running out of memory when using NumPy.
I tried to use code like the following
def load_image(path):
images = []
for i, p in enumerate(path):
image_string = tf.io.read_file(p)
image = tf.io.decode_jpeg(p, channels=3)
image = tf.reshape(image, [128,128,1,3])
image = image / 255
images.append(image)
image_block = tf.concat(images, axis=2)
return image_block
train_data = tf.data.Dataset.from_tensor_slices(total_files) # shape (1077,100)
train_data = train_data.map(load_images, num_parallel_calls=tf.data.experimental.AUTOTUNE)
But have error that tensor's shape changes. And i also use tf.Variable using .assign but have similar error.
How to make 3d convolution's input image cube with path??? I use tensorflow 2.0.
So you cannnot iterate over tensor like for x in tensor. In that case you can for example iterate over range and get value by index like
for x in range(tf.shape(tensor)[0]):
y = tensor[x]

How to resize elements in a ragged tensor in TensorFlow

I would like to resize every element in a ragged tensor. For example, if I have a ragged tensor of various sized images, how can I resize each one so that the dimensions are the same?
For example,
digits = tf.ragged.constant([np.zeros((1,60,60,1)), np.zeros((1,46,75,1))])
resize_lambda = lambda x: tf.image.resize(x, (60,60))
res = tf.ragged.map_flat_values(resize_lambda, digits)
I wish res to be a tensor of shape (2,60,60,1). How can I achieve this?
To clarify, this would be useful if within a custom layer we wanted to slice or crop sections from a single image to batch for inference in the next layer. In my case, I am attempting to combine two models (a model to segment an image into multiple cropped images of varying size and a classifier to predict each sub-image). I am also using tf 2.0
You should be able to do the following.
import tensorflow as tf
import numpy as np
digits = tf.ragged.constant([np.zeros((1,60,60,1)), np.zeros((1,46,75,1))])
res = tf.concat(
[tf.image.resize(digits[i].to_tensor(), (60,60)) for i in tf.range(digits.nrows())],
axis=0)

Resnet50 image preprocessing

I am using https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3 to extract image feature vectors. However, I'm confused when it comes to how to preprocess the images prior to passing them through the module.
Based on the related Github explanation, it's said that the following should be done:
image_path = "path/to/the/jpg/image"
image_string = tf.read_file(image_path)
image = tf.image.decode_jpeg(image_string, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
# All other transformations (during training), in my case:
image = tf.random_crop(image, [224, 224, 3])
image = tf.image.random_flip_left_right(image)
# During testing:
image = tf.image.resize_image_with_crop_or_pad(image, 224, 224)
However, using the aforementioned transformation, the results I am getting suggest that something might be wrong. Moreover, the Resnet paper is saying that the images should be preprocessed by:
A 224×224 crop is randomly sampled from an image or its
horizontal flip, with the per-pixel mean subtracted...
which I can't quite understand what is means. Can someone point me in the right direction?
Looking forward to you answers!
The image modules on TensorFlow Hub all expect pixel values in range [0,1], like you get in your code snippet above. This makes it easy and safe to switch between modules.
Inside the module, the input values are scaled to the range that the network was trained for. The module https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3 has been published from a TF-Slim checkpoint (see documentation), which uses yet another convention for normalizing inputs than He&al. -- but all this is taken care of.
To demystify the language in He&al.: it refers to the mean R, G and B values aggregated over all pixels of the dataset they studied, following the old wisdom that normalizing inputs to zero mean helps neural networks train better. However, later papers on image classification no longer expended this degree of attention to dataset-specific preprocessing.
The citation from the Resnet paper you mentioned is based on the following explanation from the Alexnet paper:
ImageNet consists of variable-resolution images, while our system requires a constant input dimensionality. Therefore, we down-sampled the images to a fixed resolution of256×256. Given a rectangular image, we first rescaled the image such that the shorter side was of length 256, and thencropped out the central 256×256patch from the resulting image. We did not pre-process the images in any other way, except for subtracting the mean activity over the training set from each pixel.
So in the Resnet paper, a similar process consist in taking a of 224x224 pixels part of the image (or of its horizontally flipped version) to ensure the network is given constant-sized images, and then center it by substracting the mean.

How to use the black/white image as the input to tensorflow

When implementing the reinforcement learning with tensorflow, the inputs are black/white images. Each pixel can be represented as a bit 1/0.
Can I give the data directly to tensorflow, with each bit as a feature? Or I had to expand the bits to bytes before sending to tensorflow? I'm new to tensorflow, so some code example would be nice.
Thanks
You can directly load the Image data as you would normally do, the Image being binary will have no effect other that the input channel width becoming 1 for the input.
Whenever you put an Image through a convnet, each output filter generally learns features for all the channels, so in case of a binary image, there is a separate kernel defined for each input channel / output channel combination (Since Only 1 input channel) in the first layer.
Each channel is defined by it's number of filters and there exists a 2D kernel for each input channel which averages over all filters, so you will have weights/parameters equal to input_channels * number_of_filters * filter_dims, here for the first layer input_channels becomes one.
Since you asked for some sample code.
Let your image be in a tensor X, simply use
X_out = tf.nn.conv2d(X, filters = 6, kernel_size = [height,width])
After that you can apply an activation, this will make your output image have 6 channels. If you face any problem or have some doubts, feel free to comment, for theoretical clarification, check out https://www.coursera.org/learn/convolutional-neural-networks/lecture/nsiuW/one-layer-of-a-convolutional-network
Edit
Since the question was about simple neural net, not conv net, here is the code for that,
X_train is the variable in which image is stored as (n_x,n_x) byte resolution, n_x is used later.
You will need to flatten the input.
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
This first flattens the image horizontally and then transposes it to arrange it vertically.
Then you will create placeholder tensor X as :
X = tf.placeholder(tf.bool,[n_x*n_x,None]) #Your Input tensor should have dimension same as your input layer.
let W, b be weight and bias respectively.
Z1 = tf.add(tf.matmul(W1,X),b1) #Linear Transformation step
A1 = tf.nn.relu(Z1) #Activation Step
And you keep on creating your graph, I think that answers your question, if not let me know.

Stick two gray images back to back in Tensorflow

I have two gray images and I want to stick these two images back-to-back along the third dimension and form a WxHx2 images. The generated image is then labeled and feed to the Tensorflow framework for training. How I can do this?
Thanks
Reshape image1 and image2 to be of shape (H, W, 1) and do
x = numpy.dstack((image1, image2))
to make x to have shape (H, W, 2) which you can feed into any tensorflow as input.