How to convert image to 4D Tenzor (1,150,80,1) [batch_size, width, height, channels] ?
The model on which I train in the manual receives 16 images (16,150,80,1).
https://keras.io/api/layers/preprocessing_layers/categorical/string_lookup/
But I want to try using 1 image.
# 1. Read image
img = tf.io.read_file(img_path)
# 2. Decode and convert to grayscale
img = tf.io.decode_jpeg(img, channels=1)
# 3. Convert to float32 in [0, 1] range
img = tf.image.convert_image_dtype(img, tf.float32)
# 4. Resize to the desired size
img = tf.image.resize(img, [80, 150])
# 5. Transpose the image because we want the time
# dimension to correspond to the width of the image.
img = tf.transpose(img, perm=[1, 0, 2])
# 6. Convert
# ...
Thanks #Innat for the answer and #Vipz for confirming the solution worked. Adding the comment in the answer section for the community benefits.
Use tf.expand_dims(image, axis=0) to cvt [h, w, c] to [1, h, w, c].
Related
I am fairly new to tensorflow and I have a tflite model which needs inference on a single image (ie no datasets). The docs say the input should be 224,224,3 and scaled to [0,1] (https://www.tensorflow.org/lite/tutorials/model_maker_image_classification#advanced_usage), but I am having trouble doing this rescaling to [0,1].
Currently I have something like so:
img = tf.io.read_file(image_path)
img = tf.io.decode_image(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.uint8)
print('min max img value',tf.reduce_min(img),tf.reduce_max(img))
The min and max and 0 and 255 respectively. I would like to scale this to [0,1]
I am on tf 2.5 and I do not see a builtin method to do this..
I tried doing this:
img = tf.io.read_file(image_path)
img = tf.io.decode_image(img, channels=3)
scale=1./255
img=img*scale
img = tf.image.convert_image_dtype(img, tf.uint8)
print('min max img value',tf.reduce_min(img),tf.reduce_max(img))
and I get thrown:
TypeError: Cannot convert 0.00392156862745098 to EagerTensor of dtype uint8
I think there is some casting error :(
In order to avoid
TypeError: Cannot convert 0.00392156862745098 to EagerTensor of dtype uint8
error we have to cast img form tf.unit8 to tf.float32 like
img = tf.cast(img, dtype=tf.float32) / tf.constant(256, dtype=tf.float32)
print('min max img value', tf.reduce_min(img), tf.reduce_max(img))
Conversion an image tensor in tf.float32 normalized to scale [0, 1] to tf.uint8 is probably not a good idea.
I am trying to generate a random boolean mask sampled according to a predefined probability distribution. The probability distribution is stored in a tensor of the same shape as the resulting mask. Each entry contains the probability that the mask will be true at that particular location.
In short I am looking for a function that takes 4 inputs:
pdf: A tensor to use as a PDF
s: The number of samples per mask
n: The total number of masks to generate
replace: A boolean indicating if sampling should be done with replacement
and returns n boolean masks
A simplified way to do this using numpy would look like this:
def sample_mask(pdf, s, replace):
hight, width = pdf.shape
# Flatten to 1 dimension
pdf = np.resize(pdf, (hight*width))
# Sample according to pdf, the result is an array of indices
samples=np.random.choice(np.arange(hight*width),
size=s, replace=replace, p=pdf)
mask = np.zeros(hight*width)
# Apply indices to mask
for s in samples:
mask[s]=1
# Resize back to the original shape
mask = np.resize(mask, (hight, width))
return mask
I already figured out that the sampling part, without the replace parameter, can be done like this:
samples = tf.multinomial(tf.log(pdf_tensor), n)
But I am stuck when it comes to transforming the samples to a mask.
I must have been sleeping, here is how I solved it:
def sample_mask(pdf, s, n, replace):
"""Initialize the model.
Args:
pdf: A 3D Tensor of shape (batch_size, hight, width, channels=1) to use as a PDF
s: The number of samples per mask. This value should be less than hight*width
n: The total number of masks to generate
replace: A boolean indicating if sampling should be done with replacement
Returns:
A Tensor of shape (batch_size, hight, width, channels=1, n) containing
values 1 or 0.
"""
batch_size, hight, width, channels = pdf.shape
# Flatten pdf
pdf = tf.reshape(pdf, (batch_size, hight*width))
if replace:
# Sample with replacement. Output is a tensor of shape (batch_size, n)
sample_fun = lambda: tf.multinomial(tf.log(pdf), s)
else:
# Sample without replacement. Output is a tensor of shape (batch_size, n).
# Cast the output to 'int64' to match the type needed for SparseTensor's indices
sample_fun = lambda: tf.cast(sample_without_replacement(tf.log(pdf), s), dtype='int64')
# Create batch indices
idx = tf.range(batch_size, dtype='int64')
idx = tf.expand_dims(idx, 1)
# Transform idx to a 2D tensor of shape (batch_size, samples_per_batch)
# Example: [[0 0 0 0 0],[1 1 1 1 1],[2 2 2 2 2]]
idx = tf.tile(idx, [1, s])
mask_list = []
for i in range(n):
# Generate samples
samples = sample_fun()
# Combine batch indices and samples
samples = tf.stack([idx,samples])
# Transform samples to a list of indicies: (batch_index, sample_index)
sample_indices = tf.transpose(tf.reshape(samples, [2, -1]))
# Create the mask as a sparse tensor and set sampled indices to 1
mask = tf.SparseTensor(indices=sample_indices, values=tf.ones(s*batch_size), dense_shape=[batch_size, hight*width])
# Convert mask to a dense tensor. Non-sampled values are set to 0.
# Don't validate the indices, since this requires indices to be ordered
# and unique.
mask = tf.sparse.to_dense(mask, default_value=0,validate_indices=False)
# Reshape to input shape and append to list of tensors
mask_list.append(tf.reshape(mask, [batch_size, hight, width, channels]))
# Combine all masks into a tensor of shape:
# (batch_size, hight, width, channels=1, number_of_masks)
return tf.stack(mask_list, axis=-1)
Function for sampling without replacement as proposed here: https://github.com/tensorflow/tensorflow/issues/9260#issuecomment-437875125
It uses the Gumble-max trick: https://timvieira.github.io/blog/post/2014/07/31/gumbel-max-trick/
def sample_without_replacement(logits, K):
z = -tf.log(-tf.log(tf.random_uniform(tf.shape(logits),0,1)))
_, indices = tf.nn.top_k(logits + z, K)
return indices
The image below describes the output before the application of a max-pooling layer of a single intermediate filter layer of a CNN.
I want to store the co-ordinates of the pixel with intensity 4(on the bottom right of the matrix on the LHS of the arrow) as it is in the matrix on the LHS of the arrow. That is the pixel at co-ordinate (4,4)(1 based indexing)in the right matrix is the one which is getting stored in the bottom right cell of the matrix on the RHS of the arrow, right. Now what I want to do is to store this co-ordinate value (4,4) along with the co-ordinates for the other pixels {(2,2) for pixel with intensity 6, (2, 4) for pixel with intensity 8 and (3, 1) for pixel with intensity 3} as a list for later processing. How do I do it in Tensorflow.
Max pooling done with a filter of size 2 x 2 and stride of 2
You can use tf.nn.max_pool_with_argmax (link).
Noteļ¼
The indices in argmax are flattened, so that a maximum value at
position [b, y, x, c] becomes flattened index ((b * height + y) *
width + x) * channels + c.
We need to do some processing to make it fit your coordinates.
An example:
import tensorflow as tf
import numpy as np
def max_pool_with_argmax(net,filter_h,filter_w,stride):
output, mask = tf.nn.max_pool_with_argmax( net,ksize=[1, filter_h, filter_w, 1],
strides=[1, stride, stride, 1],padding='SAME')
# If your ksize looks like [1, stride, stride, 1]
loc_x = mask // net.shape[2]
loc_y = mask % net.shape[2]
loc = tf.concat([loc_x+1,loc_y+1],axis=-1) #count from 0 so add 1
# If your ksize is all changing, use the following
# c = tf.mod(mask,net.shape[3])
# remain = tf.cast(tf.divide(tf.subtract(mask,c),net.shape[3]),tf.int64)
# x = tf.mod(remain,net.shape[2])
# remain = tf.cast(tf.divide(tf.subtract(remain,x),net.shape[2]),tf.int64)
# y = tf.mod(remain,net.shape[1])
# remain = tf.cast(tf.divide(tf.subtract(remain, y), net.shape[1]),tf.int64)
# b = tf.mod(remain, net.shape[0])
# loc = tf.concat([y+1,x+1], axis=-1)
return output,loc
input = tf.Variable(np.random.rand(1, 6, 4, 1), dtype=np.float32)
output, mask = max_pool_with_argmax(input,2,2,2)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
input_value,output_value,mask_value = sess.run([input,output,mask])
print(input_value[0,:,:,0])
print(output_value[0,:,:,0])
print(mask_value[0,:,:,:])
#print
[[0.20101677 0.09207255 0.32177696 0.34424785]
[0.4116488 0.5965447 0.20575707 0.63288754]
[0.3145412 0.16090539 0.59698933 0.709239 ]
[0.00252096 0.18027237 0.11163216 0.40613824]
[0.4027637 0.1995668 0.7462126 0.68812144]
[0.8993007 0.55828506 0.5263306 0.09376772]]
[[0.5965447 0.63288754]
[0.3145412 0.709239 ]
[0.8993007 0.7462126 ]]
[[[2 2]
[2 4]]
[[3 1]
[3 4]]
[[6 1]
[5 3]]]
You can see (2,2) for pixel with intensity 0.5965447, (2, 4) for pixel with intensity 0.63288754 and so on.
Let's say you have the following max-pooling layer:
pool_layer= tf.nn.max_pool(conv_output,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='VALID')
you can use:
max_pos = tf.gradients([pool_layer], [conv_output])[0]
I run the code below, it raises an ValueError: 'images' contains no shape. Therefore I have to add the line behind # to set the static shape, but img_raw may have different shapes and this line makes the tf.image.resize_images out of effect.
I just want to turn images with different shapes to [227,227,3]. How should I do that?
def tf_read(file_queue):
reader = tf.WholeFileReader()
file_name, content = reader.read(file_queue)
img_raw = tf.image.decode_image(content,3)
# img_raw.set_shape([227,227,3])
img_resized = tf.image.resize_images(img_raw,[227,227])
img_shape = tf.shape(img_resized)
return file_name, img_resized,img_shape
The issue here actually comes from the fact that tf.image.decode_image doesn't return the shape of the image. This was explained in these two GitHub issues: issue1, issue2.
The problem comes from the fact that tf.image.decode_image also handles .gif, which returns a 4D tensor, whereas .jpg and .png return 3D images. Therefore, the correct shape cannot be returned.
The solution is to simply use tf.image.decode_jpeg or tf.image.decode_png (both work the same and can be used on .png and .jpg images).
def _decode_image(filename):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_jpeg(image_string, channels=3)
image = tf.cast(image_decoded, tf.float32)
image_resized = tf.image.resize_images(image, [224, 224])
return image_resized
No, tf.image.resize_images can handle dynamic shape
file_queue = tf.train.string_input_producer(['./dog1.jpg'])
# shape of dog1.jpg is (720, 720)
reader = tf.WholeFileReader()
file_name, content = reader.read(file_queue)
img_raw = tf.image.decode_jpeg(content, 3) # size (?, ?, 3) <= dynamic h and w
# img_raw.set_shape([227,227,3])
img_resized = tf.image.resize_images(img_raw, [227, 227])
img_shape = tf.shape(img_resized)
with tf.Session() as sess:
print img_shape.eval() #[227, 227, 3]
BTW, I am using tf v0.12, and there is no function called tf.image.decode_image, but I don't think it is important
Of course you can use tensor object as size input for tf.image.resize_images.
So, by saying "turn images with different shapes to [227,227,3]", I suppose you don't want to lose their aspect ratio, right? To achieve this, you have to rescale the input image first, then pad the rest with zero.
It should be noted, though, you should consider perform image distortion and standardization before padding it.
# Rescale so that one side of image can fit one side of the box size, then padding the rest with zeros.
# target height is 227
# target width is 227
image = a_image_tensor_you_read
shape = tf.shape(image)
img_h = shape[0]
img_w = shape[1]
box_h = tf.convert_to_tensor(target_height)
box_w = tf.convert_to_tensor(target_width)
img_ratio = tf.cast(tf.divide(img_h, img_w), tf.float32)
aim_ratio = tf.convert_to_tensor(box_h / box_w, tf.float32)
aim_h, aim_w = tf.cond(tf.greater(img_ratio, aim_ratio),
lambda: (box_h,
tf.cast(img_h / box_h * img_w, tf.int32)),
lambda: (tf.cast(img_w / box_w * img_h, tf.int32),
box_w))
image_resize = tf.image.resize_images(image, tf.cast([aim_h, aim_w], tf.int32), align_corners=True)
# Perform image standardization and distortion
image_standardized_distorted = blablabla
image_padded = tf.image.resize_image_with_crop_or_pad(image_standardized_distorted, box_h, box_w)
return image_padded
While reading the Tensorflow implmentation of VGG model, I noticed that author performs some scaling operation for the input RGB images, such as following. I have two questions: what does VGG_MEAN
mean and how to get that setup? Secondly, why we need to subtract these mean values to get bgr
VGG_MEAN = [103.939, 116.779, 123.68]
ef build(self, rgb):
"""
load variable from npy to build the VGG
:param rgb: rgb image [batch, height, width, 3] values scaled [0, 1]
"""
start_time = time.time()
print("build model started")
rgb_scaled = rgb * 255.0
# Convert RGB to BGR
red, green, blue = tf.split(3, 3, rgb_scaled)
assert red.get_shape().as_list()[1:] == [224, 224, 1]
assert green.get_shape().as_list()[1:] == [224, 224, 1]
assert blue.get_shape().as_list()[1:] == [224, 224, 1]
bgr = tf.concat(3, [
blue - VGG_MEAN[0],
green - VGG_MEAN[1],
red - VGG_MEAN[2],
])
assert bgr.get_shape().as_list()[1:] == [224, 224, 3]
First off: the opencv code you'd use to convert RGB to BGR is:
from cv2 import cvtColor, COLOR_RGB2BGR
img = cvtColor(img, COLOR_RGB2BGR)
In your code, the code that does this is:
bgr = tf.concat(3, [
blue - VGG_MEAN[0],
green - VGG_MEAN[1],
red - VGG_MEAN[2],
])
Images aren't [Height x Width] matrices, they're [H x W x C] cubes, where C is the color channel. In RGB to BGR, you're swapping the first and third channels.
Second: you don't subtract the mean to get BGR, you do this to normalize color channel values to center around the means -- so values will be in the range of, say, [-125, 130], rather than the range of [0, 255].
See: Subtract mean from image
I wrote a python script to get the BGR channel means over all images in a directory, which might be useful to you: https://github.com/ebigelow/save-deep/blob/master/get_mean.py
mean value is from computing the average of each layer in the training data.
rgb -> bgr is for opencv issue.
The model is ported from Caffe, which I believe relies on OpenCV functionalities and uses the OpenCV convention of BGR channels.