openCv and PyTorch inverser Transform not working - numpy

I have a transforms class which only does:
if transform is None:
transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor()
])
root = os.path.join(PROJECT_ROOT_DIR, "data")
super(AttributesDataset, self).__init__()
self.data = torchvision.datasets.CelebA(
root=root,
split=split,
target_type='attr',
download=True,
transform=transform
)
From the documentation, I understand that this implies just a scale-down of values in the range 0,1 ie all pixel values shall lie between [0,1] (I have verified this as well).
I want to visualize some of the outputs coming from the model. As such, I created a simple method which does:-
for img, label in dataloader:
img.squeeze_(0)
# permute the channels. cv2 expects image in format (h, w, c)
unscaled_img = img.permute(1, 2, 0)
# move images to cpu and convert to numpy as required by cv2 library
unscaled_img = torch.round(unscaled_img * 255)
unscaled_img = unscaled_img.to(torch.uint8)
# unscaled_img = np.rint(unscaled_img * 255).astype(np.uint8)
unscaled_img = cv2.cvtColor(unscaled_img, cv2.COLOR_RGB2BGR)
cv2.imshow(unscaled_img.numpy())
However, all the images that are created have an unusually blue shade. For instance,
Can someone please tell me what exactly am I doing wrong here? Your help would be highly appreciated

Solved by #LajosArpad comment. The culprit was
unscaled_img = cv2.cvtColor(unscaled_img, cv2.COLOR_RGB2BGR)
Removing it resulted in correct values.

Related

How to cut and paste a part of an image randomly to a different location using tensorflow?

I am trying to implement a custom layer in keras.layers where I want to do a custom image augmentation. My idea is to cut out a part of an image from a random location and paste it to a different random location in the same image. The code below I have written works well for PIL Image but when I integrate it into my final code (which is a tensorflow model), I get as error saying that tensor doesn't support item assignment.
Below is the class that I have implemented:
class Cut_Paste(layers.Layer):
def __init__(self, x_scale = 10, y_scale = 10, IMG_SIZE = (224,224), **kwargs):
super().__init__(**kwargs)
"""
defining the x span and the y span of the box to cutout
x_scale and y_scale are taken as inputs as % of the width and height of the image
size
"""
self.size_x, self.size_y = IMG_SIZE
self.span_x = int(x_scale*self.size_x*0.01)
self.span_y = int(y_scale*self.size_y*0.01)
#getting the vertices for cut and paste
def get_vertices(self):
#determining random points for cut and paste
""" since the images in the dataset have the object of interest in the center of
the Image, the cutout will be taken from the central 25% of the image"""
fraction = 0.25
vert_x = random.randint(int(self.size_x*0.5*(1-fraction)),
int(self.size_x*0.5*(1+fraction)))
vert_y = random.randint(int(self.size_y*0.5*(1-fraction)),
int(self.size_y*0.5*(1+fraction)))
start_x = int(vert_x-self.span_x/2)
start_y = int(vert_y-self.span_y/2)
end_x = int(vert_x+self.span_x/2)
end_y = int(vert_y+self.span_y/2)
return start_x, start_y, end_x, end_y
def call(self, image):
#getting random vertices for cutting
cut_start_x, cut_start_y, cut_end_x, cut_end_y = self.get_vertices()
#getting the image as a sub-image
#image = tf.Variable(image)
sub_image = image[cut_start_x:cut_end_x,cut_start_y:cut_end_y,:]
#getting random vertices for pasting
paste_start_x, paste_start_y, paste_end_x, paste_end_y = self.get_vertices()
#replacing a part of the image at random location with sub_image
image[paste_start_x:paste_end_x,
paste_start_y:paste_end_y,:] = sub_image
return image
I am calling it from my model class this way:
class Contrastive_learning_model(keras.Model):
def __init__(self):
super().__init__()
self.cut_paste = Cut_Paste(**cut_paste_augmentation)
def train_step(self, data):
augmented_images_2 = self.cut_paste.call(images)
I have removed the part of the code which is irrelevant. But upon executing this is the error I get:
TypeError: 'tensorflow.python.framework.ops.EagerTensor' object does not support item assignment
I understood from other sources that it is not possible to do item assignment in tensor. So here I am seeking help to do this in an easier way. I need to use tensors for this. Any help will be much appreciated.
Tensorflow does not support item assignment unlike PyTorch.
A workaround you can implement is to convert the tensor to tf.Variable and then a numpy array like the following:
image = tf.Variable(image).numpy()

Input array must have a shape == (..., 3)), got (299, 299, 4)

I am using a pretrained resnet50 model to validate some classes. I am using LIME to test how the model is testing this data as well. However, some of the images are not RGB and may be different formats, and I noticed that RGB arrays are value 3 instead of other numbers (like 4). I am using skimage to preprocess the images and test it with LIME. Any suggestions on how I can fix this with skimage and tensorflow? I am using panda dataframes to collect the images and train and test generators to see if the model is able to guess correctly.
code:
def transform_img_fn_ori(url):
img = skimage.io.imread(url)
img = skimage.transform.resize(img, (299,299))
img = (img - 0.5)*2
img = np.expand_dims(img, axis=0)
return img
url="" #this is a path on pc
images=transform_img_fn_ori(url)
explanation= explainer.explain_instance(images[0].astype('double'), model.predict, top_labels=3, hide_color=0, num_samples=1000)
temp_1, mask_1 = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=True, num_features=5, hide_rest=True)
temp_2, mask_2 = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=False, num_features=10, hide_rest=False)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15,15))
ax1.imshow(mark_boundaries(temp_1, mask_1))
ax2.imshow(mark_boundaries(temp_2, mask_2))
ax1.axis('off')
ax2.axis('off')
Your model expects RGB images and your url may point to non-RGB images.
In this situation the best is to make sure images are read in RGB. For instance, OpenCV reads images always in BGR by default.
In skimage, you can't ensure the format being read, it can be grayscale, RGB or RGBA according to docs.
In addition to this, skimage doesn't provide a single method to convert any image to RGB, like convert method in in Pillow. So, you need to guess which is your color mode and convert it to RGB.
img = skimage.io.imread(url)
if img.ndim == 2 or (img.ndim==3 and img.shape[2] ==1):
# your image is in grayscale
img = skimage.color.gray2rgb(img)
elif img.ndim==3 and img.shape[2] == 4:
# your image is in RGBA
img = skimage.color.rgba2rgb(img)
else:
# your image is already in RGB
assert img.ndim == 3 and img.shape[2] == 3
The last assert is to make sure everything is ok.
Finally, probably not your case, but images may contain any number of channels and other space colors than RGB. That's why I don't like skimage and prefer OpenCV. So, whatever method you using to read images, check out the docs to make sure what does it returns: it is impossible in some cases to distinguish, like between RGB and BGR for instance.

Receiving the same (not random) augmentations of image dataset

dataset = tf.data.Dataset.range(1, 6)
def aug(y):
x = np.random.uniform(0,1)
if x > 0.5:
y = 100
return y
dataset = dataset.map(aug)
print(list(dataset))
Run this code, then all the elements in the dataset are as they were, or all equal to 100. How do I make it so each element is individually transformed?
My more specific question below is basically asking this
I create my segmentation training set by:
dataset = tf.data.Dataset.from_tensor_slices((image_paths, mask_paths))
I then apply my augmentation function to the dataset:
def augment(image_path, mask_path)):
//use tf.io.read_file and tf.io.decode_jpeg to convert paths to tensors
x = np.random.choice([0,1])
if x == 1:
image = tf.image.flip_up_down(image)
mask = tf.image.flip_up_down(mask)
return image, mask
training_dataset = dataset.map(augment)
BATCH_SIZE=2
training_dataset = training_dataset.shuffle(100, reshuffle_each_iteration=True)
training_dataset = training_dataset.batch(BATCH_SIZE)
training_dataset = training_dataset.repeat()
training_dataset = training_dataset.prefetch(-1)
However when I visualise my training dataset, all the images have same flip applied- the are all either flipped upside down or not flipped. Where as I'm expecting them to have different flips- some upside down and some not.
Why is this happening?
You need to use tensorflow operations (not numpy or normal python) because tf.data.Dataset.map() executes the mapped function as a graph. When converting a function to a graph, numpy and base python are converted to constants. The augmentation function is only running np.random.uniform(0,1) once and storing it as a constant.
Note that irrespective of the context in which map_func is defined (eager vs. graph), tf.data traces the function and executes it as a graph.
The source for the above is here.
One solution is to use tensorflow operations. I have included an example below. Note that the y value in the if has to be cast to the same dtype as the input.
dataset = tf.data.Dataset.range(1, 6)
def aug(y):
x = tf.random.uniform([], 0, 1)
if x > 0.5:
y = tf.cast(100, y.dtype)
return y
dataset = dataset.map(aug)
print(list(dataset))
You can use a uniform random function or other probability distribution
tf.random.uniform(
shape, minval=0, maxval=None, dtype=tf.dtypes.float32, seed=None, name=None
)
even you can use prebuild method in TensorFlow or Keras for fliping
tf.keras.layers.experimental.preprocessing.RandomFlip(
mode=HORIZONTAL_AND_VERTICAL, seed=None, name=None, **kwargs
)

About use tf.image.crop_and_resize

I'm working on the ROI pooling layer which work for fast-rcnn and I am used to use tensorflow. I found tf.image.crop_and_resize can act as the ROI pooling layer.
But I try many times and cannot get the result that I expected.Or did the true result is exactly what I got?
here is my code
import cv2
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
img_path = r'F:\IMG_0016.JPG'
img = cv2.imread(img_path)
img = img.reshape([1,580,580,3])
img = img.astype(np.float32)
#img = np.concatenate([img,img],axis=0)
img_ = tf.Variable(img) # img shape is [580,580,3]
boxes = tf.Variable([[100,100,300,300],[0.5,0.1,0.9,0.5]])
box_ind = tf.Variable([0,0])
crop_size = tf.Variable([100,100])
#b = tf.image.crop_and_resize(img,[[0.5,0.1,0.9,0.5]],[0],[50,50])
c = tf.image.crop_and_resize(img_,boxes,box_ind,crop_size)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
a = c.eval(session=sess)
plt.imshow(a[0])
plt.imshow(a[1])
And I handed in my origin img and result:a0,a1
if I was wrong can anyone teach me how to use this function? thanks.
Actually, there's no problem with Tensorflow here.
From the doc of tf.image.crop_and_resize (emphasis is mine) :
boxes: A Tensor of type float32. A 2-D tensor of shape [num_boxes, 4].
The i-th row of the tensor specifies the coordinates of a box in the
box_ind[i] image and is specified in normalized coordinates [y1, x1,
y2, x2]. A normalized coordinate value of y is mapped to the image
coordinate at y * (image_height - 1), so as the [0, 1] interval of
normalized image height is mapped to [0, image_height - 1] in image
height coordinates. We do allow y1 > y2, in which case the sampled
crop is an up-down flipped version of the original image. The width
dimension is treated similarly. Normalized coordinates outside the [0,
1] range are allowed, in which case we use extrapolation_value to
extrapolate the input image values.
The boxes argument needs normalized coordinates. That's why you get a black box with your first set of coordinates [100,100,300,300] (not normalized, and no extrapolation value provided), and not with your second set [0.5,0.1,0.9,0.5].
However, as that why matplotlib show you gibberish on your second attempt, it's just because you're using the wrong datatype.
Quoting the matplotlib documentation of plt.imshow (emphasis is mine):
All values should be in the range [0 .. 1] for floats or [0 .. 255]
for integers. Out-of-range values will be clipped to these bounds.
As you're using float outside the [0,1] range, matplotlib is bounding your values to 1. That's why you get those colored pixels (either solid red, solid green or solid blue, or a mixing of these). Cast your array to uint_8 to get an image that make sense.
plt.imshow( a[1].astype(np.uint8))
Edit :
As requested, I will dive a bit more into
tf.image.crop_and_resize.
[When providing non normalized coordinates and no extrapolation values], why I just get a blank result?
Quoting the doc :
Normalized coordinates outside the [0, 1] range are allowed, in which
case we use extrapolation_value to extrapolate the input image values.
So, normalized coordinates outside [0,1] are allowed. But they still need to be normalized !
With your example, [100,100,300,300], the coordinates you provide makes the red square. Your original image is the little green dot in the upper left corner! The default value of the argument extrapolation_value is 0, so the values outside the frame of the original image are inferred as [0,0,0] hence the black.
But if your usecase needs another value, you can provide it. The pixels will take a RGB value of extrapolation_value%256 on each channel. This option is useful if the zone you need to crop is not fully included in you original images. (A possible usecase would be sliding windows for example).
It seems that tf.image.crop_and_resize expects pixel values in the range [0,1].
Changing your code to
test = tf.image.crop_and_resize(image=image_np_expanded/255., ...)
solved the problem for me.
Yet another variant is to use tf.central_crop function.
Below is a concrete implementation of the tf.image.crop_and_resize API. tf version 1.14
import tensorflow as tf
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
tf.enable_eager_execution()
def single_data_2(img_path):
img = tf.read_file(img_path)
img = tf.image.decode_bmp(img,channels=1)
img_4d = tf.expand_dims(img, axis=0)
processed_img = tf.image.crop_and_resize(img_4d,boxes=
[[0.4529,0.72,0.4664,0.7358]],crop_size=[64,64],box_ind=[0])
processed_img_2 = tf.squeeze(processed_img,0)
raw_img_3 = tf.squeeze(img_4d,0)
return raw_img_3, processed_img_2
def plot_two_image(raw,processed):
fig=plt.figure(figsize=(35,35))
raw_ = fig.add_subplot(1,2,1)
raw_.set_title('Raw Image')
raw_.imshow(raw,cmap='gray')
processed_ = fig.add_subplot(1,2,2)
processed_.set_title('Processed Image')
processed_.imshow(processed,cmap='gray')
img_path = 'D:/samples/your_bmp_image.bmp'
raw_img, process_img = single_data_2(img_path)
print(raw_img.dtype,process_img.dtype)
print(raw_img.shape,process_img.shape)
raw_img=tf.squeeze(raw_img,-1)
process_img=tf.squeeze(process_img,-1)
print(raw_img.dtype,process_img.dtype)
print(raw_img.shape,process_img.shape)
plot_two_image(raw_img,process_img)
Below is my working code, also output image is not black, this can be of help to someone
for idx in range(len(bboxes)):
if bscores[idx] >= Threshold:
#Region of Interest
y_min = int(bboxes[idx][0] * im_height)
x_min = int(bboxes[idx][1] * im_width)
y_max = int(bboxes[idx][2] * im_height)
x_max = int(bboxes[idx][3] * im_width)
class_label = category_index[int(bclasses[idx])]['name']
class_labels.append(class_label)
bbox.append([x_min, y_min, x_max, y_max, class_label, float(bscores[idx])])
#Crop Image - Working Code
cropped_image = tf.image.crop_to_bounding_box(image, y_min, x_min, y_max - y_min, x_max - x_min).numpy().astype(np.int32)
# encode_jpeg encodes a tensor of type uint8 to string
output_image = tf.image.encode_jpeg(cropped_image)
# decode_jpeg decodes the string tensor to a tensor of type uint8
#output_image = tf.image.decode_jpeg(output_image)
score = bscores[idx] * 100
file_name = tf.constant(OUTPUT_PATH+image_name[:-4]+'_'+str(idx)+'_'+class_label+'_'+str(round(score))+'%'+'_'+os.path.splitext(image_name)[1])
writefile = tf.io.write_file(file_name, output_image)

Color map an image with TensorFlow?

I'm saving grayscale images in TFRecord files. The idea then was to color map them on my GPU (only using TF of course) so they get three channels (They are going to be used on a pre-trained VGG-16 model so they have to have three channels).
Does anyone have any idea how to this properly?
I tried to do it with my homemade TF color mapping script, using for-loops, tf.scatter_nd and a mapping array with shape = (256,3)... but it took forever.
EDIT:
img_rgb = GRAY SCALE IMAGE WITH 3 CHANNELS
cmp = [[255,255,255],
[255,255,253],
[255,254,250],
[255,254,248],
[255,254,245],
...
[4,0,0],
[0,0,0]]
cmp = tf.convert_to_tensor(cmp, tf.int32) # (256, 3)
hot = tf.zeros([224,224,3], tf.int32)
for i in range(img_rgb.shape[2]):
for j in range(img_rgb.shape[1]):
for k in range(img_rgb.shape[0]):
indices = tf.constant([[k,j,i]])
updates = tf.Variable([cmp[img_rgb[k,j,i],i]])
shape = tf.constant([256, 3])
hot = tf.scatter_nd(indices, updates, shape)
This was my attempt, I know it's not optimal in any way, but It was the only solution I could come up with.
Thanks work by jimfleming, https://gist.github.com/jimfleming/c1adfdb0f526465c99409cc143dea97b
import matplotlib
import matplotlib.cm
import tensorflow as tf
def colorize(value, vmin=None, vmax=None, cmap=None):
"""
A utility function for TensorFlow that maps a grayscale image to a matplotlib
colormap for use with TensorBoard image summaries.
Arguments:
- value: 2D Tensor of shape [height, width] or 3D Tensor of shape
[height, width, 1].
- vmin: the minimum value of the range used for normalization.
(Default: value minimum)
- vmax: the maximum value of the range used for normalization.
(Default: value maximum)
- cmap: a valid cmap named for use with matplotlib's `get_cmap`.
(Default: 'gray')
Example usage:
```
output = tf.random_uniform(shape=[256, 256, 1])
output_color = colorize(output, vmin=0.0, vmax=1.0, cmap='plasma')
tf.summary.image('output', output_color)
```
Returns a 3D tensor of shape [height, width, 3].
"""
# normalize
vmin = tf.reduce_min(value) if vmin is None else vmin
vmax = tf.reduce_max(value) if vmax is None else vmax
value = (value - vmin) / (vmax - vmin) # vmin..vmax
# squeeze last dim if it exists
value = tf.squeeze(value)
# quantize
indices = tf.to_int32(tf.round(value * 255))
# gather
cm = matplotlib.cm.get_cmap(cmap if cmap is not None else 'gray')
colors = tf.constant(cm.colors, dtype=tf.float32)
value = tf.gather(colors, indices)
return value
You could also try tf.image.grayscale_to_rgb, although there seems to be only one choice of color map, gray.
We're here to help. If everyone wrote optimal code, there would be no need for Stackoverflow. :)
Here's how I would do it in place of the last 7 lines (untested code):
conv_img = tf.gather( params = cmp,
indices = img_rgb[ :, :, 0 ] )
Basically, no need for the for loops, Tensorflow will do that for you, and much quicker. tf.gather() will collect elements from cmp according to the indices provided, which here would be the 0th channel of img_rgb. Each collected element will have the three channels from cmp so when you put them all together, it will form an image.
I don't have time to test right now, gotta run, sorry. Hope it works.