Why resizing dataset images before CNN since it stretches them? - tensorflow

I initialize my dataset using the following function (simplified):
WIDTH = ...
HEIGHT = ...
def load_data(dataset_path):
images = []
labels = []
for all_images:
image = cv2.imread(pimage_path)
image = cv2.resize(image, (WIDTH, HEIGHT)) #???
labels.add(corresponding_label)
return (np.array(images).reshape(-1, WIDTH, HEIGHT, 3) / 255, np.array(labels))
In the tutorials I watched, people resize the input images to (WIDTH, HEIGHT). But this proceeds to stretch the images. I don't understand why we have to do that, because in the model I'm using the input images are applied a convolution. So I tried to not resize the input images but I got an error during the reshape process at the end of my function.
What am I missing?

You aren't limited to stretching the image, perhaps you could either crop the image or add a bufferzone with a consistent color, although if you can afford to crop the images that'd be more convenient but still you can just fill the rest of the space with a fixed color, the model would not care less.

What kind of error did you get when reshaping? Chances are that if you do not reshape the image you cannot later on resize the numpy array to WIDTH, HEIGHT. In that case, you must change the value of WIDTH and HEIGHT.

Related

Input array must have a shape == (..., 3)), got (299, 299, 4)

I am using a pretrained resnet50 model to validate some classes. I am using LIME to test how the model is testing this data as well. However, some of the images are not RGB and may be different formats, and I noticed that RGB arrays are value 3 instead of other numbers (like 4). I am using skimage to preprocess the images and test it with LIME. Any suggestions on how I can fix this with skimage and tensorflow? I am using panda dataframes to collect the images and train and test generators to see if the model is able to guess correctly.
code:
def transform_img_fn_ori(url):
img = skimage.io.imread(url)
img = skimage.transform.resize(img, (299,299))
img = (img - 0.5)*2
img = np.expand_dims(img, axis=0)
return img
url="" #this is a path on pc
images=transform_img_fn_ori(url)
explanation= explainer.explain_instance(images[0].astype('double'), model.predict, top_labels=3, hide_color=0, num_samples=1000)
temp_1, mask_1 = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=True, num_features=5, hide_rest=True)
temp_2, mask_2 = explanation.get_image_and_mask(explanation.top_labels[0], positive_only=False, num_features=10, hide_rest=False)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15,15))
ax1.imshow(mark_boundaries(temp_1, mask_1))
ax2.imshow(mark_boundaries(temp_2, mask_2))
ax1.axis('off')
ax2.axis('off')
Your model expects RGB images and your url may point to non-RGB images.
In this situation the best is to make sure images are read in RGB. For instance, OpenCV reads images always in BGR by default.
In skimage, you can't ensure the format being read, it can be grayscale, RGB or RGBA according to docs.
In addition to this, skimage doesn't provide a single method to convert any image to RGB, like convert method in in Pillow. So, you need to guess which is your color mode and convert it to RGB.
img = skimage.io.imread(url)
if img.ndim == 2 or (img.ndim==3 and img.shape[2] ==1):
# your image is in grayscale
img = skimage.color.gray2rgb(img)
elif img.ndim==3 and img.shape[2] == 4:
# your image is in RGBA
img = skimage.color.rgba2rgb(img)
else:
# your image is already in RGB
assert img.ndim == 3 and img.shape[2] == 3
The last assert is to make sure everything is ok.
Finally, probably not your case, but images may contain any number of channels and other space colors than RGB. That's why I don't like skimage and prefer OpenCV. So, whatever method you using to read images, check out the docs to make sure what does it returns: it is impossible in some cases to distinguish, like between RGB and BGR for instance.

Rotating image using rotation matrix in Python

So I am trying to create a code that can rotate an image counterclockwise using Python by implementing the rotation matrix. This code is supposed to rotate the image counterclockwise, but why does it rotate the picture in a clockwise motion?
import math
import numpy as np
from PIL import Image
img = Image.open('squidward.jpg')
Im = np.array(img)
angle = 30
# Define the most occuring variables
angle=math.radians(angle) #converting degrees to radians
cosine=math.cos(angle)
sine=math.sin(angle)
height=Im.shape[0] #define the height of the image
width=Im.shape[1] #define the width of the image
# Define the height and width of the new image that is to be formed
new_height = round(abs(Im.shape[0]*cosine)+abs(Im.shape[1]*sine))+1
new_width = round(abs(Im.shape[1]*cosine)+abs(Im.shape[0]*sine))+1
# define another image variable of dimensions of new_height and new _column filled with zeros
Rot_Im=np.zeros((new_height,new_width,Im.shape[2]))
# Find the centre of the image about which we have to rotate the image
original_centre_height = round(((Im.shape[0]+1)/2)-1) #with respect to the original image
original_centre_width = round(((Im.shape[1]+1)/2)-1) #with respect to the original image
# Find the centre of the new image that will be obtained
new_centre_height= round((((new_height)+1)/2)-1) #with respect to the new image
new_centre_width= round((((new_width)+1)/2)-1) #with respect to the new image
for i in range(height):
for j in range(width):
#co-ordinates of pixel with respect to the centre of original image
y0=Im.shape[0]-1-i-original_centre_height
x0=Im.shape[1]-1-j-original_centre_width
#co-ordinate of pixel with respect to the rotated image
new_y0=round(x0*sine+y0*cosine)
new_x0=round(x0*cosine-y0*sine)
'''since image will be rotated the centre will change too,
so to adust to that we will need to change new_x and new_y with respect to the new centre'''
new_y0=new_centre_height-new_y0
new_x0=new_centre_width-new_x0
# adding if check to prevent any errors in the processing
if 0 <= new_x0 < new_width and 0 <= new_y0 < new_height and new_x0>=0 and new_y0>=0:
Rot_Im[new_y0,new_x0,:]=Im[i,j,:] #writing the pixels to the new destination in the output image
pil_img=Image.fromarray((Rot_Im).astype(np.uint8)) # converting array to image
pil_img.save("rotated_image.png") # saving the image
Do -30 for counterclockwise. I think you will get the answer but it is too late i suppose

How to numpy.tile a PIL image without changing original size?

I'm trying to speed up tiling a PIL image converted to a NumPy array, without changing the size of the image.
The input is an image of x,y dimensions and the output is an image of same x, y dimensions but with the image inside tiled.
This is what I used to did first without numpy:
import numpy
from PIL import Image
def tile_image(texture, texture_tiling = (5, 5)):
#texture is a PIL image, for e.g. Image.open(filename)
width, height = texture.size
tile = texture.copy()
tiled_texture = Image.new('RGBA', (width*texture_tiling[0], height*texture_tiling[1]))
for x in range(texture_tiling[0]):
for y in range(texture_tiling[1]):
x_ = width*x
y_ = height*y
tiled_texture.paste(tile, (x_, y_))
tiled_texture = tiled_texture.resize(texture.size, Image.BILINEAR)
return tiled_texture
This is the function with numpy:
def tile_image(texture, texture_tiling = (5, 5)):
tile = numpy.array(texture.copy())
tile = numpy.tile(tile, (texture_tiling[1], texture_tiling[0], 1))
tile = Image.fromarray(tile)
tile = tile.resize(texture.size, Image.BILINEAR)
return tile
The problem with both of these is that it requires increasing the image size before resizing them, but this becomes difficult with higher resolution textures. But trying to use a for loop and replacing pixels at [x, y] with [(texture_tiling[0]*x)%width, (texture_tiling[1]*y)%height] is way too slow using a regular for loop. What do I do to speed up the above pixel operation?
NOTE: I don't try resizing the tile to be smaller than paste on an empty layer, because the tiling could be odd and mess up the tile size.

Python OpenCV Duplicate a transparent shape in the same image

I have an image of a circle, refer to the image attached below. I already retrieved the transparent circle and want to paste that circle back to the image to make some overlapped circles.
Below is my code but it led to the problem A, it's like a (transparent) hole in the image. I need to have circles on normal white background.
height, width, channels = circle.shape
original_image[60:60+height, 40:40+width] = circle
I used cv2.addWeighted but got blending issue, I need clear circles
circle = cv2.addWeighted(original_image[60:60+height, 40:40+width],0.5,circle,0.5,0)
original_image[60:60+rows, 40:40+cols] = circle
If you already have a transparent black circle, then in Python/OpenCV here is one way to do that.
- Read the transparent image unchanged
- Extract the bgr channels and the alpha channel
- Create a colored image of the background color and size desired
- Create similar sized white and black images
- Initialize a copy of the background color image for the output
- Define a list offset coordinates in the larger image
- Loop for over the list of offsets and do the following
- Insert the bgr image into a copy of the white image as the base image
- Insert the alpha channel into a copy of the black image for a mask
- composite the initialized output and base images using the mask image
- When finished with the loop, save the result
Input (transparent):
import cv2
import numpy as np
# load image with transparency
img = cv2.imread('black_circle_transp.png', cv2.IMREAD_UNCHANGED)
height, width = img.shape[:2]
print(img.shape)
# extract the bgr channels and the alpha channel
bgr = img[:,:,0:3]
aa = img[:,:,3]
aa = cv2.merge([aa,aa,aa])
# create whatever color background you want, in this case white
background=np.full((500,500,3), (255,255,255), dtype=np.float64)
# create white image of the size you want
white=np.full((500,500,3), (255,255,255), dtype=np.float64)
# create black image of the size you want
black=np.zeros((500,500,3), dtype=np.float64)
# initialize output
result = background.copy()
# define top left corner x,y locations for circle offsets
xy_offsets = [(100,100), (150,150), (200,200)]
# insert bgr and alpha into white and black images respectively of desired output size and composite
for offset in xy_offsets:
xoff = offset[0]
yoff = offset[1]
base = white.copy()
base[yoff:height+yoff, xoff:width+xoff] = bgr
mask = black.copy()
mask[yoff:height+yoff, xoff:width+xoff] = aa
result = (result * (255-mask) + base * mask)/255
result = result.clip(0,255).astype(np.uint8)
# save resulting masked image
cv2.imwrite('black_circle_composite.png', result)
# display result, though it won't show transparency
cv2.imshow("image", img)
cv2.imshow("aa", aa)
cv2.imshow("bgr", bgr)
cv2.imshow("result", result)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result:

Data augmentation in the object detection API: random_image_scale

I am trying to use the data augmentation features of the object detection API, specifically random_image_scale.
Digging a bit I found the function implementing it (pasted below). I am missing something or the ground truth of the boxes is not treated here? I have looked around and did not find anything. If the ground truth is not modified accordingly to the scaling done to the image, it will mess up with the model being trained, won't it?
Please let me know if I am missing something or I should avoid this feature to train my network.
The file is /object_detection/core/preprocessor.py
def random_image_scale(image,
masks=None,
min_scale_ratio=0.5,
max_scale_ratio=2.0,
seed=None):
"""Scales the image size.
Args:
image: rank 3 float32 tensor contains 1 image -> [height, width, channels].
masks: (optional) rank 3 float32 tensor containing masks with
size [height, width, num_masks]. The value is set to None if there are no
masks.
min_scale_ratio: minimum scaling ratio.
max_scale_ratio: maximum scaling ratio.
seed: random seed.
Returns:
image: image which is the same rank as input image.
masks: If masks is not none, resized masks which are the same rank as input
masks will be returned.
"""
with tf.name_scope('RandomImageScale', values=[image]):
result = []
image_shape = tf.shape(image)
image_height = image_shape[0]
image_width = image_shape[1]
size_coef = tf.random_uniform([],
minval=min_scale_ratio,
maxval=max_scale_ratio,
dtype=tf.float32, seed=seed)
image_newysize = tf.to_int32(
tf.multiply(tf.to_float(image_height), size_coef))
image_newxsize = tf.to_int32(
tf.multiply(tf.to_float(image_width), size_coef))
image = tf.image.resize_images(
image, [image_newysize, image_newxsize], align_corners=True)
result.append(image)
if masks:
masks = tf.image.resize_nearest_neighbor(
masks, [image_newysize, image_newxsize], align_corners=True)
result.append(masks)
return tuple(result)
If you are using a tfrecord file, the box boundaries are not absolute pixels, but relative percentages. so if you scale the image, the boxes stay the same.
So using that should be fine.