How to Zero Pad RGB Image? - numpy

I want to Pad an RGB Image of size 500x500x3 to 512x512x3. I understand that I need to add 6 pixels on each border but I cannot figure out how. I have read numpy.pad function docs but couldn't understand how to use it. Code snippets would be appreciated.

If you need to pad 0:
RGB = np.pad(RGB, pad_width=[(6, 6),(6, 6),(0, 0)], mode='constant')
Use constant_values argument to pad different values (default is 0):
RGB = np.pad(RGB, pad_width=[(6, 6),(6, 6),(0, 0)], mode='constant', constant_values=0, constant_values=[(3,3),(5,5),(0,0)]))

We can try to get a solution by adding border padding, but it would get a bit complex. I would like to suggest you can alternate approach. First we can create a canvas of size 512x512 and then we place your original image inside this canvas. You can get help from the following code:
import numpy as np
# Create a larger black colored canvas
canvas = np.zeros(512, 512, 3)
canvas[6:506, 6:506] = your_500_500_img
Obviously you can convert 6 and 506 to a more generalized variable and use it as padding, 512-padding, etc. but this code illustrates the concept.

Related

How do I see the actual color of a single RGB value in Google Colab?

Very basic question. I have a single vector (e.g., [53, 21, 110]) and I want to print the RGB color it represents in a colab notebook. Like a color swatch. What's the simplest way to do this?
The simplest way would be using the Image module from PIL. According to the documentation, you can construct an image with:
PIL.Image.new(mode, size, color=0)
mode [required]: determines the mode used for the image, it can be RGB, RGBA, HSV, etc. You can find more modes in the docs
size [required]: this is a tuple (weight, height) that represents the dimensions of your image in pixels.
color [optional]: this is the color of the image, it can receive a tuple to represent the RGB color in your case. The default color is black.
Then, to show the image within colab, you would use
display(img)
Given your question, the mode would need to be 'RGB' and if your vector is a list, you need to convert into a tuple to use it.
To show an 300px by 300px image, the code would look like.
from PIL import Image
img = Image.new('RGB', (300,300), color = (53, 21, 110))
display(img)

Simple Captcha Solving

I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are:
I tried to the remove the noisy dots with some filters:
import cv2
import numpy as np
import pytesseract
img = cv2.imread(image_path)
_, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, np.ones((4, 4), np.uint8), iterations=1)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite('res.png', img)
print(pytesseract.image_to_string('res.png'))
Resulting tranformed images are:
Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?
Final Update:
As #Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named connectedComponentsWithStats, whichs detect connected pixels and assigns group (component) a label. By finding connected components and removing the ones with small number of pixels, I managed to get better overall detection accuracy with pytesseract.
And here are the new resulting images:
I've taken a much more direct approach to filtering ink splotches from pdf documents. I won't share the whole thing it's a lot of code, but here is the general strategy I adopted:
Use Python Pillow library to get an image object where you can manipulate pixels directly.
Binarize the image.
Find all connected pixels and how many pixels are in each group of connected pixels. You can do this using the minesweeper algorithm. Which is easy to search for.
Set some threshold value of pixels that all legitimate letters are expected to have. This will be dependent on your image resolution.
replace all black pixels in groups below the threshold with white pixels.
Convert back to image.
Your final output image is too blurry. To enhance the performance of pytesseract you need to sharpen it.
Sharpening is not as easy as blurring, but there exist a few code snippets / tutorials (e.g. http://datahacker.rs/004-how-to-smooth-and-sharpen-an-image-in-opencv/).
Rather than chaining blurs, blur once either using Gaussian or Median Blur, experiment with parameters to get the blur amount you need, perhaps try one method after the other but there is no reason to chain blurs of the same method.
There is an OCR example in python that detect the characters. Save several images and apply the filter and train a SVM algorithm. that may help you. I did trained a algorithm with even few Images but the results were acceptable. Check this link.
Wish you luck
I know the post is a bit old but I suggest you to try this library I've developed some time ago. If you have a set of labelled captchas that service would fit you. Take a look: https://github.com/punkerpunker/captcha_solver
In README there is a section "Train model on external data" that you might be interested in.

RGB to gray filter doesn't preserve the shape

I have 209 cat/noncat images and I am looking to augment my dataset. In order to do so, this is the following code I am using to convert each NumPy array of RGB values to have a grey filter. The problem is I need their dimensions to be the same for my Neural Network to work, but they happen to have different dimensions.The code:
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])
Normal Image Dimension: (64, 64, 3)
After Applying the Filter:(64,64)
I know that the missing 3 is probably the RGB Value or something,but I cannot find a way to have a "dummy" third dimension that would not affect the actual image. Can someone provide an alternative to the rgb2gray function that maintains the dimension?
The whole point of applying that greyscale filter is to reduce the number of channels from 3 (i.e. R,G and B) down to 1 (i.e. grey).
If you really, really want to get a 3-channel image that looks just the same but takes 3x as much memory, just make all 3 channels equal:
grey = np.dstack((grey, grey, grey))
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [[0.2989, 0.5870, 0.1140],[0.2989, 0.5870, 0.1140],[0.2989, 0.5870, 0.1140]])

Is there shorthand for getting the center pixels of an image?

Is there any indexing shorthand in numpy to get the center pixels of an image (or any ND array)?
Example:
cutout = xc[xc.shape[0]/2-30:xc.shape[0]/2+30,xc.shape[1]/2-30:xc.shape[1]/2+30]
I could define a function
def get_center_pixels(arr, npix):
slices = [slice(shape/2-npix,shape/2+npix) for shape in arr.shape]
return arr[slices]
cutout = get_center_pixels(xc,30)
Is there a better way or a built-in way to do this?
The closest standard function in numpy I can think of is numpy.fft.fftshift, which rolls the data along the selected axis, so that the center point now is at [0,0].

pygtk / rsvg - getting size of drawing?

Is it possible for RSVG and Cairo to find the extents of a drawing within an SVG image?
i.e. not the page width/height, but the space actually used by drawing elements.
This doesn't work, it just returns page size:
img = rsvg.Handle(file="myfile.svg")
(w, h, w2,h2) = svg.get_dimension_data() # gives document's declared size
This doesn't seem to return any information about size:
svg.render_cairo(context) # returns None
This doesn't work, it also returns the page size:
self.svg.get_pixbuf().get_width()
This is with pygtk-all-in-one-2.24.0.win32-py2.7 and RSVG 2.22.3-1_win32, in which I can't find the get_dimensions_sub() function mentioned in other answers.
I've searched the web tonight trying to solve this seemingly simple problem. There does not seem to be a simple way of getting the bounding box of the drawing with rsvg, cairo or similar tools. Unless I'm missing something obvious.
However, you can call Inkscape with the --query-all option. This gives you the dimensions of all objects in an SVG file, and the full drawing as the first entry in the list.
import subprocess
output = subprocess.check_output(["inkscape", "--query-all", "myfile.svg"])
values = [line.split(',') for line in output.split('\n')]
whole_drawing = values[0]
[x, y, width, height] = map(float, whole_drawing[1:])
Now you'll have the drawing's position in x, y and its width and height. With this, it becomes simple to use rsvg and cairo to redraw the clipped SVG to a new file.
I created a simple tool to do this, I hope the code should be rather easy to understand.