pygtk / rsvg - getting size of drawing? - pygtk

Is it possible for RSVG and Cairo to find the extents of a drawing within an SVG image?
i.e. not the page width/height, but the space actually used by drawing elements.
This doesn't work, it just returns page size:
img = rsvg.Handle(file="myfile.svg")
(w, h, w2,h2) = svg.get_dimension_data() # gives document's declared size
This doesn't seem to return any information about size:
svg.render_cairo(context) # returns None
This doesn't work, it also returns the page size:
self.svg.get_pixbuf().get_width()
This is with pygtk-all-in-one-2.24.0.win32-py2.7 and RSVG 2.22.3-1_win32, in which I can't find the get_dimensions_sub() function mentioned in other answers.

I've searched the web tonight trying to solve this seemingly simple problem. There does not seem to be a simple way of getting the bounding box of the drawing with rsvg, cairo or similar tools. Unless I'm missing something obvious.
However, you can call Inkscape with the --query-all option. This gives you the dimensions of all objects in an SVG file, and the full drawing as the first entry in the list.
import subprocess
output = subprocess.check_output(["inkscape", "--query-all", "myfile.svg"])
values = [line.split(',') for line in output.split('\n')]
whole_drawing = values[0]
[x, y, width, height] = map(float, whole_drawing[1:])
Now you'll have the drawing's position in x, y and its width and height. With this, it becomes simple to use rsvg and cairo to redraw the clipped SVG to a new file.
I created a simple tool to do this, I hope the code should be rather easy to understand.

Related

Simple Captcha Solving

I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are:
I tried to the remove the noisy dots with some filters:
import cv2
import numpy as np
import pytesseract
img = cv2.imread(image_path)
_, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, np.ones((4, 4), np.uint8), iterations=1)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite('res.png', img)
print(pytesseract.image_to_string('res.png'))
Resulting tranformed images are:
Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?
Final Update:
As #Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named connectedComponentsWithStats, whichs detect connected pixels and assigns group (component) a label. By finding connected components and removing the ones with small number of pixels, I managed to get better overall detection accuracy with pytesseract.
And here are the new resulting images:
I've taken a much more direct approach to filtering ink splotches from pdf documents. I won't share the whole thing it's a lot of code, but here is the general strategy I adopted:
Use Python Pillow library to get an image object where you can manipulate pixels directly.
Binarize the image.
Find all connected pixels and how many pixels are in each group of connected pixels. You can do this using the minesweeper algorithm. Which is easy to search for.
Set some threshold value of pixels that all legitimate letters are expected to have. This will be dependent on your image resolution.
replace all black pixels in groups below the threshold with white pixels.
Convert back to image.
Your final output image is too blurry. To enhance the performance of pytesseract you need to sharpen it.
Sharpening is not as easy as blurring, but there exist a few code snippets / tutorials (e.g. http://datahacker.rs/004-how-to-smooth-and-sharpen-an-image-in-opencv/).
Rather than chaining blurs, blur once either using Gaussian or Median Blur, experiment with parameters to get the blur amount you need, perhaps try one method after the other but there is no reason to chain blurs of the same method.
There is an OCR example in python that detect the characters. Save several images and apply the filter and train a SVM algorithm. that may help you. I did trained a algorithm with even few Images but the results were acceptable. Check this link.
Wish you luck
I know the post is a bit old but I suggest you to try this library I've developed some time ago. If you have a set of labelled captchas that service would fit you. Take a look: https://github.com/punkerpunker/captcha_solver
In README there is a section "Train model on external data" that you might be interested in.

How to Zero Pad RGB Image?

I want to Pad an RGB Image of size 500x500x3 to 512x512x3. I understand that I need to add 6 pixels on each border but I cannot figure out how. I have read numpy.pad function docs but couldn't understand how to use it. Code snippets would be appreciated.
If you need to pad 0:
RGB = np.pad(RGB, pad_width=[(6, 6),(6, 6),(0, 0)], mode='constant')
Use constant_values argument to pad different values (default is 0):
RGB = np.pad(RGB, pad_width=[(6, 6),(6, 6),(0, 0)], mode='constant', constant_values=0, constant_values=[(3,3),(5,5),(0,0)]))
We can try to get a solution by adding border padding, but it would get a bit complex. I would like to suggest you can alternate approach. First we can create a canvas of size 512x512 and then we place your original image inside this canvas. You can get help from the following code:
import numpy as np
# Create a larger black colored canvas
canvas = np.zeros(512, 512, 3)
canvas[6:506, 6:506] = your_500_500_img
Obviously you can convert 6 and 506 to a more generalized variable and use it as padding, 512-padding, etc. but this code illustrates the concept.

Using machine learning to remove background in image of hand-written signature

I am new to machine learning.
I want to prepare a document with a signature at the bottom of it.
For this purpose I am taking a photo of the user's signature for placement in the document.
How can I using machine learning extract only the signature part from the image and place it on the document?
Input example:
Output expected in gif format:
Extract the green image plane. Then take the complementary of the gray value of every pixel as the transparency coefficient. Then you can perform compositing to the destination.
https://en.wikipedia.org/wiki/Alpha_compositing
A simple image-processing technique using OpenCV should work. The idea is to obtain a binary image then bitwise-and the image to remove the non-signature details. Here's the results:
Input image
Binary image
Result
Code
import cv2
# Load image, convert to grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Bitwise-and and color background white
result = cv2.bitwise_and(image, image, mask=thresh)
result[thresh==0] = [255,255,255]
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
Please do research before posting questions like these. A simple google search of "extract signature from image python" gave so many results.
Git Repo
Git Repo
Stack Overflow
There are many other such alternatives. Please have a look and try a few approaches.
If you still have some questions or doubts, then post the approach you have taken and a discussion is warranted.

How do I save color mapped array of same dimensions of the original array?

I have data that I would like to save as png's. I need to keep the exact pixel dimensions - I don't want any inter-pixel interpolation, smoothing, or up/down sizing, etc. I do want to use a colormap, though (and mayber some other features of matplotlib's imshow). As I see it there are a couple ways I could do this:
1) Manually roll my own colormapping. (I'd rather not do this)
2) Figure out how to make sure the pixel dimenensions of the image in the figure produced by imshow are exactly correct, and then extract just the image portion of the figure for saving.
3) Use some other method which will directly give me a color mapped array (i.e. my NxN grayscale array -> NxNx3 array, using one of matplotlibs colormaps). Then save it using another png save method such as scipy.misc.imsave.
How can I do one of the above? (Or another alternate)
My problem arose when I was just saving the figure directly using savefig, and realized that I couldn't zoom into details. Upscaling wouldn't solve the problem, since the blurring between pixels is exactly one of the things I'm looking for - and the pixel size has a physical meaning.
EDIT:
Example:
import numpy as np
import matplotlib.pyplot as plt
X,Y = np.meshgrid(np.arange(-50.0,50,.1), np.arange(-50.0,50,.1))
Z = np.abs(np.sin(2*np.pi*(X**2+Y**2)**.5))/(1+(X/20)**2+(Y/20)**2)
plt.imshow(Z,cmap='inferno', interpolation='nearest')
plt.savefig('colormapeg.png')
plt.show()
Note zooming in on the interactive figure gives you a very different view then trying to zoom in on the saved figure. I could up the resolution of the saved figure - but that has it's own problems. I really just need the resolution fixed.
It seems you are looking for plt.imsave().
In this case,
plt.imsave("filename.png", Z, cmap='inferno')

TCPDF - Cropping polygons

I'm using TCPDF::Polygon() to render coastline (land) coordinates from a text file on top of a blue TCPDF::Rect(). The text file contains coastlines for the entire world, however by specifying a center latitude and longitude in the map projection, together with some multiplication to get a 'zooming' effect, I manage to display the desired area within the A4 page.
Problem:
As you can see by the image the coastlines are drawn all the way to the edge of the document (and beyond). Although most of the coastline coordinates from the text file are 'outside' the document's visible area they are still taking up some hundred kilobytes in the output file.
Is there a nice way to 'crop' the coastline-polygon, so that the coastlines fit nicely inside the blue area and the excess vertecies are completely excluded from the document (not taking up file space)?
Solution:
The 'cropping' I was looking for is done using clipping, as suggested by #Rad Lexus:
// Start clipping
$pdf->StartTransform();
// Draw clipping rectangle
$pdf->Rect($DOC_MARG, $DOC_MARG, $MAP_W, $MAP_H, 'CNZ');
// -- Draw all polygons here (land areas) --
// Stop clipping
$pdf->StopTransform();
Source: https://stackoverflow.com/a/9400490/2667737
To save space in the output file I check every pixel in each polygon (land area) and render only the polygons that has one or more pixels within the bounds of the page - also suggested by #Rad. In the example view in my first post, the size was halved using this method.
Thanks for the help!