Simple Captcha Solving - captcha

I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are:
I tried to the remove the noisy dots with some filters:
import cv2
import numpy as np
import pytesseract
img = cv2.imread(image_path)
_, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, np.ones((4, 4), np.uint8), iterations=1)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite('res.png', img)
print(pytesseract.image_to_string('res.png'))
Resulting tranformed images are:
Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?
Final Update:
As #Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named connectedComponentsWithStats, whichs detect connected pixels and assigns group (component) a label. By finding connected components and removing the ones with small number of pixels, I managed to get better overall detection accuracy with pytesseract.
And here are the new resulting images:

I've taken a much more direct approach to filtering ink splotches from pdf documents. I won't share the whole thing it's a lot of code, but here is the general strategy I adopted:
Use Python Pillow library to get an image object where you can manipulate pixels directly.
Binarize the image.
Find all connected pixels and how many pixels are in each group of connected pixels. You can do this using the minesweeper algorithm. Which is easy to search for.
Set some threshold value of pixels that all legitimate letters are expected to have. This will be dependent on your image resolution.
replace all black pixels in groups below the threshold with white pixels.
Convert back to image.

Your final output image is too blurry. To enhance the performance of pytesseract you need to sharpen it.
Sharpening is not as easy as blurring, but there exist a few code snippets / tutorials (e.g. http://datahacker.rs/004-how-to-smooth-and-sharpen-an-image-in-opencv/).
Rather than chaining blurs, blur once either using Gaussian or Median Blur, experiment with parameters to get the blur amount you need, perhaps try one method after the other but there is no reason to chain blurs of the same method.

There is an OCR example in python that detect the characters. Save several images and apply the filter and train a SVM algorithm. that may help you. I did trained a algorithm with even few Images but the results were acceptable. Check this link.
Wish you luck

I know the post is a bit old but I suggest you to try this library I've developed some time ago. If you have a set of labelled captchas that service would fit you. Take a look: https://github.com/punkerpunker/captcha_solver
In README there is a section "Train model on external data" that you might be interested in.

Related

How to resize a nifti (nii.gz medical image) file

I have some medical images of nii.gz format which are of different shapes. I want to resize all to the same shape inorder to feed to a deep learnig model, I tried using resample_img() of nibabel, but it destroys my images. I want to do some other function just to resize it to a particular shape, say (512,512,129).
Someone please help me in this regard. I am stuck in this step for quite a good number of days.
Maybe you can use this:
https://scikit-image.org/docs/dev/api/skimage.transform.html
I saw it in one of the papers. Here is the example in function ScaleToFixed:
https://github.com/sacmehta/3D-ESPNet/blob/master/Transforms.py
Here is how I did it. I have the volume of shape 320x320x130 (black and white so no rgb dimension). I want to make it twice as small. This worked for me:
import skimage.transform as skTrans
im = nib.load(file_path).get_fdata()
result1 = skTrans.resize(im, (160,160,130), order=1, preserve_range=True)
You can use TorchIO:
import torchio as tio
image = tio.ScalarImage('path/to/image.nii.gz')
transform = tio.CropOrPad((512,512,129))
output = transform(image)
If you would like to keep the original field of view, you could use the Resample transform instead.
Disclaimer: I'm the main developer of TorchIO.

try to implement cv2.findContours for person detection

I'm new to opencv and I'm trying to detect person through cv2.findContours with morphological transformation of the video. Here is the code snippet..
import numpy as np
import imutils
import cv2 as cv
import time
cap = cv.VideoCapture(0)
while(cap.isOpened()):
ret, frame = cap.read()
#frame = imutils.resize(frame, width=700,height=100)
gray = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)
gray = cv.GaussianBlur(gray, (21, 21), 0)
cv.accumulateWeighted(gray, avg, 0.5)
mask2 = cv.absdiff(gray, cv.convertScaleAbs(avg))
mask = cv.absdiff(gray, cv.convertScaleAbs(avg))
contours0, hierarchy = cv.findContours(mask2,cv.RETR_EXTERNAL,cv.CHAIN_APPROX_SIMPLE)
for cnt in contours0:
.
.
.
The rest of the code has the logic of a contour passing a line and incrementing the count.
The problem I'm encountering is, cv.findContours detects every movement/change in the frame (including the person). What I want is cv.findContours to detect only person and not any other movement. I know that person detection can be achieved through harrcasacade but is there any way I can implement detection using cv2.findContours?
If not then is there a way I can still do morphological transformation and detect people because the project I'm working on requires filtering of noise and much of the background to detect the person and increment it's count on passing the line.
I will show you two options to do this.
The method I mentioned in the comments which you can use with Yolo to detect humans:
Use saliency to detect the standout parts of the video
Apply K-Means Clustering to cluster the objects into individual clusters.
Apply Background Subtraction and erosion or dilation (or both depends on the video but try them all and see which one does the best job).
Crop the objects
Send the cropped objects to Yolo
If the class name is a pedestrian or human then draw the bounding boxes on them.
Using OpenCV's builtin pedestrian detection which is much more easier:
Convert frames to black and white
Use pedestrian_cascade.detectMultiScale() on the grey frames.
Draw a bounding box over each pedestrian
Second method is much simpler but it depends what is expected of you for this project.

Implement CVAE for a single image

I have a multi-dimensional, hyper-spectral image (channels, width, height = 15, 2500, 2500). I want to compress its 15 channel dimensions into 5 channels.So, the output would be (channels, width, height = 5, 2500, 2500). One simple way to do is to apply PCA. However, performance is not so good. Thus, I want to use Variational AutoEncoder(VAE).
When I saw the available solution in Tensorflow or keras library, it shows an example of clustering the whole images using Convolutional Variational AutoEncoder(CVAE).
https://www.tensorflow.org/tutorials/generative/cvae
https://keras.io/examples/generative/vae/
However, I have a single image. What is the best practice to implement CVAE? Is it by generating sample images by moving window approach?
One way of doing it would be to have a CVAE that takes as input (and output) values of all the spectral features for each of the spatial coordinates (the stacks circled in red in the picture). So, in the case of your image, you would have 2500*2500 = 6250000 input data samples, which are all vectors of length 15. And then the dimension of the middle layer would be a vector of length 5. And, instead of 2D convolutions that are normally used along the spatial domain of images, in this case it would make sense to use 1D convolution over the spectral domain (since the values of neighbouring wavelengths are also correlated). But I think using only fully-connected layers would also make sense.
As a disclaimer, I haven’t seen CVAEs used in this way before, but like this, you would also get many data samples, which is needed in order for the learning generalise well.
Another option would be indeed what you suggested -- to just generate the samples (patches) using a moving window (maybe with a stride that is the half size of the patch). Even though you wouldn't necessarily get enough data samples for the CVAE to generalise really well on all HSI images, I guess it doesn't matter (if it overfits), since you want to use it on that same image.

Using machine learning to remove background in image of hand-written signature

I am new to machine learning.
I want to prepare a document with a signature at the bottom of it.
For this purpose I am taking a photo of the user's signature for placement in the document.
How can I using machine learning extract only the signature part from the image and place it on the document?
Input example:
Output expected in gif format:
Extract the green image plane. Then take the complementary of the gray value of every pixel as the transparency coefficient. Then you can perform compositing to the destination.
https://en.wikipedia.org/wiki/Alpha_compositing
A simple image-processing technique using OpenCV should work. The idea is to obtain a binary image then bitwise-and the image to remove the non-signature details. Here's the results:
Input image
Binary image
Result
Code
import cv2
# Load image, convert to grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Bitwise-and and color background white
result = cv2.bitwise_and(image, image, mask=thresh)
result[thresh==0] = [255,255,255]
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
Please do research before posting questions like these. A simple google search of "extract signature from image python" gave so many results.
Git Repo
Git Repo
Stack Overflow
There are many other such alternatives. Please have a look and try a few approaches.
If you still have some questions or doubts, then post the approach you have taken and a discussion is warranted.

face alignment algorithm on images

How can I do a basic face alignment on a 2-dimensional image with the assumption that I have the position/coordinates of the mouth and eyes.
Is there any algorithm that I could implement to correct the face alignment on images?
Face (or image) alignment refers to aligning one image (or face in your case) with respect to another (or a reference image/face). It is also referred to as image registration. You can do that using either appearance (intensity-based registration) or key-point locations (feature-based registration). The second category stems from image motion models where one image is considered a displaced version of the other.
In your case the landmark locations (3 points for eyes and nose?) provide a good reference set for straightforward feature-based registration. Assuming you have the location of a set of points in both of the 2D images, x_1 and x_2 you can estimate a similarity transform (rotation, translation, scaling), i.e. a planar 2D transform S that maps x_1 to x_2. You can additionally add reflection to that, though for faces this will most-likely be unnecessary.
Estimation can be done by forming the normal equations and solving a linear least-squares (LS) problem for the x_1 = Sx_2 system using linear regression. For the 5 unknown parameters (2 rotation, 2 translation, 1 scaling) you will need 3 points (2.5 to be precise) for solving 5 equations. Solution to the above LS can be obtained through Direct Linear Transform (e.g. by applying SVD or a matrix pseudo-inverse). For cases of a sufficiently large number of reference points (i.e. automatically detected) a RANSAC-type method for point filtering and uncertainty removal (though this is not your case here).
After estimating S, apply image warping on the second image to get the transformed grid (pixel) coordinates of the entire image 2. The transform will change pixel locations but not their appearance. Unavoidably some of the transformed regions of image 2 will lie outside the grid of image 1, and you can decide on the values for those null locations (e.g. 0, NaN etc.).
For more details: R. Szeliski, "Image Alignment and Stitching: A Tutorial" (Section 4.3 "Geometric Registration")
In OpenCV see: Geometric Image Transformations, e.g. cv::getRotationMatrix2D cv::getAffineTransform and cv::warpAffine. Note though that you should estimate and apply a similarity transform (special case of an affine) in order to preserve angles and shapes.
For the face there is lot of variability in feature points. So it won't be possible to do a perfect fit of all feature points by just affine transforms. The only way to align all the points perfectly is to warp the image given the points. Basically you can do a triangulation of image given the points and do a affine warp of each triangle to get the warped image where all the points are aligned.
Face detection could be handled based on the just eye positions.
Herein, OpenCV, Dlib and MTCNN offers to detect faces and eyes. Besides, it is a python based framework but deepface wraps those methods and offers an out-of-the box detection and alignment function.
detectFace function applies detection and alignment in the background respectively.
#!pip install deepface
from deepface import DeepFace
backends = ['opencv', 'ssd', 'dlib', 'mtcnn']
DeepFace.detectFace("img.jpg", detector_backend = backends[0])
Besides, you can apply detection and alignment manually.
from deepface.commons import functions
img = functions.load_image("img.jpg")
backends = ['opencv', 'ssd', 'dlib', 'mtcnn']
detected_face = functions.detect_face(img = img, detector_backend = backends[3])
plt.imshow(detected_face)
aligned_face = functions.align_face(img = img, detector_backend = backends[3])
plt.imshow(aligned_face)
processed_img = functions.detect_face(img = aligned_face, detector_backend = backends[3])
plt.imshow(processed_img)
There's a section Aligning Face Images in OpenCV's Face Recognition guide:
http://docs.opencv.org/trunk/modules/contrib/doc/facerec/facerec_tutorial.html#aligning-face-images
The script aligns given images at the eyes. It's written in Python, but should be easy to translate to other languages. I know of a C# implementation by Sorin Miron:
http://code.google.com/p/stereo-face-recognition/