sparse_image_warp in Tensorflow doesn't work? - tensorflow

I want to apply a warp to an image specified by the source and destination locations of a (potentially small) number of control points in deep learning framework. And I thought the function 'tf.contrib.image.sparse_image_warp' can do exactly what I want. But after I tried, the warped image didn't look good.
More specifically, I want to warp the source image to destination image by face landmarks. So, I used the following code:
warped_image, dense_flows = sparse_image_warp(source_image, source_image_landmarks, dest_image_landmarks)
And the results are here:
source image with landmark:
dest image with landmark:
warped result:
desired result generated by other method:
Am I using the function in wrong way? Or the function can't realize my need?

Pay close attention to tf.contrib.image.sparse_image_warp, you need to supply the control points (lfacial landmarks in your example) in y-x coordinate rather than x-y.

Related

How to resize a nifti (nii.gz medical image) file

I have some medical images of nii.gz format which are of different shapes. I want to resize all to the same shape inorder to feed to a deep learnig model, I tried using resample_img() of nibabel, but it destroys my images. I want to do some other function just to resize it to a particular shape, say (512,512,129).
Someone please help me in this regard. I am stuck in this step for quite a good number of days.
Maybe you can use this:
https://scikit-image.org/docs/dev/api/skimage.transform.html
I saw it in one of the papers. Here is the example in function ScaleToFixed:
https://github.com/sacmehta/3D-ESPNet/blob/master/Transforms.py
Here is how I did it. I have the volume of shape 320x320x130 (black and white so no rgb dimension). I want to make it twice as small. This worked for me:
import skimage.transform as skTrans
im = nib.load(file_path).get_fdata()
result1 = skTrans.resize(im, (160,160,130), order=1, preserve_range=True)
You can use TorchIO:
import torchio as tio
image = tio.ScalarImage('path/to/image.nii.gz')
transform = tio.CropOrPad((512,512,129))
output = transform(image)
If you would like to keep the original field of view, you could use the Resample transform instead.
Disclaimer: I'm the main developer of TorchIO.

Simple Captcha Solving

I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are:
I tried to the remove the noisy dots with some filters:
import cv2
import numpy as np
import pytesseract
img = cv2.imread(image_path)
_, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, np.ones((4, 4), np.uint8), iterations=1)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite('res.png', img)
print(pytesseract.image_to_string('res.png'))
Resulting tranformed images are:
Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?
Final Update:
As #Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named connectedComponentsWithStats, whichs detect connected pixels and assigns group (component) a label. By finding connected components and removing the ones with small number of pixels, I managed to get better overall detection accuracy with pytesseract.
And here are the new resulting images:
I've taken a much more direct approach to filtering ink splotches from pdf documents. I won't share the whole thing it's a lot of code, but here is the general strategy I adopted:
Use Python Pillow library to get an image object where you can manipulate pixels directly.
Binarize the image.
Find all connected pixels and how many pixels are in each group of connected pixels. You can do this using the minesweeper algorithm. Which is easy to search for.
Set some threshold value of pixels that all legitimate letters are expected to have. This will be dependent on your image resolution.
replace all black pixels in groups below the threshold with white pixels.
Convert back to image.
Your final output image is too blurry. To enhance the performance of pytesseract you need to sharpen it.
Sharpening is not as easy as blurring, but there exist a few code snippets / tutorials (e.g. http://datahacker.rs/004-how-to-smooth-and-sharpen-an-image-in-opencv/).
Rather than chaining blurs, blur once either using Gaussian or Median Blur, experiment with parameters to get the blur amount you need, perhaps try one method after the other but there is no reason to chain blurs of the same method.
There is an OCR example in python that detect the characters. Save several images and apply the filter and train a SVM algorithm. that may help you. I did trained a algorithm with even few Images but the results were acceptable. Check this link.
Wish you luck
I know the post is a bit old but I suggest you to try this library I've developed some time ago. If you have a set of labelled captchas that service would fit you. Take a look: https://github.com/punkerpunker/captcha_solver
In README there is a section "Train model on external data" that you might be interested in.

Get the location of object to crop by providing pixel label in tensorflow

I have a data-set of images(every image is in rgb format) and corresponding label image(which contains label of every pixel in the image).
I need to extract the objects(pixels) of a particular class from original images.
first i have to find location of object using label image(by providing label of given object)(it is doable by using explicit for loops but, i don't want to use explicit for loops)
Now my questions-
If there is any in-build function in tensorflow that gives me the location(Rectangles are fine) of given object(if i provide the labels of that object)?
After that i can use the tf.image.crop_and_resize to crop the image. but i am not able to find any function that will give me location of objects.

How to refine the Graphcut cmex code based on a specific energy functions?

I download the following graph-cut code:
https://github.com/shaibagon/GCMex
I compiled the mex files, and ran it for pre-defined image in the code (which is rgb image)
I wanna optimize the image segmentation results,
I have probability map of the image, which its dimension is (width,height, 5). Five probability distribution over the image dimension are stacked together. each relates to one the classes.
My problem is which parts of code should according to the probability image.
I want to define Data and Smoothing terms based on my application.
My question is:
1) Has someone refined the code according to the defining different energy function (I wanna change Unary and pair-wise formulation).
2) I have a stack of 3D images. I wanna define 6-neighborhood system, 4 neighbors in current slice and the other two from two adjacent slices. In which function and part of code can I do the refinements?
Thanks

Feature Pyramid Network with tensorflow/models/object_detection

If I want to implement k = k0 + log2(√(w*h)/224) in Feature Pyramid Networks for Object Detection, where and which file should I change?
Note, this formula is for ROI pooling. W and H are the width and height of ROI, whereas k represents the level of the feature pyramid this ROI should be used on.
*saying the FasterRCNN meta_architecture file of in object_detection might be helpful, but please inform me which method I can change.
Take a look at this document for a rough overview of the process. In a nutshell, you'll have to create a "FeatureExtractor" sub-class for you desired meta-architecture. For FasterRCNN, you can probably start with a copy of our Resnet101 Feature Extractor as a starting point.
The short answer is that the change won't be trivial as we don't currently support cropping regions from multiple layers. Here is an outline of what would need to change if you would like to pursue this anyway:
Generating a new anchor set
Currently Faster RCNN uses a “GridAnchorGenerator” as the first_stage_anchor_generator - instead you will have to use a MultipleGridAnchorGenerator (same as we use in SSD pipeline).
You will have to use a 32^2 anchor box -> for the scales field of the anchor generator, basically you will have to add a .125
You will have to modify the code to generate and crop from multiple layers: to start, look for a function in the faster_rcnn_meta_arch file called "_extract_rpn_feature_maps", which is suggestively named, but currently returns just a single tensor! You will also have to add some logic to determine which layer to crop from based on the size of the proposal (Eqn 1 from the paper)
You will have to finally create a new feature extractor following the directions that Derek linked to.