deep learning for shape localization and recognition - tensorflow

There is a set of images, each of which contains different shape entities, such as shown in the following figure. I am trying to localize and recognize these different shapes. For instance, adding a bounding box for each different shape and maybe even label it. What are the major research papers/deep learning models that have been able to solve this kind of problem?

Object detection papers such as rcnn, faster rcnn, yolo and ssd would help you solve this if you were bent on using a deep learning approach.
It’s easy to say this is a trivial problem that can be solved with tools in OpenCV and deep learning is overkill, but I can see many reasons to use deep learning tools and that does not answer your question.

We assume that your shapes has different scales and rotations. Actually your main image shown above is very large for training process and it needs a lot of training samples to generate a good accuracy at the end on test samples. In this case it is better to train a Convolutional Neural Network on a short images (like 128x128) with only one shape per each image and then use slide trick!
This project will have three main steps:
Generate test and train samples, each image should have only one shape
Train a classifier to recognize a single shape within each input image
Use slide trick! Break your original image containing many shapes to overlapping blocks of size 128x128. Pass each block to your model trained in the second step.
In this way at the end you will have label for each shape from your trained model, and also you will have location of each shape using slide trick.
For the classifier you can use exactly CNN structure of Tensorflow's MNIST tutorial.
Here is a paper with exactly same method applied to finger print images to extract local features.
A direct fingerprint minutiae extraction approach based on convolutional neural networks

Related

Bald detection using Keras

I was wondering if anyone can help by providing me with some guidelines for creating a bald-or-not image classifier.
So far I have a model for face and eye detection and to sum it up, this is my main questions:
Where can I find datasets for this kind of classification without going to google and download thousands of images by hand?
What classification model (i.e. the structure of layers in the network) should be used for this?
Question 1:
You could start by looking at some of the datasets available in Kaggle or Tensor Flow Datasets to see if there is anything available.
If none, you could try using an Image scraper tool to download images quickly compared to by hand.
Question 2:
Typically Image Classification model uses Convolutional Layers and MaxPooling layers. On top of the commonly used Dense Layer for Multi-layer Perceptron.
To get started you can study the Tensor Flow tutorial for Image Classification in this link,
which classifies whether the Image is Cat or Dog.
This example can provide you with the general idea of how to build an Image Classifier.
Hope this helps you. Thanks

Image Detection & Classification - general approach?

I'm trying to build a detection + classification model that will recognize an object in an image and classify it. Every image will contain at most 1 object among my 10 classes (i.e. same image cannot contains 2 classes). An image can, however, contain none of my classes/objects. I'm struggling with the general approach to this problem, especially due to the nature of my problem; my objects have different sizes. This is what I have tried:
Trained a classifier with images that only contains my objects/classes, i.e. every image is the object itself with background pre-removed. Now, since the objects/images have different shapes (aspect ratios) I had to reshape the images to the same size (destroying the aspect ratios). This would work just fine if my purpose was to only build a classifier, but since I also need to detect the objects, this didn't work so good.
The second approach was similar to (1), except that I didn't reshape the objects naively, but kept the aspect ratios by padding the image with 0 (black). This completely destroyed my classifiers ability to perform well (accuracy < 5%).
Mask RCNN - I followed this blogpost to try build a detector + classifier in the same model. The approach took forever and I wasn't sure it was the right approach. I even used external tools (RectLabel) to generate annotated image files containing information about the bounding boxes.
Question:
How should I approach this problem, on a general level:
Should I build 2 separate models? (One for detection/localization and one for classification?)
Should I be annotating my images using annotations file as in approach (3)?
Do I have to reshape my images at any stage?
Thanks,
PS. In all of my approaches, I augmented the images to generate ~500-1000 images per class.
To answer your questions:
No, you don't have to build two separate models. What you are describing is called Object detection, which is classification along with localization. There are many models which do this: Mask_RCNN, Yolo, Detectron, SSD, etc..
Yes, you do need to annotate your images for training a model for your custom classes. Each of the models mentioned above has needs a different way of annotation.
No, you don't need to do any image resizing. Most of the time it is done when the model loads the data for training or inference.
You are on the right track with trying MaskRCNN.
Other than MaskRCNN, you could also try Yolo. There is also an accompanying easy-to-use annotating tool Yolo-Mark.
If you go through this tutorial, you would understand what you care about.
How to train your own Object Detector with TensorFlow’s Object Detector API
The SSD model is small so that it would not take so much time for training.
There are some object detection models.
On RectLabel, you can save bounding boxes in the PASCAL VOC format.
You can export TFRecord for Tensorflow.
https://rectlabel.com/help#tf_record

Deep learning training with nonidentical images?

[![enter image description here][1]][1]I am actually reconstructing some images using dual photography. Next, I want to train a network to reconstruct clear images by removing noise (Denoising autoencoder).
The input for training the network is reconstructed images, whereas, the output is ground truth or computer based standard test images. Now the input e.g., Lena is some how not exact version of Lena with image shifted in positions and some artifacts.
If I keep input as my reconstructed image and training output as Lena test image (computer standard test image) , will it work?
I only want to know if input/output shifted or some details missing in one of them (due to some cropping) would work.
It depends on many factors like your images for training and the architecture of the network.
However, what you want to do is to make a network that learns the noise or low level information and for this purpose Generative Adversarial Networks (GAN) are very popular. You can read about them here. Maybe, after you have tried your approach and if the results are not satisfactory then try using GANs, like, DCGAN (Deep Convolution GAN).
Also, share your outcomes with the community if you would like.
Denoising Autoencoders! Love it!
There is no reason for not training your model with those images. The autoencoder, if well trained, will eventually learn the transformation if there is enough data.
However, if you have the 'positive' images, I strongly recommend you to create your own noisy images and then train in that controlled working area. You will simplify your problem and it will be easier to solve.
What is stopping you from doing just that?

How to know what Tensorflow actually "see"?

I'm using cnn built by keras(tensorflow) to do visual recognition.
I wonder if there is a way to know what my own tensorflow model "see".
Google had a news showing the cat face in the AI brain.
https://www.smithsonianmag.com/innovation/one-step-closer-to-a-brain-79159265/
Can anybody tell me how to take out the image in my own cnn networks.
For example, what my own cnn model recognize a car?
We have to distinguish between what Tensorflow actually see:
As we go deeper into the network, the feature maps look less like the
original image and more like an abstract representation of it. As you
can see in block3_conv1 the cat is somewhat visible, but after that it
becomes unrecognizable. The reason is that deeper feature maps encode
high level concepts like “cat nose” or “dog ear” while lower level
feature maps detect simple edges and shapes. That’s why deeper feature
maps contain less information about the image and more about the class
of the image. They still encode useful features, but they are less
visually interpretable by us.
and what we can reconstruct from it as a result of some kind of reverse deconvolution (which is not a real math deconvolution in fact) process.
To answer to your real question, there is a lot of good example solution out there, one you can study it with success: Visualizing output of convolutional layer in tensorflow.
When you are building a model to perform visual recognition, you actually give it similar kinds of labelled data or pictures in this case to it to recognize so that it can modify its weights according to the training data. If you wish to build a model that can recognize a car, you have to perform training on a large train data containing labelled pictures. This type of recognition is basically a categorical recognition.
You can experiment with the MNIST dataset which provides with a dataset of pictures of digits for image recognition.

What to expect from deep learning object detection on black and white pictures?

With TensorFlow, I want to train an object detection model with my own images based on ssd_inception_v2_coco model. The problem I have is that all my pictures are black and white. What performance can I expect? Should I try to colorize my B&W pictures first? Or at the opposite, should I try to retrain base network with images "uncolorized"? Are there general guidelines for B&W processing of images for deep learning object detection?
I wouldn't go through the trouble of colorizing if you are planning on using a pretrained model. I would expect that explicitly colorizing your images as a pre-processing step would help very little (if at all) since in theory the features that a colorizing network learns can also be learned by the detection network.
If you are planning on pretraining your detection network that was trained on an RGB dataset, make sure you either (i) replace the first convolution in the network with a convolutional layer that expects a single-channel input, or (ii) pad your image with two all-zero channels.
You may get slightly worse detection performance simply because you lose two thirds of the image's pixel information when using BW instead of RGB.