Can I find the region of the found categories in TensorFlow? - tensorflow

We have been using Tensorflow for image classification, and we all see the results for the Admiral Grace Hopper, and we get:
military uniform (866): 0.647296
suit (794): 0.0477196
academic gown (896): 0.0232411
bow tie (817): 0.0157356
bolo tie (940): 0.0145024
I was wondering if there is any way to get the coordinates for each category within the image.

Tensorflow doesn't have sample code yet on image detection and localization but it's an open research problem with different approaches to do it using deep nets; for example you can lookup the papers on algorithms called OverFeat and YOLO (You Only Look Once).
Also, usually there's some preprocessing on the object coordinates labels, or postprocessing to suppress duplicate detections. Usually a second, different network is used to classify the object once it's detected.

Related

Is this the correct way of using YOLO for image classification in a custom project?

I'm a beginner in computer vision. Could anyone tell me whether what I'm considering to do is correct or not? I wanted to detect a certain cyst in teeth. So my dataset consists of a part of the dental x-ray that contains that cyst. I train my model with these pictures. The one with the colored area contains cyst (infected teeth), and the one below it is the uninfected teet.
Image with cyst
Uninfected teeth
After training my model, I want to use it on a full dental x-ray, and determine if this picture has the cyst or not. A full dental x-ray is shown below.
Full dental X-Ray
Does this work? Or I'm completely wrong?
Instead of treating this as an object detection problem, you would get far better results if you were to treat this as a classification problem.
There are already various architectures for such classification tasks.
There are various architectures in TensorFlow to get you started.
Take a look at this. If you have enough data you can train them from scratch instead of using pre-trained weights
Note - The architecture provided in TensorFlow will almost always give you better results than the architectures that you create.
Object detection is suitable for cases where you have well-defined objects. If you take a look at recently published research papers you can see that these types of problems are considered as classification problems instead of object detection problems.

How to know what Tensorflow actually "see"?

I'm using cnn built by keras(tensorflow) to do visual recognition.
I wonder if there is a way to know what my own tensorflow model "see".
Google had a news showing the cat face in the AI brain.
https://www.smithsonianmag.com/innovation/one-step-closer-to-a-brain-79159265/
Can anybody tell me how to take out the image in my own cnn networks.
For example, what my own cnn model recognize a car?
We have to distinguish between what Tensorflow actually see:
As we go deeper into the network, the feature maps look less like the
original image and more like an abstract representation of it. As you
can see in block3_conv1 the cat is somewhat visible, but after that it
becomes unrecognizable. The reason is that deeper feature maps encode
high level concepts like “cat nose” or “dog ear” while lower level
feature maps detect simple edges and shapes. That’s why deeper feature
maps contain less information about the image and more about the class
of the image. They still encode useful features, but they are less
visually interpretable by us.
and what we can reconstruct from it as a result of some kind of reverse deconvolution (which is not a real math deconvolution in fact) process.
To answer to your real question, there is a lot of good example solution out there, one you can study it with success: Visualizing output of convolutional layer in tensorflow.
When you are building a model to perform visual recognition, you actually give it similar kinds of labelled data or pictures in this case to it to recognize so that it can modify its weights according to the training data. If you wish to build a model that can recognize a car, you have to perform training on a large train data containing labelled pictures. This type of recognition is basically a categorical recognition.
You can experiment with the MNIST dataset which provides with a dataset of pictures of digits for image recognition.

Counting Pedestrians Using TensorFlow's Object Detection

I am new to machine learning field and based on what I have seen on youtube and read on internet I conjectured that it might be possible to count pedestrians in a video using tensorflow's object detection API.
Consequently, I did some research on tensorflow and read documentation about how to install tensorflow and then finally downloaded tensorflow and installed it. Using the sample files provided on github I adapted the code related to object_detection notebook provided here ->https://github.com/tensorflow/models/tree/master/research/object_detection.
I executed the adapted code on the videos that I collected while making changes to visualization_utils.py script so as to report number of objects that cross a defined region of interest on the screen. That is I collected bounding boxes dimensions (left,right,top, bottom) of person class and counted all the detection's that crossed the defined region of interest (imagine a set of two virtual vertical lines on video frame with left and right pixel value and then comparing detected bounding box's left & right values with predefined values). However, when I use this procedure I am missing on lot of pedestrians even though they are detected by the program. That is the program correctly classifies them as persons but sometimes they don't meet the criteria that I defined for counting and as such they are not counted. I want to know if there is a better way of counting unique pedestrians using the code rather than using the simplistic method that I am trying to develop. Is the approach that I am using the right one ? Could there be other better approaches ? Would appreciate any kind of help.
Please go easy on me as I am not a machine learning expert and just a novice.
You are using a pretrained model which is trained to identify people in general. I think you're saying that some people are pedestrians whereas some other people are not pedestrians, for example, someone standing waiting at the light is a pedestrian, but someone standing in their garden behind the street is not a pedestrian.
If I'm right, then you've reached the limitations of what you'll get with this model and you will probably have to train a model yourself to do what you want.
Since you're new to ML building your own dataset and training your own model probably sounds like a tall order, there's a learning curve to be sure. So I'll suggest the easiest way forward. That is, use the object detection model to identify people, then train a new binary classification model (about the easiest model to train) to identify if a particular person is a pedestrian or not (you will create a dataset of images and 1/0 values to identify them as pedestrian or not). I suggest this because a boolean classification model is about as easy a model as you can get and there are dozens of tutorials you can follow. Here's a good one:
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/neural_network.ipynb
A few things to note when doing this:
When you build your dataset you will want a set of images, at least a few thousand along with the 1/0 classification for each (pedestrian or not pedestrian).
You will get much better results if you start with a model that is pretrained on imagenet than if you train it from scratch (though this might be a reasonable step-2 as it's an extra task). Especially if you only have a few thousand images to train it on.
Since your images will have multiple people in it you have a problem of identifying which person you want the model to classify as a pedestrian or not. There's no one right way to do this necessarily. If you have a yellow box surrounding the person the network may be successful in learning this notation. Another valid approach might be to remove the other people that were detected in the image by deleting them and leaving that area black. Centering on the target person may also be a reasonable approach.
My last bullet-point illustrates a problem with the idea as it's I've proposed it. The best solution would be to alter the object detection network to ouput both a bounding box per-person, and also a pedestrian/non pedestrian classification with it; or to only train the model to identify pedestrians, specifically, in the first place. I mention this as more optimal, but I consider it a more advanced task than my first suggestion, and a more complex dataset to manage. It's probably not the first thing you want to tackle as you learn your way around ML.

Training keras with tensorflow: Redundancy in labelling the object or multiple labels on same object

I was training keras with tensorflow for person detection. After the training, when the testing was done so many images contains redundant labeling of person. ie; for a single person in an image, multiple labeling as a person was shown. What is the actual reason behind this?
My training set contains nearly 2000 images, a single class person, batch=32, epoch=100, threshold=0.55 and testing images=250.
Overtraining of samples may lead to redundancy and if you are using different angles of an image, for example if you train for detecting people and you are providing samples of human from different angles, then it may show errors on detection in real cases. If this is not the issue, then non- maximal suppression will be the better option.

How to locate multiple objects in the same image?

I am a newbie in TensorFlow.
Currently, I am testing some classification's examples "Convolutional Neural Network" in the TensorFlow website, and it explains how to classify input images into pre-defined classes, but the problem is: I can't figure out how to locate multiple objects in the same image. For example, I had an input image with a cat and dog and I want my graph to display in the output that there are both of them "a cat and a dog" in the image.
Great question. Detecting multiple objects in the same image boils is essentially a "segmentation problem". Two nice and popular algorithms are
YOLO (You Only Look Once), and SSD(Single Shot Multibox Detector). I included links to them at the bottom.
I would watch a few videos on how YOLO works, and see if you grasp the idea. Then read the paper on SSD, and see if you get why this algorithm is even faster and more precise.
Both algorithms are single-pass: they only look at the image "once" and predict bounding boxes for the categories they spot. There are more precise algorithms, but they are slower (they first pick many spots they want to look, and then run a classifier on only that spot. The result is that they run this classifier many times per image, which is slow).
As you stated you are a newbie to Tensorflow, you can try this code other people made: https://github.com/thtrieu/darkflow . The very extensive readme shows you how to get started on your own dataset.
Good luck, and let us know if you have other questions, or if these algorithms do not fit your use-case.
YOLO 9000 (https://pjreddie.com/darknet/yolo/)
SSD (Single shot multibox detector) (https://arxiv.org/abs/1512.02325)
A naive approach for what you are trying to do would be to classify parts of the image independently.
But there are some better techniques for object detection. Actually, there is the TensorFlow Object Detection API, which gives you access to the most common object detection methods like Faster R-CNN or SSD.