How can I enrich a Convolutional Neural Network with meta information? - tensorflow

I would very much like to understand how I can enrich a CNN with provided meta information. As I understand, a CNN 'just' looks at the images and classifies it into objects without looking at possibly existing meta-parameters such as time, weather conditions, etc etc.
To be more precise, I am using a keras CNN with tensorflow in the backend. I have the typical Conv2D and MaxPooling Layers and a fully connected model at the end of the pipeline. It works nicely and gives me a good accuracy. However, I do have additional meta information for each image (the manufacturer of the camera with which the image was taken) that is unused so far.
What is a recommended way to incorporate this meta information into the model? I could not yet come out with a good solution by myself.
Thanks for any help!

Usually it is done by adding this information in one of the fully connected layer before the prediction. The fully connected layer gives you K features representing your image, you just concatenate them with the additional information you have.

Related

Bald detection using Keras

I was wondering if anyone can help by providing me with some guidelines for creating a bald-or-not image classifier.
So far I have a model for face and eye detection and to sum it up, this is my main questions:
Where can I find datasets for this kind of classification without going to google and download thousands of images by hand?
What classification model (i.e. the structure of layers in the network) should be used for this?
Question 1:
You could start by looking at some of the datasets available in Kaggle or Tensor Flow Datasets to see if there is anything available.
If none, you could try using an Image scraper tool to download images quickly compared to by hand.
Question 2:
Typically Image Classification model uses Convolutional Layers and MaxPooling layers. On top of the commonly used Dense Layer for Multi-layer Perceptron.
To get started you can study the Tensor Flow tutorial for Image Classification in this link,
which classifies whether the Image is Cat or Dog.
This example can provide you with the general idea of how to build an Image Classifier.
Hope this helps you. Thanks

Classification of a sequence of images (fixed number)

I successfully trained a CNN for a single image classification, using pre-trained resnet50 from tensorflow_hub.
Now my goal is to give as input to my network a chronological sequence of images (not a video), to classify the behavior of the subject.
Each sequence consists of 20 images taken every 100ms.
What is the best kind of NN? Where can I find documentation/examples for problems similar to mine?
Any time there is sequential data some type of Recurrent Neural Network is a great candidate (usually in the form of an LSTM).
Your model may look like a combination of an CNN-LSTM because your pictures have some sort of sequential relationship.
Here is a link to some examples and tutorials. He will set up a CNN in his example but you could probably rig your architecture to use the resNet you have already made. Though your are not dealing with a video your problem shares the same domain.
Here is a paper than uses a NN architecture like the one described above you might find useful.

Misclassification using VGG16 Net

I use Faster RCNN to classify 33 items. But most of them are misclassified among each other. All items are snack packets and sweet packets like in the link below.
https://redmart.com/product/lays-salt-and-vinegar-potato-chips
https://www.google.com/search?q=ice+breakers&source=lnms&tbm=isch&sa=X&ved=0ahUKEwj5qqXMofHfAhUQY48KHbIgCO8Q_AUIDigB&biw=1855&bih=953#imgrc=TVDtryRBYCPlnM:
https://www.google.com/search?biw=1855&bih=953&tbm=isch&sa=1&ei=S5g-XPatEMTVvATZgLiwDw&q=disney+frozen+egg&oq=disney+frozen+egg&gs_l=img.3..0.6353.6886..7047...0.0..0.43.116.3......1....1..gws-wiz-img.OSreIYZziXU#imgrc=TQVYPtSi--E7eM:
So color and shape are similar.
What could be the best way to solve this misclassification problem?
Fine tuning is a way to use features, learned on some big dataset, in our problem, which means instead of training the complete network again, we freeze out weights of the lower layer of the network and add few layers at the end of network, as per requirement. Now we train it on our data-set again. So the advantage here is that, we don't need to train all-millions of parameters, but few only. Another is that we don't need large-dataset to fine-tune.
More you can find here. This is another-useful resource, where author has explained this in more detail(with code).
Note: This is also known as transfer-learning.

How to know what Tensorflow actually "see"?

I'm using cnn built by keras(tensorflow) to do visual recognition.
I wonder if there is a way to know what my own tensorflow model "see".
Google had a news showing the cat face in the AI brain.
https://www.smithsonianmag.com/innovation/one-step-closer-to-a-brain-79159265/
Can anybody tell me how to take out the image in my own cnn networks.
For example, what my own cnn model recognize a car?
We have to distinguish between what Tensorflow actually see:
As we go deeper into the network, the feature maps look less like the
original image and more like an abstract representation of it. As you
can see in block3_conv1 the cat is somewhat visible, but after that it
becomes unrecognizable. The reason is that deeper feature maps encode
high level concepts like “cat nose” or “dog ear” while lower level
feature maps detect simple edges and shapes. That’s why deeper feature
maps contain less information about the image and more about the class
of the image. They still encode useful features, but they are less
visually interpretable by us.
and what we can reconstruct from it as a result of some kind of reverse deconvolution (which is not a real math deconvolution in fact) process.
To answer to your real question, there is a lot of good example solution out there, one you can study it with success: Visualizing output of convolutional layer in tensorflow.
When you are building a model to perform visual recognition, you actually give it similar kinds of labelled data or pictures in this case to it to recognize so that it can modify its weights according to the training data. If you wish to build a model that can recognize a car, you have to perform training on a large train data containing labelled pictures. This type of recognition is basically a categorical recognition.
You can experiment with the MNIST dataset which provides with a dataset of pictures of digits for image recognition.

how to use tensorflow object detection API for face detection

Open CV provides a simple API to detect and extract faces from given images. ( I do not think it works perfectly fine though because I experienced that it cuts frames from the input pictures that have nothing to do with face images. )
I wonder if tensorflow API can be used for face detection. I failed finding relevant information but hoping that maybe an experienced person in the field can guide me on this subject. Can tensorflow's object detection API be used for face detection as well in the same way as Open CV does? (I mean, you just call the API function and it gives you the face image from the given input image.)
You can, but some work is needed.
First, take a look at the object detection README. There are some useful articles you should follow. Specifically: (1) Configuring an object detection pipeline, (3) Preparing inputs and (3) Running locally. You should start with an existing architecture with a pre-trained model. Pretrained models can be found in Model Zoo, and their corresponding configuration files can be found here.
The most common pre-trained models in Model Zoo are on COCO dataset. Unfortunately this dataset doesn't contain face as a class (but does contain person).
Instead, you can start with a pre-trained model on Open Images, such as faster_rcnn_inception_resnet_v2_atrous_oid, which does contain face as a class.
Note that this model is larger and slower than common architectures used on COCO dataset, such as SSDLite over MobileNetV1/V2. This is because Open Images has a lot more classes than COCO, and therefore a well working model need to be much more expressive in order to be able to distinguish between the large amount of classes and localizing them correctly.
Since you only want face detection, you can try the following two options:
If you're okay with a slower model which will probably result in better performance, start with faster_rcnn_inception_resnet_v2_atrous_oid, and you can only slightly fine-tune the model on the single class of face.
If you want a faster model, you should probably start with something like SSDLite-MobileNetV2 pre-trained on COCO, but then fine-tune it on the class of face from a different dataset, such as your own or the face subset of Open Images.
Note that the fact that the pre-trained model isn't trained on faces doesn't mean you can't fine-tune it to be, but rather that it might take more fine-tuning than a pre-trained model which was pre-trained on faces as well.
just increase the shape of the input, I tried and it's work much better