Machine Learning: Creating Class Activation Map - tensorflow

I recently followed this tutorial to train my own image classifier
https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/?utm_campaign=chrome_series_machinelearning_063016&utm_source=gdev&utm_medium=yt-desc#0
for those who don't know it allows the retraining of a MobileNet network own your own custom images/categories. I have had sucess training networks on my own images using tensorflow but would like to go further now.
I now would like to generate a class activation map (CAM) for images inputted into the model. I have read you need your convolution neural network to have a Global Average Pooling layer for CAM's to work and am thinking the MoblieNet network does not have that. Is it possible to generate CAM's from the network I have already trained or will I need to retrain using a different network like VGG16??
If it is possible could someone point me in the right direction on how to generate CAM's? If not could someone point me in the right direction on how to retrain a different network with my own images that will allow for CAM's and how to creat CAM's on these networks??
Sorry for the vague nature of the question. I am by no means that familiar and/or trained regarding computer science but am interested to learn more. Your help would be greatly appreciate. Please contact me with further inquires about my question if needed. Hope to hear from you soon.
Thanks!

Related

How to transfer learning or fine tune YOLOv4-darknet with freeze some layers?

I'm a beginner in object detection field.
First, I followed YOLOv4 custom-train from here, I have successfully followed the tutorial. Then I started to think that if I have a new task which is similar to YOLOv4 pre-trained (which using COCO 80 classes) and I have only small dataset size, then I think it would be great if I can fine tune the model (unfreeze only the last layer) to keep or even to increase the detector performance by using only small & similar dataset. This reference seems to legitimate my thought about the fine-tuning I wanted to do.
Then I go to Alexey github here to check how to freeze layers, and found that I should use stopbackward=1. It says that
"...set param stopbackward=1 for layer-136 in cfg-file"
But I have no idea about where is "layer-136" in the cfg-file here and also I have no idea where to put stopbackward=1 if I only want to unfreeze the last layer (with freezing all the other layers). So to summarize my questions.
Where (in which line) to put stopbackward=1 in the yolov4-custom.cfg if I want to unfreeze last layer and freeze the other layers?
What is "layer-136" which mentioned in Alexey github reference? (is it one of the classifier layer? or else?)
In which line of yolov4-custom.cfg should I put the stopbackward=1 for that layer-136?
Any further information from you is really appreciated. Please advise.
Thank you in advance.
Regards,
Sona
the "layer-136" is located before the head of yolov4. To make it easy to see, try to visualize the .cfg file to Netron apps and read the .cfg via text editor, so you can understand the location of layer. You can notice the input and output (the x-layer) when you analyze it with Netron

Bald detection using Keras

I was wondering if anyone can help by providing me with some guidelines for creating a bald-or-not image classifier.
So far I have a model for face and eye detection and to sum it up, this is my main questions:
Where can I find datasets for this kind of classification without going to google and download thousands of images by hand?
What classification model (i.e. the structure of layers in the network) should be used for this?
Question 1:
You could start by looking at some of the datasets available in Kaggle or Tensor Flow Datasets to see if there is anything available.
If none, you could try using an Image scraper tool to download images quickly compared to by hand.
Question 2:
Typically Image Classification model uses Convolutional Layers and MaxPooling layers. On top of the commonly used Dense Layer for Multi-layer Perceptron.
To get started you can study the Tensor Flow tutorial for Image Classification in this link,
which classifies whether the Image is Cat or Dog.
This example can provide you with the general idea of how to build an Image Classifier.
Hope this helps you. Thanks

YOLO v3 complete architecture

I am attempting to implement YOLO v3 in Tensorflow-Keras from scratch, with the aim of training my own model on a custom dataset. By that, I mean without using pretrained weights. I have gone through all three papers for YOLOv1, YOLOv2(YOLO9000) and YOLOv3, and find that although Darknet53 is used as a feature extractor for YOLOv3, I am unable to point out the complete architecture which extends after that - the "detection" layers talked about here. After a lot of reading on blog posts from Medium, kdnuggets and other similar sites, I ended up with a few significant questions:
Have I have missed the complete architecture of the detection layers (that extend after Darknet53 used for feature extraction) in YOLOv3 paper somewhere?
The author seems to use different image sizes at different stages of training. Does the network automatically do this upscaling/downscaling of images?
For preprocessing the images, is it really just enough to resize them and then normalize it (dividing by 255)?
Please be kind enough to point me in the right direction. I appreciate the help!

How can I enrich a Convolutional Neural Network with meta information?

I would very much like to understand how I can enrich a CNN with provided meta information. As I understand, a CNN 'just' looks at the images and classifies it into objects without looking at possibly existing meta-parameters such as time, weather conditions, etc etc.
To be more precise, I am using a keras CNN with tensorflow in the backend. I have the typical Conv2D and MaxPooling Layers and a fully connected model at the end of the pipeline. It works nicely and gives me a good accuracy. However, I do have additional meta information for each image (the manufacturer of the camera with which the image was taken) that is unused so far.
What is a recommended way to incorporate this meta information into the model? I could not yet come out with a good solution by myself.
Thanks for any help!
Usually it is done by adding this information in one of the fully connected layer before the prediction. The fully connected layer gives you K features representing your image, you just concatenate them with the additional information you have.

Process to build our own model for image detection

Currently, I am working on deep neural network for image detection and I founded a model called YOLO Network, and it's very powerful to make objects detections, but I have a question:
How can we design and concept our own model? Do we use a brut force for that, for example "I use 2 convolutional and 1 pooling layer and 1 fully connected layer" after that if the result is'nt good I change the number of layers and change the parameter until I find the best model, Please if there is anyone who knows some informations about that, show me how ?
I use Tensorflow.
Thanks,
There are a couple of papers addressing this issue. For example in http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.pdf some general principles are mentioned, like preserving information by not having too rapid changes in any cut of the graph seperating the output from the input.
Another paper is https://arxiv.org/pdf/1606.02228.pdf where specific hyperparameter combinations are tried.
The remainder are just what you observe in practice and depends on your dataset and on your requirement. Maybe you have performance requirements because you want to deploy to mobile or you need more than 90 % accuracy. Then you will have to choose your model accordingly.