How to train a set of images (apart form the tutorial) and use it to classify in Tensorflow? - tensorflow

I am using from the TensorPy GitHub repo (built by Michael Mintz) (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/imagenet/classify_image.py)to easily handle image classifications of either individual or multiple images directly from web pages. However i want to train the system with my own set of images and then run the classifier. Could anyone please point me in the right direction for training a new model and how do i use it in classifying a set of bulk images?
Thank you.

Related

How to transfer learning or fine tune YOLOv4-darknet with freeze some layers?

I'm a beginner in object detection field.
First, I followed YOLOv4 custom-train from here, I have successfully followed the tutorial. Then I started to think that if I have a new task which is similar to YOLOv4 pre-trained (which using COCO 80 classes) and I have only small dataset size, then I think it would be great if I can fine tune the model (unfreeze only the last layer) to keep or even to increase the detector performance by using only small & similar dataset. This reference seems to legitimate my thought about the fine-tuning I wanted to do.
Then I go to Alexey github here to check how to freeze layers, and found that I should use stopbackward=1. It says that
"...set param stopbackward=1 for layer-136 in cfg-file"
But I have no idea about where is "layer-136" in the cfg-file here and also I have no idea where to put stopbackward=1 if I only want to unfreeze the last layer (with freezing all the other layers). So to summarize my questions.
Where (in which line) to put stopbackward=1 in the yolov4-custom.cfg if I want to unfreeze last layer and freeze the other layers?
What is "layer-136" which mentioned in Alexey github reference? (is it one of the classifier layer? or else?)
In which line of yolov4-custom.cfg should I put the stopbackward=1 for that layer-136?
Any further information from you is really appreciated. Please advise.
Thank you in advance.
Regards,
Sona
the "layer-136" is located before the head of yolov4. To make it easy to see, try to visualize the .cfg file to Netron apps and read the .cfg via text editor, so you can understand the location of layer. You can notice the input and output (the x-layer) when you analyze it with Netron

Bald detection using Keras

I was wondering if anyone can help by providing me with some guidelines for creating a bald-or-not image classifier.
So far I have a model for face and eye detection and to sum it up, this is my main questions:
Where can I find datasets for this kind of classification without going to google and download thousands of images by hand?
What classification model (i.e. the structure of layers in the network) should be used for this?
Question 1:
You could start by looking at some of the datasets available in Kaggle or Tensor Flow Datasets to see if there is anything available.
If none, you could try using an Image scraper tool to download images quickly compared to by hand.
Question 2:
Typically Image Classification model uses Convolutional Layers and MaxPooling layers. On top of the commonly used Dense Layer for Multi-layer Perceptron.
To get started you can study the Tensor Flow tutorial for Image Classification in this link,
which classifies whether the Image is Cat or Dog.
This example can provide you with the general idea of how to build an Image Classifier.
Hope this helps you. Thanks

Image Detection & Classification - general approach?

I'm trying to build a detection + classification model that will recognize an object in an image and classify it. Every image will contain at most 1 object among my 10 classes (i.e. same image cannot contains 2 classes). An image can, however, contain none of my classes/objects. I'm struggling with the general approach to this problem, especially due to the nature of my problem; my objects have different sizes. This is what I have tried:
Trained a classifier with images that only contains my objects/classes, i.e. every image is the object itself with background pre-removed. Now, since the objects/images have different shapes (aspect ratios) I had to reshape the images to the same size (destroying the aspect ratios). This would work just fine if my purpose was to only build a classifier, but since I also need to detect the objects, this didn't work so good.
The second approach was similar to (1), except that I didn't reshape the objects naively, but kept the aspect ratios by padding the image with 0 (black). This completely destroyed my classifiers ability to perform well (accuracy < 5%).
Mask RCNN - I followed this blogpost to try build a detector + classifier in the same model. The approach took forever and I wasn't sure it was the right approach. I even used external tools (RectLabel) to generate annotated image files containing information about the bounding boxes.
Question:
How should I approach this problem, on a general level:
Should I build 2 separate models? (One for detection/localization and one for classification?)
Should I be annotating my images using annotations file as in approach (3)?
Do I have to reshape my images at any stage?
Thanks,
PS. In all of my approaches, I augmented the images to generate ~500-1000 images per class.
To answer your questions:
No, you don't have to build two separate models. What you are describing is called Object detection, which is classification along with localization. There are many models which do this: Mask_RCNN, Yolo, Detectron, SSD, etc..
Yes, you do need to annotate your images for training a model for your custom classes. Each of the models mentioned above has needs a different way of annotation.
No, you don't need to do any image resizing. Most of the time it is done when the model loads the data for training or inference.
You are on the right track with trying MaskRCNN.
Other than MaskRCNN, you could also try Yolo. There is also an accompanying easy-to-use annotating tool Yolo-Mark.
If you go through this tutorial, you would understand what you care about.
How to train your own Object Detector with TensorFlow’s Object Detector API
The SSD model is small so that it would not take so much time for training.
There are some object detection models.
On RectLabel, you can save bounding boxes in the PASCAL VOC format.
You can export TFRecord for Tensorflow.
https://rectlabel.com/help#tf_record

Can we use Yolo to detect and recognize text in a image

Currently I am using a deep learing model which is called "Yolov2" for object detection, and I want to use it to extract text and use save it in disk, but i don't know how to do that, if anyone know more about that, please advice me
I use Tensorflow
Thanks
If you use the pretrained model, you would need to save those outputs and input the images into a character recognition network, if using neural net, or another approach.
What you are doing is "scene text recognition". You can check out the Reading Text in the Wild with Convolutional Neural Networks paper, here's a demo and homepage. Github user chongyangtao has a whole list of resources on the topic.
I have a similar question and I am making a digit detection model with svhn dataset. It is not a finished project yet, but it seems to work well. You can see the code at Yolo-digit-detector.

Object detection using CNTK

I am very new to CNTK.
I wanted to train a set of images (to detect objects like alcohol glasses/bottles) using CNTK - ResNet/Fast-R CNN.
I am trying to follow below documentation from GitHub; However, it does not appear to be a straight forward procedure. https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN
I cannot find proper documentation to generate ROI's for the images with different sizes and shapes. And how to create object labels based on the trained models? Can someone point out to a proper documentation or training link using which I can work on the cntk model? Please see the attached image in which I was able to load a sample image with default ROI's in the script. How do I properly set the size and label the object in the image ? Thanks in advance!
sample image loaded for training
Not sure what you mean by proper documentation. This is an implementation of the paper (https://arxiv.org/pdf/1504.08083.pdf). Looks like you are trying to generate ROI's. Can you look through the helper functions as documented at the site to parse what you might need:
To run the toy example, make sure that in PARAMETERS.py the datasetName is set to "grocery".
Run A1_GenerateInputROIs.py to generate the input ROIs for training and testing.
Run A2_RunCntk_py3.py to train a Fast R-CNN model using the CNTK Python API and compute test results.
The algo will work on several candidate regions and then generate outputs: one for the classes of objects and another one that generates the bounding boxes for the objects belonging to those classes. Please refer to the code for getting the details of the implementation.
Can someone point out to a proper documentation or training link using which I can work on the cntk model?
You can take a look at my repository on GitHub.
It will guide you through all the steps required to train your own model for object detection and classification with CNTK.
But in short the proper steps should look something like this:
Setup environment
Prepare data
Tag images (ground truth)
Download pretrained model and create mappings for your custom dataset
Run training
Evaluate the model on test set