How to transfer learning or fine tune YOLOv4-darknet with freeze some layers? - object-detection

I'm a beginner in object detection field.
First, I followed YOLOv4 custom-train from here, I have successfully followed the tutorial. Then I started to think that if I have a new task which is similar to YOLOv4 pre-trained (which using COCO 80 classes) and I have only small dataset size, then I think it would be great if I can fine tune the model (unfreeze only the last layer) to keep or even to increase the detector performance by using only small & similar dataset. This reference seems to legitimate my thought about the fine-tuning I wanted to do.
Then I go to Alexey github here to check how to freeze layers, and found that I should use stopbackward=1. It says that
"...set param stopbackward=1 for layer-136 in cfg-file"
But I have no idea about where is "layer-136" in the cfg-file here and also I have no idea where to put stopbackward=1 if I only want to unfreeze the last layer (with freezing all the other layers). So to summarize my questions.
Where (in which line) to put stopbackward=1 in the yolov4-custom.cfg if I want to unfreeze last layer and freeze the other layers?
What is "layer-136" which mentioned in Alexey github reference? (is it one of the classifier layer? or else?)
In which line of yolov4-custom.cfg should I put the stopbackward=1 for that layer-136?
Any further information from you is really appreciated. Please advise.
Thank you in advance.
Regards,
Sona

the "layer-136" is located before the head of yolov4. To make it easy to see, try to visualize the .cfg file to Netron apps and read the .cfg via text editor, so you can understand the location of layer. You can notice the input and output (the x-layer) when you analyze it with Netron

Related

Bald detection using Keras

I was wondering if anyone can help by providing me with some guidelines for creating a bald-or-not image classifier.
So far I have a model for face and eye detection and to sum it up, this is my main questions:
Where can I find datasets for this kind of classification without going to google and download thousands of images by hand?
What classification model (i.e. the structure of layers in the network) should be used for this?
Question 1:
You could start by looking at some of the datasets available in Kaggle or Tensor Flow Datasets to see if there is anything available.
If none, you could try using an Image scraper tool to download images quickly compared to by hand.
Question 2:
Typically Image Classification model uses Convolutional Layers and MaxPooling layers. On top of the commonly used Dense Layer for Multi-layer Perceptron.
To get started you can study the Tensor Flow tutorial for Image Classification in this link,
which classifies whether the Image is Cat or Dog.
This example can provide you with the general idea of how to build an Image Classifier.
Hope this helps you. Thanks

Different evaluation accuracy when loading BERT from checkpoint

For some reason, I get wildly different loss and acc when I evaluate my BERT test set right after training vs. when I load from a saved checkpoint. I thought it might have been my adaptation of BERT, so I tried modifying the run_classifier.py script as little as possible to fit my use case, and I still am seeing this problem.
The only reason I can think of is that the model isn't loading correctly, but I don't know how to fix it. I believe I'm loading how originally intended. For the init_checkpoint parameter, I pass path/to/classifier/model.ckpt-{last_step}. There are three model files (meta, index, data) but there are also the checkpoint, events, and graph files. Do I need to be doing something with those other three files as well? I'm used to using keras, and this pure tensorflow saving/loading process seems unnecessarily convoluted to me.
Thank you in advance for any help/insight regarding BERT or pure tf saving/loading! If you're unfamiliar with BERT, here's the github link: BERT GitHub

Machine Learning: Creating Class Activation Map

I recently followed this tutorial to train my own image classifier
https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/?utm_campaign=chrome_series_machinelearning_063016&utm_source=gdev&utm_medium=yt-desc#0
for those who don't know it allows the retraining of a MobileNet network own your own custom images/categories. I have had sucess training networks on my own images using tensorflow but would like to go further now.
I now would like to generate a class activation map (CAM) for images inputted into the model. I have read you need your convolution neural network to have a Global Average Pooling layer for CAM's to work and am thinking the MoblieNet network does not have that. Is it possible to generate CAM's from the network I have already trained or will I need to retrain using a different network like VGG16??
If it is possible could someone point me in the right direction on how to generate CAM's? If not could someone point me in the right direction on how to retrain a different network with my own images that will allow for CAM's and how to creat CAM's on these networks??
Sorry for the vague nature of the question. I am by no means that familiar and/or trained regarding computer science but am interested to learn more. Your help would be greatly appreciate. Please contact me with further inquires about my question if needed. Hope to hear from you soon.
Thanks!

How can i detect and localize object using tensorflow and convolutional neural network?

My problem statement is as follows :
" Object Detection and Localization using Tensorflow and convolutional neural network "
What i did ?
I am done with the cat detection from images using tflearn library.I successfully trained a model using 25000 images of cats and its working fine with good accuracy.
Current Result :
What i wanted to do?
If my image consist of two or more than two objects in the same image for example cat and dog together so my result should be 'cat and dog' and apart from this i have to find the exact location of these two objects on the image(bounding box)
I came across many high level libraries like darknet , SSD but not able to get the concept behind it.
Please guide me about the approach to solve the problem.
Note : I am using supervised learning techniques.
Expected Result :
You have several ways to go about it.
The most straight forward way is to get some suggested bounding boxes using some bounding box suggestion algorithm like selective search and run on each on of the suggestion the classification net that you already trained. This approach is the approach taken by R-CNN.
For more advanced algorithm based on the above approach i suggest you read about Fast-R-CNN and Faster R-CNN.
Look at Object detection with R-CNN? for some basic explanation.
Darknet and SSD are based on a different approach if you want to undestand them you can read about them on
http://www.cs.unc.edu/~wliu/papers/ssd.pdf
https://pjreddie.com/media/files/papers/yolo.pdf
Image localization is a complex problem with many different implementations achieving the same result with different efficiency.
There are 2 main types of implementation
-Localize objects with regression
-Single Shot Detectors
Read this https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/object_localization_and_detection.html to get a better idea.
Cheers
I have done a similar project (detection + localization) on Indian Currencies using PyTorch and ResNet34. Following is the link of my kaggle notebook, hope you find it helpful. I have manually collected images from the internet and made bounding box around them and saved their annotation file (Pascal VOC) using "LabelImg" annotation tool.
https://www.kaggle.com/shweta2407/objectdetection-on-custom-dataset-resnet34

Can we use Yolo to detect and recognize text in a image

Currently I am using a deep learing model which is called "Yolov2" for object detection, and I want to use it to extract text and use save it in disk, but i don't know how to do that, if anyone know more about that, please advice me
I use Tensorflow
Thanks
If you use the pretrained model, you would need to save those outputs and input the images into a character recognition network, if using neural net, or another approach.
What you are doing is "scene text recognition". You can check out the Reading Text in the Wild with Convolutional Neural Networks paper, here's a demo and homepage. Github user chongyangtao has a whole list of resources on the topic.
I have a similar question and I am making a digit detection model with svhn dataset. It is not a finished project yet, but it seems to work well. You can see the code at Yolo-digit-detector.