When to use YOLO vs vanilla CNN? [closed] - tensorflow

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am wanting to build a computer vision model that can identify an object in an image. For example, identify the (x, y, width, height) pixel coordinates of the bounding box of somebody's hand. I know of complex object detection algorithm likes YOLO and RCNN but am curious as to why I couldn't just create a vanilla Conv Net with an output layer of 4 neurons (for each coordinate value) w/ linear activation functions?
For clarity, I am not wanting to identify multiple objects in the image. Just assuming that only one hand is present in each image.
Any help would be appreciated!

You for sure can do it, there's no math stopping you or anything. YOLO is designed for multiple objects after all.
Some thoughts though:
Your model will always guess some box, even when there's no hand in the image.
If you do use YOLO, you gain the benefit of using some pre-trained network, which makes it robust (at least more robust) to using the model in new environments.

Related

predict the position of an image in another image [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
If one image is a part of another image, then how to compute the accurate location in deep learning way?
Now I could compute this by extracting and matching key points using OpenCV, but I hope to solve it with neural networks.
Any ideas to design the networks and loss functions?
Thanks very much.
This is a detection problem. The simplest approach to do it is to create a a network with two heads, one for classification and the other for the bounding box (regression).
you feed your network with the image and respective label, and sum the lossess and do a backward. train for some epochs and you'll get your self a detection model that you can use to detect what you need. but its just a simple approach and it can get much more complex.
You may as well skip this and use an existing detection architecture or better framework which simplifies your life much better.
For Tensorflow I belive you can use ObjectDetctionAPI and for Pytorch you can use Detectron, Detectron2, mmdetection among others.

Semantic Segmentation with a dominant class [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am training a semantic segmentation model consists of 3 classes(counting with the background).
The background is the dominant class, and the problem is that the model predicts every pixel as background.
I am currently using cross entropy loss function.
What are the solutions for this situation?
This is a typical strong imbalance for image segmentation; down below there are a couple of solutions to tackle this problem.
Use Jaccard(IoU) loss or dice loss; rather than optimizing for accuracy, you will optimise for the intersection over union, for example, and it has been demonstrated that they work much better than cross_entropy in case of imbalanced problems.
You may try to use class weights(sample weights in Keras/TF) in order to assign a greater importance for class 2 and 3 which are not background.
The Focal Loss has shown improvements in MLPs or other deep learning tasks, in which the dataset is strongly imbalanced. Focal loss can be combined with a loss from (1) and (3); it has the potential to improve your results.
You should expect to get the best performance improvement by employing (1) alone.

Machine learning - Train medical image [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to create Deep Neural Network based classifier for chest x-ray to check there is TB or not. I read that transfer Learning technique can be used for this using inception model v3. My question is inception model is created by training with imagenet(physical object) right? How can this be used for medical image training?
One intuition is that physical objects and medical images do share some similarities especially in low-level features such as edges, curves and small object regions.
Experiments indicate that pretraining a network on ImageNet can benefit most computer vision tasks even if the images from the target domain look very different from what are in the ImageNet.
To achieve best performance, you can use a pretrained network from Imagenet and fine-tune the last layer or all layers with small learning rates on your dataset.

How can we calculate the area covered by the trees in google earth pic or even just a ratio of Trees to other things [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Is there any efficient way by which we can we calculate the area covered by trees using machine learning in any google earth image. We can re-train our data using tensorflow and inception trained dataset to identify whether there is tree or not, but I can't think of any way to find out how many trees or how kuch area it is covering. Is there anything we can do.
I use Python, Tensorflow for machine learning.
P.s : don't know much about machine learning but can work with steps.
In computer vision there exists different ways for finding objects in images:
image classification will tell you if an image is something (i.e. this image is a cat)
image detection will tell you where something is in an image (i.e. it will draw a box around a cat)
image segmentation will try to extract the exact contour of something in an image (i.e. the precise contour of the cat, not just a box containing it)
You need a neural network capable of doing the second or third task with aerial images of trees.
Then simply sum all the tree' areas and compare the result with the image size.
Here you can find a Tensorflow network for doing object detection https://github.com/tensorflow/models/tree/master/research/object_detection.

Any way to manually make a variable more important in a machine learning model? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Sometimes you know by experience or by some expert knowledge some variable will play a key role in this model, is there a way to manually make the variable count more so the training process can speed up and the method can combine some human knowledge/wisdom/intelligence.
I still think machine learning combined with human knowledge is the strongest weapon we have now
This might work by scaling your input data accordingly.
On the other hand the strength of a neural network is to figure out
which features are in fact important and which combinations with other
features are important - from the data.
You might argue, that you'll decrease training time. Somebody else might argue that you're biasing your training in such a way that it might even take more time.
Anyway if you would want to do this, assuming a fully connected layer, you could increasedly initialize the weights of the input feature you found important.
Another way, could be to first pretrain the model according to a training loss, that should have your feature as an output. Than keep the weights and switch to the actual loss - I have never tried this, but it could work.