Object Detection without labels / annotation - tensorflow

Let say I have 3 images (an apple, an orange, a banana) and another 1000 arbitrary images. What I want to do is to see if those 1000 arbitrary images contain object(s) similar to the former 3 images, if yes, draw a bounding box to indicate those objects. However, none of these 1003 images or objects are labelled nor have any annotations.
I have do some research on the internet and try to find some deep learning object detection approach (e.g. Faster R-CNN, YOLOv3) but I couldn't think of how they can be related to my task.
I have also notice that there is a term called template matching, but it seems not much related to deep learning.
So my question is:
Is there any good approach or deep learning model that could meet my needs?
Will I be benefit from any pre-trained Faster R-CNN, YOLOv3 models? (e.g. If they are trained by cars, people, dogs, cats image set, will those meaningful features can also apply to new domain?)

I want to do is to see if those 1000 arbitrary images contain object(s) similar to the former 3 image
What did you mean by "similar?"
If you meant "I want to see if the 1000 images contain objects from the target classes: orange, apple, and banana", then here's the answer:
If your models were pre-trained with your target classes (orange,
apple, and banana), then you can use those pre-trained models to
detect the objects in your 1003 images. You can just select orange,
apple, and banana as the classes' names in the configuration.
If your pre-trained models weren't trained on your target classes and you only have your 1003 images, you will need to do what is called fine-tuning, which is training the last layer of the model. 1003 images might not be enough for training the model and you might need to perform data augmentation to expand your data. Also, consider making your classes balanced (meaning having the same number of objects per class).
For something close to "similarity score," you can consider the
confidence score for class x, which is the likelihood the bounding box contains an object x. However, this confidence score mainly depends on "how well trained" the model is on class x. For example, different models may differ in their confidence scores for the same images. Also, the same model may have different confidence scores for the same object in different angles, lighting, and orientation. Thus, it might be a better idea for you to fine-tune the models anyway so that they can be more "robust" to any representations of your target classes.

Related

what neural network model is the most effective

at the moment I study neural network. I tried to use different models to recognize people and came across one very interesting question for me. I used yolo v3, mask r-cnn, but all of them in the photos taken from an indirect angle missed people in the photo. Which of the existing models is the most accurate and effective ?
This is the main problem with deep learning models. For every instance of an object you want to detect, there should be at least one similar object to it (in case of angle, size, color, shape, etc) in the training set. The more similar objects in the training data, the higher probability of the object to be detected.
In case of speed and accuracy, YOLO V3 is currently one of the best. Mask RCNN is also one of the best models if you want the exact boundaries of the object (segmentation). If there is no need for the exact boundaries of the objects, I would recommend using YOLO for its efficiency, You can work on your training data and try to add multiple instances of people with different sizes, angles, shapes, and also include cases of truncation and occlusion (when just parts of a person is visible) to get more generalization in the model's performance.

Weakly labelled dataset for object detection

I have a few datasets that are well labeled for each class they represent. I'm trying to build an object detection model using the tensorflow/research/object_detection pipeline to detect each object.
However... each dataset is not labelled for the other classes. I'm concerned that mined examples will be labelled as the background class when they really represent a class in the other datasets.
For example, if I'm trying to make a fruit detector and I have a dataset labelled with apples, another labelled with oranges and another labelled with bananas - how would I go about weighting the classification loss so it ignores the apple predictions on the orange examples and vice versa?
If i understand your post correctly, each image has multiple class objects in it. If thats the case then just create multiple bounding boxes for each of the classes in that image. An image can have multiple bounding boxes drawn with labelled class name that can be used for training

Training different objects using tensorflow Object detection API

I recently came across this link for learning tensorflow object detection
https://www.youtube.com/watch?v=Rgpfk6eYxJA&t=993s
However I have few doubts and want suggestion on how to proceed.
1) How should I train different objects using the same model( I mean what should my data set contain if I want to train cats,dogs as objects.
2) and once I have trained it for dogs and then continue training on cars will the model detect dogs?
Your dataset should contain a large variety of examples for every object (class) you wish to detect. It sounds like you're misunderstanding the training process by assuming that you train it on each class of objects in sequence, this is incorrect. When you train the model you will be taking a random batch of samples (maybe 64 for example) across all classes.
Training simultaneously on all or many of the classes makes sense, you have one model that has to perform equally well on all classes. So when you train the model you compute the error of the parameters with respect to a random selection of classes and average the error to come up with each update step, yielding a model that performs well across classes.
Notice that it's quite common to run into class imbalance issues. If you have only a few samples of cats, and millions of samples of dogs you will disproportionately penalize the network for misclassifying dogs as cats and the network will simply always predict dog to hedge its bet. Ideally, you will have a roughly equal balance of data per class, if not, there are books and tutorials galore on the strategies to deal with this.

Retrain TF object detection API to detect a specific car model -- How to prepare the training data?

I am new to object detection and trying to retrain object-detection API in TensorFlow to detect a specific car model in photos. When preparing my own training data to retrain the model, besides things like drawing bounding boxes, etc, my question is, should I also prepare negative examples in the training data (cars that are not the model I am interested in) to reach good performance?
I have read through some tutorials and they usually give example in detecting one type of object, and they prepared training data with the label only for that type. I was thinking, since the model first proposal some area of interest, then try to classify those areas, should I also prepare negative examples if I want to detect very specific stuff from photos.
I am retaining faster_rcnn based model. Thanks for the help.
Yes, you will need negative examples also for better performance. Seems like are you thinking about using transfer learning to train a pre-trained faster_rcnn model to add a new class for your custom car. You should start an equal number of positive and negative examples (images with labelled bounding boxes). You will need have examples of several negative classes (e.g. negative car type 1, negative car type 2, negative car type 3) in addition to your target car type.
You can look at examples of one positive class and several negative classes training data for transfer learning in the data folder of the my github repo at: PSV Detector Github

How to create a class for non classified object in tensorflow?

Hi i have build my CNN with two classes dogs and cats, i have trained this and now i am able to classify dog and cat image. But what about if i want to introduce a class for new unclassified object? For example if i feed my network with a flower image's the network give me a wrong classification. I want to build my network with a third class for new unclassified object. But how can i build this third class. Which images i have to use to get class for new object that are different from dogs or cats?
Actually at the end of my network i use Softmax and my code is developed by using tensorflow. Could someone provide me some suggestion? Thanks
You need to add a third "something else" class to your network. There are several ways you can go about it. In general, if you have a class that you want to detect you should have examples for that class, so you could add images without cats or dogs to your training data labelled with the new class. However, this is a bit tricky, because the new class is, by definition, everything in the universe but dogs and cats, so you cannot possibly expect to have enough data to train for it. In practice, though, if you have enough examples the network will probably learn that the third class is triggered whenever the first two are not.
Another option that I have used in the past is to model the "default" class slightly different from the regular ones. So, instead of trying to actually learn what is a "not cat or dog" image, you can just explicitly say that it is just whatever does not activates the cat or dog neurons. I did this by replacing the last layer from softmax to a sigmoids (so the loss would be sigmoid cross-entropy instead of softmax cross-entropy, and the output would not be a categorical probability distribution anymore, but honestly it didn't make much difference performance-wise in my case), then express the "default" class as 1 minus the maximum activation value from every other class. So, if no class had an activation of 0.5 of greater (i.e. 50% estimated probability of being that class), the "default" class would be the highest scoring one. You can explore this an other similar schemes.
You should just add images to your dataset that are neither dogs nor cats, label them as "Other", and treat "Other" as normal class in all your code. In particular you'll get a softmax over 3 classes.
The images you're using can be anything (except cats and dogs of course), but should be of the same kind as the ones you'll probably be testing against when using your network. So for instance if you know you'll be testing on images of dogs, cats, and other animals, train with other animals, not with pictures of flowers. If you don't know what you'll be testing with, try to get very varied images, from different sources, etc, so that the network learns well that this class is "anything but cats and dogs" (the wide range of images in the real world that fall in this category should be reflected in your training dataset).