How to create a class for non classified object in tensorflow? - tensorflow

Hi i have build my CNN with two classes dogs and cats, i have trained this and now i am able to classify dog and cat image. But what about if i want to introduce a class for new unclassified object? For example if i feed my network with a flower image's the network give me a wrong classification. I want to build my network with a third class for new unclassified object. But how can i build this third class. Which images i have to use to get class for new object that are different from dogs or cats?
Actually at the end of my network i use Softmax and my code is developed by using tensorflow. Could someone provide me some suggestion? Thanks

You need to add a third "something else" class to your network. There are several ways you can go about it. In general, if you have a class that you want to detect you should have examples for that class, so you could add images without cats or dogs to your training data labelled with the new class. However, this is a bit tricky, because the new class is, by definition, everything in the universe but dogs and cats, so you cannot possibly expect to have enough data to train for it. In practice, though, if you have enough examples the network will probably learn that the third class is triggered whenever the first two are not.
Another option that I have used in the past is to model the "default" class slightly different from the regular ones. So, instead of trying to actually learn what is a "not cat or dog" image, you can just explicitly say that it is just whatever does not activates the cat or dog neurons. I did this by replacing the last layer from softmax to a sigmoids (so the loss would be sigmoid cross-entropy instead of softmax cross-entropy, and the output would not be a categorical probability distribution anymore, but honestly it didn't make much difference performance-wise in my case), then express the "default" class as 1 minus the maximum activation value from every other class. So, if no class had an activation of 0.5 of greater (i.e. 50% estimated probability of being that class), the "default" class would be the highest scoring one. You can explore this an other similar schemes.

You should just add images to your dataset that are neither dogs nor cats, label them as "Other", and treat "Other" as normal class in all your code. In particular you'll get a softmax over 3 classes.
The images you're using can be anything (except cats and dogs of course), but should be of the same kind as the ones you'll probably be testing against when using your network. So for instance if you know you'll be testing on images of dogs, cats, and other animals, train with other animals, not with pictures of flowers. If you don't know what you'll be testing with, try to get very varied images, from different sources, etc, so that the network learns well that this class is "anything but cats and dogs" (the wide range of images in the real world that fall in this category should be reflected in your training dataset).

Related

Is it possible to combine two different custom YOLOv4 models

I'm working on an object detection project where I have to identify the type of animals and their posture given an image/video. For this purpose, I have two custom YOLOv4 models which are trained separately. Model 1 identifies the type of animal and Model 2 identifies the posture of the animal. I have converted these models to TensorFlow models.
Now, since both the models use the same image/video as input, I want to combine the outputs of both the models and the final output should display the bounding box of both the models.
I'm stuck at this point, I have been researching the solution for this and I'm confused with various methods. Could anyone help me with this?
I don't think that you need object detection model as pose identifier - because you've already localized the animal by 1st net.
The easiest (and clearly not very accurate) solution that I see is to use classifier on top of detections (crop bounding box as input) - but in that case the animal anatomy is not taken into account explicitly, but that approach is I guess still good baseline.
For further experiments you can take a look at these and these solutions with animal pose estimation, but they are more complex to use

Object Detection without labels / annotation

Let say I have 3 images (an apple, an orange, a banana) and another 1000 arbitrary images. What I want to do is to see if those 1000 arbitrary images contain object(s) similar to the former 3 images, if yes, draw a bounding box to indicate those objects. However, none of these 1003 images or objects are labelled nor have any annotations.
I have do some research on the internet and try to find some deep learning object detection approach (e.g. Faster R-CNN, YOLOv3) but I couldn't think of how they can be related to my task.
I have also notice that there is a term called template matching, but it seems not much related to deep learning.
So my question is:
Is there any good approach or deep learning model that could meet my needs?
Will I be benefit from any pre-trained Faster R-CNN, YOLOv3 models? (e.g. If they are trained by cars, people, dogs, cats image set, will those meaningful features can also apply to new domain?)
I want to do is to see if those 1000 arbitrary images contain object(s) similar to the former 3 image
What did you mean by "similar?"
If you meant "I want to see if the 1000 images contain objects from the target classes: orange, apple, and banana", then here's the answer:
If your models were pre-trained with your target classes (orange,
apple, and banana), then you can use those pre-trained models to
detect the objects in your 1003 images. You can just select orange,
apple, and banana as the classes' names in the configuration.
If your pre-trained models weren't trained on your target classes and you only have your 1003 images, you will need to do what is called fine-tuning, which is training the last layer of the model. 1003 images might not be enough for training the model and you might need to perform data augmentation to expand your data. Also, consider making your classes balanced (meaning having the same number of objects per class).
For something close to "similarity score," you can consider the
confidence score for class x, which is the likelihood the bounding box contains an object x. However, this confidence score mainly depends on "how well trained" the model is on class x. For example, different models may differ in their confidence scores for the same images. Also, the same model may have different confidence scores for the same object in different angles, lighting, and orientation. Thus, it might be a better idea for you to fine-tune the models anyway so that they can be more "robust" to any representations of your target classes.

Training different objects using tensorflow Object detection API

I recently came across this link for learning tensorflow object detection
https://www.youtube.com/watch?v=Rgpfk6eYxJA&t=993s
However I have few doubts and want suggestion on how to proceed.
1) How should I train different objects using the same model( I mean what should my data set contain if I want to train cats,dogs as objects.
2) and once I have trained it for dogs and then continue training on cars will the model detect dogs?
Your dataset should contain a large variety of examples for every object (class) you wish to detect. It sounds like you're misunderstanding the training process by assuming that you train it on each class of objects in sequence, this is incorrect. When you train the model you will be taking a random batch of samples (maybe 64 for example) across all classes.
Training simultaneously on all or many of the classes makes sense, you have one model that has to perform equally well on all classes. So when you train the model you compute the error of the parameters with respect to a random selection of classes and average the error to come up with each update step, yielding a model that performs well across classes.
Notice that it's quite common to run into class imbalance issues. If you have only a few samples of cats, and millions of samples of dogs you will disproportionately penalize the network for misclassifying dogs as cats and the network will simply always predict dog to hedge its bet. Ideally, you will have a roughly equal balance of data per class, if not, there are books and tutorials galore on the strategies to deal with this.

Retrain TF object detection API to detect a specific car model -- How to prepare the training data?

I am new to object detection and trying to retrain object-detection API in TensorFlow to detect a specific car model in photos. When preparing my own training data to retrain the model, besides things like drawing bounding boxes, etc, my question is, should I also prepare negative examples in the training data (cars that are not the model I am interested in) to reach good performance?
I have read through some tutorials and they usually give example in detecting one type of object, and they prepared training data with the label only for that type. I was thinking, since the model first proposal some area of interest, then try to classify those areas, should I also prepare negative examples if I want to detect very specific stuff from photos.
I am retaining faster_rcnn based model. Thanks for the help.
Yes, you will need negative examples also for better performance. Seems like are you thinking about using transfer learning to train a pre-trained faster_rcnn model to add a new class for your custom car. You should start an equal number of positive and negative examples (images with labelled bounding boxes). You will need have examples of several negative classes (e.g. negative car type 1, negative car type 2, negative car type 3) in addition to your target car type.
You can look at examples of one positive class and several negative classes training data for transfer learning in the data folder of the my github repo at: PSV Detector Github

How to make a model of 10000 Unique items using tensorflow? Will it scale?

I have a use case where I have around 100 images each of 10000 unique items. I have 10 items with me which are all from the 10000 set and I know which 10 items too but only at the time of testing on live data. I have to now match the 10 items with their names. What would be an efficient way to recognise these items? I have full control of training environment background and the testing environment background. If I make one model of all 10000 items, will it scale? Or should I make 10000 different models and run the 10 items on the 10 models I have pretrained.
Your question is regarding something called "one-vs-all classification" you can do a google search for that, the first hit is a video lecture by Andrew Ng that's almost certainly worth watching.
The question has been long studied and in a plethora of contexts. The answer to your question does very much depend on what model you use. But I'll assume that, if you're doing image classification, you are using convolutional neural networks, because, after all, they're state of the art for most such image classification tasks.
In the context of convolutional networks, there is something called "Multi task learning" that you should read up on. Boiled down to a single sentence, the concept is that the more you ask the network to learn the better it is at the individual tasks. So, in this case, you're almost certain to perform better training 1 model on 10,000 classes than 10,000 classes each performing a one-vs-all classification scheme.
Take for example the 1,000 class Imagenet dataset, and CIFAR-10's 10 class dataset. It has been demonstrated in numerous papers that first training against Imagenet's 1,000 class dataset, and then simply replacing the last layer with a 10 class output and re-training on CIFAR-10's dataset will produce a better result than just training on CIFAR-10's dataset alone. There are admittedly multiple reasons for this result, Imagenet is a larger dataset. But the richness of class labels, multi-task learning, in the Imagenet dataset is certainly among the reasons for this result.
So that was a long winded way of saying, use one model with 10,000 classes.
An aside:
If you want to get really, really interesting, and jump into the realm of research level thinking, you might consider a 1-hot vector of 10,000 classes rather sparse and start thinking about whether you could reduce the dimensionality of your output layer using an embedding. An embedding would be a dense vector, let's say size 100 as a good starting point. Now class labels turn into clusters of points in your 100 dimensional space. I bet your network will perform even better under these conditions.
If this little aside didn't make sense, it's completely safe to ignore it, your 10,000 class output is fine. But if it did peek your interest look up information on Word2Vec, and read this really nice post on how face recognition is achieved using embeddings: https://medium.com/#ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78. You might also consider using an Auto Encoder to generate an embedding for the images (though I favor triplet embeddings as typically used in face recognition myself).