Training different objects using tensorflow Object detection API - tensorflow

I recently came across this link for learning tensorflow object detection
https://www.youtube.com/watch?v=Rgpfk6eYxJA&t=993s
However I have few doubts and want suggestion on how to proceed.
1) How should I train different objects using the same model( I mean what should my data set contain if I want to train cats,dogs as objects.
2) and once I have trained it for dogs and then continue training on cars will the model detect dogs?

Your dataset should contain a large variety of examples for every object (class) you wish to detect. It sounds like you're misunderstanding the training process by assuming that you train it on each class of objects in sequence, this is incorrect. When you train the model you will be taking a random batch of samples (maybe 64 for example) across all classes.
Training simultaneously on all or many of the classes makes sense, you have one model that has to perform equally well on all classes. So when you train the model you compute the error of the parameters with respect to a random selection of classes and average the error to come up with each update step, yielding a model that performs well across classes.
Notice that it's quite common to run into class imbalance issues. If you have only a few samples of cats, and millions of samples of dogs you will disproportionately penalize the network for misclassifying dogs as cats and the network will simply always predict dog to hedge its bet. Ideally, you will have a roughly equal balance of data per class, if not, there are books and tutorials galore on the strategies to deal with this.

Related

Is it possible to train a CNN on a dataset and test it on another dataset with different classes?

I am new to deep learning, and I am doing a research using CNNs. I need to train a CNN model on a dataset of images (landmark images) and test the same model using a different dataset (landmark images too). One of the motivations is to see the ability of the model to generalize. But the problems is: Since the dataset used for train and test is not the same, the classes are not the same! Possibly, the number of classes too, which means that the predictions made on the test dataset are not trust worthy (Since the weights of the output layer have been calculated based on different classes belonging to train dataset). Is there any way to evaluate a model on a different dataset without affecting test accuracy?
The performance of a neural network on one dataset will not generally be the same as its performance on another. Images in one dataset can be more difficult to distinguish than those in another. As a rule of thumb: if your landmark datasets are similar, it's likely that performance will be similar. However, this is not always the case: subtle differences between the datasets can result in significantly different performance.
You can account for the potentially different performance on the two datasets by training another network on the other dataset. This will give you a baseline of what to expect when you try to generalize your network to it.
You can apply your neural network trained for one set of classes to another set of classes. There are two main approaches to this:
Transfer learning. This is where the last layer of your trained network is replaced with a new layer(s) that is trained, by itself, to classify the new images. (Use for many classes. Can use for few classes.)
All-Transfer learning. Rather than replacing the last layer, add a new layer after it and only train the final layers. (Use for few classes.)
Both approaches are much quicker than training a neural network from scratch.
I assume that you are facing a classification problem.
What do you explicitly mean? Do you have classes A B and C in your train-dataset and the same classes in your test-dataset with a different labeling, or do you have completly different classes in your test-dataset with respect to your train-dataset?
You can solve the first problem by creating a mapping from trainlabel to testlabel or vice versa.
The second one depends on what you are trying to achieve... If you want the model to predict classes, which were never trained, you wont get any outcome.

Training keras with tensorflow: Redundancy in labelling the object or multiple labels on same object

I was training keras with tensorflow for person detection. After the training, when the testing was done so many images contains redundant labeling of person. ie; for a single person in an image, multiple labeling as a person was shown. What is the actual reason behind this?
My training set contains nearly 2000 images, a single class person, batch=32, epoch=100, threshold=0.55 and testing images=250.
Overtraining of samples may lead to redundancy and if you are using different angles of an image, for example if you train for detecting people and you are providing samples of human from different angles, then it may show errors on detection in real cases. If this is not the issue, then non- maximal suppression will be the better option.

Retrain TF object detection API to detect a specific car model -- How to prepare the training data?

I am new to object detection and trying to retrain object-detection API in TensorFlow to detect a specific car model in photos. When preparing my own training data to retrain the model, besides things like drawing bounding boxes, etc, my question is, should I also prepare negative examples in the training data (cars that are not the model I am interested in) to reach good performance?
I have read through some tutorials and they usually give example in detecting one type of object, and they prepared training data with the label only for that type. I was thinking, since the model first proposal some area of interest, then try to classify those areas, should I also prepare negative examples if I want to detect very specific stuff from photos.
I am retaining faster_rcnn based model. Thanks for the help.
Yes, you will need negative examples also for better performance. Seems like are you thinking about using transfer learning to train a pre-trained faster_rcnn model to add a new class for your custom car. You should start an equal number of positive and negative examples (images with labelled bounding boxes). You will need have examples of several negative classes (e.g. negative car type 1, negative car type 2, negative car type 3) in addition to your target car type.
You can look at examples of one positive class and several negative classes training data for transfer learning in the data folder of the my github repo at: PSV Detector Github

How to make a model of 10000 Unique items using tensorflow? Will it scale?

I have a use case where I have around 100 images each of 10000 unique items. I have 10 items with me which are all from the 10000 set and I know which 10 items too but only at the time of testing on live data. I have to now match the 10 items with their names. What would be an efficient way to recognise these items? I have full control of training environment background and the testing environment background. If I make one model of all 10000 items, will it scale? Or should I make 10000 different models and run the 10 items on the 10 models I have pretrained.
Your question is regarding something called "one-vs-all classification" you can do a google search for that, the first hit is a video lecture by Andrew Ng that's almost certainly worth watching.
The question has been long studied and in a plethora of contexts. The answer to your question does very much depend on what model you use. But I'll assume that, if you're doing image classification, you are using convolutional neural networks, because, after all, they're state of the art for most such image classification tasks.
In the context of convolutional networks, there is something called "Multi task learning" that you should read up on. Boiled down to a single sentence, the concept is that the more you ask the network to learn the better it is at the individual tasks. So, in this case, you're almost certain to perform better training 1 model on 10,000 classes than 10,000 classes each performing a one-vs-all classification scheme.
Take for example the 1,000 class Imagenet dataset, and CIFAR-10's 10 class dataset. It has been demonstrated in numerous papers that first training against Imagenet's 1,000 class dataset, and then simply replacing the last layer with a 10 class output and re-training on CIFAR-10's dataset will produce a better result than just training on CIFAR-10's dataset alone. There are admittedly multiple reasons for this result, Imagenet is a larger dataset. But the richness of class labels, multi-task learning, in the Imagenet dataset is certainly among the reasons for this result.
So that was a long winded way of saying, use one model with 10,000 classes.
An aside:
If you want to get really, really interesting, and jump into the realm of research level thinking, you might consider a 1-hot vector of 10,000 classes rather sparse and start thinking about whether you could reduce the dimensionality of your output layer using an embedding. An embedding would be a dense vector, let's say size 100 as a good starting point. Now class labels turn into clusters of points in your 100 dimensional space. I bet your network will perform even better under these conditions.
If this little aside didn't make sense, it's completely safe to ignore it, your 10,000 class output is fine. But if it did peek your interest look up information on Word2Vec, and read this really nice post on how face recognition is achieved using embeddings: https://medium.com/#ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78. You might also consider using an Auto Encoder to generate an embedding for the images (though I favor triplet embeddings as typically used in face recognition myself).

One class classification - interpreting the models accuracy

I am using LIBSVM for classification of data. I am mainly doing One Class Classification.
My training sets consists of data of only one class & my testing data consists of data of two classes (one which belong to target class & the other which doesn't belong to the target class).
After applying svmtrain and svmpredict on both training and testing datasets the accuracy which is coming for training sets is 48% and for testing sets it is 34.72%.
Is it good? How can I know whether LIBSVM is classifying the datasets correctly?
To say if it is good or not depends entirely on the data you are trying to classify. You should search what is the state of the art accuracy for SVM model for your kind of classification and then you will be able to know if your model is good or not.
What I can say from your results is that the testing accuracy is worse than the training accuracy, which is normal as a classifier usually perform better with data it has already seen before.
What you can try now is to play with the regularization parameter (C if you are using a linear kernel) and see if the performance improves on the testing set.
You can also trace learning curves to see if your classifier overfit or not, which will help you choose if you need to increase or decrease the regularization.
For you case, you might want to apply weighting on the classes as the data is often sparse in favor of negative example.
To know whether Libsvm is classifying the dataset correctly you can look at which examples it predicted correctly and which ones it predicted incorrectly. Then you can try to change your features to improve its results.
If you are worried about your code being correct, you can try to code a toy example and play with it or use an example of someone on the web and replicate their results.