Can we Ignore unnecessary classes in the Tensorflow object detection API by only omitting labels in pbtxt label map file? - tensorflow

So I am trying to create custom datasets for object detection using the Tensorflow Object detection API. When working with open source datasets the annotation files I have come across as PASCAL VOC xmls or jsons. These contain a list of labels for each class for example:
<annotation>
<folder>open_images_volume</folder>
<filename>0d2471ff2d033ccd.jpg</filename>
<path>/mnt/open_images_volume/0d2471ff2d033ccd.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>1024</width>
<height>1024</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>Chair</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>8</xmin>
<ymin>670</ymin>
<xmax>409</xmax>
<ymax>1020</ymax>
</bndbox>
</object>
<object>
<name>Table</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>232</xmin>
<ymin>654</ymin>
<xmax>555</xmax>
<ymax>1020</ymax>
</bndbox>
</object>
</annotation>
Here the annotation file describes 2 classes, Table & chair. I am only interested in detecting chairs, which is why the pbtxt file I have generated is simply
item {
id: 1
display_name: "Chair"
}
My question is that will the model train on simply the annotations of the class "Chair" because that's what I have defined via the label_map.pbtxt or do I need to manually scrape all the annotation files and remove the bounding box co-ordinates through regex or xmltree in order to make sure the additional bounding boxes do not interfere with training. So is it possible to select only custom classes for training even if the annotation files have additional classes through the TF API or is it necessary to clean up the entire datasets and manually remove unnecessary class labels? Will it affect training in any way?

You can use a .pbtxt that only has the classes that you need to train and you don't have to change the xmls.
Also, make sure to change the num_classes: your_num_classes.

Related

Conversion of image annotation formats into tfrecords for tensorflow object detection api

Seeking help regarding image annotation formats for object detection API.
Foreknow:
As, we know there are two annotation formats for images, Pascal VOC and COCO formats. Both have their own specification here's the main difference between both:
Pascal VOC:
Stores annotation in .xml file format.
Bounding box format [x-top-left, y-top-left, x-bottom-right, y-bottom-right]
Create separate xml annotation file for each image in the dataset.
COCO:
Stores annotation in .json file format.
Bounding box format [x-top-left, y-top-left, width, height].
Create one annotation file for each training, testing and validation.
Current-issue:
I have two dataset to deal and this is how they are annotated.
Dataset-1:
File format: Pascal VOC(.xml)
Bounding box format: COCO.
File creation: As in Pascal VOC(separate xml annotation file for each image in the dataset).
Dataset-2:
File format: Pascal VOC(.xml)
Bounding box format: COCO.
File creation: As in COCO(Create one annotation file for each training, testing and validation)
The thing that I am not able to get pass through is which format(Pascal VOC or COCO) should I follow to convert my annotations into Tfrecords(.xml to .records) as use can see the annotations of dataset aren't purely belong to any of one format.
For instance, in this link the author wrote a script to convert .xml into .records but here it is dealing with pure pascal VOC format.
And in this link they are dealing with pure COCO annotation formats.
Which path should I follow as I am standing in the middle of both formats?
Which path should I follow as I am standing in the middle of both formats?
Use Pascal VOC format for conversion of .xml into .records.
Make the following changes in a create_tf_example function of this link
for index, row in group.TextLine.iterrows():
xmin.append(row['X']/imgwidth)
xmax.append((row['X']+row['Width'])/imgwidth)
ymin.append(row['Y']/imgheight)
ymax.append((row['Y']+row['Height'])/imgheight)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))'
In case where you have X, Y, Width, Height in your .xml annotations instead of xmin, ymin, xmax, ymax.

Background images in one class object detection

When training a single class object detector in Tensorflow, I am trying to pass instances of images where no signal object exists, such that the model doesn't learn that every image contains at least one instance of that class. E.g. if my signal were cats, id want to pass pictures of other animals/landscapes as background -this could also reduce false positives.
I can see that a class id is reserved in the object detection API (0) for background, but I am unsure how to code this into the TFrecords for my background images - class could be 0 but what would be the bounding box coords? Or do i need a simpler classifier on top of this model to detect if there is a signal in the image or not, prior to detecting position?
Later approach of simple classifier, makes sense. I don't think there is a way to do the first part. You can use check on confidence score as well apart from checking the object is present.
It is good practice to create a dataset with not objects of interest, for the same you need to use the same tools (like - label img) that you have used for adding the boxes, image with no BB wil have xml files with no details of BB but only details of the image. The script create tf record will create the tf record from the xml files, look at the below links for more inforamtion -
Create tf record example -
https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_pet_tf_record.py
Using your own dataset-
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md

Unable to get readable class labels for Inception-ResNet-v2 model

I am using Inception-ResNet-v2 pretrained version to classify the images. I need human-readable class labels for this. I found one in following site: https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a.
However, when I try to validate these labels with the images, I find it doesn't map to correct labels. One such instance is I tried to classify "Panda" image- the class label it matches is : "barracouta, snoek" with score - 0.927924 and "giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca" with score - 0.001053.
Please provide me a source where I can find correct mappings of class label to human-readable text for this model.
https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a this human readable label works, but initialize the class label list with an "unused background", before loading these labels, as inception resnet v2 model is trained for 1001 classes and this has 1000.

tensorflow retrain model file

im getting started with tensorflow und using retrain.py to teach it some new categories - this works well - however i have some questions:
In the comments of retrain.py it says:
"This produces a new model file that can be loaded and run by any TensorFlow
program, for example the label_image sample code"
however I havent found where this new model file is saved to ?
also: it does contain the whole model, right ? not just the retrained part ?
Thanks for clearing this up
1)I think you may want to save the new model.
When you want to save a model after some process, you can use
saver.save(sess, 'directory/model-name', *optional-arg).
Check out https://www.tensorflow.org/api_docs/python/tf/train/Saver
If you change model-name by epoch or any measure you would like to use, you can save the new model(otherwise, it may overlap with previous models saved).
You can find the model saved by searching 'checkpoint', '.index', '.meta'.
2)Saving the whole model or just part of it?
It's the part you need to learn bunch of ideas on tf.session and savers. You can save either the whole or just part, it's up to you. Again, start from the above link. The moral is that you put the variables you would like to save in a list quoted as 'var_list' in the link, and you can save only for them. When you call them back, you now also need to specify which variables in your model correspond to the variables in the loaded variables.
While running retrain.py you can give --output_graph and --output_labels parameters which specify the location to save graph (default is /tmp/output_graph.pb) and the labels as well. You can change those as per your requirements.

Comparing the structures of two graphs

Is there a way in TensorFlow to find out if two graphs have the same structure ?
I am designing an abstract class whose individual instances are expected to represent different architectures. I have provided an abc.abstractmethod get() which defines the graph. However, I also want to be able to load a pre-trained graph from disk. I want to check if the pre-trained graph has the same definition as the one mentioned in the get() method of a concrete class.
How may I achieve this structural comparison ?
You can get graph definition of current graph as str(tf.get_default_graph().as_graph_def()) and compare for exact equality against your previous result.
Also, TensorFlow tests have more advanced function EqualGraphDef which can tell that two graphs are equal even when graph format has changed, ie, if actual and expected as GraphDef proto objects, you could do
from tensorflow.python import pywrap_tensorflow
diff = pywrap_tensorflow.EqualGraphDefWrapper(actual.SerializeToString(),
expected.SerializeToString())
assert not diff