Different use of feature extractor with ssd meta-architecture in Tensorflows's Object Detection API - tensorflow

Is it possible to use different feature extractor with SSD meta-architecture in Tensorflow's Object Detection API? I know that .config files for mobilenets and inception are provided but is it possible to use a different architecture like AlexNet or VGG?

It's possible but with a little bit of work, as explained here, you should read this page for detailed explanation and links to examples.
In short, you'll need to create a custom FasterRCNNFeatureExtractor class, corresponding to VGG or AlexNet (it may require a bit of knowledge about these, for instance the amount of subsampling invovled). In this class, you'll code how your data should be preprocessed, how to retrieve the 1st and 2nd stage features in it (typically how is the last convolutional layer called), and how to load it.
Then you'll need to register your feaure extractor (tell the object detection API that it exists) by modifying the file object_detection/builders/model_builder.py.
Finally you should be able to make a config file with your custom feature extractor, et voilĂ  !

Related

How do I use the Object Detection API to evaulate an own custom model? What do I write into the config files?

I have a custom object detection model that I can call with model = MyModel() and model.loadweights(checkpoint) and I want to evaluate it using the Object Detection API.
From what I understood there are two possibilities, either I use the legacy eval.py, there I don't know, what to put into the pipeline_config file
Or I use the newer version that is implemented in model_main_tf2.py, but there I would have to save my model as model.config and I don't know what to put the pipeline file either.
Since my model is a YOLO model, it is not included in the sample once yet.
https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2
Would really appreciate the help!
You can't calculate the mAP using the Object Detection API because there's no pipeline.config file for Yolo.
However, you can check this repo out. It's a Tensorflow based implementation of YoloV3. They have working code for calculating mAP. You can modify this accordingly to calculate the mAP of your model.

How to create custom Tensorflow Graph from Binary Encoding of NN

I would like to create a custom Tensorflow topology, specifically a custom Tensorflow topology out of a binary encoding of a neural network. I have attached a picture of what I mean by 'binary encoding of a neural network' below.
Binary Encoding of ANN (Source: Yao99)
Unfortunately am I only familiar with how to use Tensorflow using complete layers, mostly via the Keras API, though I don't know how to create the whole topology in a custom way from scratch.
I don't require a complete solution, but would already highly appreciate links to tutorials on how to create such custom topologies from scratch. The final translation on how to map the binary encoding to the custom graph creation I can do myself. Unfortunately am I unable to find resources for such custom topologies online. Thank you for your help!

Customize Input to Tensorflow Hub module

I know how to load a pre-trained image models from Tensorflow Hub. like so:
#load model
image_module = hub.Module('https://tfhub.dev/google/imagenet/mobilenet_v2_035_128/feature_vector/2')
#get predictions
features = image_module(batch_images)
I also know how to customize the output of this model (fine-tune on new dataset). The existing Modules expect input batch_images to be a RGB image tensor.
My question: Instead of the input being a RGB image of certain dimensions, I would like to use a tensor (dim 20x20x128, from a different model) as input to the Hub model. This means I need to by-passing the initial layers of the tf-hub model definition (i don't need them). Is this possible in tf-hub module api's? Documentation is not clear on this aspect.
p.s.: I can do this easily be defining my own layers but trying to see if i can use the Tf-Hub API's.
The existing https://tfhub.dev/google/imagenet/... modules do not support this.
Generally speaking, the hub.Module format allows multiple signatures (that is, combinations of input/output tensors; think feeds and fetches as in tf.Session.run()). So module publishers can arrange for that if there is a common usage pattern they want to support.
But for free-form experimentation at this level of sophistication, you are probably better off directly using and tweaking the code that defines the models, such as TF Slim (for TF1.x) or Keras Applications (also for TF2). Both provide Imagenet-pretrained checkpoints for downloading and restoring on the side.

Tensorflow Stored Learning

I haven't tried Tensorflow yet but still curious, how does it store, and in what form, data type, file type, the acquired learning of a machine learning code for later use?
For example, Tensorflow was used to sort cucumbers in Japan. The computer used took a long time to learn from the example images given about what good cucumbers look like. In what form the learning was saved for future use?
Because I think it would be inefficient if the program should have to re-learn the images again everytime it needs to sort cucumbers.
Ultimately, a high level way to think about a machine learning model is three components - the code for the model, the data for that model, and metadata needed to make this model run.
In Tensorflow, the code for this model is written in Python, and is saved in what is known as a GraphDef. This uses a serialization format created at Google called Protobuf. Common serialization formats include Python's native Pickle for other libraries.
The main reason you write this code is to "learn" from some training data - which is ultimately a large set of matrices, full of numbers. These are the "weights" of the model - and this too is stored using ProtoBuf, although other formats like HDF5 exist.
Tensorflow also stores Metadata associated with this model - for instance, what should the input look like (eg: an image? some text?), and the output (eg: a class of image aka - cucumber1, or 2? with scores, or without?). This too is stored in Protobuf.
During prediction time, your code loads up the graph, the weights and the meta - and takes some input data to give out an output. More information here.
Are you talking about the symbolic math library, or the idea of tensor flow in general? Please be more specific here.
Here are some resources that discuss the library and tensor flow
These are some tutorials
And here is some background on the field
And this is the github page
If you want a more specific answer, please give more details as to what sort of work you are interested in.
Edit: So I'm presuming your question is more related to the general field of tensor flow than any particular application. Your question still is too vague for this website, but I'll try to point you toward a few resources you might find interesting.
The tensorflow used in image recognition often uses an ANN (Artificial Neural Network) as the object on which to act. What this means is that the tensorflow library helps in the number crunching for the neural network, which I'm sure you can read all about with a quick google search.
The point is that tensorflow isn't a form of machine learning itself, it more serves as a useful number crunching library, similar to something like numpy in python, in large scale deep learning simulations. You should read more here.

Object detection using CNTK

I am very new to CNTK.
I wanted to train a set of images (to detect objects like alcohol glasses/bottles) using CNTK - ResNet/Fast-R CNN.
I am trying to follow below documentation from GitHub; However, it does not appear to be a straight forward procedure. https://github.com/Microsoft/CNTK/wiki/Object-Detection-using-Fast-R-CNN
I cannot find proper documentation to generate ROI's for the images with different sizes and shapes. And how to create object labels based on the trained models? Can someone point out to a proper documentation or training link using which I can work on the cntk model? Please see the attached image in which I was able to load a sample image with default ROI's in the script. How do I properly set the size and label the object in the image ? Thanks in advance!
sample image loaded for training
Not sure what you mean by proper documentation. This is an implementation of the paper (https://arxiv.org/pdf/1504.08083.pdf). Looks like you are trying to generate ROI's. Can you look through the helper functions as documented at the site to parse what you might need:
To run the toy example, make sure that in PARAMETERS.py the datasetName is set to "grocery".
Run A1_GenerateInputROIs.py to generate the input ROIs for training and testing.
Run A2_RunCntk_py3.py to train a Fast R-CNN model using the CNTK Python API and compute test results.
The algo will work on several candidate regions and then generate outputs: one for the classes of objects and another one that generates the bounding boxes for the objects belonging to those classes. Please refer to the code for getting the details of the implementation.
Can someone point out to a proper documentation or training link using which I can work on the cntk model?
You can take a look at my repository on GitHub.
It will guide you through all the steps required to train your own model for object detection and classification with CNTK.
But in short the proper steps should look something like this:
Setup environment
Prepare data
Tag images (ground truth)
Download pretrained model and create mappings for your custom dataset
Run training
Evaluate the model on test set