How can I normalize my loss function in relation to the number of elements in a class in Tensorflow? - tensorflow

In an image segmentation problem, we will usually get a background label that dominates the mask and the object(s) occupying only a small area in the mask.
As such, during training, I have observed that my neural network does very well in classifying the background label but really poor at classifying the objects.
I am using the Tensorflow's tf.nn.sparse_softmax_cross_entropy_with_logits as my lost function. Maybe in relation to this problem, my loss value is also not decreasing/converging.
I was told that maybe I should consider adding a weighting factor to my loss function so that the dominant class will have a small weight compared to other non-dominant class. Can anyone share some insights?

Related

Class agnostic Mask R-CNN

How can I use the implementation of https://github.com/fizyr/keras-maskrcnn to perform class-agnostic segmentation ?
Indeed, the original Mask R-CNN paper mentions that class agnostic segmentation leads to same performances as class-specific masks.
However, I am not sure how the authors actually perform this class-agnostic based segmentation and then classify each segments. Actually, I do not know how one can label class-agnostic masks since we need to filter the very high number of predicted masks by thresholding on the class probability (in short, we need label probability to threshold on label-agnostic masks...).
The reason why I need to go class-agnostic is that I have to deal with potential unseen classes in the training set.
Therefore, I would like to get the features that leads to this segmentation so as to cut-off the classification part and be able to deal with new classes by analysis over the features that leads to segmentation.
Thanks !

Using Tensorflow Object Detection API: RPN losses keep increasing. Are there ways to make RPN losses decrease?

I am using Tensorflow Object Detection API for fine-tuning, using my own data. The goal is to detect 2 classes of objects. I am using the pre-trained faster_rcnn_resnet101_coco model.
The various detection box precision and recall measures are generally increasing (see screenshots below) and are fairly high:
The box classifier losses are decreasing. HOWEVER, the RPN losses are increasing (see screenshots below) -- It looks that the model is having a hard time distinguishing foregrounds from backgrounds (hence, the increasing RPN losses), but once the model is able to identify and locate the right foreground, it classifies well (hence, the decreasing box classifier losses)? I think this can be observed in the model's performance on test images: the false positive rate (on images that do not contain any of the two classes of target objects) is rather high. On the other hand, on images that do contain those target objects, the model does a fantastic job in accurately identifying and locating those objects.
So my question is essentially: what are some of the things I could try to help make sure RPN losses are also decreasing.

Is Capsule Network really rotationally invariant in practice?

Capsule network is said to perform well under rotation..??*
I trained a Capsule Network with (train-dataset) to get train-accuracy ~100%..
i tested the network with the (test-dataset-original) to get test-accuracy ~99%
i rotated the (test-dataset-original) by 0.5 (test-dataset-rotate0p5) and
1 degrees to get (test-dataset-rotate1) and got the test-accuracy of just ~10%
i used the network from this repo as a seed https://github.com/naturomics/CapsNet-Tensorflow
10% acc is not acceptable at all on rotated test data. perhaps something doesn't implement correctly.
we implemented capsnet on some non-english digit datasets (similar to mnist) and the result was unbelievable great.
the implemented model was invariant not only in rotation but also on other transform such as pan, zoom, perspective and etc
The first layer of a capsule network is normal convolution. The filters here are not rotation invariant, only the output feature maps are applied a pose matrix by the primary capsule layer.
I think this is why you also need to show the capsnet rotated images. But much fewer than for normal convnets.
Capsule networks encapsule vectors or 4x4 matrices in a neural network. However, matrices can be used for many things, rotations being just one of them. There's no way the network can know that you want to use the encapsuled representation for rotations, except if you specifically show it, so it can learn to use this for rotations..
Capsule Networks came into existence to solve the problem of viewpoint variance problem in convolutional neural networks (CNNs). CapsNet is said to be viewpoint invariant that includes rotational and translational invariance.
CNNs have translational invariance by using max-pooling but that results in information loss in the receptive field. And as the network goes deeper, the receptive field also increases gradually and hence max-pooling in deeper layers cause more information loss. This results in loss of the spatial information and only local/temporal information is learned by the network. CNNs fail to learn the bigger picture of the input.
The weights Wij (between primary and secondary capsule layer) are backpropagated to learn the affine transformation on the entity represented by the ith capsule in primary layer and make a predicted vector uj|i. So basically this Wij is responsible for learning rotational transformations for a given entity.

Do I need every class in a training image for object detection?

I just try to dive into TensorFlows Object Detection. I have a very small training set of circa 40 images yet. Each image can have up to 3 classes. But now the question came into my mind: Does every training image need every class? Is that important for efficient training? Or is it okay if an image may only have one of the object classes?
I get a very high total loss with ~8.0 and thought this might be the reason for this but I couldn't find an answer.
In general machine learning systems can cope with some amount of noise.
An image missing labels or having the wrong labels is fine as long as overall you have sufficient data for the model to figure it out.
40 examples for image classification sounds very small. It might work if you start with a pre-trained image network and there are few classes that are very easy to distinguish.
Ignore absolute the loss value, it doesn't mean anything. Look at the curve to see that the loss is decreasing and stop the training when the curve flattens out. Compare the loss value to a test dataset to check if the values are sufficiently similar (you are not overfitting). You might be able to compare to another training of the exact same system (to check if the training is stable for example).

Fully convolutional neural network for semantic segmentation

I have perhaps a naive question and sorry if this is not the appropriate channel to ask about these kind of questions. I have successfully implemented a FCNN for semantic segmentation, but I don't involve deconvolution or unpooling layers.
What I simply do, is to resize the ground truth image to the size of my final FCNN layer and then I compute my loss. In this way, I obtain a smaller image as output, but correctly segmented.
Is the process of deconvolution or unpooling needed at all?
I mean, resizing images in python is quite easy, so why one should involve complicated techniques as deconv or unpooling to do the same? Surely I miss something.
What's the advantage in enlarging images using unpooling and performing deconv?
The output of your network after the convolution steps is smaller than your original image: you probably don't want that, you want to have semantic segmentation for the image you give it as input.
If you simply resize it to its original size, new pixels will be interpolated and therefore lack precision. Deconvolution layers allow to learn this resize (as they're learned during training, through backpropagation), and therefore to increase your segmentation precision.