How to recover the spatial probability distribution of a GPS location from its accuracy? - gps

A location coordinate determined by GPS is not perfectly accurate. It follows a probability distribution. An Android location comes with an accuracy property. The Android Developer docs for Location.getAccuracy() says:
We define horizontal accuracy as the radius of 68% confidence.
Theoretically, what kind of probability distribution does this GPS location follow? I have read some non-authoritative articles that say it follows a normal distribution. And why is the number 68% chosen?
Suppose we know this distribution. With this given location's coordinate and accuracy, how do we recover its probability distribution (probability density function)? Namely how do we determine the unknown parameters of this probability density function?
Moreover, satellite data can also be obtained. Can it help refine the distribution even more?

Related

Is it possible to do probability calibration in TFX?

I am interested in calibrating a binary probabilistic classifier in TFX. I was about to try doing it in standard Python externally to TFX, but then I found this piecewise linear calibration layer.
The description is a bit cryptic to me. Is this layer the sort of thing one could stack to the output layer of a TFX model and calibrate the output using recent y_true and y_pred?
If not, is there a standard way to do calibration in TFX?
Calibration of the data should be done prior to the data the transformation and classification.
The piecewise data is only applicable when the data coincides to regions of observed data.
We are not given enough information to properly answer this question.

SSD mobilenet model does not detect objects at longer distances

I have trained an SSD Mobilenet model with custom dataset(Battery). Sample image of the battery is given below and also attached the config file which I used to train the model.
When the object is closer to the camera(tested with webcam) it detects the object accurately with probability over 0.95 but when I move the object to a longer distance it is not getting detected. Upon debugging, Found that the object gets detected but with the lower probability 0.35. The minimum threshold is set to 0.5. If I change the threshold 0.5 to 0.2, object is getting detected but there are more false detections.
Referring to this link, SSD does not perform very well for small objects and an alternate solution is to use FasterRCNN, but this model is very slow in real-time. I would like the battery to be detected from longer distance too using SSD.
Please help me with the following
If we want to detect longer distance objects with higher probability, do we need to change the aspect ratios and scale params in the config?
If we want to aspect ratios, how to choose those values with respective to the object?
Changing aspect ratios and scales won't help improve the detection accuracy of small objects (since the original scale is already small enough, e.g. min_scale = 0.2). The most important parameter you need to change is feature_map_layout. feature_map_layout determines the number of feature maps (and their sizes) and their corresponding depth (channels). But sadly this parameter cannot be configured in the pipeline_config file, you will have to modify it directly in the feature extractor.
Here is why this feature_map_layout is important in detecting small objects.
In the above figure, (b) and (c) are two feature maps of different layouts. The dog in the groundtruth image matches the red anchor box on the 4x4 feature map, while the cat matches the blue one on the 8x8 feature map. Now if the object you want to detect is the cat's ear, then there would be no anchor boxes to match the object. So the intuition is: If no anchor boxes match an object, then the object simply won't be detected. To successfully detect the cat's ear, what you need is probably a 16x16 feature map.
Here is how you can make the change to feature_map_layout. This parameter is configured in each specific feature extractor implementation. Suppose you use ssd_mobilenet_v1_feature_extractor, then you can find it in this file.
feature_map_layout = {
'from_layer': ['Conv2d_11_pointwise', 'Conv2d_13_pointwise', '', '',
'', ''],
'layer_depth': [-1, -1, 512, 256, 256, 128],
'use_explicit_padding': self._use_explicit_padding,
'use_depthwise': self._use_depthwise,
}
Here the there are 6 feature maps of different scales. The first two layers are taken directly from mobilenet layers (hence the depth are both -1) while the rest four result from extra convolutional operations. It can be seen that the lowest level feature map comes from the layer Conv2d_11_pointwise of mobilenet. Generally the lower the layer, the finer the feature map features, and the better for detecting small objects. So you can change this Conv2d_11_pointwise to Conv2d_5_pointwise (why this? It can be found from the tensorflow graph, this layer has bigger feature map than layer Conv2d_11_pointwise), it should help detect smaller objects.
But better accuracy comes at extra cost, the extra cost here is the detect speed will drop a little because there are more anchor boxes to take care of. (Bigger feature maps). Also since we choose Conv2d_5_pointwise over Conv2d_11_pointwise, we lose the detection power of Conv2d_11_pointwise.
If you don't want to change the layer but simply add an extra feature map, e.g. making it 7 feature maps in total, you will have to change num_layers int the config file to 7 too. You can think of this parameter as the resolution of the detection network, the more lower level layers, the finer the resolution will be.
Now if you have performed above operations, one more thing to help is to add more images with small objects. If this is not feasible, at least you can try adding data augmentation operations like random_image_scale

Save positive and negative samples used in Tensorflow Object Detection API

I use Tensorflow Object Detection API with MobilenetV2 as network backbone and SSD as meta-structure to do the object detection job.
In SSD, for each anchor point, we make several candidate bounding boxes with different aspect_ratios. For each bounding box, if its intersection with the bounding box ground-truth is greater than a threshold, we say that this bounding box is positive. Otherwise, it is negative. And then we use these positive and negative to do the training. (So it is important to note that it is NOT the entire image is used to train, but only one (or several) crops of these images are used)
To debug, I'd like to save these positive and negative crops to hard disk to see what are really samples that the algorithm uses to train.
I read the python code of Tensorflow Object Detection API but I'm lost :(
If you have any hint, please show me !
Thanks !

How to sweep a neural-network through an image with tensorflow?

My question is about finding an efficient (mostly in term of parameters count) way to implement a sliding window in tensorflow (1.4) in order to apply a neural network through the image and produce a 2-d map with each pixel (or region) representing the network output for the corresponding receptive field (which in this case is the sliding window itself).
In practice, I'm trying to implement either a MTANN or a PatchGAN using tensorflow, but I cannot understand the implementation I found.
The two architectures can be briefly described as:
MTANN: A linear neural network with input size of [1,N,N,1] and output size [ ] is applied to an image of size [1,M,M,1] to produce a map of size [1,G,G,1], in which every pixel of the generated map corresponds to a likelihood of the corresponding NxN patch to belong to a certain class.
PatchGAN Discriminator: More general architecture, as I can understand the network that is strided through the image outputs a map itself instead of a single value, which then is combined with adjacent maps to produce the final map.
While I cannot find any tensorflow implementation of MTANN, I found the PatchGAN implementation, which is considered as a convolutional network, but I couldn't figure out how to implement this in practice.
Let's say I got a pre-trained network of which I got the output tensor. I understand that convolution is the way to go, since a convolutional layer operates over a local region of the input and what is I'm trying to do can be clearly represented as a convolutional network. However, what if I already have the network that generates the sub-maps from a given window of fixed-size?
E.g. I got a tensor
sub_map = network(input_patch)
which returns a [1,2,2,1] maps from a [1,8,8,3] image (corresponding to a 3-layer FCN with input size 8, filter size 3x3).
How can I sweep this network on [1,64,64,3] images, in order to produce a [1,64,64,1] map composed of each spatial contribution, like it happens in a convolution?
I've considered these solutions:
Using tf.image.extract_image_patches which explicitly extract all the image patches and channels in the depth dimension, but I think it would consume too many resources, as I'm switching to PatchGAN Discriminator from a full convolutional network due to memory constraints - also the composition of the final map is not so straight-forward.
Adding a convolutional layer before the network I got, but I cannot figure out what the filter (and its size) should be in this case in order to keep the pretrained model work on 8x8 images while integrating it in a model which works on bigger images.
For what I can get it should be something like whole_map = tf.nn.convolution(input=x64_images, filter=sub_map, ...) but I don't think this would work as the filter is an operator which depends on the receptive field itself.
The ultimate goal is to apply this small network to big images (eg. 1024x1024) in an efficient way, since my current model downscales progressively the images and wouldn't fit in memory due to the huge number of parameters.
Can anyone help me to get a better understanding of what I am missing?
Thank you
I found an interesting video by Andrew Ng exactly on how to implement a sliding window using a convolutional layer.
The problem here was that I was thinking at the number of layers as a variable that is dependent on a fixed input/output shape, while it should be the opposite.
In principle, a saved model should only contain the learned filters for each level and as long as the filter shapes are compatible with the layers' input/output depth. Thus, applying a different (ie. bigger) spatial resolution to the network input produces a different output shape, which can be seen as an application of the neural network to a sliding windows sweeping across the input image.

Is Capsule Network really rotationally invariant in practice?

Capsule network is said to perform well under rotation..??*
I trained a Capsule Network with (train-dataset) to get train-accuracy ~100%..
i tested the network with the (test-dataset-original) to get test-accuracy ~99%
i rotated the (test-dataset-original) by 0.5 (test-dataset-rotate0p5) and
1 degrees to get (test-dataset-rotate1) and got the test-accuracy of just ~10%
i used the network from this repo as a seed https://github.com/naturomics/CapsNet-Tensorflow
10% acc is not acceptable at all on rotated test data. perhaps something doesn't implement correctly.
we implemented capsnet on some non-english digit datasets (similar to mnist) and the result was unbelievable great.
the implemented model was invariant not only in rotation but also on other transform such as pan, zoom, perspective and etc
The first layer of a capsule network is normal convolution. The filters here are not rotation invariant, only the output feature maps are applied a pose matrix by the primary capsule layer.
I think this is why you also need to show the capsnet rotated images. But much fewer than for normal convnets.
Capsule networks encapsule vectors or 4x4 matrices in a neural network. However, matrices can be used for many things, rotations being just one of them. There's no way the network can know that you want to use the encapsuled representation for rotations, except if you specifically show it, so it can learn to use this for rotations..
Capsule Networks came into existence to solve the problem of viewpoint variance problem in convolutional neural networks (CNNs). CapsNet is said to be viewpoint invariant that includes rotational and translational invariance.
CNNs have translational invariance by using max-pooling but that results in information loss in the receptive field. And as the network goes deeper, the receptive field also increases gradually and hence max-pooling in deeper layers cause more information loss. This results in loss of the spatial information and only local/temporal information is learned by the network. CNNs fail to learn the bigger picture of the input.
The weights Wij (between primary and secondary capsule layer) are backpropagated to learn the affine transformation on the entity represented by the ith capsule in primary layer and make a predicted vector uj|i. So basically this Wij is responsible for learning rotational transformations for a given entity.