How to test the convergence in bugs model? - bayesian

I want to explain the convergence in a bugs model with the command plot(). An example of the output is in the follow figure
I don't sure that I can read this output well, thanks to everyone :)

Unfortunately, it does not look as if you can confirm convergence from the figure that you are showing (EDIT: There is at least some information, see below). The left hand side of the figure is just a caterpillar plot, which effectively just shows the 95% intervals of the distribution for each parameter.
Assessing convergence is a much more nuanced process, as there are multiple ways to decide if your model has converged. What you will want to determine is that your model has appropriately explored the parameter space for each parameter (through trace plots, traceplot function in the coda library), between and within chain variance (the gelman-rubin diagnostic, gelman.diag in the coda library), and auto-correlation in your chains (autocorr.plot in coda). There are a variety of other measures that others have suggested to assess if your model has converged, and looking through the rest of the coda package will illustrate this.
I highly suggest that you go through the WINBUGS tutorial in their user manual (link to pdf), it has a section that addresses checking model convergence. You want to ensure that the traceplots are well-mixed (look to tutorial to see what that means), that your Gelman Rubin diagnostic is < 1.10 for each parameter (general rule), and that your chains are not too correlated (this will reduce your effective sample size in your chains).
Good luck, and read up a bit on the subject, it will greatly benefit you if you are interested in Bayesian inference!
Edit
As #jacobsocolar pointed out, and I completely missed, the plots that are available in this question do at least have some information that indicates the model did converge. I did not see the R-hat plot on the right side of the left plot. These values should be less than 1.1 for each parameter if the model did indeed converge. Eyeballing the above plot does hint that the model converged, but this would be far easier to see if there was a vertical line at the 1.1 mark on the plot, which there is not.

Your output figure is indeed enough to (begin to) assess convergence, contra M_Fidino's answer. Next to the caterpillar plot, there is a plot of 'r-hat' values. These are the Gelman-Rubin statistic--the ratio of between-chain variance to within-chain variance, and they are all < 1.10
This is an encouraging first sign that the model has converged, assuming that the initial values were chosen to be nicely overdispersed.
Otherwise, I agree with everything in M_Fidino's answer.

Related

Boxcox transformation with tree-based models(XGBoost to be specific)

I have a question regarding boxcox transformation(or log transformation). I am working on a data-set which I have lots of skewed features. Now when I take the boxcox transformation, I get quite a nice distribution but the thing is correlation decrease. Now if I was working with linear models I would just consider correlation to decide I should transform the feature or not. But as I mentioned I am working with tree-based models, so should I transform the feature to get a more dispersed distribution or I leave the feature as it is to avoid a decrease in correlation.
I add a screenshot of distribution and its relationship with the target variable, for both transformed and not transformed(Left 2 plots original feature and target).
PS: Guessing from the plots, it seems to me that if I transform the feature it will be easier for tree to find a split for this particular feature.
Thanks a lot,

Is it possible to approximate missing position data by imputation?

I would like to increase the density of my AIS or GPS data in order to carry out more precise analyses afterwards. During my research I came across different approaches like interpolation, filtering or imputation. With the first two approaches, there is no doubt that these can be used to approximate the points between two collected data points.
In the case of imputation (e.g. MICE), however, I have not yet found an approach in the literature for determining position data.
That's why I wanted to ask if anyone knew a paper dealing with this subject and whether it makes sense at all to determine further position data approximately by imputation.
The problem you are describing there is trajectory reconstruction for AIS/GPS data. There's a number of papers for general trajectory reconstruction (see this for example), but AIS data are quite specific.
The irregularity of AIS data is a well know problem with no standard approach to deal with, as far as I know.
However, there is a handful of publications which try to deal with this issue. The problem of reconstruction is connected to the trajectory prediction problem, since both of these two shares some of the methods (the latter is more popular in the scientific community, I think).
Traditionally, AIS trajectory reconstruction is done using some physical models, which take into account the curvature of the earth and other factors, such as data noise (see examples here, here, and here).
More recent approach tries to use LSTM neural networks.
I don't know much about GPS data, but I think the methods are very similar to the ones mentioned above (especially taking into account the fact that you probably want to deal with maritime data).

Forming conditional distributions in TensorFlow probability

I am using Tensorflow Probability to build a VAE which includes image pixels as well as some other variables. The output of the VAE:
tfp.distributions.Independent(tfp.distributions.Bernoulli(logits), 2, name="decoder-dist")
I am trying to understand how to form other conditional distributions based on this which I can use with the inference methods (MCMC or VI). Say the output above was P(A,B,C | Z), how would I take that distribution to form a posterior P(A|B, C, Z) that I could perform inference on? I have been trying to read through the docs but I am having some trouble grasping them.
The answer to your question depends very much on the nature of the joint model within which you'd like to do the conditioning. Much has been written about the topic, and in short it's a very hard problem in general :). Without knowing a bit more about the particulars of your problem, it's near impossible to recommend a useful generic inference procedure. However, we do have a bunch of examples (scripts and jupyter/colab notebooks) in the TFP repo here: https://github.com/tensorflow/probability/tree/master/tensorflow_probability/examples
In particular, there's
The Hierarchical Linear Model example, which is a sort of Rosetta stone showing how to do posterior inference using Hamiltonian Monte Carlo (an MCMC technique) in TFP, R, and Stan,
The Linear Mixed Effects Model example, showing how you might use VI to solve a standard LME problem,
among many others. You can click the "Run in Google Colab" link at the top of any of these notebooks to open and run on them on https://colab.research.google.com.
Please feel free, also, to reach out on to us via email at tfprobability#tensorflow.org. This is a public Google Group where users can engage with the team that builds TFP directly. If you provide us some more info there on what you'd like to do, we're happy to provide guidance on modeling and inference with TFP.
Hope this is gives at least a start in the right direction!

Correcting SLAM drift error using GPS measurements

I'm trying to figure out how to correct drift errors introduced by a SLAM method using GPS measurements, I have two point sets in euclidian 3d space taken at fixed moments in time:
The red dataset is introduced by GPS and contains no drift errors, while blue dataset is based on SLAM algorithm, it drifts over time.
The idea is that SLAM is accurate on short distances but eventually drifts, while GPS is accurate on long distances and inaccurate on short ones. So I would like to figure out how to fuse SLAM data with GPS in such way that will take best accuracy of both measurements. At least how to approach this problem?
Since your GPS looks like it is very locally biased, I'm assuming it is low-cost and doesn't use any correction techniques, e.g. that it is not differential. As you probably are aware, GPS errors are not Gaussian. The guys in this paper show that a good way to model GPS noise is as v+eps where v is a locally constant "bias" vector (it is usually constant for a few metters, and then changes more or less smoothly or abruptly) and eps is Gaussian noise.
Given this information, one option would be to use Kalman-based fusion, e.g. you add the GPS noise and bias to the state vector, and define your transition equations appropriately and proceed as you would with an ordinary EKF. Note that if we ignore the prediction step of the Kalman, this is roughly equivalent to minimizing an error function of the form
measurement_constraints + some_weight * GPS_constraints
and that gives you a more straigh-forward, second option. For example, if your SLAM is visual, you can just use the sum of squared reprojection errors (i.e. the bundle adjustment error) as the measurment constraints, and define your GPS constraints as ||x- x_{gps}|| where the x are 2d or 3d GPS positions (you might want to ignore the altitude with low-cost GPS).
If your SLAM is visual and feature-point based (you didn't really say what type of SLAM you were using so I assume the most widespread type), then fusion with any of the methods above can lead to "inlier loss". You make a sudden, violent correction, and augment the reprojection errors. This means that you lose inliers in SLAM's tracking. So you have to re-triangulate points, and so on. Plus, note that even though the paper I linked to above presents a model of the GPS errors, it is not a very accurate model, and assuming that the distribution of GPS errors is unimodal (necessary for the EKF) seems a bit adventurous to me.
So, I think a good option is to use barrier-term optimization. Basically, the idea is this: since you don't really know how to model GPS errors, assume that you have more confidance in SLAM locally, and minimize a function S(x) that captures the quality of your SLAM reconstruction. Note x_opt the minimizer of S. Then, fuse with GPS data as long as it does not deteriorate S(x_opt) more than a given threshold. Mathematically, you'd want to minimize
some_coef/(thresh - S(X)) + ||x-x_{gps}||
and you'd initialize the minimization with x_opt. A good choice for S is the bundle adjustment error, since by not degrading it, you prevent inlier loss. There are other choices of S in the litterature, but they are usually meant to reduce computational time and add little in terms of accuracy.
This, unlike the EKF, does not have a nice probabilistic interpretation, but produces very nice results in practice (I have used it for fusion with other things than GPS too, and it works well). You can for example see this excellent paper that explains how to implement this thoroughly, how to set the threshold, etc.
Hope this helps. Please don't hesitate to tell me if you find inaccuracies/errors in my answer.

Drawing cartograms with Matplotlib?

In case somebody doesn't know: A cartogram is a type of map where some country/region-dependent numeric property scales the respective regions so that that property's density is (close to) constant. An example is
from worldmapper.org. In this example, countries are scaled according to their population, resulting in near-constant population density.
Needless to say, this is really cool. Does anyone know of a Matplotlib-based library for drawing such maps? The method used at worldmapper.org is described in (1), so it would surprise me if no one has implemented this yet...
I'm also interested in hearing about other cartogram libraries, even if they're not made for Matplotlib.
(1) Michael T. Gastner and M. E. J. Newman,
Diffusion-based method for producing density-equalizing maps,
Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004). Available at arXiv.
There's this, though it's based and a different algorithm (and though it's on the ESRI site, it doesn't require ArcGIS). Of course, once you have the cartogram you can plot it in matplotlib.
Here is a Javascript plugin to make cartograms using D3. It is a good, simple solution if you are not too concerned about the regions being sized accurately. If accuracy is important, there are other options available that give you more freedom to play with the algorithm's parameters to get to a more accurate result.
Here are two great standalone programs I know of:
Scapetoad
Carto3F
Scapetoad is very easy to use. Just give it a shapefile, tell it which attribute to use for the scaling, and set a few accuracy parameters. If there is any doubt, this post describes the process.
Carto3F is more complex and allows for greater accuracy, though it is a bit trickier to figure out - lots of parameter settings without much documentation explaining them.
There is also a QGIS cartogram plugin, written in Python. Though I have not been able to get it to work, so cannot comment on that one.
In short, no. But Newman has an excellent little implementation of his and Gastner's method on his website. Installing it is easy and it works from the command line. Here's an example of a workflow using this software that worked for me.
Compute a grid of density estimates over some region, e.g. in Python. Store it as a matrix of numbers.
Run the cart program with your density matrix as input from the command line or from as subprocess in Python.
The program returns a list of new coordinates for each grid point.
Pipe your shapefile points through the interp program and into a new shapefile to get the transformed map.
There are nice instructions on the main page.
The geoplot.cartogram function in
Geoplot: geospatial data visualization — geoplot 0.2.0
says it is a high-level Python geospatial plotting library, and an extension to cartopy and matplotlib.
Try this library if you are using geopandas, it is quick and doesnt require much customization. https://github.com/mthh/cartogram_geopandas