how to overcome tracking jitter - tracking

I'm working with a object tracking project.
Steps:
1.Preprocessing the image and achieve some candidates regions of interest.
2.For each region, test if it is the target by ORB/BF.
3.After the target region determined, acquire coordinates of some points on the target and their corresponding coordinates in the world coordinate system.
4.Use solvePnP(in opencv) to get rotation vector and translation vector.
5.Translation vector is used in VR for localization and view control.
Tracking jitter means, although the object is stationary, because of some tracking errors, such as noise, the position of the target is slightly changing. Then, look at step 4 and step 5, due to the change, translation vector is slightly changed and with the Head Mounted Device, I feel the jitter all the time.
Seems to me that tracking jitter is unavoidable because of change in environment or some noise. But one pixel value change can lead to about a few centimeters change in z value in translation vector. So any proper way to deal with it?
I have googled but there didn't seem much information.Effects of Tracking Technology, Latency, and Spatial Jitter on Object Movement mentions the phemomenon, but did not provide a solution. Another interesting paper is Motion Tracking Requirements and Technologies. So can anyone offer some useful information?
It occurs to me that fileter is needed to do some post processing to the tracking data. But the idea is not very idea. Kalman filter can be used for tracking and can be used to attenuate noise. I don't know whether it can compensate for this kind of jitter(I mean, very small fluctuation in values) very well. And investigate how to incorporate Kalman filter into this project is another topic and need extra time.

Related

Correcting SLAM drift error using GPS measurements

I'm trying to figure out how to correct drift errors introduced by a SLAM method using GPS measurements, I have two point sets in euclidian 3d space taken at fixed moments in time:
The red dataset is introduced by GPS and contains no drift errors, while blue dataset is based on SLAM algorithm, it drifts over time.
The idea is that SLAM is accurate on short distances but eventually drifts, while GPS is accurate on long distances and inaccurate on short ones. So I would like to figure out how to fuse SLAM data with GPS in such way that will take best accuracy of both measurements. At least how to approach this problem?
Since your GPS looks like it is very locally biased, I'm assuming it is low-cost and doesn't use any correction techniques, e.g. that it is not differential. As you probably are aware, GPS errors are not Gaussian. The guys in this paper show that a good way to model GPS noise is as v+eps where v is a locally constant "bias" vector (it is usually constant for a few metters, and then changes more or less smoothly or abruptly) and eps is Gaussian noise.
Given this information, one option would be to use Kalman-based fusion, e.g. you add the GPS noise and bias to the state vector, and define your transition equations appropriately and proceed as you would with an ordinary EKF. Note that if we ignore the prediction step of the Kalman, this is roughly equivalent to minimizing an error function of the form
measurement_constraints + some_weight * GPS_constraints
and that gives you a more straigh-forward, second option. For example, if your SLAM is visual, you can just use the sum of squared reprojection errors (i.e. the bundle adjustment error) as the measurment constraints, and define your GPS constraints as ||x- x_{gps}|| where the x are 2d or 3d GPS positions (you might want to ignore the altitude with low-cost GPS).
If your SLAM is visual and feature-point based (you didn't really say what type of SLAM you were using so I assume the most widespread type), then fusion with any of the methods above can lead to "inlier loss". You make a sudden, violent correction, and augment the reprojection errors. This means that you lose inliers in SLAM's tracking. So you have to re-triangulate points, and so on. Plus, note that even though the paper I linked to above presents a model of the GPS errors, it is not a very accurate model, and assuming that the distribution of GPS errors is unimodal (necessary for the EKF) seems a bit adventurous to me.
So, I think a good option is to use barrier-term optimization. Basically, the idea is this: since you don't really know how to model GPS errors, assume that you have more confidance in SLAM locally, and minimize a function S(x) that captures the quality of your SLAM reconstruction. Note x_opt the minimizer of S. Then, fuse with GPS data as long as it does not deteriorate S(x_opt) more than a given threshold. Mathematically, you'd want to minimize
some_coef/(thresh - S(X)) + ||x-x_{gps}||
and you'd initialize the minimization with x_opt. A good choice for S is the bundle adjustment error, since by not degrading it, you prevent inlier loss. There are other choices of S in the litterature, but they are usually meant to reduce computational time and add little in terms of accuracy.
This, unlike the EKF, does not have a nice probabilistic interpretation, but produces very nice results in practice (I have used it for fusion with other things than GPS too, and it works well). You can for example see this excellent paper that explains how to implement this thoroughly, how to set the threshold, etc.
Hope this helps. Please don't hesitate to tell me if you find inaccuracies/errors in my answer.

Smoothed Particle Hydrodynamics - Particle Density Estimation Issue

I'm currently writing an SPH Solver using CUDA on https://github.com/Mathiasb17/sph_opengl.
I have pretty good results and performances but in my mind they still seem pretty weird for some reason :
https://www.youtube.com/watch?v=_DdHN8qApns
https://www.youtube.com/watch?v=Afgn0iWeDoc
In some implementations, i saw that a particle does not contribute to its own internal forces (which would be 0 anyways due to the formulas), but it does contribute to its own density.
My simulations work "pretty fine" (i don't like "pretty fine", i want it perfect) and in my implementation a particle does not contribute to its own density.
Besides when i change the code so it does contribute to its own density, the resulting simulation becomes way too unstable (particles explode).
I asked this to a lecturer in physics based animation, he told me a particle should not contribute to its density, but did not give me specific details about this assertion.
Any idea of how it should be ?
As long as you calculate the density with the summation formula instead of the continuity equation, yes you need to do it with self-contribution.
Here is why:
SPH is an interpolation scheme, which allows you to interpolate a specific value in any position in space over a particle cloud. Any position means you are not restricted to evaluate it on a particle, but anywhere in space. If you do so, obviously you need to consider all particles within the influence radius. From this point of view, it is easy to see that interpolating a quantity at a particle's position does not influence its contribution.
For other quantities like forces, where the derivative of some quantity is approximated, you don't need to apply self-contribution (that would lead to the evaluation of 0/0).
To discover the source of the instability:
check if the kernel is normalised
are the stiffness of the liquid and the time step size compatible (for the weakly compressible case)?

It is possible to recognize all objects from a room with Microsoft Kinect?

I have a project where I have to recognize an entire room so I can calculate the distances between objects (like big ones eg. bed, table, etc.) and a person in that room. It is possible something like that using Microsoft Kinect?
Thank you!
Kinect provides you following
Depth Stream
Color Stream
Skeleton information
Its up to you how you use this data.
To answer your question - Official Micorosft Kinect SDK doesnt provides shape detection out of the box. But it does provide you skeleton data/face tracking with which you can detect distance of user from kinect.
Also with mapping color stream to depth stream you can detect how far a particular pixel is from kinect. In your implementation if you have unique characteristics of different objects like color,shape and size you can probably detect them and also detect the distance.
OpenCV is one of the library that i use for computer vision etc.
Again its up to you how you use this data.
Kinect camera provides depth and consequently 3D information (point cloud) about matte objects in the range 0.5-10 meters. With this information it is possible to segment out the floor (by fitting a plane) of the room and possibly walls and the ceiling. This step is important since these surfaces often connect separate objects making them a one big object.
The remaining parts of point cloud can be segmented by depth if they don't touch each other physically. Using color one can separate the objects even further. Note that we implicitly define an object as 3D dense and color consistent entity while other definitions are also possible.
As soon as you have your objects segmented you can measure the distances between your segments, analyse their shape, recognize artifacts or humans, etc. To the best of my knowledge however a Skeleton library can recognize humans after they moved for a few seconds. Below is a simple depth map that was broken on a few segments using depth but not color information.

Transformation between point clouds

I hope to find some hints where to start with a problem I am dealing with.
I am using a Kinect sensor to capture 3d point clouds. I created a 3d object detector which is already working.
Here my task:
Lets say I have a point cloud 1. I detected a object in cloud A and I know the centroid position of my object (x1,y1,z1). Now I move my sensor around a path and create new clouds (e.g. cloud 2). In that cloud 2 I see the same object but e.g. from the side, where the object detection is not working fine.
I would like to transform the detected object form cloud 1 to cloud 2, to get the centroid also in cloud 2. For me it sound like I need a matrix (Translation, Rotation) to transform point from 1 to 2.
And ideas how I could solve my problem?
Maybe ICP? Are there better solutions?
THX!
In general, this task is called registration. It relies on having a good estimation of which points in cloud 1 correspond to which clouds in point 2 (more specifically, which given a point in cloud 1, which point in cloud 2 represents the same location on the detected object). There's a good overview in the PCL library documentation
If you have such a correspondence, you're in luck and you can directly compute a rotation and translation as demonstrated here.
If not, you'll need to estimate that correspondence. ICP does that for approximately aligned point clouds, but if your point clouds are not already fairly well aligned, you may want to start by estimating "key points" (such as book corners, distinct colors, etc) in your point clouds, computing a rotation and translation as above, and then performing ICP. As D.J.Duff mentioned, ICP works better in practice on point clouds that are already approximately aligned because it estimates correspondences using one of two metrics, minimal point-to-point distance or minimal point to plane distance, according to wikipedia, the latter works better in practice, but it does involve estimating normals, which can be tricky. If the correspondences are far off, the transforms likely will be as well.
I think what you were asking about was in particular to the Kinect Sensor and the API Microsoft released for it.
If you are not planning to do reconstruction, you can look into the AlignPointClouds function in Sensor Fusion namespace. This should take care of it automatically, in methods similar to the answer given by #pnhgiol.
On the other hand, if you are looking at doing reconstruction as well as point cloud transforms, the Reconstruction class is what you are looking for. All of which can be found out about, here.

Smoothing data received from CoreLocation

I'm trying to develop an app which allows you to walk around, and where you walked will be drawn on a map. I have this all working fine, but I'm finding that even with a reasonably accurate GPS location the points still jump around a bit. When drawn on a map this has the effect of creating a squiggly or zig-zag line.
I'm looking for suggestions/strategies on how to smooth the data, so that the line drawn on the map is more of a smooth best fit, rather than an accurate point to point drawing.
There are many different types of smoothing algorithms you could apply to the data (for a few starting points, see this Wikipedia article). The only way to know for sure which is/are suitable for your application is to implement and test them.
Simple or weighted moving averages are fairly common (taking the last n samples and averaging them), but have the problem of lagging behind the data. A common one for filtering signal noise is a high-pass filter, which attenuates small (noisy) movements while passing through larger ones. Apple has some code for this in their AccelerometerGraph sample.
I'd suggest trying those out first as they're easy to implement, before looking at the move complex ones.