I am working with Kinect and I implemented a skeleton tracking mechanism to keep joint trajectories of a sequence of human gait with single kinect. The problem is with the accuracy of the mesurments. I want to overcome occlusion and jitter problems. I found some filter implementations to deal with jittering (more advanced than average filters). But occlusion is more difficult. Is there an good open source project for skeleton tracking with good results (more accurate). Microsoft sdk or openni it doesn't matter.
Thanks in advance
Actually I did a gait recognition using Kinect and I used microsoft official sdk. The accuracy of measurements is based on your calculation. Check this as example.
Related
I want to create a application which converts 2d-images/video into a 3d model. While researching on it i found out similar application like Trnio, Scann3D, Qlone,and few others(Though few of them provide poor output 3D model). I also find out about a technology launched by the microsoft research called mobileFusion which showed the same vision i was hoping for my application but these apps were non like that.
Creating a 3D modelling app is complex task, and achieving it to a high standard requires a lot of studying. To point you in the right direction, you most likely want to perform something called Structure-from-Motion(SfM) or Simultaneous Localization and Mapping (SLAM).
If you want to program this yourself OpenCV is a good place to start if you know C++ or Python. A typical pipeline involves; feature extraction and matching, camera pose estimation, triangulation and then optimised using a bundle adjustment. All pipelines for SfM and SLAM follow these general steps (with exceptions of course). All of these steps are possible is OpenCV although Googles Ceres Solver is an excellent open-source bundle adjustment. SfM generally goes onto dense matching which is where you get very dense point clouds which are good for creating meshes. A free open-source pipeline for this is OpenSfM. Another good source for tools is OpenMVG which has all of the tools you need to make a full pipeline.
SLAM is similar to SfM, however, has more of a focus on real-time application and less on absolute accuracy. Applications for this is more centred around robotics where a robot wants to know where it is relative to its environment, but it not so concerned on absolute accuracy. The top SLAM algorithms are ORB-SLAM and LSD-SLAM. Both are open-source and free for you to implement into your own software.
So really it depends what you want... SfM for high accuracy, SLAM for real-time. If you want a good 3D model I would recommend using existing algorithms as they are very good.
The best commercial software in my opinion... Agisoft Photoscan. If you can make anything half as good as this i'd be very impressed. To answer your question what resources will you require. In my opinion, python/c++ skills, the ability to google well and a spare time to read up on photogrammetry and SfM properly.
First of all, I have looked into the list of all branches of StackExchange, and this seems to be the one best suited for this question.
I am looking for a comparative analysis (can be both theoretical and implementation-oriented) between various popular methods of object recognition and tracking using a Microsoft Kinect 360. These methods do not have to include specialised Kinect API features like hand gesture recognition or skeleton tracking. Could you point me to some technical literature that tackles this subject? I have found quite a few papers that talk about doing object detection and training based on the PCL library. I have also found some papers on using just the RGB image of the Kinect to do classic object tracking. But to put things into perspective, I want to know which method gives good performance and is less challenging to implement when a set of Kinects (possibly with overlapping projection cones) is used to do object recognition and/or tracking.
In my opinion, building a unified point cloud to analyse and label the objects to recognise/classify them using multiple Kinects (assuming I have multiple BUSes) would have too high processing overhead. Would it be a viable alternative to do foreground extraction on the depth images separately and then somehow identify duplicates?
How can I track fast hand movement using kinect?
I've tried both Openni and Microsoft sdk to track hand. On both of them, there are lots of jitters and inaccurate movement of joints.
Here is an example video of kinect fruit ninja: Example Video
On that video, there are no jitters and inaccuracy and also it's tracking the fast hand movements.
What am I missing? Is there any kinds of kinect hardware versions or types which I should look into.
My best guess is that Fruit Ninja applies some sort of smoothing at some point. What you're seeing in that video is almost certainly not the raw data they're getting from the Kinect. The data from the Kinect will always have some kind of jitter; real-world sensor data almost always does. You'll need to smooth it - exactly how to do that depends on the application; it could be something simple like modelling a kind of damping and/or inertia on the point that's being moved by the hand (which is what I suspect Fruit Ninja is doing), or you could look at something like a Kalman filter for a robust (but more computationally-intensive) way to reduce the noise in your sensor readings.
I want to evaluate the performance of several SDKs / frameworks for depth cameras. These cameras can either be using Time-of-Flight or structured light.
The framework should be capable (at least) of person tracking / blob detection and gesture recognition.
So far I found the following frameworks:
OpenNI (structured light only)
Microsoft Kinect SDK (Kinect only)
Beckon SDK by Omek Interactive (ToF and structured light)
iisu by SoftKinetic (ToF and structured light)
Are there any other frameworks I should be aware of?
EDIT: I found this article by Techradar that seems to indicate that these are indeed the only options currently available.
Any feedback would be very much appreciated!
I have found some interesting links on this. You can take MIT's approach using CodAC . They list lots of facts on this post, the most important ones I will post here.
9. What are limitations of this technique?
The main limitation of our framework is inapplicability to scenes with curvilinear
objects, which would require extensions of the current mathematical model.
Another limitation is that a periodic light source creates a wrap-around error
as it does in other TOF devices. For scenes in which surfaces have high reflectance
or texture variations, availability of a traditional 2D image prior to our data
acquisition allows for improved depth map reconstruction as discussed in our paper.
10. What are advantages of this technique/device and how does it
compare with existing TOF-based range sensing techniques?
In laser scanning, spatial resolution is limited by the scanning time.
TOF cameras do not provide high spatial resolution because they rely on a
low-resolution 2D pixel array of range-sensing pixels. CoDAC is a single-sensor,
high spatial resolution depth camera which works by exploiting the sparsity of natural
scene structure.
11. What is the range resolution and spatial resolution of the CoDAC system?
We have demonstrated sub-centimeter range resolution in our experiments.
This is significantly better than fundamental limit of about 10 cm that would
arise from using a detector with 0.7 nanosecond rise time if we were not using
parametric signal modeling. The improvement in range resolution comes from the
parametric modeling and deconvolution in our framework. We refer the reader to
our publications for complete details and analysis.
We have demonstrated 64-by-64 pixel spatial resolution,
as this is the spatial resolution of our spatial light modulator.
Spatially patterning with a digital micromirror device (DMD) will enable
much higher spatial resolution. Our experiments use only 205 projection patterns,
which correspond to just 5% of number of pixels in the reconstructed depth map.
This is a significant improvement over raster scanning in LIDAR, and it is
obtained without the 2D sensor array used in TOF cameras.
Also another interesting project I found on Youtube uses libfreenect and libusb
There is also dSensingNI which is described as
This work presents an approach to overcome the disadvantages of existing interaction
frameworks and technologies for touch detection and object interaction. The robust and
easy to use framework dSensingNI (Depth Sensing Natural Interaction) is described,
which supports multitouch and tangible interaction with arbitrary objects. It uses
images from a depth-sensing camera and provides tracking of users fingers of palm of
hands and combines this with object interaction, such as grasping, grouping and
stacking, which can be used for advanced interaction techniques.
So you have hit most of them out there, especially that use Kinect, but there are a few other options out there! Hope this Helps!
I'm currently using a Processing Kinect library which supplies a depth map. I was wondering how I could take that and use it to create a 2D skeleton, if possible. Not looking for any code here, just a general process I could use to achieve those results.
Also, given that we've seen this in several of the Kinect games so far, would it be difficult to have multiple skeletons running at once?
Disclaimer: the reason why you still didn't get an answer for this question is probably because that's a current research problem. So I can't give you a direct answer but will try to help with some information and useful resources for this topic.
There are mainly 2 different approaches to create a skeleton from a depth map. The first one is to use machine learning, the second is purely algorithmic.
For the machine learning one, you'd need many samples of people doing a predetermined move, and use those samples to train your favorite learning algorithm. That's the approach that was taken and implemented by Microsoft in the XBox (source), it works really well BUT you need millions of samples to make it reliable... quite a drawback.
The "algorithmic" approach (understand without using a training set) can be done in many different ways and is a research problem. It's often based on modeling the possible body postures and trying to match that with the depth image received. That's the approach that was chosen by PrimeSense (the guys behind the kinect depth camera technology) for their skeleton tracking tool NITE.
The OpenKinect community maintains a wiki where they list some interesting research material about this topic. You might also be interested in this thread on the OpenNI mailing list.
If you're looking for an implementation of a skeleton tracking tool, PrimeSense released NITE (closed source), the one they made: it's part of the OpenNI framework. That's what's used in most of the videos you might have seen that involve skeleton tracking. I think it's able to handle up to 2 skeletons at the same time, but that requires confirmation.
The best solution is to use FAAST (http://projects.ict.usc.edu/mxr/faast/) which requires OpenNI. I have struggled to get OpenNI to work on my computer. I have not seen an approach yet using Code Laboratories' CL NUI.
An algorithmic approach is http://code.google.com/p/skeletonization/ but you may have a problem because your depthmap only represents surfaces and no closed objects.