What frameworks for depth cameras are out there? - kinect

I want to evaluate the performance of several SDKs / frameworks for depth cameras. These cameras can either be using Time-of-Flight or structured light.
The framework should be capable (at least) of person tracking / blob detection and gesture recognition.
So far I found the following frameworks:
OpenNI (structured light only)
Microsoft Kinect SDK (Kinect only)
Beckon SDK by Omek Interactive (ToF and structured light)
iisu by SoftKinetic (ToF and structured light)
Are there any other frameworks I should be aware of?
EDIT: I found this article by Techradar that seems to indicate that these are indeed the only options currently available.
Any feedback would be very much appreciated!

I have found some interesting links on this. You can take MIT's approach using CodAC . They list lots of facts on this post, the most important ones I will post here.
9. What are limitations of this technique?
The main limitation of our framework is inapplicability to scenes with curvilinear
objects, which would require extensions of the current mathematical model.
Another limitation is that a periodic light source creates a wrap-around error
as it does in other TOF devices. For scenes in which surfaces have high reflectance
or texture variations, availability of a traditional 2D image prior to our data
acquisition allows for improved depth map reconstruction as discussed in our paper.
10. What are advantages of this technique/device and how does it
compare with existing TOF-based range sensing techniques?
In laser scanning, spatial resolution is limited by the scanning time.
TOF cameras do not provide high spatial resolution because they rely on a
low-resolution 2D pixel array of range-sensing pixels. CoDAC is a single-sensor,
high spatial resolution depth camera which works by exploiting the sparsity of natural
scene structure.
11. What is the range resolution and spatial resolution of the CoDAC system?
We have demonstrated sub-centimeter range resolution in our experiments.
This is significantly better than fundamental limit of about 10 cm that would
arise from using a detector with 0.7 nanosecond rise time if we were not using
parametric signal modeling. The improvement in range resolution comes from the
parametric modeling and deconvolution in our framework. We refer the reader to
our publications for complete details and analysis.
We have demonstrated 64-by-64 pixel spatial resolution,
as this is the spatial resolution of our spatial light modulator.
Spatially patterning with a digital micromirror device (DMD) will enable
much higher spatial resolution. Our experiments use only 205 projection patterns,
which correspond to just 5% of number of pixels in the reconstructed depth map.
This is a significant improvement over raster scanning in LIDAR, and it is
obtained without the 2D sensor array used in TOF cameras.
Also another interesting project I found on Youtube uses libfreenect and libusb
There is also dSensingNI which is described as
This work presents an approach to overcome the disadvantages of existing interaction
frameworks and technologies for touch detection and object interaction. The robust and
easy to use framework dSensingNI (Depth Sensing Natural Interaction) is described,
which supports multitouch and tangible interaction with arbitrary objects. It uses
images from a depth-sensing camera and provides tracking of users fingers of palm of
hands and combines this with object interaction, such as grasping, grouping and
stacking, which can be used for advanced interaction techniques.
So you have hit most of them out there, especially that use Kinect, but there are a few other options out there! Hope this Helps!

Related

Is programming a voxel based graphics API theoretically possible?

This is entirely a theoretical question because I understand the time it would take to do such a thing would be ridiculous
I've been working with "voxels" a lot lately and the only way I can display them to a user is to either triangulate the visible surfaces or make a CPU ray-tracer but both come with their own problems.
Simply put, if we dismiss the storage space needed for voxel meshs and targeted a very specific GPU would someone who was wanting to create a graphics API like OpenGL but with "true" voxel primitives that don't need to be converted be able to make such thing or are GPUs designed specifically for triangles with no way to introduce a new base primitive?
Its possible and it was already done many times
games like Minecraft,SpaceEngineers...
3D printing tools and slicers
MRI/PET scans tools
Yes rendering on GPU is possible with the two base methods you mention. Games usually use the transform to boundary representation 3D geometry. With rise of shaders even ray tracers are now possible here mine:
simple GLSL voxel ray tracer
using native OpenGL architecture and passing geometry as 3D texture. In order to obtain speed you need to add BVH or similar spatial subdivision of geometry...
However voxel based tools have been here for quite some time. For example many isometric games/engines are voxel based (tile is a voxel) like this one:
Improving performance of click detection on a staggered column isometric grid
Also do you remember UFO ? It was playable on x286 and it was also "voxel/tile" based isometric.

Custom rendering with GPU, Direct3D or OpenGL

I have a Windows application that currently renders graphics largely using MFC that I'd like to change to get better use out of the GPU. Most of the graphics are straightforward and could easily be built up into a scene graph, but some of the graphics could prove very difficult. Specifically, in addition to the normal mesh type objects, I'm also dealing with point clouds which are liable to contain billions of Cartesian stored in a very compact manner that use quite a lot of custom culling techniques to be displayed in real time (Example). What I'm looking for is a mechanism that does the bulk of the scene rendering to a buffer and then gives me access to that buffer, a z buffer, and camera parameters such that I can modify them before putting them out to the display. I'm wondering whether this is possible with Direct3D, OpenGL or possibly use a higher level framework like OpenSceneGraph, and what would be the best starting point? Given the software is Windows based, I'd probably prefer to use Direct3D as this is likely to lead to fewest driver issues which I'm eager to avoid. OpenSceneGraph seems to provide custom culling via octrees, which are close but not identical to what I'm using.
Edit: To clarify a bit more, currently I have the following;
A display list / scene in memory which will typically contain up to a few million triangles, lines, and pieces of text, which I cull in software and output to a bitmap using low performing drawing primitives
A point cloud in memory which may contain billions of points in a highly compressed format (~4.5 bytes per 3d point) which I cull and output to the same bitmap
Cursor information that gets added to the bitmap prior to output
A camera, z-buffer and attribute buffers for navigation and picking purposes
The slow bit is the highlighted part of section 1 which I'd like to replace with GPU rendering of some kind. The solution I envisage is to build a scene for the GPU, render it to a bitmap (with matching z-buffer) based on my current camera parameters and then add my point cloud prior to output.
Alternatively, I could move to a scene based framework that managed the cameras and navigation for me and provide points in view as spheres or splats based on volume and level of detail during the rendering loop. In this scenario I'd also need to be able add cursor information to the view.
In either scenario, the hosting application will be MFC C++ based on VS2017 which would require too much work to change for the purposes of this exercise.
It's hard to say exactly based on your description of a complex problem.
OSG can probably do what you're looking for.
Depending on your timeframe, I'd consider eschewing both OpenGL (OSG) and DirectX in favor of the newer Vulkan 3D API. It's a successor to both D3D and OGL, and is designed by the GPU manufacturers themselves to provide optimal performance exceeding both of its predecessors.
The OSG project is currently developing a Vulkan scenegraph known as VSG, which already demonstrates superior performance to OSG and will have more generalized culling ability.
I've worked a bunch with point clouds and am pretty experienced with them, but I'm not exactly clear on what you're proposing to do.
If you want to actually have a verbal discussion about the matter, I'm pretty easy to find (my company is AlphaPixel -- AlphaPixel.com) and you could call us. I'm in the European time zone right now, it's not clear from your question where you are but you sound US-based.

What are 'physically-based' lighting/rendering/materials?

What's the difference between physically based lighting, physically based rendering and physically based materials? How they are different from deferred and forward rendering? And physically based rendering is used in films and deferred rendering is used in games?
It would appear that "Physically-based rendering" (PBR), "Physically-based materials", and "Physically-based shading" are different names for the same general concept of creating materials with shading that lend themselves to looking more "real" or physically-based when rendered.
PBR can be used in both movies (offline rendering systems, Pixar has a paper) and games (realtime: Unity and Unreal have docs). It can also be done in WebGL, check out Marmoset's PBR-enabled WebGL Viewer, find the picture of the camera lens, and try rotating that around. Another WebGL example can be found in OpenSceneGraphJS's PBR Demo.
Different PBR implementations have different parameters, but most center around a concept of specifying how "metallic" and how "rough" a particular material is, and allowing the rendering engine to make sure that the material doesn't reflect light in an unrealistic way, or reflect more light than it receives, etc. For example:
This example is a composite of screenshots from the OSGJS demo above, and shows a single base color with varying degrees of metallic and roughness. The idea is that the artist can vary these parameters per-material and even per-texel, and not wind up with something that appears to violate the laws of physics when it comes to lighting. This is considered better than more traditional rendering techniques, where you can create a specular highlight that overexposes objects even in low light conditions.
Disclaimer, I'm still learning about PBR myself. I've been doing a lot of reading up on it lately in hopes of being able to contribute to the current effort underway to add PBR to the glTF 3D model spec. Khronos (the standards organization behind OpenGL and WebGL) is creating an open standard 3D model delivery format that eventually will support PBR, and can convey PBR materials to a wide variety of rendering engines, on the web, mobile, and desktop. But, currently the PBR portion of this is still a draft extension (as of this writing), still under development for some time to come.
I'm not a deferred/forward rendering expert, perhaps someone else can chime in with differences there. In the meantime, I'd suggest to read more from Marmoset about the theory of PBR, and PBR in practice. Good luck.

Comparative Analysis of Object Recognition or Tracking Methods with Multiple Kinects

First of all, I have looked into the list of all branches of StackExchange, and this seems to be the one best suited for this question.
I am looking for a comparative analysis (can be both theoretical and implementation-oriented) between various popular methods of object recognition and tracking using a Microsoft Kinect 360. These methods do not have to include specialised Kinect API features like hand gesture recognition or skeleton tracking. Could you point me to some technical literature that tackles this subject? I have found quite a few papers that talk about doing object detection and training based on the PCL library. I have also found some papers on using just the RGB image of the Kinect to do classic object tracking. But to put things into perspective, I want to know which method gives good performance and is less challenging to implement when a set of Kinects (possibly with overlapping projection cones) is used to do object recognition and/or tracking.
In my opinion, building a unified point cloud to analyse and label the objects to recognise/classify them using multiple Kinects (assuming I have multiple BUSes) would have too high processing overhead. Would it be a viable alternative to do foreground extraction on the depth images separately and then somehow identify duplicates?

It is possible to recognize all objects from a room with Microsoft Kinect?

I have a project where I have to recognize an entire room so I can calculate the distances between objects (like big ones eg. bed, table, etc.) and a person in that room. It is possible something like that using Microsoft Kinect?
Thank you!
Kinect provides you following
Depth Stream
Color Stream
Skeleton information
Its up to you how you use this data.
To answer your question - Official Micorosft Kinect SDK doesnt provides shape detection out of the box. But it does provide you skeleton data/face tracking with which you can detect distance of user from kinect.
Also with mapping color stream to depth stream you can detect how far a particular pixel is from kinect. In your implementation if you have unique characteristics of different objects like color,shape and size you can probably detect them and also detect the distance.
OpenCV is one of the library that i use for computer vision etc.
Again its up to you how you use this data.
Kinect camera provides depth and consequently 3D information (point cloud) about matte objects in the range 0.5-10 meters. With this information it is possible to segment out the floor (by fitting a plane) of the room and possibly walls and the ceiling. This step is important since these surfaces often connect separate objects making them a one big object.
The remaining parts of point cloud can be segmented by depth if they don't touch each other physically. Using color one can separate the objects even further. Note that we implicitly define an object as 3D dense and color consistent entity while other definitions are also possible.
As soon as you have your objects segmented you can measure the distances between your segments, analyse their shape, recognize artifacts or humans, etc. To the best of my knowledge however a Skeleton library can recognize humans after they moved for a few seconds. Below is a simple depth map that was broken on a few segments using depth but not color information.