How can I get an approximate upper body volume from the Kinect skeleton? - kinect

I want to do a fitting room app using the Kinect. Since I need to measure the player clothing size (S, M, L, XL) I must get the player's upper body "approximation" mass only using its skeleton (not using depth data). I don't need a very precise calculation.

Examine the length of the bones by calculating the distance between the relevant upper-body joints:
SHOULDER_LEFT
SHOULDER_CENTER
SHOULDER_RIGHT
SPINE
HIP_LEFT
HIP_CENTER
HIP_RIGHT
For example, two of the most relevant features to calculate are likely related to the user's height - the distance between SHOULDER_CENTER and SPINE joints, and the distance between SPINE and HIP_CENTER joints.
I suggest using Kinect Studio to store recordings of users and classify each recording according to the user's clothing size. With this data, you should be able to iterate on an algorithm (assuming it's feasible to approximate this accurately enough using only the skeleton data).
(As a side note, to do this more accurately, you'll probably need the depth data and 3D scanning. For example, there is an existing company called Styku that has a related Kinect product that does 3D body scanning.)

Related

How to find the transformation between two IMU measurements?

I am looking at the Clubs Dataset https://clubs.github.io/ for some research into multiway registration of point clouds. I am initially trying to use the ICP registration method in a sequential order adding one point cloud at a time.
The dataset has the RGB parameters for the various poses of the object.
1525694104,-0.0913515162993169,-0.16815189159691318,0.4504956847817425,0.4591556084159897,-0.7817619820248951,-0.3682132612946675,0.20601769355692517
1525694157,-0.22510740390250225,-0.32514596548025265,0.45221561140129063,0.2388698281592328,-0.8750788198918591,-0.4081451880445991,0.10293936130543749
1525694174,-0.4179094161019803,-0.39403349319958664,0.4522321523188167,-0.004021371419719342,0.9070013543525104,0.4210736433584828,0.005455788205342825
The columns are namely
filename, translation across 3 axes, quaternion for rotation.
I converted the images into point clouds and when I try to align the point clouds, I would need to have a good estimation of the transformation. Given the measurements at two points, how would I find out the transformation between those points. I would use that as my intiial estimate of my ICP registration algorithm.

Compute road plane normal with an embedded camera

I am developing some computer vision algorithms for vehicle applications.
I am in front of a problem and some help would be appreciated.
Let say we have a calibrated camera attached to a vehicle which captures a frame of the road forward the vehicle:
Initial frame
We apply a first filter to keep only the road markers and return a binary image:
Filtered image
Once the road lane are separated, we can approximate the lanes with linear expressions and detect the vanishing point:
Objective
But what I am looking for to recover is the equation of the normal n into the image without any prior knowledge of the rotation matrix and the translation vector. Nevertheless, I assume L1, L2 and L3 lie on the same plane.
In the 3D space the problem is quite simple. In the 2D image plane, since the camera projective transformation does not keep the angle properties more complex. I am not able to find a way to figure out the equation of the normal.
Do you have any idea about how I could compute the normal?
Thanks,
Pm
No can do, you need a minimum of two independent vanishing points (i.e. vanishing points representing the images of the points at infinity of two different pencils of parallel lines).
If you have them, the answer is trivial: express the image positions of said vanishing points in homogeneous coordinates. Then their cross product is equal (up to scale) to the normal vector of the 3D plane said pencils define, decomposed in camera coordinates.
Your information is insufficient as the others have stated. If your data is coming from a video a common way to get a road ground plane is to take two or more images, compute the associated homography then decompose the homography matrix into the surface normal and relative camera motion. You can do the decomposition with OpenCV's decomposeHomographyMatmethod. You can compute the homography by associating four or more point correspondences using OpenCV's findHomography method. If it is hard to determine these correspondences it is also possible to do it with a combination of point and line correspondences paper, however this is not implemented in OpenCV.
You do not have sufficient information in the example you provide.
If you are wondering "which way is up", one thing you might be able to do is to detect the line on the horizon. If K is the calibration matrix then KTl will give you the plane normal in 3D relative to your camera. (The general equation for backprojection of a line l in the image to a plane E through the center of projection is E=PTl with a 3x4 projection matrix P)
A better alternative might be to establish a homography to rectify the ground-plane. To do so, however, you need at least four non-collinear points with known coordinates - or four lines, no three of which may be parallel.

how to reconstruct scene from different views' point clouds

I am facing a problem on 3D reconstruction since I am a new to this filed. I have some different views' depth map(point clouds), I want to use them to reconstruct the scene to get the effect like using the kinect fusion. Is there any paper of source code to settle this problem. Or any ideas on this problem.
PS:the point cloud is stored as a file with (x,y,z), you can check here to get the data.
Thank you very much.
As you have stated that you are new to this field, I shall attempt to keep this high level. Please do comment if there is something that is not clear.
The pipeline you refer to has three key stages:
Integration
Rendering
Pose Estimation
The Integration stage takes the unprojected points from a Depth Map (Kinect image) under the current pose and "integrates" them into a spatial data structure (a Voxel Volume such as a Signed Distance Function or a hierarchical structure like an Octree), often by maintaining per Voxel running averages.
The Rendering stage takes the inverse pose for the current frame and produces an image of the visible parts of the model currently in view. For the common volumetric representations this is achieved by Raycasting. The output of this stage provides the points of the model to which the next live frame is registered (the next stage).
The Pose Estimation stage registers the previously extracted model points to those of the live frame. This is commonly achieved by the Iterative Closest Point algorithm.
With regards to pertinent literature, I would advise the following papers as a starting point.
KinectFusion: Real-Time Dense Surface Mapping and Tracking
Real-time 3D Reconstruction at Scale using Voxel Hashing
Very High Frame Rate Volumetric Integration of Depth Images on
Mobile Devices

It is possible to recognize all objects from a room with Microsoft Kinect?

I have a project where I have to recognize an entire room so I can calculate the distances between objects (like big ones eg. bed, table, etc.) and a person in that room. It is possible something like that using Microsoft Kinect?
Thank you!
Kinect provides you following
Depth Stream
Color Stream
Skeleton information
Its up to you how you use this data.
To answer your question - Official Micorosft Kinect SDK doesnt provides shape detection out of the box. But it does provide you skeleton data/face tracking with which you can detect distance of user from kinect.
Also with mapping color stream to depth stream you can detect how far a particular pixel is from kinect. In your implementation if you have unique characteristics of different objects like color,shape and size you can probably detect them and also detect the distance.
OpenCV is one of the library that i use for computer vision etc.
Again its up to you how you use this data.
Kinect camera provides depth and consequently 3D information (point cloud) about matte objects in the range 0.5-10 meters. With this information it is possible to segment out the floor (by fitting a plane) of the room and possibly walls and the ceiling. This step is important since these surfaces often connect separate objects making them a one big object.
The remaining parts of point cloud can be segmented by depth if they don't touch each other physically. Using color one can separate the objects even further. Note that we implicitly define an object as 3D dense and color consistent entity while other definitions are also possible.
As soon as you have your objects segmented you can measure the distances between your segments, analyse their shape, recognize artifacts or humans, etc. To the best of my knowledge however a Skeleton library can recognize humans after they moved for a few seconds. Below is a simple depth map that was broken on a few segments using depth but not color information.

minimum working distance to track moving object with kinect

I'm wondering what the minimum working distance for the Kinect is.
I'd like to track a moving object (10cm x 10cm) from a distance of 1m. The area that the object will be moving in, is 120cm x 60cm.
Given Kinect's specs, will it be possible to track the object across the entire area?
wikipedia says:
The sensor has an angular field of view of 57° horizontally and 43°
vertically
so the answer to your question would be: no, 120cm would be too wide. Since the maximum horizontal viewing field at 1m would be tan(57deg/2)*2 = ~1.08m
though at a distance of ~1.2m it should work (tan(57deg/2)*2*1.2 = ~1.3m)
Distance range handled by the sensor is 850mm minimum and 4000mm maximum, so this should be possible.
I strongly recommend watching the Kinect SDK Quickstarts videos as it covers all basics to get you started. In fact the "Working with depth data" probably contains exactly the kind of info you're looking for.