For an app I have to take the measurements of a person. Is there any way to measure the size of his hip from the IR and image data I have ?
The depth image stream has an option to mark what pixels belong to each person. You can use that and the position of the hip joint to measure the hip using a simple method: simply start at the hip joint position, and use a loop to move right and left. Once you have the pixels on the depth image that correspond to the edge of the user's silhouette, convert them to world coordinates to get their distance in millimeters. But be warned that you will not get accurate results. Things like clothing and accesories can make the Kinect not detect the user's shape correctly.
To determine if a pixel in the depth frame belongs to a user, take a look at this example from the MSDN (https://msdn.microsoft.com/en-us/library/jj131025.aspx):
private void FindPlayerInDepthPixel(short[] depthFrame)
{
foreach(short depthPixel in depthFrame)
{
int player = depthPixel & DepthImageFrame.PlayerIndexBitmask;
if (player > 0 && this.skeletonData != null)
{
// Found the player at this pixel
// ...
}
}
}
Related
I'm trying to make a 'Stone Age' clone.
And I'm trying to figure a way to limit my player movement to certain tiles. Meaning that a player can move only on certain tiles.
What I did so far is make two TileMaps one for 'walk-able floor' and one for 'un walk-able floor'. I put them on different collision layers and I have my player only collide with the 'un walk-able floor'. So the player is prevented from entering it:
Here (https://i.imgur.com/sPGCS0V.png): The walkable floor is the brown tiles and the unwalkable are the ones that say 'FLOOR'.
This method is a problem because I have platforms I want to move my player on. (The tiles with the arrows on them) But If all the background is unwalkable, my player will collide with it when he'll be on a platform, so he won't be able to use the platforms to move...
Is there a better way to do this?
I found a solution on this guide and slightly modified it. In a nutshell the solution is to use ray-cast in the direction of the movement and see if there is a collision.
What I did was create a tilemap and call all the tiles that allow player movement 'walkable_[something]'. In the player script where I detect the collision from the ray cast I check the name of the tile the raycast collided with. If the name begins with 'walkable' I allow movement and in all other cases I don't.
onready var ray = $RayCast2D
var speed = 256 # big number because it's multiplied by delta
var tile_size = 64 # size in pixels of tiles on the grid
var last_position = Vector2() # last idle position
var target_position = Vector2() # desired position to move towards
var movedir = Vector2() # move direction
func _process(delta):
# MOVEMENT
if ray.is_colliding():
var collision = ray.get_collider()
if collision is TileMap:
var tile_name = get_tile(collision)
if !(tile_name.begins_with("Walkable")):
position = last_position
target_position = last_position
else:
position += speed * movedir * delta
if position.distance_to(last_position) >= tile_size - speed * delta: # if we've moved further than one space
position = target_position # snap the player to the intended position
else:
position += speed * movedir * delta
if position.distance_to(last_position) >= tile_size - speed * delta: # if we've moved further than one space
position = target_position # snap the player to the intended position
This is the get_tile() function, it gets the string name of the tile the ray cast collided with. Because the raycast collision returns the whole tile map we need to know which specific tile we collided with. So I used the position of the player + the direction of the ray cast vector to figure out the correct tile. First I added half a tile in width and height of a tile size because the player position is considered the top left corner of the player node. For example if the player is a square of size 2x2 and he's positioned on (0,0). It means it's top left corner is on 0,0. Which means that 1,1 is the center of the square.
func get_tile(tile_map):
if tile_map is TileMap:
var v = Vector2(tile_size / 2 , tile_size /2)
var tile_pos = tile_map.world_to_map(position + v + ray.cast_to)
var tile_id = tile_map.get_cellv(tile_pos)
if (tile_id == -1):
return "None"
else:
return tile_map.tile_set.tile_get_name(tile_id)
For the sake of completion here is a pastebin of the full player script.
Recently I'm struggling with a pose estimation problem with a single camera. I have some 3D points and the corresponding 2D points on the image. Then I use solvePnP to get the rotation and translation vectors. The problem is, how can I determine whether the vectors are right results?
Now I use an indirect way to do this:
I use the rotation matrix, the translation vector and the world 3D coordinates of a certain point to obtain the coordinates of that point in Camera system. Then all I have to do is to determine whether the coordinates are reasonable. I think I know the directions of x, y and z axes of Camera system.
Is Camera center the origin of the Camera system?
Now consider the x component of that point. Is x equavalent to the distance of the camera and the point in the world space in Camera's x-axis direction (the sign can then be determined by the point is placed on which side of the camera)?
The figure below is in world space, while the axes depicted are in Camera system.
========How Camera and the point be placed in the world space=============
|
|
Camera--------------------------> Z axis
| |} Xw?
| P(Xw, Yw, Zw)
|
v x-axis
My rvec and tvec results seems right and wrong. For a specified point, the z value seems reasonable, I mean, if this point is about one meter away from the camera in the z direction, then the z value is about 1. But for x and y, according to the location of the point I think x and y should be positive but they are negative. What's more, the pattern detected in the original image is like this:
But using the points coordinates calculated in Camera system and the camera intrinsic parameters, I get an image like this:
The target keeps its pattern. But it moved from bottom right to top left. I cannot understand why.
Yes, the camera center is the origin of the camera coordinate system, which seems to be right following to this post.
In case of camera pose estimation, value seems reasonable can be named as backprojection error. That's a measure of how well your resulting rotation and translation map the 3D points to the 2D pixels. Unfortunately, solvePnP does not return a residual error measure. Therefore one has to compute it:
cv::solvePnP(worldPoints, pixelPoints, camIntrinsics, camDistortion, rVec, tVec);
// Use computed solution to project 3D pattern to image
cv::Mat projectedPattern;
cv::projectPoints(worldPoints, rVec, tVec, camIntrinsics, camDistortion, projectedPattern);
// Compute error of each 2D-3D correspondence.
std::vector<float> errors;
for( int i=0; i < corners.size(); ++i)
{
float dx = pixelPoints.at(i).x - projectedPattern.at<float>(i, 0);
float dy = pixelPoints.at(i).y - projectedPattern.at<float>(i, 1);
// Euclidean distance between projected and real measured pixel
float err = sqrt(dx*dx + dy*dy);
errors.push_back(err);
}
// Here, compute max or average of your "errors"
An average backprojection error of a calibrated camera might be in the range of 0 - 2 pixel. According to your two pictures, this would be way more. To me, it looks like a scaling problem. If I am right, you compute the projection yourself. Maybe you can try once cv::projectPoints() and compare.
When it comes to transformations, I learned not to follow my imagination :) The first thing I Do with the returned rVec and tVec is usually creating a 4x4 rigid transformation matrix out of it (I posted once code here). This makes things even less intuitive, but instead it is compact and handy.
Now I know the answers.
Yes, the camera center is the origin of the camera coordinate system.
Consider that the coordinates in the camera system are calculated as (xc,yc,zc). Then xc should be the distance between the camera and
the point in real world in the x direction.
Next, how to determine whether the output matrices are right?
1. as #eidelen points out, backprojection error is one indicative measure.
2. Calculate the coordinates of the points according to their coordinates in the world coordinate system and the matrices.
So why did I get a wrong result(the pattern remained but moved to a different region of the image)?
Parameter cameraMatrix in solvePnP() is a matrix supplying the parameters of the camera's external parameters. In camera matrix, you should use width/2 and height/2 for cx and cy. While I use width and height of the image size. I think that caused the error. After I corrected that and re-calibrated the camera, everything seems fine.
How to convert the Kinect2 raw depthdata to the distance in meters. The raw depthdata was got by Windows Kinect2 SDK. It's all Integral data.
Point: It's kinect2 rather than kinect1.
I already got the equation of Kinect1,but it's not match.
So anyone can help me.
The equation of Kinect1:
if (depthValue < 2047)
{
depthM = 1.0 / (depthValue*-0.0030711016 + 3.3309495161);
}
The Kinect API Overview (2) indicates:
The DepthFrame Class represents a frame where each pixel represents the distance of the closest object seen by that pixel. The data for this frame is stored as 16-bit unsigned integers, where each value represents the distance in millimeters. The maximum depth distance is 8 meters, although reliability starts to degrade at around 4.5 meters. Developers can use the depth frame to build custom tracking algorithms in cases where the BodyFrame Class isn’t enough.
Also V. Pterneas comments in his blog that the values are in meters, he refers to this code. So you could retrieve it from the DepthFrame in C# (extracted partially from this code):
public override void Update(DepthFrame frame)
{
ushort minDepth = frame.DepthMinReliableDistance;
ushort maxDepth = frame.DepthMaxReliableDistance;
//..
Width = frame.FrameDescription.Width;
Height = frame.FrameDescription.Height;
public ushort[] DepthData;
//...
frame.CopyFrameDataToArray(DepthData);
// now Depthdata contains the depth
}
The minimum and maximum depth are given by the functions DepthMinReliableDistance and DepthMaxReliableDistance, also in millimeters.
I am tracking the motion of a human body using Lucas Kanade algorithm. I am working on RGB and Depth streams of Kinect. I have to place some points on the body, and track those points. Finally their xyz co-ordinates have to be calculated.
My question is, how to get z coordinate for a particular pixel from the depth stream? I know Depth raw depth values for kinect lie in 0-2047 range..and applying formulas on it can get me the distance in any units I want. But how to get this raw depth? Can anyone help?
I am using libfreenect driver for kinect and opencv.
With OpenCV 2.3.x, you can retrieve all the information that you want. You will need to compile it from source with OpenNI to work. Look at this code:
VideoCapture capture(CV_CAP_OPENNI);
for(;;)
{
Mat depthMap;
Mat rgbImage
capture.grab();
capture.retrieve( depthMap, OPENNI_DEPTH_MAP );
if( waitKey( 30 ) >= 0 )
break;
}
Yes, its that easy. depthMap will store a depth in each index. This depth represents the distance from the sensor to the surface were the depth has been measured with the IR. More details at: http://opencv.itseez.com/doc/user_guide/ug_highgui.html
I display an audio level meter for recording and playback of that recording. The level values are from 0 - 1.0. I draw a bar representing the 0 - 1.0 value on screen. To get the audio levels for recording I use:
OSStatus status = AudioQueueGetProperty(aqData.mQueue, kAudioQueueProperty_CurrentLevelMeter, aqLevelMeterState, &levelMeterStateDataSize);
float level = aqLevelMeterState[0].mAveragePower;
For playback I use:
// soundPlayer is an AVSoundPlayer
float level = [soundPlayer averagePowerForChannel:0];
I normalize level from 0 - 1.0.
Right now they look very different when showing the bar. The recording meter bar is more on the low end while the playback bar meter, when playing back that same recorded audio, stays more in the middle.
I'm trying to make the two meters look the same, but I'm fairly new to audio. I've done a bit of research and know that the recording is returning an RMS value and the playback is returning it in decibels.
Can someone knowledgeable in audio point me to links or documents, or give a little hint to make sense of these two values so that I can start making them look the same?
It is the RMS or root mean square for a given timer interval. RMS is calculated by summing the square of each signal value for the total, divining that the number of samples to get the mean and then taking the square root of the mean.
uint64 avgTotal;
for(int i =0; i < sampleCount; i++)
avgTotal+= (sample[i]*sample[i])/(uint64)sampleCount; //divide to help with overlfow
float rms = sqrt(avgTotal);
You will have to understand enough about the data you a playing to get the signal values. The durration of time that you consider should not matter that much. 50-80ms should do it.
Decibels are logarithmically scaled. So you probably want some equation such as:
myDb = 20.0 * log (myRMS / scaleFactor);
somewhere, where you'd have to calibrate the scaleFactor to match up the values you want for full scale.