I work with the MS SDK in windows 7, and I'm aiming to extract some head orientation from the 3D skeleton. But I was surprised when drawing the 3D coordinates. The estimated head position is always leaning forward. From my static position viewing the camera, I am putting 3 images as examples, both hands and heads are colored in yellow, the right side of skeleton is in magenta, and the left side in cyan.
(1) 3D skeleton from the front view
(2)rotating the same skeleton from (1)
(3) top view of (1)
So, The question is,
Is the correct technique?
Though I have changed the default parameters, I am not getting any improvements. Any tips on working setup skeleton filtering parameters?
Now head rotation relative to neck axis can be calculated by the depth of two shoulders. First calculate the depth difference between neck center and shoulder joint(z distance ) and get the difference between neck center and shoulder( x distance ) and by getting the tan value you can get how much the body being rotated.
I'm not sure whether you're using V1 or V2 but they are similar with respect to the questions...
This is by correct both from the Kinect SDK and anatomically. Drawing a line from the center of your shoulders to the top center of your head would result in a line with forward lean. You can see examples of this in the SDK Browser Samples (Body Basics).
I'm not sure what you're trying to map the joint positions to, but here is a demo that visualizes the JointOrientation data for each joint. There is source code in the comments that demonstrate how to obtain the values.
Here are the Kinect SDK Joint Types (V2, but similar to V1)
Related
I'm fairly new to computer vision and currently trying to make the following happen, but had no success so far.
My situation: I want to track different landmarks of a person with mediapipe. Tracking with a single camera works fine and also tracking with two cameras at the same time. What I want is to receive the same coordinate from each camera for a point that has been detected. For example: Camera 1 found the landmark of the left shoulder with the x and y coordinates (1,2). For camera 2 the same landmark has obviously different coordinates lets say (2,3). Ideally there is a way to map or transform the coordinates of camera 2 to camera 1.
the following picture shows the camera setup
Camera setup (I can't post images yet)
So far I've tried to use stereo camera calibration as described here: https://temugeb.github.io/opencv/python/2021/02/02/stereo-camera-calibration-and-triangulation.html. But this doesn't seem to do the trick. I receive a rotation and translation matrix as an output from the calibration, but when I concatenate them to a transformation matrix and multiply it with the coordinates of camera 2, the results don't match with the coordinates of camera 1.
I've also tried to implement planar homography, but since the observed scene isn't limited to a plane, it isn't working well.
The idea behind this is to increase to probability that the landmarks will be detected and use both camera streams to build a full set of coordinates for all desired landmarks.
Is it possible to do what I want to do? If so, what's the name of this process?
I'm really grateful for any help. Thanks in advance.
First off, I am not sure if this is the right place so I apologize if this belongs elsewhere - please let me know if it does. I am currently doing some prototyping with this in VB so that's why I come here first.
My Goal
I am trying to make a program to be able to log different types of information for a video game that I play. I would like to be able to map out the entire game with my program and add locations for mobs, resources, etc.
What I have
The in game map can be downloaded so I have literally just stuck this in as a background image on the form (just for now). The map that I get downloaded though is not exactly as the map appears in the game though since the game will add extra water around everything when scrolling around. This makes it a bit tricky to match up where the origin for the map is in game compared to where it would be on the downloaded map.
The nice thing though is that while I am in the game I can print my current coordinates to the screen. So I thought that maybe I can somehow use this to get the right calculation for the rest of the points on the map.
Here is an example image I will refer to now:
In the above map you will see a dotted bounding box. This is an invisible box in the game where once you move your mouse out of the longitude and latitude points will no longer show. This is what I refer to above when I mean I can't find the exact point of origin for the in game map.
You will also see 2 points: A and B. In the game there are teleporters. This is what I would use to get the most accurate position possible. I am thinking I can find the position (in game) of point A and point B and then somehow calculate that into a conversion for my mouse drag event in VB.
In VB the screen starts at top-left and is 0,0. I did already try to get the 2 points like this and just add or subtract the number to the x and y pixel position of the mouse, but it didn't quite line up right.
So with all this information does anyone know if it is possible to write a lon/lat conversion to pixels based on this kind of data?
I appreciate any thoughts and suggestions and if you need any clarification of any information I have posted please let me know and I will be happy to expand on it. I am really hoping I can get this solved!
Thanks!
EDIT:
I also want to mention I am not sure if there is an exact pixel to lat/lon point for the in game map. I.e. the in game map could be 1 pixel = 100 latitude or something. So I might also need to figure out what that conversion number is?
Some clarifications about conversion between the pixel location to 'latitude and longitude'.
First the map in your game is in a geometry coordinate system, which means everything lies in 2D and you can measure the distance between two points by calculate the pixel position.
But when we talk about longitude and latitude, we are actually talking about a geography coordinate system, which is a '3D' model of the sphere oabout the surface of the earth. All the maps on earth are abstracted from 3D to 2D through one step called projection. Like google maps or your GPS. In this projection process, the 3D model converted to 2D model but there is always some part of the map will be tortured, so that same distance in pixels on a map could be different in length in reality.
So if you don't care about the accuracy then you can consider the geometry point as geography point. Otherwise, you need to implement some GIS library to handle the geodesic distance and calculate the geography point based on the projection coordinate system.
I've been searching around the web about how to do this and I know that it needs to be done with OpenCV. The problem is that all the tutorials and examples that I find are for separated shapes detection or template matching.
What I need is a way to detect the contents between 3 circles (which can be a photo or something else). From what I searched, its not to difficult to find the circles with the camera using contours but, how do I extract what is between them? The circles work like a pattern on the image to grab what is "inside the pattern".
Do I need to use the contours of each circle and measure the distance between them to grab my contents? If so, what if the image is a bit rotated/distorted on the camera?
I'm using Xamarin.iOS for this but from what I already saw, I believe I need to go native for this and any Objective C example is welcome too.
EDIT
Imagining that the image captured by the camera is this:
What I want is to match the 3 circles and get the following part of the image as result:
Since the images come from the camera, they can be rotated or scaled up/down.
The warpAffine function will let you map the desired area of the source image to a destination image, performing cropping, rotation and scaling in a single go.
Talking about rotation and scaling seem to indicate that you want to extract a rectangle of a given aspect ratio, hence perform a similarity transform. To define such a transform, three points are too much, two suffice. The construction of the affine matrix is a little tricky.
I'm attempting to calculate vertex normals for various game assets. The normals I calculate are used for "inflating" the model (to draw behind the real model producing a thick outline).
I currently compute the normal for each face and average all of them (several other questions on Stack Overflow suggest this approach). However, this doesn't work for sharp corners like this one (adjacent faces' normals marked in orange, the normal I'm trying to calculate is outlined in green).
The object looks like a small pedestal and we're looking at the front-left corner. There are three adjoining faces (the bottom face isn't visible; its normal points straight down).
Blender computes an excellent normal that lies squarely in the middle of the three faces' normals; it seems like it somehow calculates a normal that has minimum rotation to each of the three face normals. Blender's normal also doesn't change when the quads are triangulated differently.
Averaging the faces' normals gives me a different normal that points slightly upward in the Z-axis (-0.45, -0.89, +0.08). Inflating my model this way doesn't produce a good outline because the bottom face of the outline is shifted up and doesn't enclose the original model.
I attempted to look at the Blender source code but couldn't find what I was looking for. If anyone can point me to the algorithm in the Blender source, I'd accept that also.
Weight the surface normals by the angle of the faces where they join. It is a common practice in surface rendering (see discussion here: http://www.bytehazard.com/code/vertnorm.html), and will ensure that your bottom face is weighted stronger than the two slanted side faces. I don't know if Blender does it differently, but you should give it a try.
Why should you ever want something like this?
I want to track a single user that is mounted above the ground in a horizontal position. The user is facing downwards to allow free movement of legs and arms. Think of swimming for example.
I mounted the Kinect at the ceiling facing downwards so I have a free view of all extremities.
The sensor is rotated 90° in z-axis to have the maximum resolution (you're usually taller than wide).
Therefore the user is seen from the backside, rotated by 90°. It is impossible to get a proper skeleton from OpenNI 1.5. My tests showed that OpenNI is expecting the user facing the camera with the head up in y-axis (see my other answer). Microsofts SDK is the same but I excluded it here because it won't allow you to change the source code and cannot be adapted. OpenNI 2.0 is not working with the current SensorKinect to interface the Kinect in Linux. So:
Which class is generating the skeleton in OpenNI 1.5.x?
My best guess would be to rotate the prototype skeleton by y 180° and z 90°. If you know where I could find this.
EDIT: As I just learned there is no open source software that generates a skeleton from depth images so I fall back to the question in the header:
How can I get a user skeleton from a rotated back view?