Pose tracking with mediapipe with two cameras and receive the same coordinates for corresponding points in both images

Pose tracking with mediapipe with two cameras and receive the same coordinates for corresponding points in both images - coordinate-transformation

I'm fairly new to computer vision and currently trying to make the following happen, but had no success so far.
My situation: I want to track different landmarks of a person with mediapipe. Tracking with a single camera works fine and also tracking with two cameras at the same time. What I want is to receive the same coordinate from each camera for a point that has been detected. For example: Camera 1 found the landmark of the left shoulder with the x and y coordinates (1,2). For camera 2 the same landmark has obviously different coordinates lets say (2,3). Ideally there is a way to map or transform the coordinates of camera 2 to camera 1.
the following picture shows the camera setup
Camera setup (I can't post images yet)
So far I've tried to use stereo camera calibration as described here: https://temugeb.github.io/opencv/python/2021/02/02/stereo-camera-calibration-and-triangulation.html. But this doesn't seem to do the trick. I receive a rotation and translation matrix as an output from the calibration, but when I concatenate them to a transformation matrix and multiply it with the coordinates of camera 2, the results don't match with the coordinates of camera 1.
I've also tried to implement planar homography, but since the observed scene isn't limited to a plane, it isn't working well.
The idea behind this is to increase to probability that the landmarks will be detected and use both camera streams to build a full set of coordinates for all desired landmarks.
Is it possible to do what I want to do? If so, what's the name of this process?
I'm really grateful for any help. Thanks in advance.

Related

Extracting a plane from an image taken by a camera

I have a camera at a known fixed location and orientation.
I also have a plane at a known location whose z position changes.
I want to turn the image from the camera into a top down view of the plane.
I can do this without knowing any positions by using the 4 points of the plane for a homography matrix and warping the image but each time the plane moves in Z I have to repeat this process.
After searching around online most methods seem to center on finding features of the image (using SIFT or something like it) then computing a homography matrix.
With the problem so constrained I thought there may be a simple linear algebra based approach.

Unreal Engine 4 minimum distance of camera from object

I placed a standard Camera in front of a moving Actor. When I set the current view to this camera I noticed a strange behaviour: If the actor get really close to another object on the scenario (a default cube) it disappear from the view. It looks like the camera is getting into the cube. I'm pretty sure the camera is not colliding with the cube because the actor has a couple of bumpers that prevent the side where the camera is placed to collide with other objects and the whole camera mesh is placed fully 'inside' the actor. The problem maybe is related with the size of the actor that's about 40cm x 30cm x 10cm. The observed cube is 1mt x 1mt x 1mt, the minimum distance of camera from cube is around 3 cm.

Sounds to me like you're experiencing an issue with an object passing your camera's "clipping plane." In the 3D world, this is simply just draw distance minimum and maximum values. For more information on what you are experiencing, check out this brilliant explanation by Autodesk: https://knowledge.autodesk.com/support/maya/learn-explore/caas/CloudHelp/cloudhelp/2018/ENU/Maya-Rendering/files/GUID-D69C23DA-ECFB-4D95-82F5-81118ED41C95-htm.html
Now, let's fix the issue! In Unreal Engine, it's super easy. Go into your Project Settings/General Settings. There is a value called Near Clip Plane, which simply changes the minimum clipping value for Camera components. I would bet making this value smaller will fix your issue! For a visual representation, check out this tutorial by Kyle Dail: https://www.youtube.com/watch?v=oO79qxNnOfU

Convert a Lat/Lon coordinate on a map based on preset coordinates

First off, I am not sure if this is the right place so I apologize if this belongs elsewhere - please let me know if it does. I am currently doing some prototyping with this in VB so that's why I come here first.
My Goal
I am trying to make a program to be able to log different types of information for a video game that I play. I would like to be able to map out the entire game with my program and add locations for mobs, resources, etc.
What I have
The in game map can be downloaded so I have literally just stuck this in as a background image on the form (just for now). The map that I get downloaded though is not exactly as the map appears in the game though since the game will add extra water around everything when scrolling around. This makes it a bit tricky to match up where the origin for the map is in game compared to where it would be on the downloaded map.
The nice thing though is that while I am in the game I can print my current coordinates to the screen. So I thought that maybe I can somehow use this to get the right calculation for the rest of the points on the map.
Here is an example image I will refer to now:
In the above map you will see a dotted bounding box. This is an invisible box in the game where once you move your mouse out of the longitude and latitude points will no longer show. This is what I refer to above when I mean I can't find the exact point of origin for the in game map.
You will also see 2 points: A and B. In the game there are teleporters. This is what I would use to get the most accurate position possible. I am thinking I can find the position (in game) of point A and point B and then somehow calculate that into a conversion for my mouse drag event in VB.
In VB the screen starts at top-left and is 0,0. I did already try to get the 2 points like this and just add or subtract the number to the x and y pixel position of the mouse, but it didn't quite line up right.
So with all this information does anyone know if it is possible to write a lon/lat conversion to pixels based on this kind of data?
I appreciate any thoughts and suggestions and if you need any clarification of any information I have posted please let me know and I will be happy to expand on it. I am really hoping I can get this solved!
Thanks!
EDIT:
I also want to mention I am not sure if there is an exact pixel to lat/lon point for the in game map. I.e. the in game map could be 1 pixel = 100 latitude or something. So I might also need to figure out what that conversion number is?

Some clarifications about conversion between the pixel location to 'latitude and longitude'.
First the map in your game is in a geometry coordinate system, which means everything lies in 2D and you can measure the distance between two points by calculate the pixel position.
But when we talk about longitude and latitude, we are actually talking about a geography coordinate system, which is a '3D' model of the sphere oabout the surface of the earth. All the maps on earth are abstracted from 3D to 2D through one step called projection. Like google maps or your GPS. In this projection process, the 3D model converted to 2D model but there is always some part of the map will be tortured, so that same distance in pixels on a map could be different in length in reality.
So if you don't care about the accuracy then you can consider the geometry point as geography point. Otherwise, you need to implement some GIS library to handle the geodesic distance and calculate the geography point based on the projection coordinate system.

How to detect an image between shapes from camera

I've been searching around the web about how to do this and I know that it needs to be done with OpenCV. The problem is that all the tutorials and examples that I find are for separated shapes detection or template matching.
What I need is a way to detect the contents between 3 circles (which can be a photo or something else). From what I searched, its not to difficult to find the circles with the camera using contours but, how do I extract what is between them? The circles work like a pattern on the image to grab what is "inside the pattern".
Do I need to use the contours of each circle and measure the distance between them to grab my contents? If so, what if the image is a bit rotated/distorted on the camera?
I'm using Xamarin.iOS for this but from what I already saw, I believe I need to go native for this and any Objective C example is welcome too.
EDIT
Imagining that the image captured by the camera is this:
What I want is to match the 3 circles and get the following part of the image as result:
Since the images come from the camera, they can be rotated or scaled up/down.

The warpAffine function will let you map the desired area of the source image to a destination image, performing cropping, rotation and scaling in a single go.
Talking about rotation and scaling seem to indicate that you want to extract a rectangle of a given aspect ratio, hence perform a similarity transform. To define such a transform, three points are too much, two suffice. The construction of the affine matrix is a little tricky.

How to calibrate a camera and a robot

I have a robot and a camera. The robot is just a 3D printer where I changed the extruder for a tool, so it doesn't print but it moves every axis independently. The bed is transparent, and below the bed there is a camera, the camera never moves. It is just a normal webcam (playstation eye).
I want to calibrate the robot and the camera, so that when I click on a pixel on a image provided by the camera, the robot will go there. I know I can measure the translation and the rotation between the two frames, but that will probably return lots of errors.
So that's my question, how can I relate the camera and a robot. The camera is already calibrated using chessboards.
In order to make everything easier, the Z-axis can be ignored. So the calibration will be over X and Y.

It depends of what error is acceptable for you.
We have similar setup where we have camera which looks at some plane with object on it that can be moved.
We assume that the image and plane are parallel.
First lets calculate the rotation. Put the tool in such position that you see it on the center of the image, move it on one axis select the point on the image that is corresponding to tool position.
Those two points will give you a vector in the image coordinate system.
The angle between this vector and original image axis will give the rotation.
The scale may be calculated in the similar way, knowing the vector length (in pixels) and the distance between the tool positions(in mm or cm) will give you the scale factor between the image and real world axis.
If this method won't provide enough accuracy you may calibrate the camera for distortion and relative position to the plane using computer vision techniques. Which is more complicated.
See the following links
http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html
http://dasl.mem.drexel.edu/~noahKuntz/openCVTut10.html

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas