How to do transformation between two cameras (RGB and Thermal? - camera

I have two cameras including RGB (raspberry pi high quality with lens(6mm, FOV 63)) and Thermla camera (MLX90640 (FOV 110)).
Current view
How to do transformation? like calibration of FOV. Images of two cameras should be exact same. Please see the expected result.
After transformation
I tried OpenCV transformation, But it doesn't work well depending on different Field of view in two cameras.
Here is arrangement of cameras in hardware enter image description here

Related

Pose tracking with mediapipe with two cameras and receive the same coordinates for corresponding points in both images

I'm fairly new to computer vision and currently trying to make the following happen, but had no success so far.
My situation: I want to track different landmarks of a person with mediapipe. Tracking with a single camera works fine and also tracking with two cameras at the same time. What I want is to receive the same coordinate from each camera for a point that has been detected. For example: Camera 1 found the landmark of the left shoulder with the x and y coordinates (1,2). For camera 2 the same landmark has obviously different coordinates lets say (2,3). Ideally there is a way to map or transform the coordinates of camera 2 to camera 1.
the following picture shows the camera setup
Camera setup (I can't post images yet)
So far I've tried to use stereo camera calibration as described here: https://temugeb.github.io/opencv/python/2021/02/02/stereo-camera-calibration-and-triangulation.html. But this doesn't seem to do the trick. I receive a rotation and translation matrix as an output from the calibration, but when I concatenate them to a transformation matrix and multiply it with the coordinates of camera 2, the results don't match with the coordinates of camera 1.
I've also tried to implement planar homography, but since the observed scene isn't limited to a plane, it isn't working well.
The idea behind this is to increase to probability that the landmarks will be detected and use both camera streams to build a full set of coordinates for all desired landmarks.
Is it possible to do what I want to do? If so, what's the name of this process?
I'm really grateful for any help. Thanks in advance.

3D Human Pose Estimation

I am working on human pose estimation work.
I am able to generate 2d coordinates of different joints of a person in an image.
But I need 3d coordinates to solve the purpose of my project.
Is there any library or code available to generate 3d coordinates of joints ?
Please help.
for 3d coordinates on pose estimation there is a limit for you. you cant get 3d pose with only one camera (monocular). you have 2 way to estimate those :
use RGBD ( red, green, blue and depth) cameras like Kinect
or use stereo vision with using at least two camera.
for RGBD opencv contrib has a library for that.
but if you want to use stereo vision you have some steps:
1.Get camera calibration parameters
for calibration you can follow this.
2.then you should get undistorted of your points with using calibration parameters.
3.then you should get projection matrix of your both cameras.
4.at last, you can use opencv triangulation for getting 3D coordinates.
for more info about each step, you can search about stereo vision, camera calibration, triangulation and etc.

Key Difference between Active IR image and depth image in Kinect V2

I just have a confusion in understanding the difference between the Active IR image and depth Image of Kinect v2. Can anyone tell me the what special features Active IR image have as compare to depth image?
In the depth image, the value of a pixels relates to the distance from the camera as measured by time-of-flight. For the active infrared image, the value of a pixel is determined by the amount of infrared light reflected back to the camera.
Sidenote:
I think there is only one sensor that does both these. The Kinect uses the reflected IR to calculate time of flight but then also makes it available as an IR image.

Asus Xtion Pro sensor calibration or texture offset workaround

I'm using 2 of the sensors on the asus xtion pro (kinect knockoff), the RGB cam and the user data as a mask. Not sure if it's called user data, the other one that's not depth or color.
It works except for 2 issues:
When you combine the 2 textures together, the mask is actually slightly offset from the color texture, leaving an outline around the character of the background. I believe it's because the two sensors are simply shooting straight out and aren't calibrated, so one is 2" in real world space off from the other.
The second problem is a question about optimizing the mask edges. Is there any way to feather the edges around the character, or smooth based on the difference of neighboring pixels? I find that the edges really jump around on the edges of objects.
So, the shader I'm using to combine the base texture with the mask requires both textures to be the same size, so I can't simply resize it to be slightly smaller to get rid of the gap around the character.
I'm curious of how you would shrink the mask texture by a couple % and add more black around the edges, like if you resized the texture smaller than the rect it occupies, how could you fill in the perimeter with black?
I'm using Unity + OpenNI + Asus xtion pro sensor. The mis-alignment ins't noticeable in most uses, but when doing something really precise it's not that accurate...
Any ideas or pointers? Looking for direction.
Are you using two Asus Xtion Pro sensors at the same time ? The 3 images on the left look like the scene, depth and rgb streams.
Which version of OpenNI are you using ? I haven't used OpenNI with Unity, but I assume you there are equivalent calls to the original API.
For OpenNI 1.5.x look into the Alternative View Point Capability:
yourDepthGenerator.GetAlternativeViewPointCap().SetViewPoint(image);
For OpenNI 2.x it should be something like:
device.setDepthColorSyncEnabled(true);
If you want to calibrate between two sensors, it's a bit more complicated.

How to calibrate a camera and a robot

I have a robot and a camera. The robot is just a 3D printer where I changed the extruder for a tool, so it doesn't print but it moves every axis independently. The bed is transparent, and below the bed there is a camera, the camera never moves. It is just a normal webcam (playstation eye).
I want to calibrate the robot and the camera, so that when I click on a pixel on a image provided by the camera, the robot will go there. I know I can measure the translation and the rotation between the two frames, but that will probably return lots of errors.
So that's my question, how can I relate the camera and a robot. The camera is already calibrated using chessboards.
In order to make everything easier, the Z-axis can be ignored. So the calibration will be over X and Y.
It depends of what error is acceptable for you.
We have similar setup where we have camera which looks at some plane with object on it that can be moved.
We assume that the image and plane are parallel.
First lets calculate the rotation. Put the tool in such position that you see it on the center of the image, move it on one axis select the point on the image that is corresponding to tool position.
Those two points will give you a vector in the image coordinate system.
The angle between this vector and original image axis will give the rotation.
The scale may be calculated in the similar way, knowing the vector length (in pixels) and the distance between the tool positions(in mm or cm) will give you the scale factor between the image and real world axis.
If this method won't provide enough accuracy you may calibrate the camera for distortion and relative position to the plane using computer vision techniques. Which is more complicated.
See the following links
http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html
http://dasl.mem.drexel.edu/~noahKuntz/openCVTut10.html