How to highlight the desk in RGB with assistence of 3d point cloud? - object-detection

What I describe next is a preprocess of object detection.
I have a desk with some snacks and pasters,and a Depth Camera(Intel Realsense R300).Now I want to avoid the paster on it(set the pixel value to 0), and avoid the interfering object around the desk(but not on the desk,instead,on the ground).To make full use of Depth Camera,I think that I can using Depth Camera to get the depth image and use RANSAC algorithm to recognize the plane of the desk,which can avoid the paster at the same time.
However, now the problem is, I successfully get the plane of desk with RANSAC algorithm in 3D point cloud, but I don't know how to use it to fit 2D RGB image to set the pixel value of the plane to 0.

Related

Extracting a plane from an image taken by a camera

I have a camera at a known fixed location and orientation.
I also have a plane at a known location whose z position changes.
I want to turn the image from the camera into a top down view of the plane.
I can do this without knowing any positions by using the 4 points of the plane for a homography matrix and warping the image but each time the plane moves in Z I have to repeat this process.
After searching around online most methods seem to center on finding features of the image (using SIFT or something like it) then computing a homography matrix.
With the problem so constrained I thought there may be a simple linear algebra based approach.

What algorithm do i need to convert a 2D image file into a representative 2D triangle mesh file?

I am looking for some advice to point me in the direction of the algorithm I would need to convert an image file into a mesh. Note that I am not asking to convert from 2D into 3D - the output mesh is not required to have any depth.
For image file I mean a black and white image of a relatively simple shape such as a stick figure stored in a simple to read uncompressed bitmap file. The shape would have a high contrast between the black and white areas of the image to help detect the edges of the image by an algorithm.
For the static mesh I mean the data that can be used to construct a typical indexed triangle mesh (list of vertices and a list of indices) in a modern 3D game engine such as Unreal. The mesh would need to represent the shape of the image in 2D but is not required to have any 3D depth in itself, ie. zero thickness. The mesh will ultimately be used in a 3D environment like a cardboard cut-out shape for example imagine it standing on a ground plane.
This conversion is not required to work in any real time environment - it can be batched processed and then it is intended the mesh data read in by the game engine.
Thanks in advance.

3D Human Pose Estimation

I am working on human pose estimation work.
I am able to generate 2d coordinates of different joints of a person in an image.
But I need 3d coordinates to solve the purpose of my project.
Is there any library or code available to generate 3d coordinates of joints ?
Please help.
for 3d coordinates on pose estimation there is a limit for you. you cant get 3d pose with only one camera (monocular). you have 2 way to estimate those :
use RGBD ( red, green, blue and depth) cameras like Kinect
or use stereo vision with using at least two camera.
for RGBD opencv contrib has a library for that.
but if you want to use stereo vision you have some steps:
1.Get camera calibration parameters
for calibration you can follow this.
2.then you should get undistorted of your points with using calibration parameters.
3.then you should get projection matrix of your both cameras.
4.at last, you can use opencv triangulation for getting 3D coordinates.
for more info about each step, you can search about stereo vision, camera calibration, triangulation and etc.

Camera's extrinsic matrix

I am trying to use MATLAB's camera calibrator to calibrate an infrared camera. I was able to get the intrinsic matrix by just feeding around 100 images to the calibrator. But I'm struggling with how to get the extrinsic matrix [R|t].
Because the extrinsic matrix is used to map the world frame with the camera frame, so in theory, when the camera(object) is moving, there will be many extrinsic matrices.
In the picture below, if the intrinsic matrix is determined using 50 images, then there are 50 extrinsic matrices correspond to each image. Am I correct?
You are right. Usually, a by-product of an intrinsic calibration is the extrinsic matrix for each pattern observed; this is mostly used to draw the patterns with respect to the camera as in the picture you posted.
What you usually do afterwards is to define some external reference frame that makes sense for you application, also known as the 'world' reference frame, and compute the pose of the camera with respect to it. That's the extrinsic matrix you always hear about.
For this, you:
Define the reference frame and take some points with known 3D coordinates on it; this can be a grid drawn on the floor, for example.
Take a picture of the 3D points with the calibrated camera and get a list of the correspondent 2D (image) coordinates of the points.
Use a pose estimation function that takes: the camera intrinsic parameters, the 3D points and the correspondent 2D image points. I am more familiar with OpenCV, but the Matlab function that seems to do the job is: https://www.mathworks.com/help/vision/ref/estimateworldcamerapose.html

How to transfer depth pixel to camera space using kinect sdk v2

I'm using Kinect v2 and Kinect SDK v2.
I have couple of questions about coordinate mapping:
How to transfer a camera space point (point in 3d coordinate system) to depth space with depth value?
Current MapCameraPointToDepthSpace method can only return the depth space coordinate.
But without depth value, this method is useless.
Did anyone know how to get the depth value?
How to get the color camera intrinsic?
There is only a GetDepthCameraIntrinsics methos to get depth camera intrinsic.
But how about color camera?
How to use the depth camera intrinsic?
Seems that the Kinect 2 consider the radial distortion.
But how to use these intrinsic to do the transformation between depth pixel and 3d point?
Is there any example code can do this?
Regarding 1: The depth value of your remapped world coordinate is the same as in the original world coordinate's Z value. Read the description of the depth buffer and world coordinate space: this value in both is simply the distance from the point to Kinect's plane, in meters. Now, if you want the depth value of the object being seen on the depth frame directly behind your remapped coordinate, you have to read the depth image buffer in that position.
Regarding 3: You use the camera's intrinsic when you have to manually construct a CoordinateMapper object (i.e. when you don't have a Kinect available). When you get the CoordinateMapper associated to a Kinect (using Kinect object's CoordinateMapper property), it already contains that Kinect's intrinsics... that's why you have a GetDepthCameraIntrinsics method which returns that specific Kinect's intrinsics (they can vary from device to device).
Regarding 2: There is now way to get the color camera intrinsic. You have to evaluate them by camera calibration.