depth image based rendering - rendering

I have to implement a depth image base rendering. Given a 2D image and a depth map, the algorithm will generate a virtual view - what the scene would look like if a camera was placed in a different position. I wrote this function, V is the matrix with the pixel of 2d view, D the pixels from depth map and camera shift a parameter.
Z=1.1-D./255; is a normalization. I try to follow this instruction:
For each pixel in the depth map, compute the disparity that results from the depth, For each pixel in the source 2D image, find a new location for it in the virtual view: old location + disparity of that specific pixel.
The function doesnt work very well, what's wrong?
function[virtualView]=renderViews(V,D,camerashift)
Z=1.1-D./255;
[M,N]=size(Z);
for m=1:M
for n=1:N
d=camerashift/Z(m,n);
shift=round(abs(d));
V2(m,n)=V(m+shift,n);
end
end
imshow(V2)

Related

Calculate angle on a plane in 3D space from a 2D image

I have 2 input images of a plane where the (static) camera is at an unknown angle. I managed to extract edges and points of interests using opencv. But I'm stuck calculating real angles from the images.
From image #1 I need to calculate the camera angle relative to the plane. I know 3 points on the plane that form a equilateral triangle (angles of 60 degree). The center point of the triangle is also the centerpoint of the plane. However the plane center point on the image is covered by another object.
From image #2 I need to calculate the real angle of an object (Point C) on the plane to one of the 3 points and the plane center point (= line A to B).
How can I calculate the real angle β as if the camera had no angle towards the plane?
Update:
I was looking for a solution for my problem at https://docs.opencv.org/3.4/d9/d0c/group__calib3d.html
There is a number of functions but I couldn't figure out how to apply them to my specific problem.
There is a function to calculate Homography using two images with keypoints but I do not have images of the scene from different camera angles.
Then there is cv::findHomography which Finds a perspective transformation between two planes. I know 4 source points but what are my 4 destination points?
Another one I was looking at is cv::solvePnP and cv::solvePnPRansac but again I only know 4 source points on the plane. I don't know about their 3D correspondence point.
What am I missing?
#Micka: Thanks for your input. I have 4 points for processing the image (the 3 static base points + the object at point C). I can assume these points are all located on the plane at z=0. However I do not have coordinates for a second plane neither the (x,y) of the corresponding 3D points.
Your description does not explicitly say it, but if you can assume that segment AB bisects the base of the triangle, then you have 4 point correspondences between the plane and its image, so you can use cv::findHomography.

How to calculate the Horizontal and Vertical FOV for the KITTI cameras from the camera intrinsic matrix?

I would like to calculate the Horizontal and Vertical field of view from the camera intrinsic matrix for the cameras used in the KITTI dataset. The reason I need the Field of view is to convert a depth map into 3D point clouds.
Though this question has been asked quite a long time ago, I felt it needed an answer as I ran into the same issue and was unable to find any info on it.
I have however solved it using the information available in this document and some more general camera calibration documents
Firstly, we need to convert the supplied disparity into distance. This can be done through fist converting the disp map into floats through the method in the dev_kit where they state:
disp(u,v) = ((float)I(u,v))/256.0;
This disparity can then be converted into a distance through the default stereo vision equation:
Depth = Baseline * focal length/ Disparity
Now come some tricky parts. I searched high and low for the focal length and was unable to find it in documentation.
I realised just now when writing that the baseline is documented in the aforementioned source however from section IV.B we can see that it can be found in P(i)rect indirectly.
The P_rects can be found in the calibration files and will be used for both calculating the baseline and the translation from uv in the image to xyz in the real world.
The steps are as follows:
For pixel in depthmap:
xyz_normalised = P_rect \ [u,v,1]
where u and v are the x and y coordinates of the pixel respectively
which will give you a xyz_normalised of shape [x,y,z,0] with z = 1
You can then multiply it with the depth that is given at that pixel to result in a xyz coordinate.
For completeness, as P_rect is the depth map here, you need to use P_3 from the cam_cam calibration txt files to get the baseline (as it contains the baseline between the colour cameras) and the P_2 belongs to the left camera which is used as a reference for occ_0 files.

Pointcloud and RGB Image alignment on RealSense ROS

I am working on a dog detection system using deep learning (Tensorflow object detection) and Real Sense D425 camera. I am using the Intel(R) RealSense(TM) ROS Wrapper in order to get images from the camera.
I am executing "roslaunch rs_rgbd.launch" and my Python code is subscribed to "/camera/color/image_raw" topic in order to get the RGB image. Using this image and object detection library, I am able to infer (20 fps) the location of a dog in a image as a box level (xmin,xmax,ymin,ymax)
I will like to crop the PointCloud information with the object detection information (xmin,xmax,ymin,ymax)
and determine if the dog is far away or near the camera. I will like to use the aligned information pixel by pixel between the RGB image and the pointcloud.
How can I do it? Is there any topic for that?
Thanks in advance
Intel publishes their python notebook about the same problem at: https://github.com/IntelRealSense/librealsense/blob/jupyter/notebooks/distance_to_object.ipynb
What they do is as follow :
get color frame and depth frame (point cloud in your case)
align the depth to color
use ssd to detect the dog inside color frame
Get the average depth for detected dog and convert to meter
depth = np.asanyarray(aligned_depth_frame.get_data())
# Crop depth data:
depth = depth[xmin_depth:xmax_depth,ymin_depth:ymax_depth].astype(float)
# Get data scale from the device and convert to meters
depth_scale = profile.get_device().first_depth_sensor().get_depth_scale()
depth = depth * depth_scale
dist,_,_,_ = cv2.mean(depth)
print("Detected a {0} {1:.3} meters away.".format(className, dist))
Hope this help

Convert grid of dots in XY plane from camera coordinates to real world coordinates

I am writing a program. I have, say, a grid of dots on a piece of paper. I fix one end and bend the paper toward the screen, giving me a trapezoidal shape from the camera's point of view. I have the (x,y) camera coordinate of each dot. Is there a simple way I can change these (x,y) to real life (x,y) which should give me a rectangle? I have the camera/real (x,y) of the original flat sheet of paper pre-bend if that helps.
I have looked at 3D Camera coordinates to world coordinates (change of basis?) and Transforming screen coordinates from security camera to real world coordinates.
Look up "homography". The transformation from a plane in 3D space to its image as captured by an ideal pinhole camera is a homography. It can be represented as a 3x3 matrix H that transforms the 3D coordinates X of points in the world to their corresponding homogeneous image coordinates x:
x = H * X
where X is a 3x1 vector of the world point coordinates, and x = [u, v, w]^T is the image point in homogeneous coordinates.
Given a minimum of 4 matches between world and image points (e.g. the corners of a rectangle) you can estimate the parameters of the matrix H. For details, look up "DLT algorithm". In OpenCV the routine to use is findHomography.

pose estimation: determine whether rotation and transmation matrix are right

Recently I'm struggling with a pose estimation problem with a single camera. I have some 3D points and the corresponding 2D points on the image. Then I use solvePnP to get the rotation and translation vectors. The problem is, how can I determine whether the vectors are right results?
Now I use an indirect way to do this:
I use the rotation matrix, the translation vector and the world 3D coordinates of a certain point to obtain the coordinates of that point in Camera system. Then all I have to do is to determine whether the coordinates are reasonable. I think I know the directions of x, y and z axes of Camera system.
Is Camera center the origin of the Camera system?
Now consider the x component of that point. Is x equavalent to the distance of the camera and the point in the world space in Camera's x-axis direction (the sign can then be determined by the point is placed on which side of the camera)?
The figure below is in world space, while the axes depicted are in Camera system.
========How Camera and the point be placed in the world space=============
|
|
Camera--------------------------> Z axis
| |} Xw?
| P(Xw, Yw, Zw)
|
v x-axis
My rvec and tvec results seems right and wrong. For a specified point, the z value seems reasonable, I mean, if this point is about one meter away from the camera in the z direction, then the z value is about 1. But for x and y, according to the location of the point I think x and y should be positive but they are negative. What's more, the pattern detected in the original image is like this:
But using the points coordinates calculated in Camera system and the camera intrinsic parameters, I get an image like this:
The target keeps its pattern. But it moved from bottom right to top left. I cannot understand why.
Yes, the camera center is the origin of the camera coordinate system, which seems to be right following to this post.
In case of camera pose estimation, value seems reasonable can be named as backprojection error. That's a measure of how well your resulting rotation and translation map the 3D points to the 2D pixels. Unfortunately, solvePnP does not return a residual error measure. Therefore one has to compute it:
cv::solvePnP(worldPoints, pixelPoints, camIntrinsics, camDistortion, rVec, tVec);
// Use computed solution to project 3D pattern to image
cv::Mat projectedPattern;
cv::projectPoints(worldPoints, rVec, tVec, camIntrinsics, camDistortion, projectedPattern);
// Compute error of each 2D-3D correspondence.
std::vector<float> errors;
for( int i=0; i < corners.size(); ++i)
{
float dx = pixelPoints.at(i).x - projectedPattern.at<float>(i, 0);
float dy = pixelPoints.at(i).y - projectedPattern.at<float>(i, 1);
// Euclidean distance between projected and real measured pixel
float err = sqrt(dx*dx + dy*dy);
errors.push_back(err);
}
// Here, compute max or average of your "errors"
An average backprojection error of a calibrated camera might be in the range of 0 - 2 pixel. According to your two pictures, this would be way more. To me, it looks like a scaling problem. If I am right, you compute the projection yourself. Maybe you can try once cv::projectPoints() and compare.
When it comes to transformations, I learned not to follow my imagination :) The first thing I Do with the returned rVec and tVec is usually creating a 4x4 rigid transformation matrix out of it (I posted once code here). This makes things even less intuitive, but instead it is compact and handy.
Now I know the answers.
Yes, the camera center is the origin of the camera coordinate system.
Consider that the coordinates in the camera system are calculated as (xc,yc,zc). Then xc should be the distance between the camera and
the point in real world in the x direction.
Next, how to determine whether the output matrices are right?
1. as #eidelen points out, backprojection error is one indicative measure.
2. Calculate the coordinates of the points according to their coordinates in the world coordinate system and the matrices.
So why did I get a wrong result(the pattern remained but moved to a different region of the image)?
Parameter cameraMatrix in solvePnP() is a matrix supplying the parameters of the camera's external parameters. In camera matrix, you should use width/2 and height/2 for cx and cy. While I use width and height of the image size. I think that caused the error. After I corrected that and re-calibrated the camera, everything seems fine.