Extracting a plane from an image taken by a camera

Extracting a plane from an image taken by a camera - camera

I have a camera at a known fixed location and orientation.
I also have a plane at a known location whose z position changes.
I want to turn the image from the camera into a top down view of the plane.
I can do this without knowing any positions by using the 4 points of the plane for a homography matrix and warping the image but each time the plane moves in Z I have to repeat this process.
After searching around online most methods seem to center on finding features of the image (using SIFT or something like it) then computing a homography matrix.
With the problem so constrained I thought there may be a simple linear algebra based approach.

Related

Compute absolute rotaion matrix from relative rotation matrix

Using a homography matrix, I am able to find a mapping from one image to another. From this matrix I can also compute a relative rotation matrix between the two images. How can I then compute an absolute rotation matrix? And what are the differences between these two matrices?

General points:
A general homography between images does not imply a camera motion that is a pure rotation.
However, camera motion that is a pure rotation, or one whose translation is very small compared to the distance from the camera and the scene, is well modeled by a homography.
Specifically to your question:
A "relative" rotation is just that, a motion from the orientation of the first camera to the one of the second camera.
An "absolute" rotation, or orientation, describes a motion with respect to a specified "reference" coordinate frame that is constant and independent of the camera motion.
As a special case, if you have only two camera poses, and you use the first one as the reference, then the relative pose of the second one is also its absolute pose.

How does Blender calculate vertex normals?

I'm attempting to calculate vertex normals for various game assets. The normals I calculate are used for "inflating" the model (to draw behind the real model producing a thick outline).
I currently compute the normal for each face and average all of them (several other questions on Stack Overflow suggest this approach). However, this doesn't work for sharp corners like this one (adjacent faces' normals marked in orange, the normal I'm trying to calculate is outlined in green).
The object looks like a small pedestal and we're looking at the front-left corner. There are three adjoining faces (the bottom face isn't visible; its normal points straight down).
Blender computes an excellent normal that lies squarely in the middle of the three faces' normals; it seems like it somehow calculates a normal that has minimum rotation to each of the three face normals. Blender's normal also doesn't change when the quads are triangulated differently.
Averaging the faces' normals gives me a different normal that points slightly upward in the Z-axis (-0.45, -0.89, +0.08). Inflating my model this way doesn't produce a good outline because the bottom face of the outline is shifted up and doesn't enclose the original model.
I attempted to look at the Blender source code but couldn't find what I was looking for. If anyone can point me to the algorithm in the Blender source, I'd accept that also.

Weight the surface normals by the angle of the faces where they join. It is a common practice in surface rendering (see discussion here: http://www.bytehazard.com/code/vertnorm.html), and will ensure that your bottom face is weighted stronger than the two slanted side faces. I don't know if Blender does it differently, but you should give it a try.

Circular Motion of calibrated camera with given degree

according to this project (carving a dinosaur) I'd like to create a dataset with 36 images taken from an object and estimate the appropriate camera projection matrix.
Therefore I calibrated my camera once (extrinsic/intrinsic) for the first image with three chessboard patterns and now I want to add circular motion (rougly 10 degrees) according to the 36 images I've taken to get something like shown here:
My camera is static while the photographed object was rotated 10 degrees for every image.
How do I achieve this? Is it correct to create rotation matrices by hand and add it just to my camera projection matrix?
Thanks for advice

Modifying rotation matrices is not enough, you need to change position of the camera. In structure from motion problem it is assumed that scene is static, while camera is moving. You can consider such case because only relational movement is important.
Let the extrinsic camera matrix be A = R[I | -C], where C is position of camera center in global frame and R is rotation from global frame to the camera frame. Let Ra represent rotation by angle alpha about vertical axis in global frame. It can be written as (cos(alpha),-sin(alpha),0;sin(alpha),cos(alpha),0;0,0,1). Then the required camera matrix can be computed as A2 = R2[I | -C2], where R2 = R * transpose(Ra) and C2 = Ra * C.
However, you should ensure two things when using this approach. Firstly vertical axis of global frame must correspond to a real-world vertical direction. Secondly the origin of global frame must lie on the axis of the camera center rotation. The latter can be achieved by putting the object at the origin of global frame.
If angles are measured inaccurately or global frame is not centered well, then the computed extrinsic matrix can also be inaccurate. It can be used as an initial estimate for a structure from motion algorithm in this case. The other alternative is to calibrate the camera for each frame, not only the first one.

How to calibrate a camera and a robot

I have a robot and a camera. The robot is just a 3D printer where I changed the extruder for a tool, so it doesn't print but it moves every axis independently. The bed is transparent, and below the bed there is a camera, the camera never moves. It is just a normal webcam (playstation eye).
I want to calibrate the robot and the camera, so that when I click on a pixel on a image provided by the camera, the robot will go there. I know I can measure the translation and the rotation between the two frames, but that will probably return lots of errors.
So that's my question, how can I relate the camera and a robot. The camera is already calibrated using chessboards.
In order to make everything easier, the Z-axis can be ignored. So the calibration will be over X and Y.

It depends of what error is acceptable for you.
We have similar setup where we have camera which looks at some plane with object on it that can be moved.
We assume that the image and plane are parallel.
First lets calculate the rotation. Put the tool in such position that you see it on the center of the image, move it on one axis select the point on the image that is corresponding to tool position.
Those two points will give you a vector in the image coordinate system.
The angle between this vector and original image axis will give the rotation.
The scale may be calculated in the similar way, knowing the vector length (in pixels) and the distance between the tool positions(in mm or cm) will give you the scale factor between the image and real world axis.
If this method won't provide enough accuracy you may calibrate the camera for distortion and relative position to the plane using computer vision techniques. Which is more complicated.
See the following links
http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html
http://dasl.mem.drexel.edu/~noahKuntz/openCVTut10.html

World space to screen space (perspective projection)

I'm using a 3d engine and need to translate between 3d world space and 2d screen space using perspective projection, so I can place 2d text labels on items in 3d space.
I've seen a few posts of various answers to this problem but they seem to use components I don't have.
I have a Camera object, and can only set it's current position and lookat position, it cannot roll. The camera is moving along a path and certain target object may appear in it's view then disappear.
I have only the following values
lookat position
position
vertical FOV
Z far
Z near
and obviously the position of the target object.
Can anyone please give me an algorithm that will do this using just these components?
Many thanks.

all graphics engines use matrices to transform between different coordinats systems. Indeed OpenGL and DirectX uses them, because they are the standard way.
Cameras usually construct the matrices using the parameters you have:
view matrix (transform the world to position in a way you look at it from the camera position), it uses lookat position and camera position (also the up vector which usually is 0,1,0)
projection matrix (transforms from 3D coordinates to 2D Coordinates), it uses the fov, near, far and aspect.
You could find information of how to construct the matrices in internet searching for the opengl functions that create them:
gluLookat creates a viewmatrix
gluPerspective: creates the projection matrix
But I cant imagine an engine that doesnt allow you to get these matrices, because I can ensure you they are somewhere, the engine is using it.
Once you have those matrices, you multiply them, to get the viewprojeciton matrix. This matrix transform from World coordinates to Screen Coordinates. So just multiply the matrix with the position you want to know (in vector 4 format, being the 4º component 1.0).
But wait, the result will be in homogeneous coordinates, you need to divide X,Y,Z of the resulting vector by W, and then you have the position in Normalized screen coordinates (0 means the center, 1 means right, -1 means left, etc).
From here it is easy to transform multiplying by width and height.
I have some slides explaining all this here: https://docs.google.com/presentation/d/13crrSCPonJcxAjGaS5HJOat3MpE0lmEtqxeVr4tVLDs/present?slide=id.i0
Good luck :)
P.S: when you work with 3D it is really important to understand the three matrices (model, view and projection), otherwise you will stumble every time.

so I can place 2d text labels on items
in 3d space
Have you looked up "billboard" techniques? Sometimes just knowing the right term to search under is all you need. This refers to polygons (typically rectangles) that always face the camera, regardless of camera position or orientation.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas