What is the right way to resize using NVIDIA NPP to exact destination dimensions? - gpu

I'm trying to use NVIDIA NPP to experiment with some image resizing routines. I want to resize to an exact dimension. I've been looking at image resizing using NVIDIA NPP but all of its resize functions take scale factors for X and Y Dimensions, and I could not see any API taking direct destination dimensions.
As an example, this is one API:
NppStatus nppiResizeSqrPixel_8u_C1R(const Npp8u * pSrc, NppiSize oSrcSize, int nSrcStep, NppiRect oSrcROI, Npp8u * pDst, int nDstStep, NppiRect oDstROI, double nXFactor, double nYFactor, double nXShift, double nYShift, int eInterpolation);
I realize one way could be to find the appropriate scale factor the destination dimension, but we don't exactly know how the API decides destination ROI based on scale factor (since it is floating point math). We could reverse the calculation in the jpegNPP sample to find the scale factor, but the API itself does not make any guarantees so I'm not sure how safe it is. Any ideas what are the possibilities?
As a side question, the API also takes two params, nXShift and nYShift, but just says "Source pixel shift in x-direction". I'm not exactly clear what shift is being talked about here. Do you have an idea?

If I wanted to map the whole SRC image to the smaller rectangle in the DST image as shown in the image below I would use xFactor = yFactor = 0.5 and xShift = 0.5*DST.width and yShift = 0.
Mapping src to half size destination image
In other words, the pixel at (x,y) in the SRC is mapped to the pixel (x',y') in the DST as
x' = xFactor * x + xShift
y' = yFactor * y + yShift
In this case, both the source and dest ROI could be the entire support of the respective images.

Related

Draw circular disk in open3d from center, normal, and radius

I am trying to see if I can cover a pointcloud with disks using an algorithm and I have generated millions of "disks" that have a center, radius, and normal. I am currently doing this:
center, rad, normal = getcirc()
disk = o3d.geometry.TriangleMesh.create_cylinder(radius = rad, height = 1, resolution = 100).translate(center)
I am not sure how to draw to this so that it has the normal I want. The only methods open3d has are rotate (which takes in a 3 by 3 matrix) and transform (which takes in a 4 by 4 matrix) and I am not sure how to incorporate either of those to do this as I have never really studied linear algebra before.

Difference between channel_shift_range and brightness_range in ImageDataGenerator (Keras)?

There are multiple pages (like this and this) that present examples about the effect of channel_shift_range in images. At first glance, it appears as if the images have only had a change in brightness applied.
This issue has multiple comments mentioning this observation. So, if channel_shift_range and brightness_range do the same, why do they both exist?
After long hours of reverse engineering, I found that:
channel_shift_range: applies the (R + i, G + i, B + i) operation to all pixels in an image, where i is an integer value within the range [0, 255].
brightness_range: applies the (R * f, G * f, B * f) operation to all pixels in an image, where f is a float value around 1.0.
Both parameters are related to brightness, however, I found a very interesting difference: the operation applied by channel_shift_range roughly preserves the contrast of an image, while the operation applied by brightness_range roughly multiply the contrast of an image by f and roughly preserves its saturation. It is important to note that these conclusions could not be fulfilled for large values of i and f, since the brightness of the image will be intense and it will have lost much of its information.
Channel shift and Brightness change are completely different.
Channel Shift: Channel shift changes the color saturation level(eg. light Red/dark red) of pixels by changing the [R,G,B] channels of the input image. Channel shift is used to introduce the color augmentation in the dataset so as to make the model learn color based features irrespective of its saturation value.
Below is the example of Channel shift from mentioned the article:
In the above image, if you observe carefully, objects(specially cloud region) are still clearly visible and distinguishable from their neighboring regions even after channel shift augmentation.
Brightness change: Brightness level of the image explains the light intensity throughout the image and used to add under exposure and over exposure augmentation in the dataset.
Below is the example of Brightness augmentation:
In the above image, at low brightness value objects(eg. clouds) have lost their visibility due to low light intensity level.

How to calculate the Horizontal and Vertical FOV for the KITTI cameras from the camera intrinsic matrix?

I would like to calculate the Horizontal and Vertical field of view from the camera intrinsic matrix for the cameras used in the KITTI dataset. The reason I need the Field of view is to convert a depth map into 3D point clouds.
Though this question has been asked quite a long time ago, I felt it needed an answer as I ran into the same issue and was unable to find any info on it.
I have however solved it using the information available in this document and some more general camera calibration documents
Firstly, we need to convert the supplied disparity into distance. This can be done through fist converting the disp map into floats through the method in the dev_kit where they state:
disp(u,v) = ((float)I(u,v))/256.0;
This disparity can then be converted into a distance through the default stereo vision equation:
Depth = Baseline * focal length/ Disparity
Now come some tricky parts. I searched high and low for the focal length and was unable to find it in documentation.
I realised just now when writing that the baseline is documented in the aforementioned source however from section IV.B we can see that it can be found in P(i)rect indirectly.
The P_rects can be found in the calibration files and will be used for both calculating the baseline and the translation from uv in the image to xyz in the real world.
The steps are as follows:
For pixel in depthmap:
xyz_normalised = P_rect \ [u,v,1]
where u and v are the x and y coordinates of the pixel respectively
which will give you a xyz_normalised of shape [x,y,z,0] with z = 1
You can then multiply it with the depth that is given at that pixel to result in a xyz coordinate.
For completeness, as P_rect is the depth map here, you need to use P_3 from the cam_cam calibration txt files to get the baseline (as it contains the baseline between the colour cameras) and the P_2 belongs to the left camera which is used as a reference for occ_0 files.

pose estimation: determine whether rotation and transmation matrix are right

Recently I'm struggling with a pose estimation problem with a single camera. I have some 3D points and the corresponding 2D points on the image. Then I use solvePnP to get the rotation and translation vectors. The problem is, how can I determine whether the vectors are right results?
Now I use an indirect way to do this:
I use the rotation matrix, the translation vector and the world 3D coordinates of a certain point to obtain the coordinates of that point in Camera system. Then all I have to do is to determine whether the coordinates are reasonable. I think I know the directions of x, y and z axes of Camera system.
Is Camera center the origin of the Camera system?
Now consider the x component of that point. Is x equavalent to the distance of the camera and the point in the world space in Camera's x-axis direction (the sign can then be determined by the point is placed on which side of the camera)?
The figure below is in world space, while the axes depicted are in Camera system.
========How Camera and the point be placed in the world space=============
|
|
Camera--------------------------> Z axis
| |} Xw?
| P(Xw, Yw, Zw)
|
v x-axis
My rvec and tvec results seems right and wrong. For a specified point, the z value seems reasonable, I mean, if this point is about one meter away from the camera in the z direction, then the z value is about 1. But for x and y, according to the location of the point I think x and y should be positive but they are negative. What's more, the pattern detected in the original image is like this:
But using the points coordinates calculated in Camera system and the camera intrinsic parameters, I get an image like this:
The target keeps its pattern. But it moved from bottom right to top left. I cannot understand why.
Yes, the camera center is the origin of the camera coordinate system, which seems to be right following to this post.
In case of camera pose estimation, value seems reasonable can be named as backprojection error. That's a measure of how well your resulting rotation and translation map the 3D points to the 2D pixels. Unfortunately, solvePnP does not return a residual error measure. Therefore one has to compute it:
cv::solvePnP(worldPoints, pixelPoints, camIntrinsics, camDistortion, rVec, tVec);
// Use computed solution to project 3D pattern to image
cv::Mat projectedPattern;
cv::projectPoints(worldPoints, rVec, tVec, camIntrinsics, camDistortion, projectedPattern);
// Compute error of each 2D-3D correspondence.
std::vector<float> errors;
for( int i=0; i < corners.size(); ++i)
{
float dx = pixelPoints.at(i).x - projectedPattern.at<float>(i, 0);
float dy = pixelPoints.at(i).y - projectedPattern.at<float>(i, 1);
// Euclidean distance between projected and real measured pixel
float err = sqrt(dx*dx + dy*dy);
errors.push_back(err);
}
// Here, compute max or average of your "errors"
An average backprojection error of a calibrated camera might be in the range of 0 - 2 pixel. According to your two pictures, this would be way more. To me, it looks like a scaling problem. If I am right, you compute the projection yourself. Maybe you can try once cv::projectPoints() and compare.
When it comes to transformations, I learned not to follow my imagination :) The first thing I Do with the returned rVec and tVec is usually creating a 4x4 rigid transformation matrix out of it (I posted once code here). This makes things even less intuitive, but instead it is compact and handy.
Now I know the answers.
Yes, the camera center is the origin of the camera coordinate system.
Consider that the coordinates in the camera system are calculated as (xc,yc,zc). Then xc should be the distance between the camera and
the point in real world in the x direction.
Next, how to determine whether the output matrices are right?
1. as #eidelen points out, backprojection error is one indicative measure.
2. Calculate the coordinates of the points according to their coordinates in the world coordinate system and the matrices.
So why did I get a wrong result(the pattern remained but moved to a different region of the image)?
Parameter cameraMatrix in solvePnP() is a matrix supplying the parameters of the camera's external parameters. In camera matrix, you should use width/2 and height/2 for cx and cy. While I use width and height of the image size. I think that caused the error. After I corrected that and re-calibrated the camera, everything seems fine.

Draw a scatterplot matrix using glut, opengl

I am new to GLUT and opengl. I need to draw a scatterplot matrix for n dimensional array.
I have saved the data from csv to a vector of vectors and each vector corresponds to a row. I have plotted just one scatterplot. And used GL_LINES to draw the grid. My questions
1. How do I draw points in a particular grid? Using GL_POINTS I can only draw points in the entire window.
Please let me know need any further info to answer this question
Thanks
What you need to do is be able to transform your data's (x,y) coordinates into screen coordinates. The most straightforward way to do it actually does not rely on OpenGL or GLUT. All you have to do is use a little math. Determine the screen (x,y) coordinates of the place where you want a datapoint for (0,0) to be on the screen, and then determine how far apart you want one increment to be on the screen. Simply take your original data points, apply the offset, and then scale them, to get your screen coordinates, which you then pass into glVertex2f() (or whatever function you are using to specify points in your API).
For instance, you might decide you want point (0,0) in your data to be at location (200,0) on your screen, and the distance between 0 and 1 in your data to be 30 pixels on the screen. This operation will look like this:
int x = 0, y = 0; //Original data points
int scaleX = 30, scaleY = 30; //Scaling values for each component
int offsetX = 100, offsetY = 100; //Where you want the origin of your graph to be
// Apply the scaling values and offsets:
int screenX = x * scaleX + offsetX;
int screenY = y * scaleY + offsetY;
// Calls to your drawing functions using screenX and screenY as your coordinates
You will have to determine values that make sense for the scalaing and offsets. You can also have your program use different values for different sets of data, so you can display multiple graphs on the same screen. But this is a simple way to do it.
There are also other ways you can go about this. OpenGL has very powerful coordinate transformation functions and matrix math capabilities. Those may become more useful when you develop increasingly elaborate programs. They're most useful if you're going to be moving things around the screen in real-time, or operating on incredibly large data sets, as they allow you to perform these mathematical calculations very quickly using your graphics hardware (which is able to do them much faster than the CPU). However, the time it takes for the CPU to do simple calculations like those where you only are going to do them once or very infrequently on limited sets of data is not a problem for computers today.