Vulkan: Framebuffer larger than Image dimensions - vulkan

This question primarily relates to the dimension parameters (width, height, and layers) in the structure VkFramebufferCreateInfo.
The actual question:
In the case that one or more of the VkImageViews, used in creating a VkFrameBuffer, has dimensions that are larger than those specified in the VkFramebufferCreateInfo used to create the VkFrameBuffer, how does one control which part of that VkImageView is used during a render pass instance?
Alternatively worded question:
I am basically asking in the case that the image is larger (not the same dimensions) than the framebuffer, what defines which part of the image is used (read/write)?
Some Details:
The specification states this is a valid situation (I have seen many people state the attachments used by a framebuffer must match the dimensions of the framebuffer itself, but I can't find support for this in the specification):
Each element of pAttachments must have dimensions at least as large as the corresponding framebuffer dimension.
I want to be clear, that I understand that if I just wanted to draw to part of an image I can use a framebuffer that has the same dimensions as the image, and use viewports and scissors. But scissors and viewports are defined relative to the framebuffer's (0,0) as far as I can tell from the spec, although it is not clear to me.
I'm asking this question to help my understand of the framebuffer as I am certain I have misunderstood something. I feel it may well be the case that (x,y) in framebuffer space, is always (x,y) in image space (As in there is no way of controlling which part of the VkImageView is used).
I have been stuck on this for quite sometime (~4 days), and have tried both the Vulkan: Cookbook and the Vulkan Programming Guide, and read most of the specification, and searched online.
If the question needs clarification, please ask. I just didn't want to make it overly long.
Thank you for reading.

There isn't a way to control which part of the image is used by the framebuffer when the framebuffer is smaller than the image. The framebuffer origin always maps to the image origin.
Allowing attachments to be larger than the framebuffer is only meant to allow reusing memory/images/views for several purposes in a frame even when they don't all need the same dimensions. The typical example is reusing a depth buffer (but not it's contents) for several different render passes. You could accomplish the same thing with memory aliasing, but engines that have to support multiple APIs might find it easier to do it this way.

The way to control where you render to is by controlling the viewport. That is, you specify a framebuffer size that's actually big enough to cover the total area of the target images that you may want to render to, and use the viewport transform/scissoring to render to a specific area of those images.
There is no post-viewport transformation that goes from framebuffer space to image space. That would be decidedly redundant, since we already have a post-NDC transform. There's no point in having two of them.
Sure, VkRenderPassBeginInfo has the renderArea object, but that is more of a promise from the user rather than a guarantee for the system:
The application must ensure (using scissor if necessary) that all rendering is contained within the render area, otherwise the pixels outside of the render area become undefined and shader side effects may occur for fragments outside the render area.
So basically, the implementation doesn't do anything with renderArea. It doesn't set up a transformation or anything; you're just promising that no framebuffer pixels outside of that area will be impacted.
In any case, there's really little point to providing a framebuffer size that's smaller than the images sizes. That sort of thing is more the perview of the renderArea than the framebuffer specification.

Related

Culling off-screen objects in OpenGL ES 2 2D

I'm playing about with OpenGL ES 2.0. If I'm working with a simple 2D projection, if I have a large 2D grid of vertices which are pretty much static (think map tiles), of which only a small proportion are visible at any one time, would it be better to...
Work out in the CPU which vertices are visible, and and create a VBO to draw just those triangles that make up the visible tiles in each frame?
or
Keep a static VBO with the entire tiled grid, and then just rely on the graphics card (RPi, in my case) to clip out the off-screen triangles?
Or perhaps some combination of the two (like sets of overlapping pre-computed grids)? How big does the grid have to be before the latter option becomes unworkable?
Edit
I decided to make several calls to glDrawElements(), drawing sub-ranges of the index buffer that I knew would overlap the viewport. At the scale I'm working at it doesn't seem to make any difference to the speed over drawing the entire element array, even on a Pi Zero.
However, this approach would require more computation to determine which ranges of elements needed to be rendered if there was any rotation of the grid involved - effectively rasterising my own quad. I'm interested to hear if this is a reasonable approach.
There are some other options like a more exotic structure for breaking up the plane into sub areas, I guess. Still not sure if any of this is really necessary, though.
Thanks!
Please note: I don't want to discuss drawing tiles in the fragment shader, I'm more interested in the correct way to work with the vertex shader than actually solving the described problem.
If that's a regular grid, I'd split it in large chunks, so the screen width (larger side) would fit 2-3 such chunks. They don't need to overlap if it's regular grid.
Checking one chunk's visibility is trivial and cheap, as well as finding/selecting those few that must be drawn. And the wasted/clipped area is small enough to not worry about it. You don't have to go crazy and trim every single vertex that's outside of the screen.
Each chunk would have own VBO, and it would be weakly cached when it goes fully outside of screen, so you don't have to rebuild/reload resources needed to draw that chunk if you quickly return to this part of the map.
Splitting in chunks minimizes the memory requirements and speeds up the level loading. You spend time only loading the part of the screen that user will see immediately. This also allows quite huge maps, since you can prefetch the areas that you're going towards to.

Guarantee no anti-aliasing rendering to integer texture

For a WebGL 2 canvas, I need a simple 'picking' system, i.e. given a point p in 2D, the system can tell which object (if any) was rendered to p. (I don't need the pick results in the CPU, only in a shader.)
To implement this, each object will be rendered with a different 'color id' to a framebuffer dedicated to picking. I am thinking of using an R16UI or R32UI texture format, and GL_NEAREST filtering. My concern is anti-aliasing: how do I guarantee that the edges of the objects won't get anti-aliased, thus changing the output values, and corrupting the pick system?
I am looking for both the code to disable anti-aliasing, and explanations on why this is/isn't guaranteed, from those who know the standards.
WebGL (and OpenGL ES) don't antialias framebuffers in any automatic way. Antialiasing of framebuffers is a manual operation. In WebGL1 you can't antialias a framebuffer at all. In WebGL2 you'd create a multisample renderbuffer. So basically if you don't create a multisample render buffer you'll get no antialiasing.
Also Integer and Unsigned integer texture are not filterable which means they only support gl.NEAREST
So there's nothing to show you. If you use an R16UI or R32UI texture and render to it it will just work as you were hoping.

What's the name of Polygon Masking?

When masking a bitmap by reducing the polygons it's displayed on, what is the name of doing this?
I often see this done for physics, sort of, to have an edge around an object for detection etc, but a long while ago remember seeing a similar approach being used with polygons for visual masking, but have completely forgotten what it was called or how to find it.
ADDITIONAL INFO:
In this example, the polygons are used to "mask" some of the image:
http://fancyratstudios.com/2010/02/programming/progresstimer-for-cocos2d/
What's the name of using polygons to do masking in this manner?
According to the definition you gave, the technique you're referring to could be Gouraud shading, whose definition is also reported in the Video-Based Rendering book on page 50:
Hole-filling: for polygons that are not depicted in any image,
determine appropriate vertex colors; during rendering, Gouraud shading
is used to mask the missing texture information.
Not sure what your talking about, but it's either rasterization (projecting polygons to a plane), or "apply a texture" (projecting an image to a set of polygons), or decimation (reducing the amount of polygons)

Perspective distortions correction

I'm searching for a methods of text recognition based on document borders.
Or the methods that can solve the problem of finding new viewpoint.
For exmp. the camera is in point (x1,y1,z1) and the result picture with perspective distortions, but we can find (x2,y2,z2) for camera to correct picture.
Thanks.
The usual approach, which assumes that the document's page is approximately flat in 3D space, is to warp the quadrangle encompassing the page into a rectangle. To do so you must estimate a homography, i.e. a (linear) projective transformation between the original image and its warped counterpart.
The estimation requires matching points (or lines) between the two images, and a common choice for documents is to map the page corners in the original images to the image corners of the warped image. This will in general produce a rectangle with an incorrect aspect ratio (i.e. the warped page will look "wider" or "taller" than the real one), but this can be easily corrected if you happen to know in advance what the real aspect ratio is (for example, because you know the type of paper used, whether letter, A4, etc.).
A simple algorithm to perform the estimation is the so-called Direct Linear Transformation.
The OpenCV library contains routines to help accomplishing all these tasks, look into it.

How do I rotate an OpenGL view relative to the center of the view as opposed to the center of the object being displayed?

I'm working on a fork of Pleasant3D.
When rotating an object being displayed the object always rotates around the same point relative to to itself even if that point is not at the center of the view (e.g. because the user has panned to move the object in the view).
I would like to change this so that the view always rotates the object around the point at the center of the view as it appears to the user instead of the center of the object.
Here is the core of the current code that rotates the object around its center (slightly simplified) (from here):
glLoadIdentity();
// midPlatform is the offset to reach the "middle" of the object (or more specifically the platform on which the object sits) in the x/y dimension.
// This the point around which the view is currently rotated.
Vector3 *midPlatform = [self.currentMachine calcMidBuildPlatform];
glTranslatef((GLfloat)cameraTranslateX - midPlatform.x,
(GLfloat)cameraTranslateY - midPlatform.y,
(GLfloat)cameraOffset);
// trackBallRotation and worldRotation come from trackball.h/c which appears to be
// from an Apple OpenGL sample.
if (trackBallRotation[0] != 0.0f) {
glRotatef (trackBallRotation[0], trackBallRotation[1], trackBallRotation[2], trackBallRotation[3]);
}
// accumlated world rotation via trackball
glRotatef (worldRotation[0], worldRotation[1], worldRotation[2], worldRotation[3]);
glTranslatef(midPlatform.x, midPlatform.y, 0.);
// Now draw object...
What transformations do I need to apply in what order to get the effect I desire?
Some of what I've tried so far
As I understand it this is what the current code does:
"OpenGL performs matrices multiplications in reverse order if multiple transforms are applied to a vertex" (from here). This means that the first transformation to be applied is actually the last one in the code above. It moves the center of the view (0,0) to the center of the object.
This point is then used as the center of rotation for the next two transformations (the rotations).
Finally the midPlatform translation is done in reverse to move the center back to the original location and the XY translations (panning) done by the user is applied. Here also the "camera" is moved away from the object to the proper location (indicated by cameraOffset).
This seems straightforward enough. So what I need to change is instead of translating the center of the view to the center of the object (midPlatform) I need to translate it to the current center of the view as seen by the user, right?
Unfortunately this is where the transformations start affecting each other in interesting ways and I am running into trouble.
I tried changing the code to this:
glLoadIdentity();
glTranslatef(0,
0,
(GLfloat)cameraOffset);
if (trackBallRotation[0] != 0.0f) {
glRotatef (trackBallRotation[0], trackBallRotation[1], trackBallRotation[2], trackBallRotation[3]);
}
// accumlated world rotation via trackball
glRotatef (worldRotation[0], worldRotation[1], worldRotation[2], worldRotation[3]);
glTranslatef(cameraTranslateX, cameraTranslateY, 0.);
In other words, I translate the center of the view to the previous center, rotate around that, and then apply the camera offset to move the camera away to the proper position. This makes the rotation behave exactly the way I want it to, but it introduces a new issue. Now any panning done by the user is relative to the object. For example if the object is rotated so that the camera is looking along the X axis end-on, if the user pans left to right the object appears to be moving closer/further from the user instead of left or right.
I think I can understand why the is (XY camera translations being applied before rotation), and I think what I need to do is figure out a way to cancel out the translation from before the rotation after the rotation (to avoid the weird panning effect) and then to do another translation which translates relative to the viewer (eye coordinate space) instead of the object (object coordinate space) but I'm not sure exactly how to do this.
I found what I think are some clues in the OpenGL FAQ(http://www.opengl.org/resources/faq/technical/transformations.htm), for example:
9.070 How do I transform my objects around a fixed coordinate system rather than the object's local coordinate system?
If you rotate an object around its Y-axis, you'll find that the X- and Z-axes rotate with the object. A subsequent rotation around one of these axes rotates around the newly transformed axis and not the original axis. It's often desirable to perform transformations in a fixed coordinate system rather than the object’s local coordinate system.
The root cause of the problem is that OpenGL matrix operations postmultiply onto the matrix stack, thus causing transformations to occur in object space. To affect screen space transformations, you need to premultiply. OpenGL doesn't provide a mode switch for the order of matrix multiplication, so you need to premultiply by hand. An application might implement this by retrieving the current matrix after each frame. The application multiplies new transformations for the next frame on top of an identity matrix and multiplies the accumulated current transformations (from the last frame) onto those transformations using glMultMatrix().
You need to be aware that retrieving the ModelView matrix once per frame might have a detrimental impact on your application’s performance. However, you need to benchmark this operation, because the performance will vary from one implementation to the next.
And
9.120 How do I find the coordinates of a vertex transformed only by the ModelView matrix?
It's often useful to obtain the eye coordinate space value of a vertex (i.e., the object space vertex transformed by the ModelView matrix). You can obtain this by retrieving the current ModelView matrix and performing simple vector / matrix multiplication.
But I'm not sure how to apply these in my situation.
You need to transform/translate "center of view" point into origin, rotate, then invert that translation, back to the object's transform. This is known as a basis change in linear algebra.
This is way easier to work with if you have a proper 3d-math library (I'm assuming you do have one), and that also helps to to stay far from the deprecated fixed-pipeline APIs. (more on that later).
Here's how I'd do it:
Find the transform for the center of view point in world coordinates (figure it out, then draw it to make sure it's correct, with x,y,z axis too, since the axii are supposed to be correct w.r.t. the view). If you use the center-of-view point and the rotation (usually the inverse of the camera's rotation), this will be a transform from world origin to the view center. Store this in a 4x4 matrix transform.
Apply the inverse of the above transform, so that it becomes the origin. glMultMatrixfv(center_of_view_tf.inverse());
Rotate about this point however you want (glRotate())
Transform everything back to world space (glMultMatrixfv(center_of_view_tf);)
Apply object's own world transform (glTranslate/glRotate or glMultMatrix) and draw it.
About the fixed function pipeline
Back in the old days, there were separate transistors for transforming a vertex (or it's texture coordinates), computing where light was in relation to it applying lights (up to 8) and texturing fragments in many different ways. Simply, glEnable(), enabled fixed blocks of silicon to do some computation in the hardware graphics pipeline. As performance grew, die sized shrunk and people demanded more features, the amount of dedicated silicon grew too, and much of it wasn't used.
Eventually, it got so advanced that you could program it in rather obscene ways (register combiners anyone). And then, it became feasible to actually upload a small assembler program for all vertex-level transforms. Then, it made to sense to keep a lot of silicon there that just did one thing (especially as you could've used those transistors to make the programmable stuff faster), so everything became programmable. If "fixed function" rendering was called for, the driver just converted the state (X lights, texture projections, etc) to shader code and uploaded that as a vertex shader.
So, currently, where even the fragment processing is programmable, there is just a lot of fixed-function options that is used by tons and tons of OpenGL applications, but the silicon on the GPU just runs shaders (and lots of it, in parallell).
...
To make OpenGL more efficient, and the drivers less bulky, and the hardware simpler and useable on mobile/console devices and to take full advantage of the programmable hardware that OpenGL runs on these days, many functions in the API are now marked deprecated. They are not available on OpenGL ES 2.0 and beyond (mobile) and you won't be getting the best performance out of them even on desktop systems (where they will still be in the driver for ages to come, serving equally ancient code bases originating back to the dawn of accelerated 3d graphics)
The fixed-functionness mostly concerns how transforms/lighting/texturing etc. are done by "default" in OpenGL (i.e. glEnable(GL_LIGHTING)), instead of you specifying these ops in your custom shaders.
In the new, programmable, OpenGL, transform matrices are just uniforms in the shader. Any rotate/translate/mult/inverse (like the above) should be done by client code (your code) before being uploaded to OpenGL. (Using only glLoadMatrix is one way to start thinking about it, but instead of using gl_ModelViewProjectionMatrix and the ilk in your shader, use your own uniforms.)
It's a bit of a bother, since you have to implement quite a bit of what was done by the GL driver before, but if you have your own object list/graph with transforms and a transform somewhere etc, it's not that much work. (OTOH, if you have a lot of glTranslate/glRotate in your code, it might be...). As I said, a good 3d-math library is indispensable here.
-..
So, to change the above code to "programmable pipeline" style, you'd just do all these matrix multiplications in your own code (instead of the GL driver doing it, still on the CPU) and then send the resulting matrix to opengl as a uniform before you activate the shaders and draw your object from VBOs.
(Note that modern cards do not have fixed-function code, just a lot of code in the driver to compile fixed-function rendering state to a shader that does the job. No wonder "classic" GL drivers are huge...)
...
Some info about this process is available at Tom's Hardware Guide and probably Google too.