Out-of-bounds index in indices OpenGL - indexing

What happens if I use an out-of-bounds index in my indices buffer, then a call to glDrawElements?!
It seems in OpenGL ES, it will use the default constant value, for example if three colors with four positions are provided, when I use index 3, it will use solid black for the color.
But in WebGL, it will throw invalid operation. However, these results are just in my device and I couldn't find any thing related to this in official documents. What is the documented behavior of this in all OpenGL APIs (core, ES, and WebGL)?

The behaviour for out-of-bounds buffer accesses are clearly worded in all specifications:
WebGL
For WebGL, the relevant section is "6.6 Enabled Vertex Attributes and Range Checking":
It is possible for draw commands to request data outside the bounds of a WebGLBuffer by calling a drawing command that requires fetching data for an active vertex attribute, when it is enabled as an array, either directly (drawArrays), or indirectly from an indexed draw (drawElements). If this occurs, then one of the following behaviors will result:
The WebGL implementation may generate an INVALID_OPERATION error and draw no geometry.
Out-of-range vertex fetches may return any of the following values:
Values from anywhere within the buffer object.
Zero values, or (0,0,0,x) vectors for vector reads where x is a valid value represented in the type of the vector components and may be any of:
0, 1, or the maximum representable positive integer value, for signed or unsigned integer components
0.0 or 1.0, for floating-point components
So, a compliant WebGL implementation may do either of the above. The implementation used by you seems to throw an error (which is permitted). Other implementations may behave differently and return some constant value for a fetched vertex attribute.
OpenGL Desktop
For OpenGL Desktop, the relevant section is "6.4 Effects of Accessing Outside Buffer Bounds":
Most, but not all GL commands operating on buffer objects will detect attempts to read from or write to a location in a bound buffer object at an offset less than zero, or greater than or equal to the buffer’s size. When such an attempt is detected, a GL error is generated. Any command which does not detect these attempts, and performs such an invalid read or write, has undefined results, and may result in GL interruption or termination.
So, here, an implementation is quite free to do whatever on an out-of-bounds buffer access, such as with an out-of-bounds index in an element buffer when doing an indexed draw call.
There is also the "robust buffer access" feature, which previously was exposed as the extensions KHR_robust_buffer_access_behaviour/ARB_robust_buffer_access_behavior, but becoming core in OpenGL 4.3, which clearly defines the behaviour of indexing out of the bounds of a vertex buffer:
Robust buffer access can be enabled by creating a context with robust
access enabled through the window system binding APIs. When enabled,
indices within the elements array that reference vertex data that
lies outside the enabled attributes vertex buffer objects, result in
reading zero.
See also section "10.3.7 Robust Buffer Access" in the OpenGL 4.6 core specification.
OpenGL ES
Before OpenGL ES 3.2 accessing out-of-bounds vertices from indexed draw calls seem to be completely undefined behaviour, which also may result in program termination.
Starting with OpenGL ES 3.2, there is the same "Robust Buffer Access" capability that can be enabled as with OpenGL since 4.3 or with the ARB_robust_buffer_access_behaviour. See section "10.3.5 Robust Buffer Access" in the OpenGL ES 3.2 specification:
Robust buffer access is enabled by creating a context with robust access enabled through the window system binding APIs. When enabled, indices within the element array (see section 10.3.8) that reference vertex data that lies outside the enabled attribute’s vertex buffer object result in undefined values for the corresponding attributes, but cannot result in application failure.
Before OpenGL ES 3.2 there is still the extension KHR_robust_buffer_access_behaviour that can be used if available.

Related

how to get soft particle effect using direct3d 11 api

I tried all the ways to calculate the particle alpha, and set shaderResource to the draw process, in the renderdoc, the screenDepthTexture is always no Resource.
You’re probably trying to use the same depth buffer texture in two stages of the pipeline at the same time: read in pixel shader to compute softness, and use the same texture as a depth render target in output merger stage.
This is not going to work. When you call OMSetRenderTargets, the pixel shader binding is unset for the resource view of that texture.
An easy workaround is making a copy of your depth texture with CopyResource, and bind the copy to the input of the pixel shader. This way your pixel shader can read from there, while output merger stage uses another copy as depth/stencil target.
In the future, to troubleshoot such issues use D3D11_CREATE_DEVICE_DEBUG flag when creating the device, and read debug output in visual studio. RenderDoc is awesome for higher level bugs, when you’re rendering something but it doesn’t look right. Your mistake is incorrect use of D3D API. That debug layer in Windows SDK is more useful than RenderDoc for such things.

Vulkan: Uploading non-pow-of-2 texture data with vkCmdCopyBufferToImage

There is a similar thread (Loading non-power-of-two textures in Vulkan) but it pertains to updating data in a host-visible mapped region.
I wanted to utilize the fully-fledged function vkCmdCopyBufferToImage to copy data present in a host-visible buffer to a device-local image. My image has dim which are not power of 2 (they are 1280x720, to be more specific).
When doing so, I've seen that it works fine on NVIDIA and Intel, but not on AMD, where the image becomes "distorted", which indicates problem with rowPitch/padding.
My host-visible buffer is tightly packed so bufferRowLength and bufferImageHeight are the same as imageExtent.width and imageExtent.height.
Shouldn't this function cater for non-power-of-2 textures? Or maybe I'm doing it wrong?
I could implement a workaround with a compute shader but I thought this function's purpose was to be generic. Also, the downside of using a compute shader is that the copy operation could not be performed on a transfer-only queue.

What is the DirectX 12 equivalent of Vulkan's "transient attachment"?

I have a compute shader which I'd like to output to an image/buffer which is meant to be intermediate stoarge between two pipelines: a compute pipeline, and a graphics pipeline. The graphics pipeline is actually a "dummy", in that it does nothing apart from copy the contents of the intermediate buffer into a swapchain image. This is necessitated by the fact that DX12 deprecated the ability of compute pipelines to use UAVS to directly write into swapchain images.
I think the intermediate storage should be a "transient" attachment, in the Vulkan sense:
VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT specifies that the memory bound to this image will have been allocated with the VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT (see Memory Allocation for more detail). This bit can be set for any image that can be used to create a VkImageView suitable for use as a color, resolve, depth/stencil, or input attachment.`
This is explained in this article:
Finally, Vulkan includes the concept of transient attachments. These are framebuffer attachments that begin in an uninitialized or cleared state at the beginning of a renderpass, are written by one or more subpasses, consumed by one or more subpasses and are ultimately discarded at the end of the renderpass. In this scenario, the data in the attachments only lives within the renderpass and never needs to be written to main memory. Although we’ll still allocate memory for such an attachment, the data may never leave the GPU, instead only ever living in cache. This saves bandwidth, reduces latency and improves power efficiency.
Does DirectX 12 have a similar image usage concept?
Direct3D 12 does not have this concept. And the reason for that limitation ultimately boils down to why transient allocation exists. TL;DR: It's not for doing the kind of thing you're trying to do.
Vulkan's render pass system exists for one purpose: to make tile-based renderers first-class citizens of the rendering system. TBRs do not fit well in OpenGL or D3D's framebuffer model. In both APIs, you can just swap framebuffers in and out whenever you want.
TBRs do not render to memory directly. They perform rendering operations into internal buffers, which are seeded from memory and then possibly written to memory after the rendering operation is complete. Switching rendered images whenever you want works against this structure, which is why TBR vendors have a list of things you're not supposed to do if you want high-performance in your OpenGL ES code.
Vulkan's render pass system is an abstraction of a TBR system. In the abstract model, the render pass system potentially reads data from the images in the frame buffer, then performs a bunch of subpasses on copies of this data, and at the end, potentially writes the updated data back out into the images. So from the outside of the process, it looks like you're rendering to the images, but you're not. To maintain this illusion, for the duration of a render pass, you can only use those framebuffer images in the way that the render pass model allows: as attachments.
Now consider deferred rendering. In deferred rendering, you render to g-buffers, which you then read in your lighting passes to generate the final image. Once you've generated the final image, you don't need those g-buffers anymore. In a regular GPU, that doesn't mean anything; because rendering goes directly to memory, those g-buffers must take up actual storage.
But consider how a TBR works. It does rendering into a single tile; in optimal cases, it processes all of the fragments for a single tile at once. Which means it goes through the geometry and lighting passes. For a TBR, the g-buffer is just a piece of scratch memory you use to get the final answer; it doesn't need to be read from memory or copied to memory.
In short, it doesn't need memory.
Enter lazily allocated memory and transient attachment images. They exist to allow TBRs to keep g-buffers in tile memory and never to have to allocate actual storage for them (or at least, it only happens if some runtime circumstance occurs that forces it, like shoving too much geometry at the GPU). And it only works within a render pass; if you end a render pass and have to use one of the g-buffers in another render pass, then the magic has to go away and the data has to touch actual storage.
The Vulkan API makes how specific this use case is very explicit. You cannot bind a piece of lazily-allocated memory to an image that does not have the USAGE_TRANSIENT_ATTACHMENT flag set on it (or to a buffer of any kind). And you'll notice that it says "transient attachment", as in render pass attachments. It says this because you'll also notice that transient attachments cannot be used for non-attachment uses (part of the valid usage tests for VkImageCreateInfo). At all.
What you want to do is not the sort of thing that lazily allocated memory is made for. It can't work.
As for Direct3D 12, the API is not designed to run on mobile GPUs, and since only mobile GPUs are tile-based renderers (some recent desktop GPUs have TBR similarities, but are not full TBRs), it has no facilities designed explicitly for them. And thus, it has no need for lazily allocated memory or transient attachments.

Why do we need multiple render passes and subpasses?

I had some experience with DirectX12 in the past and I don't remember something similar to render passes in Vulkan so I can't make an analogy. If I'm understanding correctly command buffers inside the same subpass doesn't need to be synchronized. So why to complicate and make multiple of them? Why can't I just take one command buffer and put all my frame related info there?
Imagine that the GPU cannot render to images directly. Imagine that it can only render to special framebuffer memory storage, which is completely separate from regular image memory. You cannot talk to this framebuffer memory directly, and you cannot allocate from it. However, during a rendering operation, you can copy data from images into it, read data out of it into images, and of course render to this internal memory.
Now imagine that your special framebuffer memory is fixed in size, a size which is smaller than the size of the overall framebuffer you want to render to (perhaps much smaller). To be able to render to images that are bigger than your framebuffer memory, you basically have to execute all rendering commands for those targets multiple times. To avoid running vertex processing multiple times, you need a way to store the output of vertex processing stages.
Furthermore, when generating rendering commands, you need to have some idea of how to apportion your framebuffer memory. You may have to divide up your framebuffer memory differently if you're rendering to one 32-bpp image than if you're rendering to two. And how you assign your framebuffer memory can affect how your fragment shader code works. After all, this framebuffer rendering memory may be directly accessible by the fragment shader during a rendering operation.
That is the basic idea of the render pass model: you are rendering to special framebuffer memory, of an indeterminate size. Every aspect of the render pass system's complexity is based on this conceptual model.
Subpasses are the part where you determine exactly which things you're rendering to at the moment. Because this affects framebuffer memory arrangement, graphics pipelines are always built by referring to a subpass of a render pass. Similarly, secondary command buffers that are to be executed within a subpass must provide the subpass it will be used within.
When a render pass instance begins execution on a queue, it (conceptually) copies the attachment images we intend to render to into framebuffer rendering memory. At the end of the render pass, the data we render is copied back out to the attachment images.
During the execution of a render pass instance, the data for attachment images is considered "indeterminate". While the model says that we're copying into framebuffer rendering memory, Vulkan doesn't want to force implementations to actually copy stuff if they directly render to images.
As such, Vulkan merely states that no operation can access images that are being used as attachments, except for those which access the images as attachments. For example, you cannot read an attachment image as a texture. But you can read from it as an input attachment.
This is a conceptual description of the way tile-based renderers work. And this is the conceptual model that is the foundation of the Vulkan render pass architecture. Render targets are not accessible memory; they're special things that can only be accessed in special ways.
You can't "just" read from a G-buffer because, while you're rendering to that G-buffer, it exists in special framebuffer memory that isn't in the image yet.
Both features primarily exist for tile-based GPUs, which are common in mobile but, historically, uncommon on desktop computers. That's why DX12 doesn't have an equivalent, and Metal (iOS) does. Though both Nvidia's and AMD's recent architectures do a variant of tile-based rendering now also, and with the recent Windows-on-ARM PCs using Qualcomm chips (tile-based GPU), it will be interesting to see how DX12 evolves.
The benefit of render passes is that during pixel shading, you can keep the framebuffer data in on-chip memory instead of constantly reading and writing external memory. Caches help some, but without reordering pixel shading, the cache tends to thrash quite a bit since it's not large enough to store the entire framebuffer. A related benefit is you can avoid reading in previous framebuffer contents if you're just going to completely overwrite them anyway, and avoid writing out framebuffer contents at the end of the render pass if they're not needed after it's over. In many applications, tile-based GPUs never have to read and write depth buffer data or multisample data to or from external memory, which saves a lot of bandwidth and power.
Subpasses are an advanced feature that, in some cases, allow the driver to effectively merge multiple render passes into one. The goal and underlying mechanism is similar to the OpenGL ES Pixel Local Storage extension, but the API is a bit different in order to allow more GPU architectures to support it and to make it more extensible / future-proof. The classic example where this helps is with basic deferred shading: the first subpass writes out gbuffer data for each pixel, and later subpasses use that to light and shade pixels. Gbuffers can be huge, so keeping all of that on-chip and never having to read or write it to main memory is a big deal, especially on mobile GPUs which tend to be more bandwidth- and power-constrained.

Does Vulkan allow draw commands in memory buffers?

I've mainly used directx in my 3d programming. I'm just learning Vulkan now.
Is this correct:
In vulkan, a draw call (an operation that causes a group of primitives to be rendered, indexed or non-indexed), can only be executed in a command stream by executing a draw call when building that command stream. If you want to draw 3 objects using different vertex or index buffers (/offsets), you will, in the general case, execute 3 API calls.
In d3d12, instead, the arguments for a draw call can come from a GPU memory buffer, filled in any of the ways buffers are filled, including using the GPU.
I'm aware of (complex) ways to essentially draw separate models as one batch, even on API's older than dx12. And of course drawing repeated geometry without multiple drawcalls is trivial.
But the straightforward "write draw commands into GPU memory like you write other data into GPU memory" feature is only available on DX12, correct?
Indirect drawing is a thing in Vulkan. It does require a single vertex buffer contains all the data but you don't need to draw all of the buffer in each call.
There is an extension that allows you to build a set of drawing commands in gpu memory. It also allows binding different descriptor sets and vertex buffers between draws.