I'm using WebGL (≈ OpenGL ES 2.0). At the moment, my application uses texture unit 0 as a per-model texture, and 1 and 2 as constant textures used by the shader.
In order to load and set up a texture, I do
var texture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.texImage2D(...);
gl.texParameteri(...);
However, this stomps on the state of which texture is bound to the current texture unit. In order to avoid this program, I added
gl.activeTexture(gl.TEXTURE0);
to the beginning of the above code, so that it only affects the binding for texture unit 0 which is reset before drawing any geometry anyway.
Is there a better approach, i.e. one that does not involve happening to have a “scratch” texture unit available? Is there an established best practice for this state management problem?
For me this issue comes under the same category as a few other: what can you assume about the GL state at certain points in the program?
AFAICT you basically have two options:
assume nothing, and set the state the way you need it. i.e. call glActiveTextrure() as you are currently doing
set rules for what state the GL should be in at certain points in your program. e.g. for your case, have the drawing part call glActiveTexture(gl.TEXTURE0) when it is done, so the pre-drawing parts can assume it's set that way.
But really, don't think too much about it. In OpenGL you optimise your program by maximising (within certain practical limits) the ratio of polygons to function calls, which you can do as well by passing lots of polygons do your glDrawArrays() calls etc.
Related
Does changing the blend function in Metal needs setting a whole new MTLRenderPipelineState?
I assume YES, because the MTLRenderPipelineState is immutable, so I cannot change its descriptor and, for example descriptor's sourceRGBBlendFactor property. But I wanted to confirm, as this sounds a little inefficient to generate large objects to change a single parameter.
Edit:
I am thinking about a case, where I am drawing one vertex buffer with series of meshes and multiple call to -drawPrimitives:. Each mesh can use a different blend mode but all use the same vertex and fragment shader. In OpenGL I could switch glBlendFunc() between the draw calls. In Metal I need to set a whole separate MTLRenderPipelineState with several state values.
Some objects in Metal are designed to be transient and extremely lightweight, while others are more expensive and can last for a long time, perhaps for the lifetime of the app.
Command buffer and command encoder objects are transient and designed for a single use. They are very inexpensive to allocate and deallocate, so their creation methods return autoreleased objects.
MTLRenderPipelineState is not transient.
Does changing the blend function in Metal needs setting a whole new
MTLRenderPipelineState?
Yes, you must create a whole new MTLRenderPipelineState object for each blending configuration.
I thought in two ways to write my opengl es 2.0 code.
First, I write many calls to draw elements in the screen with many VAOs and VBOs or one only VAO and many VBOs.
Second, I save the coordinates of all elements in one list and I write all vertices of these coordinates in one only VAO and one only VBO and draw all vertices in the screen.
What is the better way that I should follow?
These are the ones I thought, what other ways are there?
The VAO is meant to save you some setup calls when setting the vertex attributes pointers and enabling/disabling the pipeline states related to that setup. Having just one VAO isn't saving you anything, because you will repeatedly re-bind the vertex buffers and change some settings. So you should aim to have multiple VAOs, one per "static" rendering batch, but not necessarily one per object drawn.
As to having all vertices in single VBO or many VBOs - that really depends on the task.
Having all data in single VBO has no benefits if you draw that all in many calls. But there's also no point in allocating one VBO per sprite. It's always about the balance between the costs of different calls to setup the pipeline, so ideally you try different approaches and decide what's best for you in your particular case.
There might be restrictions on the buffer sizes, and there's definitely "reasonable" sizes preferred by specific implementations. I remember some issues with old Intel drivers, when rendering the portion of the buffer would process the entire buffer, skipping unneeded vertices.
I'm trying to draw scenes on top of each other, my way of doing so in OpenGL was to draw each scene and then clear the depth stencil and then draw the next scene.
For Vulkan world, if we want to translate my previous way, I have to start a render-pass and then draw each scene and then clear the depth-stencil (vkCmdClearAttachments) and then draw the next scene and end the render-pass. But several questions come in my mind:
Is there any better strategy for it? (e.g. starting new-render pass for each scene.)
I found the vkCmdClearAttachments but I'm not sure that it needs synchronization with previous and next commands or not?
The "optimal" way to do this, whether in Vulkan or OpenGL, is to partition your depth buffer into two regions: one for each scene. You use the viewport transform for partitioning the depth buffer. One scene would get the [0.5, 1] range, while the next would get the [0, 0.5] range. glDepthRange is what you would use to set this in OpenGL, and the min/maxDepth fields of VkViewport handle the similar transform in Vulkan.
If for some reason you can't partition your depth buffer (perhaps because both scenes need the extra precision of small floating-point values up-close), then which alternative is more optimal will almost certainly depend on the underlying hardware. On a tile-based renderer, I would imagine that vkCmdClearAttachments is not good news as far as performance is concerned. But for a traditional renderer, such a clear call is hardly a performance problem, and the cost of having to begin a new renderpass would be the main thing you would use.
Let me say the scenario, we have several meshes with the same shader (material type, e.g. PBR material), but the difference between meshes materials are the uniform buffer and textures for rendering them.
For uniform buffer we have a dynamic uniform buffer technique that uniform buffer offsets can be specify for each draw in the command buffer, but for the image till here I didn't find a way of specifying image view in command buffer for descriptor set. In all the sample codes I have seen till now, for every mesh and every material of that mesh they have a new pipeline, descriptor sets and etc.
I think it is not the best way, there must be a way to only have one pipeline and descriptor set and etc for a material type and only change the uniform buffer offset and texture image-view and sampler, am I right?
If I'm wrong, are these samples doing the best way?
How should I specify the VkDescriptorPoolCreateInfo.maxSets (or other limits like that) for dynamic scene that every minute meshes will add and remove?
Update:
I think it is possible to have a same pipeline and descriptor set layout for all of the objects but problem with VkDescriptorPoolCreateInfo.maxSets (or other limits like that) and the best practice still exist.
It is not duplicate
I was seeking for a way of specifying textures like what we can do with dynamic uniform buffer (to reduce number of descriptor sets) and along with this question there were complementary questions mostly to find best practices for the way that's gonna be suggested with an answer.
You have many options.
The simplest mechanism is to divide your descriptor set layout into sets based on the frequency of changes. Things that change per-scene would be in set 0, things that change per-kind-of-object (character, static mesh, etc), would be in set 1, and things that change per-object would be in set 2. Or whatever. The point is that the things that change with greater frequency go in higher numbered sets.
This texture is per-object, so it would be in the highest numbered set. So you would give each object its own descriptor set containing that texture, then apply that descriptor set when you go to render.
As for VkDescriptorPoolCreateInfo.maxSets, you set that to whatever you feel is appropriate for your system. And if you run out, you can always create another pool; nobody's forcing you to use just one.
However, this is only one option. You can also employ array textures or arrays of textures (depending on your hardware capabilities). In either method, you have an array of different images (either as a single image view or multiple views bound to the same arrayed descriptor). Your per-object uniform data would have that object's texture index, so that it can fetch the index from the array texture/array of textures.
I've got some OpenGL drawing code that I'm trying to optimize. It's currently testing all drawing objects for visibility client-side before deciding whether or not to send rendering data to OpenGL. (This is easier than it sounds. It's drawing a 2D scene so clipping is trivial: just test against the current coordinates of the viewport rectangle.)
It occurs to me that the entire model could be greatly simplified by passing the entire scene to OpenGL and letting the GPU take care of the clipping. But sometimes the total can be very, very complex, involving up to 100,000 total sprites, most of which never get rendered because they're off-camera, and I'd prefer to not end up killing the framerate in the name of simplicity.
I'm using OpenGL 2.0, and I've got a pretty simple vertex shader and a much more complicated fragment shader. Is there any guarantee that says that if the vertex shader runs and determines coordinates that are completely off-camera for all vertices of a polygon, that a clipping test will be applied somewhere between there and the fragment shader and prevent the fragment shader from ever running for that polygon? And if so, is this automatic or is there something I need to do to enable it? I've looked around online for information on this but I haven't found anything conclusive...
Clipping happens after the vertex transform stage before and after the NDC space; clip planes are applied in clip space, viewport clipping is done in NDC space. That is one step before rasterizing. Clipping means, that a face only partially visible is "cut" by inserting new vertices at the visibility border, or fragments outside the viewport discarded. What you mean is usually called culling. Faces completely outside the viewport are culled, at the same stage like clipping.
From a performance point of view, the best code is code never executed, and the best data is data never accessed. So in your case sending off a single drawing call that makes the GPU process a large batch of vertices clearly takes load off the CPU, but it consumes GPU processing power. Culling those vertices before sending the drawing command consumes CPU power, but takes load off the GPU. The goal is to find the right balance. If the number of vertices is low, a simple brute force approach (just render the whole thing) may easily outperform ever other scheme.
However using a simple, yet effective data management scheme can greatly improve performance on both ends. For example a spatial subdivision structure like a Kd tree is easily built (you don't have to balance it). Sorting the vertices into the Kd tree you can omit (cull) large portions of the tree if one branch near to the root is completely outside the viewport. Preparing drawing a frame you iterate through the visible parts of the tree, building the list of vertices to draw, then you pass this list to the rendering command. Kd trees can be traversed on average in O(n log n) time.
It's important to understand the difference between clipping and culling. You appear to be talking about the latter.
Clipping means taking a triangle and literally cutting it into pieces to fit into the viewport. The OpenGL specification defines this process to happen post-vertex shader, for any triangle that is only partially in view.
Culling means throwing something away entirely. If a triangle is not entirely in view, it can therefore be culled. OpenGL does not say that culling has to happen. Remember: the OpenGL specification defines behavior, not performance.
That being said, hardware makers are not stupid. Obvious efforts like not rasterizing triangles that are outside of the viewport are easily implemented and improve performance. Pretty much any hardware that exists will do this.
Similarly, clipping is typically implemented (where possible) with rasterizer tricks, rather than by creating new triangles. Fragments that would be outside of the viewport simply aren't generated by the rasterizer. This is also legal according to OpenGL, because the spec defines apparent behavior. It doesn't really care if you actually cut the triangle into pieces as long as it looks indistinguishable form if you did.
Your question is essentially one of, "How much work should I do to not render off-screen objects?" That really depends on what your scene is and how you're rendering it. You say you're rendering 100,000 sprites. Are you making 100,000 draw calls, or are these sprites part of larger structures that you render with larger granularity? Do you stream the vertex data to the GPU every frame, or is the vertex data static?
Clipping and culling happen before fragment processing. http://www.opengl.org/wiki/Rendering_Pipeline_Overview
However, you will still be passing 100000 * 4 vertices (assuming you're rendering the sprites with quads and not point sprites) to the card if you don't do culling yourself. Depending on the card's memory performance this can be an issue.