I'm currently programming an OpenGL ES 2.0 application on both iOS and Android platforms.
In this application I render multiple meshes that all use VBOs. In the process of optimizing the rendering, I realized that the meshes I render share two vertex formats. So I wanted to do the following:
First, setup all vertex attribute pointer offsets, and then simply bind each VBO that uses this vertex format and render it without calling the glVertexAttribPointer function again.
But it gives me strange results.
My question is: Do we have to do the calls glVertexAttribPointer each time we bind a new VBO?
First of all, like every OpenGL state, the state set with glVertexAttribPointer keeps unchanged until someone else calls glVertexAttribPointer again (for the same attribute index). But the important thing here is, that the internal state changed with glVertexAttribPointer doesn't just store the buffer offset to be used for rendering, offsetting into the VBO bound when calling glDraw.... It also stores the actual buffer object bound when calling glVertexAttribPointer.
So yes, whenever you want your vertex data sourced from another VBO, you need to bind this VBO and do the appropriate glVertexAttribPointer calls while this VBO is bound. While this may seem cumbersome in your case, this is in fact a good thing. This way you don't need to worry about the currently bound buffer when rendering something, but only about the things set up with glVertexAttribPointer. And even more important it let's bind a different VBO before rendering, thus you can source different vertex attributes from different VBOs in a single render call (how else would you do that?).
EDIT: You can however use Vertex Array Objects to ease the process of setting up your vertex data. They encapsulate all the state neccessary for rendering from a bunch of arrays (and thus all the things changed by glVertexAttribPointer, gl(En/Dis)ableVertexAttribArray and the buffer bound to GL_ELEMENT_ARRAY_BUFFER, but like said, not the buffer bound to GL_ARRAY_BUFFER). You still have to properly bind the buffer before calling glVertexAttribPointer of course. But using a VAO you only need this code in some set up routine and all you need to do for rendering is calling glBindVertexArray. Though I don't know if your particular ES device supports them.
Some cool resource I found on drawing with VAO / VBO
http://www.arcsynthesis.org/gltut/Positioning/Tutorial%2005.html#d0e4720
It shows how you can drive several objects, with several VAOs and a single VBO for example (each VAO holding pointers with different offsets to the same VBO)
Definitively worth a look, not mentioning the part where you learn that
glBindBuffer(GL_ARRAY_BUFFER...doesn't bind any data, it's just a
global pointer for the following glVertexAttribPointer calls which do the actual data binding
BUT
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER...DOES bind/save the element array into the current VAO
Related
So I basically have some fair knowledge of Opengl 4.0. In OpenGL you can render the same object at many places. This is a technique called Instancing. This saves up some CPU calls or something.
I wanted to do this in Godot. So I looked up in the docs and it basically just tells me to duplicate an object. But I think this does not save the CPU calls to the GPU, like how Instancing does (please let me know if I'm wrong about this).
Plus I cannot have all the nodes beforehand. Because the number of times I need to render the object(at different places) is determined during runtime and can change.
Is there a solution to this?
Any help would be appreciated.
Thank you
Instancing can be thought of as making copies of an object from a blueprint. The reason it saves memory and draw calls is that essentially, only the "blueprint" must be kept in memory. The recommended way that Godot addresses this (as per the documentation) is through (packed) scenes.
To do this, create the object as it's own scene - remember that you can right click on the root node of a scene (even an empty one) and change the type to whatever you want. Once you have the object set up the way you like, save it as it's own scene (ex: myInstance.tscn).
From there, you can call upon the instance from your main scene (or whatever scene you need it in). To do this you need to do a couple of things:
First, create a variable for your instance in the script you want to call it from by declaring something like onready var instancedObject = preload("res://myInstance.tscn"). (Using whatever path you used for the scene).
From there, you call the variable from whatever function you need by writing something like: var myObject = instancedObject.instance()
You then must add the instance to the current scene with add_child(myObject)
After this, you can (optionally) specify things like transforms and rotations to determine where the instance gets put (Ex: myObject.transform.origin = Vector3(0,10,0) - For 3D, or myObject.position = Vector2(10,0) for 2D)
Alternatively, you can initialize and instance the object at the same time by writing onready var instancedObject = preload(res://myInstance.tscn).instance(), and then adding it in functions by using add_child(instancedObject), however although it requires fewer steps, there are limitations to doing it this way, and I personally have had much more success using the first approach.
If, however, you are looking to instance multiple thousands of objects (or more) in the same scene, I recommend using Calinou's answer and using a MultiMeshInstance. However, one of the limitations of the MultiMeshInstance is that it uses an all or nothing approach to drawing, meaning all instances will either be all drawn at once, or not drawn at all. There is no in-between. This could be good or bad depending on what you need it for.
Hope this helps.
Instancing in Godot is handled using the MultiMeshInstance node. It's the instanced counterpart to MeshInstance. See Optimization using MultiMeshes in the documentation for more information.
Keep in mind MultiMeshes aren't suited if you need to move the objects in different directions every frame (although you can can achieve this by using INSTANCE_ID in a shader shared among all instances). MultiMeshInstance lets you change how many instances are visible by setting its visible_instance_count property.
I'm learning Metal, and there's a conceptual question that I'm trying to wrap my head around: at what level, exactly, should my code handle successive drawing operations that require different pipeline states? As I understand it (from answers like this: https://stackoverflow.com/a/43827775/2752221), I can use a single MTLRenderCommandEncoder and change its pipeline state, the vertex buffer it's using, etc., between calls to drawPrimitives:, and the encoder state that was current at the time of each call to drawPrimitives: will be preserved. So that's great. But it also seems like the design of Metal is such that one can make multiple MTLRenderCommandEncoder instances, and use them to sequentially throw batches of commands into a MTLCommandBuffer. Given that the former works – using one MTLRenderCommandEncoder and changing its state – why would one do the latter? Under what circumstances is it correct to do the former, and under what circumstances is it necessary to do the latter? What is an example of a situation where the latter would be necessary/appropriate?
If it matters, I'm working on a macOS app, using Objective-C. Thanks.
Ignoring multithreaded encoding cases, which are somewhat advanced, the main reason you'd want to create multiple render command encoders during a frame is because you need to change which textures you're rendering to.
You'll notice that you need to provide a render pass descriptor when creating a render command encoder. For this reason, we often say that the sequence of commands belonging to a particular encoder constitute a render pass. The attachments of that descriptor refer to the textures that will be written to by the commands encoded by the encoder.
Many different techniques, including shadow mapping and postprocessing effects like bloom require multiple passes to produce. Since you can't change attachments in the midst of a pass, creating a new encoder is the only way to encode multiple passes in a frame.
Relatedly, you should ordinarily use one command buffer per frame. You can, however, sometimes reduce frame time by splitting your passes across multiple command buffers, but this is highly dependent on the shape of your workload and should only be done in tandem with profiling, as it's not always an optimization.
In addition to Warren's answer, another way to look at the question is by examining the API. A number of Metal objects are created from descriptors. The properties of the descriptor at the time an object is created from it govern that object for its lifetime. Those are aspects of the object that can't be changed after creation.
By contrast, the object will have various setter methods to modify other properties over its lifetime.
For a render command encoder, the properties that are fixed for its lifetime are those specified by the MTLRenderPassDescriptor used to create it. If you want to render with different values for any of those properties, the only way to do so is to create a new encoder from a different descriptor. On the other hand, if you can do everything you need/want to do by using the encoder's setter methods, then you don't need a new encoder.
Say I have one VkBuffer bound to every device allocation, and use appropriate combinations of vkCmdCopyBuffer to perform defragmentation of an arena block-by-block.
Say an arena may contain linear and non-linear data in any appropriately-aligned arrangement. Due to the immutability of VkImage after binding, defragmentation will involve constructing and binding new VkImages at the new locations of image data that have been moved.
None of the resources within an arena undergoing defragmentation are bound to anything, or could be considered "in-use".
This isn't difficult to implement, though I have a concern:
Is it UB to use vkCmdCopyBuffer to move an image's data (as to avoid redundant layout transitions), then construct a new VkImage at the new location?
My thoughts are that perhaps an implementation would do something strange, like rely on absolute device addresses within some internal bookkeeping structure, making it UB to treat image data as POD until bound to a new object.
Well, let's look at this systematically.
So you find a suitable destination area for your image. You do a vkCmdCopyBuffer to copy from the source area to the destination area. Now you create a new VkImage for that destination area, and the initial layout you specify is... what?
See, there are only two valid initialLayout values in VkCreateImageInfo: undefined or preinitialized. And preinitialized is only works for images that use linear tiling, since there is no well-defined layout for optimal tiled images.
So you can't use preinitialized layout. And using undefined layout means that the next image transition you use will potentially mangle whatever data is there. Now, undefined layout might work on some implementations. On implementations that don't care about layout, it might work. It also might work on implementations if the source image was in the general layout.
But none of that is guaranteed to work by the standard. As far as the standard is concerned, if you set the layout to be undefined, then the data will not be preserved. So regardless of questions of buffer/image aliasing, this can't work.
You have to create the VkImage at the destination, then use vkCmdCopyImage to copy from the source image to the destination.
It should also be noted that, even if the layout issue worked, the aliasing rules tell us that copying from non-host-accessible memory (ie: images in optimal tiling or non-general/preinitialized layout) to host-accessible memory yields undefined values. So even if the layout issue weren't a problem, the copy itself doesn't work. In theory, at least.
In Vulkan, I understand that a descriptor pool is used to allocate descriptor sets of some layout for use in a shader, but in the VkDescriptorPoolCreateInfo passed to vkCreateDescriptorPool, there is a field pPoolSizes that takes a bunch of objects containing a descriptor type and a number.
The documentation seems somewhat vague, but is this saying that a given descriptor pool can only have a certain, predetermined amount of each type of descriptor allocated from it in descriptor sets? If so, how do I determine how many I will need beforehand? What happens if it runs out?
Your understanding of descriptor pools is correct.
If so, how do I determine how many I will need beforehand?
That's up to you and your application's needs.
If your application needs to be completely flexible and freeform, then you will need to create descriptor pools dynamically as needed. If your application has greater foreknowledge of what the scene will look like, then your application will need fewer of such gymnastics.
Many serious Vulkan applications try to avoid having the number of descriptor sets be based on the number of objects in the scene. Push constants and/or dynamic UBO/SSBO descriptors allow different per-object state to be used without changing the descriptor itself. Textures for lots of objects can be bundled together into array textures, or depending on the hardware, arrays of textures.
In a perfect world, all meshes of a type (say, skinned meshes) could be rendered with the exact same descriptor set, using some per-object state to fetch the right matrix/texture data for that object.
But that's how they render. Such applications have firm control over the kinds of objects they render, what per-object data looks like, and so forth. Other applications may have different needs.
Vulkan is a tool; how you use it is entirely up to you.
What happens if it runs out?
Then you cannot allocate more descriptors from that pool. If you need to allocate another descriptor set, you will need to create another pool.
My approach was to have a class that initially allocates N of the descriptor, and if it runs out, it'll create another pool with N*2 entries. It'll keep doubling in size. It uses a simple linked lists and when it comes to allocating, it just tries the first one, and then moves onto the next if it's full.
That's all pretty inefficient, so I also had my code fire an assert if it ever had to create a second pool, that way I can make sure I choose a value of N that's big enough so that the retail version should never have to do it (but if it does somehow manage to due to some unforeseen set of circumstances, it'll still render correctly).
At the time, I remember cursing the spec and wishing descriptor pools would auto grow like command pools do. Still I imagine there's a good reason that they are like they are.
I think - as I googled around, searched stackoverflow, and didn't find a clear answer - we need to clarify for the future: what is the exact scope of glEnableVertexAttribArray?
What do I mean exactly? Well, we know: Uniform state is bound to the shader program. So calling glUseProgram(X) (X > 0) will also set all used uniforms to either their default value (except on ATI), or the value we provided earlier via glUniformXXX() when that same program was active. So my answer to my question like "what is the exact scope of glUniform" would be "every use of one specific shader program".
Now I have the situation that a Mesh may consist of multiple sets of buffer objects. The pseudocode of rendering looks like this:
mat = mesh->getMaterial;
mesh->overrideUniforms;
mat->prepareGL; // call glUseProgram, update changed Uniforms, bind textures
mesh->render;
Now mesh->render obviously deals with binding Attributes and drawing. In case a mesh has multiple sets of buffers, it'd look like that (assuming each buffer object set contains all data for all attributes/one render pass):
for_each(set_of_bufferObjects)
bindBufferObjects
for_each(Attribute)
glEnableVertexAttribArray
glVertexAttribPointer
glDraw
If the scope of enableVertexAttribArray was f.e. "every use of program X", I could spare those glEnableVertexAttribArray calls, as long as I enabled the array before (when program X was in use for the first time).
If on the other hand the scope was "during one specific program use", I could set them up once within mesh->render and then forget about them. This would particularly explain why I don't suffer side-effects from not disabling any VAAs.
So is anybody out there enlightened to know which piece of GL state the glEnableVertexAttribArray belongs to?
P.S.: I'm explicitly asking for gl/es 2.0, as there are no VAOs by spec! So please don't answer "just use VAOs".
This state is global. Not related to the program state at all.
Edit: I just noticed the last paragraph of your question. So to the original poster, please ignore the part below, since you did not want to hear about VAOs. ;) I'll leave it there, just in case it helps somebody else.
Full OpenGL, as well as OpenGL ES 3.0, have an additional object type called Vertex Array Object (often abbreviated as VAO). This allows you to store all the setup state for a given set of vertex buffers in an object, and switch to the set of state with a single glBindVertexArray call. If you use this feature, the scope of the state becomes the VAO.