I have a compute shader that reads a signed normalized integer image using imageLoad.
The image itself (which contains both positive and negative values) is created as a R16G16_SNORM and is written by a fragment shader in a previous gpass.
The imageview bound to the descriptorsetlayout binding in the compute shader is also created with the same R16G16_SNORM format.
Everything works as expected.
Yesterday I realized that in the compute shader I used the wrong image format qualifier rg16.
A bit puzzled (I could not understand how it could work properly reading an unsigned normalized value) I corrected to rg16_snorm, and.. nothing changed.
I performed several tests (I even specified a rg16f) and always had the same (correct, [-1,1] signed) result.
It seems like Vulkan (at least my implementation) silently ignores any image format qualifier, and falls back (I guess) to the imageview format bound to the descriptorset.
This seems to be in line with the spec regarding format in imageview creation
format is a VkFormat describing the format and type used to interpret texel blocks in the image
but then in Appendix A (Vulkan Environment for SPIR-V - "Compatibility Between SPIR-V Image Formats And Vulkan Formats") there is a clear distinction between Rg16 and Rg16Snorm.. so:
is it a bug or a feature?
I am working with an Nvidia 2070 Super under ubuntu 20.04
UPDATE
The initial image writing operation happens as the result of a fragment shader color attachment output, and as such, there is no descriptorsetlayout binding declaration. The fragment shader outputs a vec2 to the R16G16_SNORM color attachment as specified by the active framebuffer and renderpass.
The resulting image (after the relevant barriers) is then read (correctly, despite the wrong layout qualifier) by a compute shader as an image/imageLoad operation.
Note that validation layers are enabled and silent.
Note also that the resulting values are far from random, and exactly match the expected values (both positive and negative), using either rg16, rg16f or rg16_snorm.
What you're getting is undefined behavior.
There is a validation check on Image Write Operations that prevents the OpTypeImage's format (equivalent to the layout format specifier in GLSL) from being incompatible with the backing VkImageView's format:
If the image format of the OpTypeImage is not compatible with the VkImageView’s format, the write causes the contents of the image’s memory to become undefined.
Note that when it says "compatible", it doesn't mean image view compatibility; it means "exactly match". Your OpTypeImage format did not exactly match that of the shader, so your writes were undefined. And "undefined" can mean "works as if you had specified the correct format".
Related
I have a case where I am writing to integer framebuffers, and I want to use logic operations when writing to pixels in the fragment shader. These are the steps I followed:
When creating the logical device, I set the VkPhysicalDeviceFeatures.logicOp to VK_TRUE (so this feature is enabled)
when creating the pipeline, I set VkPipelineColorBlendStateCreateInfo.logicOpEnable to VK_TRUE, and VkPipelineColorBlendStateCreateInfo.logicOp to VK_LOGIC_OP_COPY.
My framebuffer format is VK_FORMAT_R32G32B32A32_SINT
Once I render the frame, I see that nothing is getting updated in the frame buffer. Is there any step I am missing? (btw, I don't get any validation errors).
Thanks!
Using pyiron, I want to calculate the mean square displacement of the ions in my system. How do I see the total displacement (i.e. not folded back by periodic boundary conditions) without dumping very frequently and checking when an atom passes over the boundary and gets wrapped?
Try to compare job['output/generic/unwrapped_positions'][-1] and job.structure.positions+job.output.total_displacements[-1]. If they deliver the same values, it's definitely fine both ways. If not, you can post the relevant lines in your notebook here.
I'd like to add a few comments to Jan's answer:
While job['output/generic/unwrapped_positions'] returns the unwrapped positions parsed from the output files, job.output.total_displacements returns the displacement of atoms calculated from each pair of consecutive snapshots. So if an atom moves more than half the box length in any direction, job.output.total_displacements will give wrong coordinates. Therefore, job['output/generic/unwrapped_positions'] is generally more trustworthy, but it is not available in all the codes (since some codes simply do not provide an output for unwrapped positions).
Moreover, if an interactive job is used, it is possible that job.structure.positions does not return the initial positions, i.e. job.structure.positions+job.output.total_displacements won't be initial positions + displacements.
So, in short, my answer to your question would be rather "Use job['output/generic/unwrapped_positions'] and if it's not available, use job.structure.positions+job.output.total_displacements but be aware of potential problems you might be running into."
First of all, I'm a total newbie with Vulkan (I'm using the binding provided by LWJGL). I know I should copy/paste more code, but I don't even know what would be relevant for now (so don't hesitate to ask me some specific piece of code).
I try to make something like that :
Use a ComputeShader to compute a buffer of pixel.
Use vkCmdCopyBufferToImage to directly copy this array into a framebuffer image.
So, no vertex/fragment shaders for now.
I allocated a Compute Pipeline, and a FrameBuffer. I have one {Queue/CommandPool/CommandBuffer} for Computation, and one other for Rendering.
When I try to submit the graphic queue with:
vkQueueSubmit(graphicQueue, renderPipeline.getFrameSubmission().getSubmitInfo(imageIndex));
I obtain the following error message (from validation) :
ERROR OCCURED: Object: VK_NULL_HANDLE (Type = 0) | vkQueueSubmit() call includes a stageMask with VK_PIPELINE_STAGE_GEOMETRY_SHADER_BIT bit set when device does not have geometryShader feature enabled. The spec valid usage text states 'If the geometry shaders feature is not enabled, each element of pWaitDstStageMask must not contain VK_PIPELINE_STAGE_GEOMETRY_SHADER_BIT' (https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#VUID-VkSubmitInfo-pWaitDstStageMask-00076)
ERROR OCCURED: Object: VK_NULL_HANDLE (Type = 0) | vkQueueSubmit() call includes a stageMask with VK_PIPELINE_STAGE_TESSELLATION_CONTROL_SHADER_BIT and/or VK_PIPELINE_STAGE_TESSELLATION_EVALUATION_SHADER_BIT bit(s) set when device does not have tessellationShader feature enabled. The spec valid usage text states 'If the tessellation shaders feature is not enabled, each element of pWaitDstStageMask must not contain VK_PIPELINE_STAGE_TESSELLATION_CONTROL_SHADER_BIT or VK_PIPELINE_STAGE_TESSELLATION_EVALUATION_SHADER_BIT' (https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#VUID-VkSubmitInfo-pWaitDstStageMask-00077)
I tried to change the VkSubmitInfo.pWaitDstStageMask to different values (like VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT...) but nothing change.
So, what would be the best pWaitDstStageMask for my use case ?
Ok, I found my problem:
The pWaitDstStageMask must be an array with the same size than pWaitSemaphores.
I only putted 1 stage mask, for 2 semaphores.
I have read (after running into the limitation myself) that for copying data from the host to a VK_IMAGE_TILING_OPTIMAL VkImage, you're better off using a VkBuffer rather than a VkImage for the staging image to avoid restrictions on mipmap and layer counts. (Here and Here)
So, when it came to implementing a glReadPixels-esque piece of functionality to read the results of a render-to-texture back to the host, I thought that reading to a staging VkBuffer with vkCmdCopyImageToBuffer instead of using a staging VkImage would be a good idea.
However, I haven't been able to get it to work yet, I'm seeing most of the intended image, but with rectangular blocks of the image in incorrect locations and even some bits duplicated.
There is a good chance that I've messed up my synchronization or layout transitions somewhere and I'll continue to investigate that possibility.
However, I couldn't figure out from the spec whether using vkCmdCopyImageToBuffer with an image source using VK_IMAGE_TILING_OPTIMAL is actually supposed to 'un-tile' the image, or whether I should actually expect to receive a garbled implementation-defined image layout if I attempt such a thing.
So my question is: Does vkCmdCopyImageToBuffer with a VK_IMAGE_TILING_OPTIMAL source image fill the buffer with linearly tiled data or optimally (implementation defined) tiled data?
Section 18.4 describes the layout of the data in the source/destination buffers, relative to the image being copied from/to. This is outlined in the description of the VkBufferImageCopy struct. There is no language in this section which would permit different behavior from tiled images.
The specification even has pseudo code for how copies work (this is for non-block compressed images):
rowLength = region->bufferRowLength;
if (rowLength == 0)
rowLength = region->imageExtent.width;
imageHeight = region->bufferImageHeight;
if (imageHeight == 0)
imageHeight = region->imageExtent.height;
texelSize = <texel size taken from the src/dstImage>;
address of (x,y,z) = region->bufferOffset + (((z * imageHeight) + y) * rowLength + x) * texelSize;
where x,y,z range from (0,0,0) to region->imageExtent.width,height,depth}.
The x,y,z part is the location of the pixel in question from the image. Since this location is not dependent on the tiling of the image (as evidenced by the lack of anything stating that it would be), buffer/image copies will work equally on both kinds of tiling.
Also, do note that this specification is shared between vkCmdCopyImageToBuffer and vkCmdCopyBufferToImage. As such, if a copy works one way, it by necessity must work the other.
I'm loading a 3D Skeleton model (exported from Blender) using Cocos3D but got the following assertion:
*** Assertion failure in -[CC3OpenGLES2IOS drawIndicies:ofLength:andType:as:], /Users/phamdacloc/Downloads/xxx/cocos3d201/Projects/CC3HelloWorld/cocos3d/cocos3d/OpenGL/CC3OpenGL.m:282
Here's where the assert came from:
-(void) drawIndicies: (GLvoid*) indicies ofLength: (GLuint) len andType: (GLenum) type as: (GLenum) drawMode {
#if CC3_OGLES
CC3Assert((type == GL_UNSIGNED_SHORT || type == GL_UNSIGNED_BYTE),
#"OpenGL ES permits drawing a maximum of 65536 indexed vertices, and supports only"
#" GL_UNSIGNED_SHORT or GL_UNSIGNED_BYTE types for vertex indices");
#endif
glDrawElements(drawMode, len, type, indicies);
LogGLErrorTrace(#"glDrawElements(%#, %u, %#, %p)", NSStringFromGLEnum(drawMode), len, NSStringFromGLEnum(type), indicies);
CC_INCREMENT_GL_DRAWS(1);
}
From the message above, my understanding is that the model is too detailed and contains more vertices than allowed (65536). I then removed all the spinal cords, head, legs and this time Cocos3D loads successfully. Is there a way to keep all these vertices or should I split the models into several .pod files?
On a side note, when I open the skeleton.blend file in "Object Mode", I see 205,407 vertices at the top right of Blender. However, when I changed from "Object Mode" to "Edit Mode" and select all the vertices, only 33,574 + 4,773 = 38,347 vertices were present. Why does "Object Mode" shows more vertices than "Edit Mode"?
If you're ok with a certain amount of device/platform dependence, most reasonably recent devices support the OES_element_index_uint extension. This adds support for indices of type GL_UNSIGNED_INT, on top of the types GL_UNSIGNED_SHORT and GL_UNSIGNED_BYTE types supported by base ES 2.0.
On iOS, which you seem to be using, this extension is supported by all SGX Series 5, A7, and A8 GPUs (source: https://developer.apple.com/library/ios/documentation/DeviceInformation/Reference/iOSDeviceCompatibility/OpenGLESPlatforms/OpenGLESPlatforms.html). I would expect support for this extension to also be present on Android devices of a similar age.
With 32-bit indices, your memory will run out long before you exhausted the index range.
If you want to get this working with pure ES 2.0 functionality, you pretty much have to split up your model into smaller parts. At least I can't think of a reasonable alternative.