What is the right way to use Logic operations (VkLogicOp) in vulkan - vulkan

I have a case where I am writing to integer framebuffers, and I want to use logic operations when writing to pixels in the fragment shader. These are the steps I followed:
When creating the logical device, I set the VkPhysicalDeviceFeatures.logicOp to VK_TRUE (so this feature is enabled)
when creating the pipeline, I set VkPipelineColorBlendStateCreateInfo.logicOpEnable to VK_TRUE, and VkPipelineColorBlendStateCreateInfo.logicOp to VK_LOGIC_OP_COPY.
My framebuffer format is VK_FORMAT_R32G32B32A32_SINT
Once I render the frame, I see that nothing is getting updated in the frame buffer. Is there any step I am missing? (btw, I don't get any validation errors).
Thanks!

Related

Vulkan ignoring GLSL image format qualifier

I have a compute shader that reads a signed normalized integer image using imageLoad.
The image itself (which contains both positive and negative values) is created as a R16G16_SNORM and is written by a fragment shader in a previous gpass.
The imageview bound to the descriptorsetlayout binding in the compute shader is also created with the same R16G16_SNORM format.
Everything works as expected.
Yesterday I realized that in the compute shader I used the wrong image format qualifier rg16.
A bit puzzled (I could not understand how it could work properly reading an unsigned normalized value) I corrected to rg16_snorm, and.. nothing changed.
I performed several tests (I even specified a rg16f) and always had the same (correct, [-1,1] signed) result.
It seems like Vulkan (at least my implementation) silently ignores any image format qualifier, and falls back (I guess) to the imageview format bound to the descriptorset.
This seems to be in line with the spec regarding format in imageview creation
format is a VkFormat describing the format and type used to interpret texel blocks in the image
but then in Appendix A (Vulkan Environment for SPIR-V - "Compatibility Between SPIR-V Image Formats And Vulkan Formats") there is a clear distinction between Rg16 and Rg16Snorm.. so:
is it a bug or a feature?
I am working with an Nvidia 2070 Super under ubuntu 20.04
UPDATE
The initial image writing operation happens as the result of a fragment shader color attachment output, and as such, there is no descriptorsetlayout binding declaration. The fragment shader outputs a vec2 to the R16G16_SNORM color attachment as specified by the active framebuffer and renderpass.
The resulting image (after the relevant barriers) is then read (correctly, despite the wrong layout qualifier) by a compute shader as an image/imageLoad operation.
Note that validation layers are enabled and silent.
Note also that the resulting values are far from random, and exactly match the expected values (both positive and negative), using either rg16, rg16f or rg16_snorm.
What you're getting is undefined behavior.
There is a validation check on Image Write Operations that prevents the OpTypeImage's format (equivalent to the layout format specifier in GLSL) from being incompatible with the backing VkImageView's format:
If the image format of the OpTypeImage is not compatible with the VkImageView’s format, the write causes the contents of the image’s memory to become undefined.
Note that when it says "compatible", it doesn't mean image view compatibility; it means "exactly match". Your OpTypeImage format did not exactly match that of the shader, so your writes were undefined. And "undefined" can mean "works as if you had specified the correct format".

TensorFlow Dataset `.map` - Is it possible to ignore errors?

Short version:
When using Dataset map operations, is it possible to specify that any 'rows' where the map invocation results in an error are quietly filtered out rather than having the error bubble up and kill the whole session?
Specifics:
I have an input pipeline set up that (more or less) does the following:
reads a set of file paths of images stored locally (images of varying dimensions)
reads a suggested set of 'bounding boxes' from a csv
Produces the set of all image path to bounding box combinations
Reads and decodes the image then produces the set of 'cropped' images for each of these combinations using tf.image.crop_to_bounding_box
My issue is that there are (very rare) instances where my suggested bounding boxes are outside the bounds of a given image so (understandably) tf.image.crop_to_bounding_box throws an error something like this:
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [width must be >= target + offset.]
which kills the session.
I'd prefer it if these errors were simply ignored and that the pipeline moved onto the next combination.
(I understand that the correct fix for this specific issue would be commit the time to checking each bounding box and image dimension size are possible the step before and filter them out using a filter operation before it got to the map with the cropping operation. I was wondering if there was an easy way to just ignore an error and move on to the next case both for easy of implementation in this specific case and also in more general cases)
For Tensorflow 2
dataset = dataset.apply(tf.data.experimental.ignore_errors())
There is tf.contrib.data.ignore_errors. I've never tried this myself, but according to the docs the usage is simply
dataset = dataset.map(some_map_function)
dataset = dataset.apply(tf.contrib.data.ignore_errors())
It should simply pass through the inputs (i.e. returns the same dataset) but ignore any that throw an error.

vkQueueSubmit() call includes a stageMask with VK_PIPELINE_STAGE_GEOMETRY_SHADER_BIT bit set when device does not have geometryShader feature enabled

First of all, I'm a total newbie with Vulkan (I'm using the binding provided by LWJGL). I know I should copy/paste more code, but I don't even know what would be relevant for now (so don't hesitate to ask me some specific piece of code).
I try to make something like that :
Use a ComputeShader to compute a buffer of pixel.
Use vkCmdCopyBufferToImage to directly copy this array into a framebuffer image.
So, no vertex/fragment shaders for now.
I allocated a Compute Pipeline, and a FrameBuffer. I have one {Queue/CommandPool/CommandBuffer} for Computation, and one other for Rendering.
When I try to submit the graphic queue with:
vkQueueSubmit(graphicQueue, renderPipeline.getFrameSubmission().getSubmitInfo(imageIndex));
I obtain the following error message (from validation) :
ERROR OCCURED: Object: VK_NULL_HANDLE (Type = 0) | vkQueueSubmit() call includes a stageMask with VK_PIPELINE_STAGE_GEOMETRY_SHADER_BIT bit set when device does not have geometryShader feature enabled. The spec valid usage text states 'If the geometry shaders feature is not enabled, each element of pWaitDstStageMask must not contain VK_PIPELINE_STAGE_GEOMETRY_SHADER_BIT' (https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#VUID-VkSubmitInfo-pWaitDstStageMask-00076)
ERROR OCCURED: Object: VK_NULL_HANDLE (Type = 0) | vkQueueSubmit() call includes a stageMask with VK_PIPELINE_STAGE_TESSELLATION_CONTROL_SHADER_BIT and/or VK_PIPELINE_STAGE_TESSELLATION_EVALUATION_SHADER_BIT bit(s) set when device does not have tessellationShader feature enabled. The spec valid usage text states 'If the tessellation shaders feature is not enabled, each element of pWaitDstStageMask must not contain VK_PIPELINE_STAGE_TESSELLATION_CONTROL_SHADER_BIT or VK_PIPELINE_STAGE_TESSELLATION_EVALUATION_SHADER_BIT' (https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#VUID-VkSubmitInfo-pWaitDstStageMask-00077)
I tried to change the VkSubmitInfo.pWaitDstStageMask to different values (like VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT...) but nothing change.
So, what would be the best pWaitDstStageMask for my use case ?
Ok, I found my problem:
The pWaitDstStageMask must be an array with the same size than pWaitSemaphores.
I only putted 1 stage mask, for 2 semaphores.

Does vkCmdCopyImageToBuffer work when source image uses VK_IMAGE_TILING_OPTIMAL?

I have read (after running into the limitation myself) that for copying data from the host to a VK_IMAGE_TILING_OPTIMAL VkImage, you're better off using a VkBuffer rather than a VkImage for the staging image to avoid restrictions on mipmap and layer counts. (Here and Here)
So, when it came to implementing a glReadPixels-esque piece of functionality to read the results of a render-to-texture back to the host, I thought that reading to a staging VkBuffer with vkCmdCopyImageToBuffer instead of using a staging VkImage would be a good idea.
However, I haven't been able to get it to work yet, I'm seeing most of the intended image, but with rectangular blocks of the image in incorrect locations and even some bits duplicated.
There is a good chance that I've messed up my synchronization or layout transitions somewhere and I'll continue to investigate that possibility.
However, I couldn't figure out from the spec whether using vkCmdCopyImageToBuffer with an image source using VK_IMAGE_TILING_OPTIMAL is actually supposed to 'un-tile' the image, or whether I should actually expect to receive a garbled implementation-defined image layout if I attempt such a thing.
So my question is: Does vkCmdCopyImageToBuffer with a VK_IMAGE_TILING_OPTIMAL source image fill the buffer with linearly tiled data or optimally (implementation defined) tiled data?
Section 18.4 describes the layout of the data in the source/destination buffers, relative to the image being copied from/to. This is outlined in the description of the VkBufferImageCopy struct. There is no language in this section which would permit different behavior from tiled images.
The specification even has pseudo code for how copies work (this is for non-block compressed images):
rowLength = region->bufferRowLength;
if (rowLength == 0)
rowLength = region->imageExtent.width;
imageHeight = region->bufferImageHeight;
if (imageHeight == 0)
imageHeight = region->imageExtent.height;
texelSize = <texel size taken from the src/dstImage>;
address of (x,y,z) = region->bufferOffset + (((z * imageHeight) + y) * rowLength + x) * texelSize;
where x,y,z range from (0,0,0) to region->imageExtent.width,height,depth}.
The x,y,z part is the location of the pixel in question from the image. Since this location is not dependent on the tiling of the image (as evidenced by the lack of anything stating that it would be), buffer/image copies will work equally on both kinds of tiling.
Also, do note that this specification is shared between vkCmdCopyImageToBuffer and vkCmdCopyBufferToImage. As such, if a copy works one way, it by necessity must work the other.

GPUImage capturePhotoAsImageProcessedUpToFilter only working for the last filter

In my application I am using a stack of 3 filters and adding that to a stillCamera. I am trying to take the image from filter1, its an empty filter so it returns the actual image.
[stillCamera addTarget:filter1];
[filter1 addTarget:filter2];
[filter2 addTarget:filter3];
[filter3 addTarget:cameraView];
When I call capturePhotoAsImageProcessedUpToFilter, it only ever returns an image when I pass it filter3 like below.
[stillCamera capturePhotoAsImageProcessedUpToFilter:filter3 with...
The two examples below never return images
[stillCamera capturePhotoAsImageProcessedUpToFilter:filter1 with...
[stillCamera capturePhotoAsImageProcessedUpToFilter:filter2 with...
Am I doing something wrong? As a fix I am using:
[filter1 imageFromCurrentlyProcessedOutput]
Is there any difference between calling capturePhotoAsImageProcessedUpToFilter and imageFromCurrentlyProcessedOutput?
I think this is a side effect of a memory conservation optimization I tried to put in place last year. For very large images, like photos, what I try to do is destroy the framebuffer that backs each filter as the filtered image progresses through the filter chain. The idea is to try to minimize memory spikes by only having one or two copies of the large image in memory at any point in time.
Unfortunately, that doesn't seem to work as intended much of the time, and because the framebuffers are deleted as the image progresses, only the last filter in the chain ends up having a valid framebuffer to read from. I'm probably going to yank this optimization out at some point in the near future in favor of an internal framebuffer and texture cache, but I'm not sure what can be done in the meantime to read from these intermediary filters in a chain.