How to read R32G32_SFLOAT image from gpu in Vulkan - vulkan

I am able to dump stuff from R32G32B32A32 image for screenshot. I would like to read out a pixel from R32G32_SFLOAT image as well. But the result look weird.
below is my working image dump code(no validation error)
void DumpImageToFile(VkTool::VulkanDevice &device, VkQueue graphics_queue, VkTool::Wrapper::CommandBuffers &command_buffer, VkImage image, uint32_t width, uint32_t height, const char *filename)
{
auto image_create_info = VkTool::Initializer::GenerateImageCreateInfo(VK_IMAGE_TYPE_2D, VK_FORMAT_R8G8B8A8_UNORM, {width, height, 1},
VK_IMAGE_USAGE_TRANSFER_SRC_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT, VK_SAMPLE_COUNT_1_BIT);
VkTool::Wrapper::Image staging_image(device, image_create_info, VK_MEMORY_HEAP_DEVICE_LOCAL_BIT);
auto buffer_create_info = VkTool::Initializer::GenerateBufferCreateInfo(width * height * 4, VK_BUFFER_USAGE_TRANSFER_DST_BIT);
VkTool::Wrapper::Buffer staging_buffer(device, buffer_create_info, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);
// Copy texture to buffer
command_buffer.Begin();
auto image_memory_barrier = VkTool::Initializer::GenerateImageMemoryBarrier(VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 }, staging_image.Get());
device.vkCmdPipelineBarrier(command_buffer.Get(), VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0
, 0, nullptr, 0, nullptr, 1, &image_memory_barrier);
image_memory_barrier = VkTool::Initializer::GenerateImageMemoryBarrier(VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 }, image);
device.vkCmdPipelineBarrier(command_buffer.Get(), VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0
, 0, nullptr, 0, nullptr, 1, &image_memory_barrier);
// Copy!!
VkImageBlit region = {};
region.srcSubresource = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 0, 1 };
region.srcOffsets[0] = { 0, 0, 0 };
region.srcOffsets[1] = { static_cast<int32_t>(width), static_cast<int32_t>(height), 1};
region.dstSubresource = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 0, 1 };
region.dstOffsets[0] = { 0, 0, 0 };
region.dstOffsets[1] = { static_cast<int32_t>(width), static_cast<int32_t>(height), 1 };
device.vkCmdBlitImage(command_buffer.Get(), image, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, staging_image.Get(), VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &region, VK_FILTER_LINEAR);
image_memory_barrier = VkTool::Initializer::GenerateImageMemoryBarrier(VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 }, image);
device.vkCmdPipelineBarrier(command_buffer.Get(), VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, 0
, 0, nullptr, 0, nullptr, 1, &image_memory_barrier);
image_memory_barrier = VkTool::Initializer::GenerateImageMemoryBarrier(VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 }, staging_image.Get());
device.vkCmdPipelineBarrier(command_buffer.Get(), VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0
, 0, nullptr, 0, nullptr, 1, &image_memory_barrier);
auto buffer_image_copy = VkTool::Initializer::GenerateBufferImageCopy({ VK_IMAGE_ASPECT_COLOR_BIT , 0, 0, 1 }, { width, height, 1 });
device.vkCmdCopyImageToBuffer(command_buffer.Get(), staging_image.Get(), VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, staging_buffer.Get(), 1, &buffer_image_copy);
command_buffer.End();
std::vector<VkCommandBuffer> raw_command_buffers = command_buffer.GetAll();
auto submit_info = VkTool::Initializer::GenerateSubmitInfo(raw_command_buffers);
VkTool::Wrapper::Fence fence(device);
device.vkQueueSubmit(graphics_queue, 1, &submit_info, fence.Get());
fence.Wait();
fence.Destroy();
const uint8_t *mapped_address = reinterpret_cast<const uint8_t *>(staging_buffer.MapMemory());
lodepng::encode(filename, mapped_address, width, height);
staging_buffer.UnmapMemory();
staging_image.Destroy();
staging_buffer.Destroy();
}
Sorry for the ugly self-made wrapper, there was no official wrapper. Basically, it creates a staging image and buffer. first copy from source image to staging image with vkCmdBlitImage. then use vkCmdCopyImageToBuffer and map the buffer to host memory. This method works on multiple gpus and it does not need to worry about padding.(I guess, correct me if I am wrong).
However, I have no luck to use this method to read R32G32_SFLOAT. at first I thought it was because of endianness until I dump the whole image out.
The image above is I directly convert R32G32_SFLOAT to R8G8B8A8_UNORM, I know it does not make sense. But without changing format, there's still a lot of "hole" in the image and values are deadly wrong.

I am not really sure if it is THE problem, but if I understand your code, you want to put image into filename.
So you want to read from this image. However, you said that the old layout for this image (not the staging one) is UNDEFINED layout. The implementation is free to assume you do not care about data that are stored in it. Use the real layout instead (I think it is COLOR_ATTACHMENT or something like that).
Moreover, you are using one staging image and one staging buffer. I do not really understand why are you doing such a thing? Why not simply use vkCmdCopyImageToBuffer function with image to staging_buffer?
BTW, with Vulkan it is not because one code works on some GPUs that this code is correct.
Also, I think you must use a memory barrier after your transfer to the buffer that implies HOST_STAGE and HOST_READ. In the specification, it is write :
Signaling a fence and waiting on the host does not guarantee that the results of memory accesses will be visible to the host, as the access scope of a memory dependency defined by a fence only includes device access. A memory barrier or other memory dependency must be used to guarantee this. See the description of host access types for more information.

This part of your code seems weird:
image_memory_barrier = VkTool::Initializer::GenerateImageMemoryBarrier(VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 }, image);
device.vkCmdPipelineBarrier(command_buffer.Get(), VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 0, nullptr, 0, nullptr, 1, &image_memory_barrier);
This basically means that after the barrier your source image may not have any data. UNDEFINED value used as a source layout doesn't guarantee that the contents of an image are preserved.

Related

<Vulkan> Use rendered vkImage as Texture

I want to use a vkImage rendered at a previous render pass as Texture to do the composite operation in a fragment shader. From here I learned vkCmdPipelineBarrier is used to wait for GPU finish a rendering operation and I write this code. It works well on Snapdragon devices. But not on Mali-G52. The Write-after-write error is partly happed. Is this code not enough? Any suggestions?
vkCmdEndRenderPass(cb);
vkCmdBeginRenderPass(cb, &renderPassBeginInfo, VK_SUBPASS_CONTENTS_INLINE);
VkViewport viewport = vks::initializers::viewport((float)offscreenPass.width, (float)offscreenPass.height, 0.0f, 1.0f);
vkCmdSetViewport(cb, 0, 1, &viewport);
VkRect2D scissor = vks::initializers::rect2D(offscreenPass.width, offscreenPass.height, 0, 0);
vkCmdSetScissor(cb, 0, 1, &scissor);
// https://github.com/KhronosGroup/Vulkan-Samples/blob/master/samples/performance/pipeline_barriers/pipeline_barriers.cpp
VkImageMemoryBarrier imageMemoryBarrier = vks::initializers::imageMemoryBarrier();
imageMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
imageMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
imageMemoryBarrier.srcAccessMask = 0;
imageMemoryBarrier.dstAccessMask = 0;
imageMemoryBarrier.image = offscreenPass.color[drawframe].image;
imageMemoryBarrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
imageMemoryBarrier.subresourceRange.baseMipLevel = 0;
imageMemoryBarrier.subresourceRange.levelCount = 1;
imageMemoryBarrier.subresourceRange.baseArrayLayer = 0;
imageMemoryBarrier.subresourceRange.layerCount = 1;
vkCmdPipelineBarrier(
cb,
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
imageMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
imageMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL;
imageMemoryBarrier.image = offscreenPass.depth.image;
imageMemoryBarrier.srcAccessMask = 0;
imageMemoryBarrier.dstAccessMask = 0;
vkCmdPipelineBarrier(
cb,
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
I have tried every pattern written here.
If you want to synchronize render passes then your pipeline barrier must be outside of the render pass in the command stream. I.e. it must be after the vkCmdEndRenderPass() of the first pass, and before the vkCmdBeginRenderPass() of the second pass. Pipeline barriers issued inside a render pass, as you are currently doing, are used for synchronization only within the current subpass.
Also, try to avoid:
srcStage=VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT
dstStage=VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT
... for pipeline barriers when you only consume the output of the first pass as a fragment shader input in the second. This is overly conservative and needlessly serializes execution of the geometry processing too. In this case, you should use:
srcStage=VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT
dstStage=VK_PIPELINE_STAGE_FRAGMENT_BIT
... which allows the non-dependent vertex shading and binning for the second pass to run in parallel to the first pass.
Self solved.
The difference in the precision of sampler2D between Adreno and Mali causes this issue. I can read correct data using "precision highp sampler2D".

How to move image in Glcontrol in OpenTk

I am new to opentk I want to move image from left to right smoothly. How can I do this?
I have tried moving viewport but it doesn't look smooth rendering.
int cnt = 1920;
public void DrawImage(int image)
{
GL.Viewport(new Rectangle(cnt, 0, ScreenWidth, ScreenHeight));
cnt--;
GL.MatrixMode(MatrixMode.Projection);
GL.PushMatrix();
GL.LoadIdentity();
//GL.Ortho(0, 1920, 0, 1080, 0, 1);
GL.MatrixMode(MatrixMode.Modelview);
GL.PushMatrix();
GL.LoadIdentity();
GL.Disable(EnableCap.Lighting);
GL.Enable(EnableCap.Texture2D);
GL.ActiveTexture(TextureUnit.Texture0);
GL.BindTexture(TextureTarget.Texture2D, image);
RunShaders();
GL.Disable(EnableCap.Texture2D);
GL.PopMatrix();
GL.MatrixMode(MatrixMode.Projection);
GL.PopMatrix();
GL.MatrixMode(MatrixMode.Modelview);
//ErrorCode ec = GL.GetError();
//if (ec != 0)
// System.Console.WriteLine(ec.ToString());
//Console.Read();
glControl1.SwapBuffers();
}

Vulkan validation error when I try to reset a commandPool after vkQueueWaitIddle

I have a small Vulkan program that runs a compute shader in a loop.
There is only one commandBuffer that is allocated from the only commandPool I have.
After the commandBuffer is built, I submit it to the queue, and wait for it to comple with vkQueueWaitIddle. I does indeed wait for a while in that line of code. After that, I call vkResetCommandPool, which should reset all commandBuffer allocated with that pool (there is only one anyways).
...
vkEndCommandBuffer(commandBuffer);
{
VkSubmitInfo info = {};
info.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
info.commandBufferCount = 1;
info.pCommandBuffers = &commandBuffer;
vkQueueSubmit(queue, 1, &info, VK_NULL_HANDLE);
}
vkQueueWaitIdle(queue);
vkResetCommandPool(device, commandPool, VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT);
When it tries to reset the commandPool the validation gives me the following error.
VUID-vkResetCommandPool-commandPool-00040(ERROR / SPEC): msgNum: -1254218959
- Validation Error: [ VUID-vkResetCommandPool-commandPool-00040 ]
Object 0: handle = 0x20d2ce0b718, type = VK_OBJECT_TYPE_COMMAND_BUFFER; |
MessageID = 0xb53e2331 |
Attempt to reset command pool with VkCommandBuffer 0x20d2ce0b718[] which is in use.
The Vulkan spec states: All VkCommandBuffer objects allocated from commandPool must not be in the pending state
(https://vulkan.lunarg.com/doc/view/1.2.176.1/windows/1.2-extensions/vkspec.html#VUID-vkResetCommandPool-commandPool-00040)
Objects: 1
[0] 0x20d2ce0b718, type: 6, name: NULL
But I don't understand why, since I'm already waiting with vkQueueWaitIdle. According to the documentation, once the commandBuffer is done executing, it should go to the invalid state, and I should be able to reset it.
Here's the relevan surrounding code:
VkCommandBufferBeginInfo beginInfo = {};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;
beginInfo.pInheritanceInfo = nullptr;
for (i64 i = 0; i < numIterations; i++)
{
vkBeginCommandBuffer(commandBuffer, &beginInfo);
vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);
vkCmdBindDescriptorSets(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, pipelineLayout,
0, 2, descriptorSets, 0, nullptr);
uniforms.start = i * numThreads;
vkCmdUpdateBuffer(commandBuffer, unifsBuffer, 0, sizeof(uniforms), &uniforms);
vkCmdPipelineBarrier(commandBuffer,
VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, 0,
0, nullptr,
1, &memBarriers[0],
0, nullptr);
vkCmdDispatch(commandBuffer, numThreads, 1, 1);
vkCmdPipelineBarrier(commandBuffer,
VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0,
0, nullptr,
1, &memBarriers[1],
0, nullptr);
VkBufferCopy copyInfo = {};
copyInfo.srcOffset = 0;
copyInfo.dstOffset = 0;
copyInfo.size = sizeof(i64) * numThreads;
vkCmdCopyBuffer(commandBuffer,
buffer, stagingBuffer, 1, &copyInfo);
vkEndCommandBuffer(commandBuffer);
{
VkSubmitInfo info = {};
info.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
info.commandBufferCount = 1;
info.pCommandBuffers = &commandBuffer;
vkQueueSubmit(queue, 1, &info, VK_NULL_HANDLE);
}
vkQueueWaitIdle(queue);
vkResetCommandPool(device, commandPool, VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT);
i64* result;
vkMapMemory(device, stagingBufferMem, 0, sizeof(i64) * numThreads, 0, (void**)&result);
for (int i = 0; i < numThreads; i++)
{
if (result[i]) {
auto res = result[i];
vkUnmapMemory(device, stagingBufferMem);
return res;
}
}
vkUnmapMemory(device, stagingBufferMem);
}
I have found my problem. In vkCmdDispatch, I thought the paremeters specify the global size (number of compute shader invocations) but it's actually the number of work groups. Therefore, I was dispatching more threads than I intended, and my buffer wasn't big enough, so the threads were writing out of bounds.
I believe the validation layer wasn't giving me the right hints though.

Vulkan Device - Host - Device synchronization with VkEvent

I'm trying to synchronize a host stage into my pipeline, where I basically edit some data on the host during the execution of a command buffer on the device. From reading the specification I think I'm doing the correct synchronization, execution/memory dependencies and availability/visibility operations, but it neither works on NV nor AMD hardware. Is this even possible? If so, what am I doing wrong in terms of synchronization?
In summary I'm doing the following:
[D] A device buffer (VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT) is copied to a host visible and coherent one (VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT).
[D] The first event is set.
[D] The second event is waited for.
[H] Meanwhile the host waits for the first event.
[H] After it has been set, it increments the numbers in the host visible buffer.
[H] Then it sets the second event.
[D] The device then continues to copy the host visible buffer back to the device local buffer.
What happens?
On NV the first part works, the correct data arrives at the host side, but the altered data never arrives at the device side. On AMD not even the first part works and I already don't get the data on the host.
Command buffer recording:
// ...
VkMemoryBarrier barrier = {};
barrier.sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER;
barrier.srcAccessMask = ...;
barrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
vkCmdPipelineBarrier(command_buffer, ..., VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 1, &barrier, 0, nullptr, 0, nullptr);
copyWholeBuffer(command_buffer, host_buffer, device_buffer);
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
barrier.dstAccessMask = VK_ACCESS_HOST_READ_BIT;
vkCmdPipelineBarrier(command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_HOST_BIT, 0, 1, &barrier, 0, nullptr, 0, nullptr);
vkCmdSetEvent(command_buffer, device_to_host_sync_event, VK_PIPELINE_STAGE_TRANSFER_BIT);
barrier.srcAccessMask = VK_ACCESS_HOST_WRITE_BIT;
barrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
vkCmdWaitEvents(command_buffer, 1, &host_to_device_sync_event, VK_PIPELINE_STAGE_HOST_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 1, &barrier, 0, nullptr, 0, nullptr);
copyWholeBuffer(command_buffer, device_buffer, host_buffer);
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
barrier.dstAccessMask = ...;
vkCmdPipelineBarrier(command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT, ..., 0, 1, &barrier, 0, nullptr, 0, nullptr);
// ...
Execution
vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE);
while(vkGetEventStatus(device, device_to_host_sync_event) != VK_EVENT_SET)
std::this_thread::sleep_for(std::chrono::microseconds(10));
void* data;
vkMapMemory(device, host_buffer, 0, BUFFER_SIZE, 0, &data);
// read and write parts of the memory
vkUnmapMemory(device, host_buffer);
vkSetEvent(device, host_to_device_sync_event);
vkDeviceWaitIdle(device);
I've uploaded a working example: https://gist.github.com/neXyon/859b2e52bac9a5a56b804d8a9d5fa4a5
The interesting bits start at line 292! Please have a look if it works for you?
I opened an issue on github: https://github.com/KhronosGroup/Vulkan-Docs/issues/755
After a bit of discussion there, the conclusion is that Device to Host synchronization is not possible with an event and a fence has to be used.

iOS OpenGL ES2 make change in vertex buffer object

I have a terrain in OpenGL. I want to dynamicly change the space between points.
But when the vertex data is send to the vertex buffer object, i cannot modify anything.
The only thing i can do is delete the VBO and create a replacement VBO with new positions of each point.
Is there a best way to do this ?
As mentioned in the comments, it sounds like you want glBufferSubData.
If you planned to modify the data often, first setup your VBO's initial state:
float[] positions = { 0, 0, 0, 0, 0, 0 };
int numberOfPositions = 6;
int vbo = glGenBuffers();
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(float) * numberOfPositions, positions, GL_DYNAMIC_DRAW);
Then later say you want to change the last two values to 1, you would do this:
float[] update = { 1, 1 };
int offset = 4
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferSubData(GL_ARRAY_BUFFER, sizeof(float) * offset, update);
Check out the docs.gl page on glBufferSubData for more information.