I'm trying to synchronize a host stage into my pipeline, where I basically edit some data on the host during the execution of a command buffer on the device. From reading the specification I think I'm doing the correct synchronization, execution/memory dependencies and availability/visibility operations, but it neither works on NV nor AMD hardware. Is this even possible? If so, what am I doing wrong in terms of synchronization?
In summary I'm doing the following:
[D] A device buffer (VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT) is copied to a host visible and coherent one (VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT).
[D] The first event is set.
[D] The second event is waited for.
[H] Meanwhile the host waits for the first event.
[H] After it has been set, it increments the numbers in the host visible buffer.
[H] Then it sets the second event.
[D] The device then continues to copy the host visible buffer back to the device local buffer.
What happens?
On NV the first part works, the correct data arrives at the host side, but the altered data never arrives at the device side. On AMD not even the first part works and I already don't get the data on the host.
Command buffer recording:
// ...
VkMemoryBarrier barrier = {};
barrier.sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER;
barrier.srcAccessMask = ...;
barrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
vkCmdPipelineBarrier(command_buffer, ..., VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 1, &barrier, 0, nullptr, 0, nullptr);
copyWholeBuffer(command_buffer, host_buffer, device_buffer);
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
barrier.dstAccessMask = VK_ACCESS_HOST_READ_BIT;
vkCmdPipelineBarrier(command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_HOST_BIT, 0, 1, &barrier, 0, nullptr, 0, nullptr);
vkCmdSetEvent(command_buffer, device_to_host_sync_event, VK_PIPELINE_STAGE_TRANSFER_BIT);
barrier.srcAccessMask = VK_ACCESS_HOST_WRITE_BIT;
barrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
vkCmdWaitEvents(command_buffer, 1, &host_to_device_sync_event, VK_PIPELINE_STAGE_HOST_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 1, &barrier, 0, nullptr, 0, nullptr);
copyWholeBuffer(command_buffer, device_buffer, host_buffer);
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
barrier.dstAccessMask = ...;
vkCmdPipelineBarrier(command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT, ..., 0, 1, &barrier, 0, nullptr, 0, nullptr);
// ...
Execution
vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE);
while(vkGetEventStatus(device, device_to_host_sync_event) != VK_EVENT_SET)
std::this_thread::sleep_for(std::chrono::microseconds(10));
void* data;
vkMapMemory(device, host_buffer, 0, BUFFER_SIZE, 0, &data);
// read and write parts of the memory
vkUnmapMemory(device, host_buffer);
vkSetEvent(device, host_to_device_sync_event);
vkDeviceWaitIdle(device);
I've uploaded a working example: https://gist.github.com/neXyon/859b2e52bac9a5a56b804d8a9d5fa4a5
The interesting bits start at line 292! Please have a look if it works for you?
I opened an issue on github: https://github.com/KhronosGroup/Vulkan-Docs/issues/755
After a bit of discussion there, the conclusion is that Device to Host synchronization is not possible with an event and a fence has to be used.
Related
I want to use a vkImage rendered at a previous render pass as Texture to do the composite operation in a fragment shader. From here I learned vkCmdPipelineBarrier is used to wait for GPU finish a rendering operation and I write this code. It works well on Snapdragon devices. But not on Mali-G52. The Write-after-write error is partly happed. Is this code not enough? Any suggestions?
vkCmdEndRenderPass(cb);
vkCmdBeginRenderPass(cb, &renderPassBeginInfo, VK_SUBPASS_CONTENTS_INLINE);
VkViewport viewport = vks::initializers::viewport((float)offscreenPass.width, (float)offscreenPass.height, 0.0f, 1.0f);
vkCmdSetViewport(cb, 0, 1, &viewport);
VkRect2D scissor = vks::initializers::rect2D(offscreenPass.width, offscreenPass.height, 0, 0);
vkCmdSetScissor(cb, 0, 1, &scissor);
// https://github.com/KhronosGroup/Vulkan-Samples/blob/master/samples/performance/pipeline_barriers/pipeline_barriers.cpp
VkImageMemoryBarrier imageMemoryBarrier = vks::initializers::imageMemoryBarrier();
imageMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
imageMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
imageMemoryBarrier.srcAccessMask = 0;
imageMemoryBarrier.dstAccessMask = 0;
imageMemoryBarrier.image = offscreenPass.color[drawframe].image;
imageMemoryBarrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
imageMemoryBarrier.subresourceRange.baseMipLevel = 0;
imageMemoryBarrier.subresourceRange.levelCount = 1;
imageMemoryBarrier.subresourceRange.baseArrayLayer = 0;
imageMemoryBarrier.subresourceRange.layerCount = 1;
vkCmdPipelineBarrier(
cb,
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
imageMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
imageMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL;
imageMemoryBarrier.image = offscreenPass.depth.image;
imageMemoryBarrier.srcAccessMask = 0;
imageMemoryBarrier.dstAccessMask = 0;
vkCmdPipelineBarrier(
cb,
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
I have tried every pattern written here.
If you want to synchronize render passes then your pipeline barrier must be outside of the render pass in the command stream. I.e. it must be after the vkCmdEndRenderPass() of the first pass, and before the vkCmdBeginRenderPass() of the second pass. Pipeline barriers issued inside a render pass, as you are currently doing, are used for synchronization only within the current subpass.
Also, try to avoid:
srcStage=VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT
dstStage=VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT
... for pipeline barriers when you only consume the output of the first pass as a fragment shader input in the second. This is overly conservative and needlessly serializes execution of the geometry processing too. In this case, you should use:
srcStage=VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT
dstStage=VK_PIPELINE_STAGE_FRAGMENT_BIT
... which allows the non-dependent vertex shading and binning for the second pass to run in parallel to the first pass.
Self solved.
The difference in the precision of sampler2D between Adreno and Mali causes this issue. I can read correct data using "precision highp sampler2D".
I am very much a Vulkan/ graphics APIs beginner. I've read some resources on Vulkan synchronization and understand it more than at the beginning but the code still doesn't work. I'm expecting the ray tracing pipeline to output a flat color bule image, but it flickers intensly between blue and just black. Validation layers scream every frame that "images passed to present must be in layout VK_IMAGE_LAYOUT_PRESENT_SRC_KHR or VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR but is in VK_IMAGE_LAYOUT_UNDEFINED."
This is more or less what my code looks like:
vkBeginCommandBuffer();
// ... bind pipeline and descriptor sets
vkCmdTraceRaysKHR();
// Prepare current swap chain image as transfer destination
VkImageMemoryBarrier barrier{};
barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
barrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
barrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.image = swapchainImage;
barrier.subresourceRange = subresource_range;
// No need to make anything available
barrier.srcAccessMask = 0;
// The result of this transition should be visible for transfers
barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT | VK_ACCESS_TRANSFER_READ_BIT;
vkCmdPipelineBarrier(
cmdBuffer,
VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, // No need to wait for anything
VK_PIPELINE_STAGE_TRANSFER_BIT, // Should make transfers wait
0,
0, nullptr,
0, nullptr,
1, &barrier
);
// Prepare ray tracing output image as transfer source
VkImageMemoryBarrier barrier{};
barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
barrier.oldLayout = VK_IMAGE_LAYOUT_GENERAL;
barrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.image = renderImage.image;
barrier.subresourceRange = subresource_range;
// The data written by the ray tracing should be made available
barrier.srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
// The transition and data should be visible for transitions
barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT | VK_ACCESS_TRANSFER_READ_BIT;
vkCmdPipelineBarrier(
cmdBuffer,
VK_PIPELINE_STAGE_RAY_TRACING_SHADER_BIT_KHR, // Should wait until ray tracing is done
VK_PIPELINE_STAGE_TRANSFER_BIT, // Should make transfers wait
0,
0, nullptr,
0, nullptr,
1, &barrier
);
vkCmdCopyImage();
// Transition swap chain image back for presentation
VkImageMemoryBarrier barrier{};
barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
barrier.newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.image = swapchainImage;
barrier.subresourceRange = subresource_range;
// The effects of the transfer should be made available
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
// The effects of the transfer should be made visible for swapchain presentation
barrier.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT;
vkCmdPipelineBarrier(
cmdBuffer,
VK_PIPELINE_STAGE_TRANSFER_BIT, // Wait for transfers
VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, // Block all commands after this barrier
0,
0, nullptr,
0, nullptr,
1, &barrier
);
// Transition ray tracing output image back to general layout
VkImageMemoryBarrier barrier{};
barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
barrier.newLayout = VK_IMAGE_LAYOUT_GENERAL;
barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.image = renderImage.image;
barrier.subresourceRange = subresource_range;
// The effects of the transfer should be made available (possibly unnecessary?)
barrier.srcAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
// (possibly unnecessary?)
barrier.dstAccessMask = VK_ACCESS_MEMORY_WRITE_BIT;
vkCmdPipelineBarrier(
cmdBuffer,
VK_PIPELINE_STAGE_TRANSFER_BIT, // Wait for transfers
VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, // Block all commands after this barrier
0,
0, nullptr,
0, nullptr,
1, &barrier
);
After all this the queue is submitted with two semaphores:
one wait semaphore that is signalled by vkAcquireNextImageKHR. It's wait stage is set to VK_PIPELINE_STAGE_RAY_TRACING_SHADER_BIT_KHR.
one signal semaphore that is later used as a wait semaphore in vkQueuePresentKHR.
So how do I get rid of the vulkan validation layer message and properly display the rendered image?
Edit: The culrpit was found somwhere else (choosing wrong swapchain image for rendering), but I would still appreciate it if someone could confirm/correct my rationale behind the chosen stage and access masks. Especially that now I can't even make it freak out on purpouse, for example by setting the semaphore wait stage to BOTTOM_OF_PIPE (I thought it would mean that no stages wait so the render runs and writes without a swapchain image)
I don't quite understand here.:
https://github.com/SaschaWillems/Vulkan/blob/master/examples/computeshader/computeshader.cpp
void draw()
{
VulkanExampleBase::prepareFrame();
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &drawCmdBuffers[currentBuffer];
VK_CHECK_RESULT(vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE));
VulkanExampleBase::submitFrame();
// Submit compute commands
// Use a fence to ensure that compute command buffer has finished executin before using it again
vkWaitForFences(device, 1, &compute.fence, VK_TRUE, UINT64_MAX);
vkResetFences(device, 1, &compute.fence);
VkSubmitInfo computeSubmitInfo = vks::initializers::submitInfo();
computeSubmitInfo.commandBufferCount = 1;
computeSubmitInfo.pCommandBuffers = &compute.commandBuffer;
VK_CHECK_RESULT(vkQueueSubmit(compute.queue, 1, &computeSubmitInfo, compute.fence));
}
drawCmdBuffers[currentBuffer] runs before compute.commandBuffer, but the consumer drawCmdBuffers[currentBuffer] requires the textureComputeTarget produced by the producer compute.commandBuffer.
I don't understand why drawCmdBuffers[currentBuffer] is called before compute.commandBuffer.
In the following code, only the first frame is rendered, while the right picture does not get the textureComputeTarget, so it is rendered with a blue background.
void draw()
{
VulkanExampleBase::prepareFrame();
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &drawCmdBuffers[currentBuffer];
VK_CHECK_RESULT(vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE));
VulkanExampleBase::submitFrame();
// Submit compute commands
// Use a fence to ensure that compute command buffer has finished executin before using it again
vkWaitForFences(device, 1, &compute.fence, VK_TRUE, UINT64_MAX);
vkResetFences(device, 1, &compute.fence);
VkSubmitInfo computeSubmitInfo = vks::initializers::submitInfo();
computeSubmitInfo.commandBufferCount = 1;
computeSubmitInfo.pCommandBuffers = &compute.commandBuffer;
VK_CHECK_RESULT(vkQueueSubmit(compute.queue, 1, &computeSubmitInfo, compute.fence));
sleep(1000) // <-------- Wait
}
Executed when calling vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE):
VkImageMemoryBarrier imageMemoryBarrier = {};
imageMemoryBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
// We won't be changing the layout of the image
imageMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_GENERAL;
imageMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_GENERAL;
imageMemoryBarrier.image = textureComputeTarget.image;
imageMemoryBarrier.subresourceRange = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 };
imageMemoryBarrier.srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
imageMemoryBarrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
vkCmdPipelineBarrier(
drawCmdBuffers[i],
VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
VK_FLAGS_NONE,
0, nullptr,
0, nullptr,
1, &imageMemoryBarrier);
vkCmdBeginRenderPass(drawCmdBuffers[i], &renderPassBeginInfo, VK_SUBPASS_CONTENTS_INLINE);
Wait for VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, but this phase has not been executed before, why is the pipeline not stuck? Is it because
there is no pipeline before, so there is no need to wait?
In section 6.6 Pipeline Barriers
vkCmdPipelineBarrier is a synchronization command that inserts a dependency between commands submitted to the same queue, or between commands in the same subpass.
void draw()
{
printf("%p, %p\n", queue, compute.queue);
VulkanExampleBase::prepareFrame();
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &drawCmdBuffers[currentBuffer];
VK_CHECK_RESULT(vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE));
VulkanExampleBase::submitFrame();
// Submit compute commands
// Use a fence to ensure that compute command buffer has finished executin before using it again
vkWaitForFences(device, 1, &compute.fence, VK_TRUE, UINT64_MAX);
vkResetFences(device, 1, &compute.fence);
VkSubmitInfo computeSubmitInfo = vks::initializers::submitInfo();
computeSubmitInfo.commandBufferCount = 1;
computeSubmitInfo.pCommandBuffers = &compute.commandBuffer;
VK_CHECK_RESULT(vkQueueSubmit(compute.queue, 1, &computeSubmitInfo, compute.fence));
sleep(1000);
}
Print results:
0x6000039c4a20, 0x6000039c4a20
The current queue and compute.queue are the same queue.But it is possible that the above code may generate different queue.
Can VkImageMemoryBarrier be synchronized in multiple queues?
vkCmdPipelineBarrier is a synchronization command that inserts a dependency between commands submitted to the same queue, or
between commands in the same subpass. why use "or", why not use
"and"?
I don't understand why drawCmdBuffers[currentBuffer] is called before compute.commandBuffer.
Dunno, it is an example. Author was probably not awfully woried what happens in the first frame. It would simply be drawn with one frame delay. Swapping the compute before draw should also work with some effort.
Wait for VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, but this phase has not been executed before, why is the pipeline not stuck? Is it because there is no pipeline before, so there is no need to wait?
Because that is not how pipeline and dependencies work. vkCmdPipelineBarrier makes sure any command\operation in queue before the barrier reaches (and finishes) at least the srcStage stage (i.e. COMPUTE) before any command\op recorded after it reach dstStage.
Such dependency is satisfied even if there are no commands recorded before. I.e. by definition of "nothing", there are no commands that have not reached COMPUTE stage yet.
Can VkImageMemoryBarrier be synchronized in multiple queues?
Yes, with the help of a Semaphore.
For VK_SHARING_MODE_EXCLUSIVE and different queue family it is called Queue Family Ownership Transfer (QFOT).
Otherwisely, a Semaphore already performs a memory dependency and a VkImageMemoryBarrier is not needed.
vkCmdPipelineBarrier is a synchronization command that inserts a dependency between commands submitted to the same queue, or between commands in the same subpass. why use "or", why not use "and"?
vkCmdPipelineBarrier is either outside subpass, then it forms a dependency with commands recorded before and after in the queue.
Or vkCmdPipelineBarrier is inside a subpass, in which case it is called "subpass self-dependency" and its scope is limited only to that subpass (among other restrictions).
I am able to dump stuff from R32G32B32A32 image for screenshot. I would like to read out a pixel from R32G32_SFLOAT image as well. But the result look weird.
below is my working image dump code(no validation error)
void DumpImageToFile(VkTool::VulkanDevice &device, VkQueue graphics_queue, VkTool::Wrapper::CommandBuffers &command_buffer, VkImage image, uint32_t width, uint32_t height, const char *filename)
{
auto image_create_info = VkTool::Initializer::GenerateImageCreateInfo(VK_IMAGE_TYPE_2D, VK_FORMAT_R8G8B8A8_UNORM, {width, height, 1},
VK_IMAGE_USAGE_TRANSFER_SRC_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT, VK_SAMPLE_COUNT_1_BIT);
VkTool::Wrapper::Image staging_image(device, image_create_info, VK_MEMORY_HEAP_DEVICE_LOCAL_BIT);
auto buffer_create_info = VkTool::Initializer::GenerateBufferCreateInfo(width * height * 4, VK_BUFFER_USAGE_TRANSFER_DST_BIT);
VkTool::Wrapper::Buffer staging_buffer(device, buffer_create_info, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);
// Copy texture to buffer
command_buffer.Begin();
auto image_memory_barrier = VkTool::Initializer::GenerateImageMemoryBarrier(VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 }, staging_image.Get());
device.vkCmdPipelineBarrier(command_buffer.Get(), VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0
, 0, nullptr, 0, nullptr, 1, &image_memory_barrier);
image_memory_barrier = VkTool::Initializer::GenerateImageMemoryBarrier(VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 }, image);
device.vkCmdPipelineBarrier(command_buffer.Get(), VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0
, 0, nullptr, 0, nullptr, 1, &image_memory_barrier);
// Copy!!
VkImageBlit region = {};
region.srcSubresource = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 0, 1 };
region.srcOffsets[0] = { 0, 0, 0 };
region.srcOffsets[1] = { static_cast<int32_t>(width), static_cast<int32_t>(height), 1};
region.dstSubresource = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 0, 1 };
region.dstOffsets[0] = { 0, 0, 0 };
region.dstOffsets[1] = { static_cast<int32_t>(width), static_cast<int32_t>(height), 1 };
device.vkCmdBlitImage(command_buffer.Get(), image, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, staging_image.Get(), VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, ®ion, VK_FILTER_LINEAR);
image_memory_barrier = VkTool::Initializer::GenerateImageMemoryBarrier(VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 }, image);
device.vkCmdPipelineBarrier(command_buffer.Get(), VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, 0
, 0, nullptr, 0, nullptr, 1, &image_memory_barrier);
image_memory_barrier = VkTool::Initializer::GenerateImageMemoryBarrier(VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
{ VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 }, staging_image.Get());
device.vkCmdPipelineBarrier(command_buffer.Get(), VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0
, 0, nullptr, 0, nullptr, 1, &image_memory_barrier);
auto buffer_image_copy = VkTool::Initializer::GenerateBufferImageCopy({ VK_IMAGE_ASPECT_COLOR_BIT , 0, 0, 1 }, { width, height, 1 });
device.vkCmdCopyImageToBuffer(command_buffer.Get(), staging_image.Get(), VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, staging_buffer.Get(), 1, &buffer_image_copy);
command_buffer.End();
std::vector<VkCommandBuffer> raw_command_buffers = command_buffer.GetAll();
auto submit_info = VkTool::Initializer::GenerateSubmitInfo(raw_command_buffers);
VkTool::Wrapper::Fence fence(device);
device.vkQueueSubmit(graphics_queue, 1, &submit_info, fence.Get());
fence.Wait();
fence.Destroy();
const uint8_t *mapped_address = reinterpret_cast<const uint8_t *>(staging_buffer.MapMemory());
lodepng::encode(filename, mapped_address, width, height);
staging_buffer.UnmapMemory();
staging_image.Destroy();
staging_buffer.Destroy();
}
Sorry for the ugly self-made wrapper, there was no official wrapper. Basically, it creates a staging image and buffer. first copy from source image to staging image with vkCmdBlitImage. then use vkCmdCopyImageToBuffer and map the buffer to host memory. This method works on multiple gpus and it does not need to worry about padding.(I guess, correct me if I am wrong).
However, I have no luck to use this method to read R32G32_SFLOAT. at first I thought it was because of endianness until I dump the whole image out.
The image above is I directly convert R32G32_SFLOAT to R8G8B8A8_UNORM, I know it does not make sense. But without changing format, there's still a lot of "hole" in the image and values are deadly wrong.
I am not really sure if it is THE problem, but if I understand your code, you want to put image into filename.
So you want to read from this image. However, you said that the old layout for this image (not the staging one) is UNDEFINED layout. The implementation is free to assume you do not care about data that are stored in it. Use the real layout instead (I think it is COLOR_ATTACHMENT or something like that).
Moreover, you are using one staging image and one staging buffer. I do not really understand why are you doing such a thing? Why not simply use vkCmdCopyImageToBuffer function with image to staging_buffer?
BTW, with Vulkan it is not because one code works on some GPUs that this code is correct.
Also, I think you must use a memory barrier after your transfer to the buffer that implies HOST_STAGE and HOST_READ. In the specification, it is write :
Signaling a fence and waiting on the host does not guarantee that the results of memory accesses will be visible to the host, as the access scope of a memory dependency defined by a fence only includes device access. A memory barrier or other memory dependency must be used to guarantee this. See the description of host access types for more information.
This part of your code seems weird:
image_memory_barrier = VkTool::Initializer::GenerateImageMemoryBarrier(VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 }, image);
device.vkCmdPipelineBarrier(command_buffer.Get(), VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 0, nullptr, 0, nullptr, 1, &image_memory_barrier);
This basically means that after the barrier your source image may not have any data. UNDEFINED value used as a source layout doesn't guarantee that the contents of an image are preserved.
I have a MusicPlayer that holds a MusicSequence containing 3 MusicTracks. I have set up an AUGraph with 3 AUSampler Nodes plugged into a multichannel mixer, which in turn is connected to an output node.
I am using a SoundFont, and would like my 3 different MusicTracks to play on 3 different musical instruments, as is described here. However, the code I've got doesn't work - instead, it plays only one of the parts.
I create the AUGraph as follows:
NewAUGraph (&_processingGraph);
AUNode samplerNode, samplerNodeTwo, samplerNodeThree, ioNode, mixerNode;
AudioComponentDescription cd = {};
cd.componentManufacturer = kAudioUnitManufacturer_Apple;
//----------------------------------------
// Add 3 Sampler unit nodes to the graph
//----------------------------------------
cd.componentType = kAudioUnitType_MusicDevice;
cd.componentSubType = kAudioUnitSubType_Sampler;
AUGraphAddNode (self.processingGraph, &cd, &samplerNode);
AUGraphAddNode (self.processingGraph, &cd, &samplerNodeTwo);
AUGraphAddNode (self.processingGraph, &cd, &samplerNodeThree);
//-----------------------------------
// 2. Add a Mixer unit node to the graph
//-----------------------------------
cd.componentType = kAudioUnitType_Mixer;
cd.componentSubType = kAudioUnitSubType_MultiChannelMixer;
AUGraphAddNode (self.processingGraph, &cd, &mixerNode);
//--------------------------------------
// 3. Add the Output unit node to the graph
//--------------------------------------
cd.componentType = kAudioUnitType_Output;
cd.componentSubType = kAudioUnitSubType_RemoteIO; // Output to speakers
AUGraphAddNode (self.processingGraph, &cd, &ioNode);
//---------------
// Open the graph
//---------------
AUGraphOpen (self.processingGraph);
//-----------------------------------------------------------
// Obtain the mixer unit instance from its corresponding node
//-----------------------------------------------------------
AUGraphNodeInfo (
self.processingGraph,
mixerNode,
NULL,
&mixerUnit
);
//--------------------------------
// Set the bus count for the mixer
//--------------------------------
UInt32 numBuses = 3;
AudioUnitSetProperty(mixerUnit,
kAudioUnitProperty_ElementCount,
kAudioUnitScope_Input,
0,
&numBuses,
sizeof(numBuses));
//------------------
// Connect the nodes
//------------------
AUGraphConnectNodeInput (self.processingGraph, samplerNode, 0, mixerNode, 0);
AUGraphConnectNodeInput (self.processingGraph, samplerNodeTwo, 0, mixerNode, 1);
AUGraphConnectNodeInput (self.processingGraph, samplerNodeThree, 0, mixerNode, 2);
// Connect the mixer unit to the output unit
AUGraphConnectNodeInput (self.processingGraph, mixerNode, 0, ioNode, 0);
// Obtain references to all of the audio units from their nodes
AUGraphNodeInfo (self.processingGraph, samplerNode, 0, &_samplerUnit);
AUGraphNodeInfo (self.processingGraph, samplerNodeTwo, 0, &_samplerUnitTwo);
AUGraphNodeInfo (self.processingGraph, samplerNodeThree, 0, &_samplerUnitThree);
AUGraphNodeInfo (self.processingGraph, ioNode, 0, &_ioUnit);
I then load the 3 instruments from the SoundFont (IDs 0, 1 and 2 in the SoundFont) as follows, passing in the 'bankURL' of the SoundFont:
// Load the first instrument
AUSamplerBankPresetData bpdata;
bpdata.bankURL = (__bridge CFURLRef) bankURL;
bpdata.bankMSB = kAUSampler_DefaultMelodicBankMSB;
bpdata.bankLSB = kAUSampler_DefaultBankLSB;
bpdata.presetID = (UInt8) 0;
AudioUnitSetProperty(self.samplerUnit,
kAUSamplerProperty_LoadPresetFromBank,
kAudioUnitScope_Global,
0,
&bpdata,
sizeof(bpdata));
// Load the second instrument
AUSamplerBankPresetData bpdataTwo;
bpdataTwo.bankURL = (__bridge CFURLRef) bankURL;
bpdataTwo.bankMSB = kAUSampler_DefaultMelodicBankMSB;
bpdataTwo.bankLSB = kAUSampler_DefaultBankLSB;
bpdataTwo.presetID = (UInt8) 1;
AudioUnitSetProperty(self.samplerUnitTwo,
kAUSamplerProperty_LoadPresetFromBank,
kAudioUnitScope_Global,
0,
&bpdataTwo,
sizeof(bpdataTwo));
// Load the third instrument
AUSamplerBankPresetData bpdataThree;
bpdataThree.bankURL = (__bridge CFURLRef) bankURL;
bpdataThree.bankMSB = kAUSampler_DefaultMelodicBankMSB;
bpdataThree.bankLSB = kAUSampler_DefaultBankLSB;
bpdataThree.presetID = (UInt8) 2;
AudioUnitSetProperty(self.samplerUnitThree,
kAUSamplerProperty_LoadPresetFromBank,
kAudioUnitScope_Global,
0,
&bpdataThree,
sizeof(bpdataThree));
Finally, I set the AUSampler nodes to be used by each MusicTrack as follows:
//-------------------------------------------------
// Set the AUSampler nodes to be used by each track
//-------------------------------------------------
MusicTrack track, trackTwo, trackThree;
MusicSequenceGetIndTrack(testSequence, 0, &track);
MusicSequenceGetIndTrack(testSequence, 1, &trackTwo);
MusicSequenceGetIndTrack(testSequence, 2, &trackThree);
AUNode samplerNode, samplerNodeTwo, samplerNodeThree;
AUGraphGetIndNode (self.processingGraph, 0, &samplerNode);
AUGraphGetIndNode (self.processingGraph, 1, &samplerNodeTwo);
AUGraphGetIndNode (self.processingGraph, 2, &samplerNodeThree);
MusicTrackSetDestNode(track, samplerNode);
MusicTrackSetDestNode(trackTwo, samplerNodeTwo);
MusicTrackSetDestNode(trackThree, samplerNodeThree);
However, when I then play the MusicPlayer, I only hear a single part playing. The problem is arising in trying to use different instruments - when I use a single instrument with the standard MusicPlayer setup (instead of editing the AUGraph as I do above), it works fine.
Does anyone have any idea what I'm doing wrong?
I've found the solution. Before loading the instruments from the SoundFont, the following line is needed:
MusicSequenceSetAUGraph(testSequence, self.processingGraph);
As long as the point at which this line is run comes before the instruments are loaded from the SoundFont and before the various MusicTracks are assigned AUSampler nodes, it seems to work - all parts are played on different instruments, as desired. This answer to a related question helped me figure this out.
I had exactly same issue as you. All tracks play with the first sound font instrument.
I followed your solution but it not work at first. Finally, I resolve the problem.
As your mentioned, the sequence of calling functions really maters. Yes, it is. Actually, the sequence calling should be like this:
.....
MusicSequenceSetAUGraph(s, _processingGraph);
.......
MusicTrackSetDestNode(track[i], samplerNodes[i]);
......
[self loadFromDLSOrSoundFont];
......
MusicPlayerStart(p);
This works in my project.
BTW, thanks for sharing your codes. Really helped :)