vulkan: VkImageMemoryBarrier - vulkan

I don't quite understand here.:
https://github.com/SaschaWillems/Vulkan/blob/master/examples/computeshader/computeshader.cpp
void draw()
{
VulkanExampleBase::prepareFrame();
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &drawCmdBuffers[currentBuffer];
VK_CHECK_RESULT(vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE));
VulkanExampleBase::submitFrame();
// Submit compute commands
// Use a fence to ensure that compute command buffer has finished executin before using it again
vkWaitForFences(device, 1, &compute.fence, VK_TRUE, UINT64_MAX);
vkResetFences(device, 1, &compute.fence);
VkSubmitInfo computeSubmitInfo = vks::initializers::submitInfo();
computeSubmitInfo.commandBufferCount = 1;
computeSubmitInfo.pCommandBuffers = &compute.commandBuffer;
VK_CHECK_RESULT(vkQueueSubmit(compute.queue, 1, &computeSubmitInfo, compute.fence));
}
drawCmdBuffers[currentBuffer] runs before compute.commandBuffer, but the consumer drawCmdBuffers[currentBuffer] requires the textureComputeTarget produced by the producer compute.commandBuffer.
I don't understand why drawCmdBuffers[currentBuffer] is called before compute.commandBuffer.
In the following code, only the first frame is rendered, while the right picture does not get the textureComputeTarget, so it is rendered with a blue background.
void draw()
{
VulkanExampleBase::prepareFrame();
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &drawCmdBuffers[currentBuffer];
VK_CHECK_RESULT(vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE));
VulkanExampleBase::submitFrame();
// Submit compute commands
// Use a fence to ensure that compute command buffer has finished executin before using it again
vkWaitForFences(device, 1, &compute.fence, VK_TRUE, UINT64_MAX);
vkResetFences(device, 1, &compute.fence);
VkSubmitInfo computeSubmitInfo = vks::initializers::submitInfo();
computeSubmitInfo.commandBufferCount = 1;
computeSubmitInfo.pCommandBuffers = &compute.commandBuffer;
VK_CHECK_RESULT(vkQueueSubmit(compute.queue, 1, &computeSubmitInfo, compute.fence));
sleep(1000) // <-------- Wait
}
Executed when calling vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE):
VkImageMemoryBarrier imageMemoryBarrier = {};
imageMemoryBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
// We won't be changing the layout of the image
imageMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_GENERAL;
imageMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_GENERAL;
imageMemoryBarrier.image = textureComputeTarget.image;
imageMemoryBarrier.subresourceRange = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 };
imageMemoryBarrier.srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
imageMemoryBarrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
vkCmdPipelineBarrier(
drawCmdBuffers[i],
VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
VK_FLAGS_NONE,
0, nullptr,
0, nullptr,
1, &imageMemoryBarrier);
vkCmdBeginRenderPass(drawCmdBuffers[i], &renderPassBeginInfo, VK_SUBPASS_CONTENTS_INLINE);
Wait for VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, but this phase has not been executed before, why is the pipeline not stuck? Is it because
there is no pipeline before, so there is no need to wait?
In section 6.6 Pipeline Barriers
vkCmdPipelineBarrier is a synchronization command that inserts a dependency between commands submitted to the same queue, or between commands in the same subpass.
void draw()
{
printf("%p, %p\n", queue, compute.queue);
VulkanExampleBase::prepareFrame();
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &drawCmdBuffers[currentBuffer];
VK_CHECK_RESULT(vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE));
VulkanExampleBase::submitFrame();
// Submit compute commands
// Use a fence to ensure that compute command buffer has finished executin before using it again
vkWaitForFences(device, 1, &compute.fence, VK_TRUE, UINT64_MAX);
vkResetFences(device, 1, &compute.fence);
VkSubmitInfo computeSubmitInfo = vks::initializers::submitInfo();
computeSubmitInfo.commandBufferCount = 1;
computeSubmitInfo.pCommandBuffers = &compute.commandBuffer;
VK_CHECK_RESULT(vkQueueSubmit(compute.queue, 1, &computeSubmitInfo, compute.fence));
sleep(1000);
}
Print results:
0x6000039c4a20, 0x6000039c4a20
The current queue and compute.queue are the same queue.But it is possible that the above code may generate different queue.
Can VkImageMemoryBarrier be synchronized in multiple queues?
vkCmdPipelineBarrier is a synchronization command that inserts a dependency between commands submitted to the same queue, or
between commands in the same subpass. why use "or", why not use
"and"?

I don't understand why drawCmdBuffers[currentBuffer] is called before compute.commandBuffer.
Dunno, it is an example. Author was probably not awfully woried what happens in the first frame. It would simply be drawn with one frame delay. Swapping the compute before draw should also work with some effort.
Wait for VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, but this phase has not been executed before, why is the pipeline not stuck? Is it because there is no pipeline before, so there is no need to wait?
Because that is not how pipeline and dependencies work. vkCmdPipelineBarrier makes sure any command\operation in queue before the barrier reaches (and finishes) at least the srcStage stage (i.e. COMPUTE) before any command\op recorded after it reach dstStage.
Such dependency is satisfied even if there are no commands recorded before. I.e. by definition of "nothing", there are no commands that have not reached COMPUTE stage yet.
Can VkImageMemoryBarrier be synchronized in multiple queues?
Yes, with the help of a Semaphore.
For VK_SHARING_MODE_EXCLUSIVE and different queue family it is called Queue Family Ownership Transfer (QFOT).
Otherwisely, a Semaphore already performs a memory dependency and a VkImageMemoryBarrier is not needed.
vkCmdPipelineBarrier is a synchronization command that inserts a dependency between commands submitted to the same queue, or between commands in the same subpass. why use "or", why not use "and"?
vkCmdPipelineBarrier is either outside subpass, then it forms a dependency with commands recorded before and after in the queue.
Or vkCmdPipelineBarrier is inside a subpass, in which case it is called "subpass self-dependency" and its scope is limited only to that subpass (among other restrictions).

Related

<Vulkan> Use rendered vkImage as Texture

I want to use a vkImage rendered at a previous render pass as Texture to do the composite operation in a fragment shader. From here I learned vkCmdPipelineBarrier is used to wait for GPU finish a rendering operation and I write this code. It works well on Snapdragon devices. But not on Mali-G52. The Write-after-write error is partly happed. Is this code not enough? Any suggestions?
vkCmdEndRenderPass(cb);
vkCmdBeginRenderPass(cb, &renderPassBeginInfo, VK_SUBPASS_CONTENTS_INLINE);
VkViewport viewport = vks::initializers::viewport((float)offscreenPass.width, (float)offscreenPass.height, 0.0f, 1.0f);
vkCmdSetViewport(cb, 0, 1, &viewport);
VkRect2D scissor = vks::initializers::rect2D(offscreenPass.width, offscreenPass.height, 0, 0);
vkCmdSetScissor(cb, 0, 1, &scissor);
// https://github.com/KhronosGroup/Vulkan-Samples/blob/master/samples/performance/pipeline_barriers/pipeline_barriers.cpp
VkImageMemoryBarrier imageMemoryBarrier = vks::initializers::imageMemoryBarrier();
imageMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
imageMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
imageMemoryBarrier.srcAccessMask = 0;
imageMemoryBarrier.dstAccessMask = 0;
imageMemoryBarrier.image = offscreenPass.color[drawframe].image;
imageMemoryBarrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
imageMemoryBarrier.subresourceRange.baseMipLevel = 0;
imageMemoryBarrier.subresourceRange.levelCount = 1;
imageMemoryBarrier.subresourceRange.baseArrayLayer = 0;
imageMemoryBarrier.subresourceRange.layerCount = 1;
vkCmdPipelineBarrier(
cb,
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
imageMemoryBarrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
imageMemoryBarrier.newLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL;
imageMemoryBarrier.image = offscreenPass.depth.image;
imageMemoryBarrier.srcAccessMask = 0;
imageMemoryBarrier.dstAccessMask = 0;
vkCmdPipelineBarrier(
cb,
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
0, 0, nullptr, 0, nullptr, 1, &imageMemoryBarrier);
I have tried every pattern written here.
If you want to synchronize render passes then your pipeline barrier must be outside of the render pass in the command stream. I.e. it must be after the vkCmdEndRenderPass() of the first pass, and before the vkCmdBeginRenderPass() of the second pass. Pipeline barriers issued inside a render pass, as you are currently doing, are used for synchronization only within the current subpass.
Also, try to avoid:
srcStage=VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT
dstStage=VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT
... for pipeline barriers when you only consume the output of the first pass as a fragment shader input in the second. This is overly conservative and needlessly serializes execution of the geometry processing too. In this case, you should use:
srcStage=VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT
dstStage=VK_PIPELINE_STAGE_FRAGMENT_BIT
... which allows the non-dependent vertex shading and binning for the second pass to run in parallel to the first pass.
Self solved.
The difference in the precision of sampler2D between Adreno and Mali causes this issue. I can read correct data using "precision highp sampler2D".

Has Submitting Queue Without Command Buffer valid usage?

What is the valid use case to call vkQueueSubmit with VkSubmitInfo command buffers parameters set to 0 count and nullptr pointer?
Example code:
VkFenceCreateInfo fenceInfo{};
VkFence fence;
vkCreateFence(device, &fenceInfo, nullptr, &fence);
std::vector<VkSubmitInfo> submitInfos;
VkSubmitInfo submitInfo{};
...
submitInfo.commandBufferCount = 0;
submitInfo.pCommandBuffers = nullptr;
...
submitInfos.push_back(submitInfo);
vkQueueSubmit(queue, submitInfos.size(), submitInfos.data(), fence);
I can think just about one thing and that is signaling the fence, but it makes no sense for me to do it this way (as it can be reseted or created in signaled state).
Is there any (other) valid use case for doing so?

Vulkan: In the case of multiple frame buffer, should I create multiple VkBuffers for each shader uniform?

glsl:
#version 450
layout(set = 0, binding = 0) uniform mat4 MyMatrix;
void main() {
}
If multiple frame buffers are in progress, multiple frame buffers may be submitted to the queue at the same time, for example:
while(!isClose) {
...
VkSubmitInfo submitInfo = {};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
std::vector<VkCommandBuffer> cs = {commandBuffers[imageIndex]}; // bind different frame buffer
...
vkQueueSubmit(graphicsQueue, 1, &submitInfo, inFlightFences[currentFrame]);
...
currentFrame = (currentFrame + 1) % MAX_FRAMES_IN_FLIGHT;
...
}
MyMatrix needs to update every frame. If MyMatrix uses the same VkBuffer, every frame of data may be overwritten? So should I create multiple VkBuffers for MyMatrix?
Multiple buffers is probably overkill - it's more usual to allocate a single large uniform buffer and then sub-allocate from that for each draw call batch. You can then just use offsets into that buffer for each draw, rather than needing an entire new buffer each time.
In general offsets are less expensive for hardware to handle than entirely new buffer descriptors.

Vulkan Device - Host - Device synchronization with VkEvent

I'm trying to synchronize a host stage into my pipeline, where I basically edit some data on the host during the execution of a command buffer on the device. From reading the specification I think I'm doing the correct synchronization, execution/memory dependencies and availability/visibility operations, but it neither works on NV nor AMD hardware. Is this even possible? If so, what am I doing wrong in terms of synchronization?
In summary I'm doing the following:
[D] A device buffer (VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT) is copied to a host visible and coherent one (VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT).
[D] The first event is set.
[D] The second event is waited for.
[H] Meanwhile the host waits for the first event.
[H] After it has been set, it increments the numbers in the host visible buffer.
[H] Then it sets the second event.
[D] The device then continues to copy the host visible buffer back to the device local buffer.
What happens?
On NV the first part works, the correct data arrives at the host side, but the altered data never arrives at the device side. On AMD not even the first part works and I already don't get the data on the host.
Command buffer recording:
// ...
VkMemoryBarrier barrier = {};
barrier.sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER;
barrier.srcAccessMask = ...;
barrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
vkCmdPipelineBarrier(command_buffer, ..., VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 1, &barrier, 0, nullptr, 0, nullptr);
copyWholeBuffer(command_buffer, host_buffer, device_buffer);
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
barrier.dstAccessMask = VK_ACCESS_HOST_READ_BIT;
vkCmdPipelineBarrier(command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_HOST_BIT, 0, 1, &barrier, 0, nullptr, 0, nullptr);
vkCmdSetEvent(command_buffer, device_to_host_sync_event, VK_PIPELINE_STAGE_TRANSFER_BIT);
barrier.srcAccessMask = VK_ACCESS_HOST_WRITE_BIT;
barrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
vkCmdWaitEvents(command_buffer, 1, &host_to_device_sync_event, VK_PIPELINE_STAGE_HOST_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 1, &barrier, 0, nullptr, 0, nullptr);
copyWholeBuffer(command_buffer, device_buffer, host_buffer);
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
barrier.dstAccessMask = ...;
vkCmdPipelineBarrier(command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT, ..., 0, 1, &barrier, 0, nullptr, 0, nullptr);
// ...
Execution
vkQueueSubmit(queue, 1, &submitInfo, VK_NULL_HANDLE);
while(vkGetEventStatus(device, device_to_host_sync_event) != VK_EVENT_SET)
std::this_thread::sleep_for(std::chrono::microseconds(10));
void* data;
vkMapMemory(device, host_buffer, 0, BUFFER_SIZE, 0, &data);
// read and write parts of the memory
vkUnmapMemory(device, host_buffer);
vkSetEvent(device, host_to_device_sync_event);
vkDeviceWaitIdle(device);
I've uploaded a working example: https://gist.github.com/neXyon/859b2e52bac9a5a56b804d8a9d5fa4a5
The interesting bits start at line 292! Please have a look if it works for you?
I opened an issue on github: https://github.com/KhronosGroup/Vulkan-Docs/issues/755
After a bit of discussion there, the conclusion is that Device to Host synchronization is not possible with an event and a fence has to be used.

How to change from one musicSequence to another without time delay

I'm playing a MIDI sequence via MusicPlayer which I loaded from a MIDI file and I want to change the sequence to another while playback.
When I try this:
MusicPlayerSetSequence(_player, sequence);
MusicSequenceSetAUGraph(sequence, _processingGraph);
it stops the playback. So I start it back again and set the time with
MusicPlayerSetTime(_player, currentTime);
so it plays again where the previous sequence stopped, but there is a little delay.
I've tried to add the time interval to currentTime, which I got by obtaining the time before stopping and after starting again. But there is still a delay.
I was wondering if there is an alternative to stopping -> changing sequence -> starting again.
You definitely need to manage the AUSamplers if you are adding and removing tracks or switching sequences. It probably is cleaner to dispose of the AUSampler and create a new one for each new track but it is also possible to 'recycle' AUSamplers but that means you will need to keep track of them.
Managing AUSamplers means that when you are no longer using an instance of one (for example if you delete or replace a MusicTrack), you need to disconnect it from the AUMixer instance, remove it from the AUGraph instance, and then update the AUGraph.
There are lots of ways to handle all this. For convenience in keeping track of AUSampler instances' bus number, sound font loaded and some other stuff, I use a subClass of NSObject named SamplerAudioUnitto contain all the needed properties and methods. Same for MusicTracks - I have a Track class - but this may not be needed in your project.
The gist though is that AUSamplers need to be managed for performance and memory. If an instance is no longer being used it should be removed and the AUMixer bus input freed up.
BTW - I check the docs and there is apparently no technical limit to the number of mixer busses - but the number does need to be specified.
// this is not cut and paste code - just an example of managing the AUSampler instance
- (OSStatus)deleteTrack:(Track*) trackObj
{
OSStatus result = noErr;
// turn off MP if playing
BOOL MPstate = [self isPlaying];
if (MPstate){
MusicPlayerStop(player);
}
//-disconnect node from mixer + update list of mixer buses
SamplerAudioUnit * samplerObj = trackObj.sampler;
UInt32 busNumber = samplerObj.busNumber;
result = AUGraphDisconnectNodeInput(graph, mixerNode, busNumber);
if (result) {[self printErrorMessage: #"AUGraphDisconnectNodeInput" withStatus: result];}
[self clearMixerBusState: busNumber]; // routine that keeps track of available busses
result = MusicSequenceDisposeTrack(sequence, trackObj.track);
if (result) {[self printErrorMessage: #"MusicSequenceDisposeTrack" withStatus: result];}
// remove AUSampler node
result = AUGraphRemoveNode(graph, samplerObj.samplerNode);
if (result) {[self printErrorMessage: #"AUGraphRemoveNode" withStatus: result];}
result = AUGraphUpdate(graph, NULL);
if (result) {[self printErrorMessage: #"AUGraphUpdate" withStatus: result];}
samplerObj = nil;
trackObj = nil;
if (MPstate){
MusicPlayerStart(player);
}
// CAShow(graph);
// CAShow(sequence);
return result;
}
Because
MusicPlayerSetSequence(_player, sequence);
MusicSequenceSetAUGraph(sequence, _processingGraph);
will still cause the player to stop, it is still possible to hear a little break.
So instead of updating the musicSequence, i went ahead and changed the content of the tracks instead, which won't cause any breaks:
MusicTrack currentTrack;
MusicTrack currentTrack2;
MusicSequenceGetIndTrack(musicSequence, 0, &currentTrack);
MusicSequenceGetIndTrack(musicSequence, 1, &currentTrack2);
MusicTrackClear(currentTrack, 0, _trackLen);
MusicTrackClear(currentTrack2, 0, _trackLen);
MusicSequence tmpSequence;
switch (number) {
case 0:
tmpSequence = musicSequence1;
break;
case 1:
tmpSequence = musicSequence2;
break;
case 2:
tmpSequence = musicSequence3;
break;
case 3:
tmpSequence = musicSequence4;
break;
default:
tmpSequence = musicSequence1;
break;
}
MusicTrack tmpTrack;
MusicTrack tmpTrack2;
MusicSequenceGetIndTrack(tmpSequence, 0, &tmpTrack);
MusicSequenceGetIndTrack(tmpSequence, 1, &tmpTrack2);
MusicTimeStamp trackLen = 0;
UInt32 trackLenLenLen = sizeof(trackLen);
MusicTrackGetProperty(tmpTrack, kSequenceTrackProperty_TrackLength, &trackLen, &trackLenLenLen);
_trackLen = trackLen;
MusicTrackCopyInsert(tmpTrack, 0, _trackLen, currentTrack, 0);
MusicTrackCopyInsert(tmpTrack2, 0, _trackLen, currentTrack2, 0);
No disconnection of nodes, no updating the graph, no stopping the player.