Vulkan: VK_PRESENT_MODE_MAILBOX_KHR with two images equivalent to VK_PRESENT_MODE_FIFO_KHR? - vulkan

I wrote some Vulkan code and I think I am hitting some driver bugs (Linux, mesa 13, intel). The driver only offers VK_PRESENT_MODE_MAILBOX_KHR (spec violation). I was under the impression that if I create my swap chain with an imageCount of 2, the resulting behavior should be equivalent to VK_PRESENT_MODE_FIFO_KHR.
My reasoning is that one image is being presented, so the swap chain will only give me an image and signal its availability (vkAcquireNextImageKHR with semaphore) if the other one was submitted. Then it would swap out which image is being presented on the next vblank completion.
However, I get very high framerates, so it is clear that not all images are actually presented.
Is it possible that the present engine does some kind of blit to internal memory and releases the image practically immediately?

Turns out I missed the fact that the image count you provide when creating a swapchain is a minimum. So the intel driver advertises that it wants at least 2 images, but will create 4 or more anyway, no matter what you tell it. How odd.

Related

How do I know when Vulkan isn't using memory anymore so I can overwrite it / reuse it?

When working with Vulkan it's common that when creating a buffer, such as a uniform buffer, that you create multiple (buffers 'versions'), because if you have double buffering for example you don't know if the graphics API is still drawing the last frame (using the memory you bound and instructed it to use the last loop). I've seen this happen with uniform buffers but not vertex or index buffers or image/texture buffers. Is this because uniform buffers are updated regularly and vertex buffers or images are not?
If you wanted to update an image or a vertex buffer how would you go about it given that you don't know whether the graphics API is still using it? Do you simply reallocate new memory for that image/buffer and start anew? Even if you just want to update a section of it? And if this is the case that you allocate a new buffer, when would you know to release the old buffer? Would say, for example 5 frames into the future be OK? Or 2 seconds? After all, it could still be being used. How is this done?
given that you don't know whether the graphics API is still using it?
But you do know.
Vulkan doesn't arbitrarily use resources. It uses them exactly and only how your code tells it to use the resource. You created and submitted the commands that use those resources, so if you need to know when a resource is in use, it is you who must keep track of it and manage this.
You have to use API synchronization functions to follow the GPU's execution of commands.
If an action command uses some set of resources, then those resources are in use while that command is being executed. You have tools like events which can be used to stop subsequent commands from executing until some prior commands have finished. And events can tell when a particular command has finished, so that you'll know when those resources are no longer in use.
Semaphores have similar powers, but at the level of a batch of work. If a semaphore is signaled, then all of the commands in the batch that signaled it have completed and are no longer using the resources they use. Fences can be used for extremely coarse synchronization, at the level of a submit command.
You multi-buffer uniform data because the nature of uniform data is such that it typically needs to change every frame. If you have vertex buffers or images to change every frame, then you'll need to do the same thing with those.
For infrequent changes, you may want to have extra memory available so that you can just create new images or buffers, then delete the old ones when the memory is no longer in use. Or you may have to stall the CPU until the GPU has finished using those resources.

Do I need to create all the surfaces before creating the device?

just a quick question here...
So, as you know, when you create a Vulkan device, you need to make sure the physical device you chose supports presenting to a surface with vkGetPhysicalDeviceSurfaceSupportKHR() right?, that means, you need to create the surface before creating the device.
Now lets say that at run-time, the user may press a button which makes a new window open, and stuff is going to be drawn to that window, so you need a new surface right?, but the device has already been created...
Does this mean I have to create all the surfaces before I create the device or do I have to recreate the device, but, if need to recreate it, what happens with all the stuff that has been created/allocated from that device?
Does this mean I have to create all the surfaces before I create the device or do I have to recreate the device
Neither.
If the physical device cannot draw to the surface... then you need to find a physical device which can. This could happen if you have 2 GPUs, and each one is plugged into a different monitor. Each GPU can only draw to surfaces that are on their monitor (though sometimes there are ways for implementations to get around this).
So if the physical device for the logial VkDevice you're using cannot draw to the surface, you don't "recreate" the device. You create a new device, one which is almost certainly unable to draw to the surface that the old device could draw to. So in this case, you'd need 2 separate devices to render to the two surfaces.
But for most multi-monitor cases this isn't an issue. If you have a single GPU with multi-monitor output support, then any windows you create will almost certainly be compatible with that GPU. Integrated GPU + discrete GPU cases also tend to support the same surfaces.
The Vulkan API simply requires that you check to see if there is an incompatibility, and then deal with it however you can. Which could involve moving the window to the proper monitor or other OS-specific things.

Impossible to acquire and present in parallel with rendering?

Note: I'm self-learning Vulkan with little knowledge of modern OpenGL.
Reading the Vulkan specifications, I can see very nice semaphores that allow the command buffer and the swapchain to synchronize. Here's what I understand to be a simple (yet I think inefficient) way of doing things:
Get image with vkAcquireNextImageKHR, signalling sem_post_acq
Build command buffer (or use pre-built) with:
Image barrier to transition image away from VK_IMAGE_LAYOUT_UNDEFINED
render
Image barrier to transition image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
Submit to queue, waiting on sem_post_acq on fragment stage and signalling sem_pre_present.
vkQueuePresentKHR waiting on sem_pre_present.
The problem here is that the image barriers in the command buffer must know which image they are transitioning, which means that vkAcquireNextImageKHR must return before one knows how to build the command buffer (or which pre-built command buffer to submit). But vkAcquireNextImageKHR could potentially sleep a lot (because the presentation engine is busy and there are no free images). On the other hand, the submission of the command buffer is costly itself, and more importantly, all stages before fragment can run without having any knowledge of which image the final result will be rendered to.
Theoretically, it seems to me that a scheme like the following would allow a higher degree of parallelism:
Build command buffer (or use pre-built) with:
Image barrier to transition image away from VK_IMAGE_LAYOUT_UNDEFINED
render
Image barrier to transition image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
Submit to queue, waiting on sem_post_acq on fragment stage and signalling sem_pre_present.
Get image with vkAcquireNextImageKHR, signalling sem_post_acq
vkQueuePresentKHR waiting on sem_pre_present.
Which would, again theoretically, allow the pipeline to execute all the way up to the fragment shader, while we wait for vkAcquireNextImageKHR. The only reason this doesn't work is that it is neither possible to tell the command buffer that this image will be determined later (with proper synchronization), nor is it possible to ask the presentation engine for a specific image.
My first question is: is my analysis correct? If so, is such an optimization not possible in Vulkan at all and why not?
My second question is: wouldn't it have made more sense if you could tell vkAcquireNextImageKHR which particular image you want to acquire, and iterate through them yourself? That way, you could know in advance which image you are going to ask for, and build and submit your command buffer accordingly.
Like Nicol said you can record secondaries independent of which image it will be rendering to.
However you can take it a step further and record command buffers for all swpachain images in advance and select the correct one to submit from the image acquired.
This type of reuse does take some extra consideration into account because all memory ranges used are baked into the command buffer. But in many situations the required render commands don't actually change frame one frame to the next, only a little bit of the data used.
So the sequence of such a frame would be:
vkAcquireNextImageKHR(vk.dev, vk.swap, 0, vk.acquire, VK_NULL_HANDLE, &vk.image_ind);
vkWaitForFences(vk.dev, 1, &vk.fences[vk.image_ind], true, ~0);
engine_update_render_data(vk.mapped_staging[vk.image_ind]);
VkSubmitInfo submit = build_submit(vk.acquire, vk.rend_cmd[vk.image_ind], vk.present);
vkQueueSubmit(vk.rend_queue, 1, &submit, vk.fences[vk.image_ind]);
VkPresentInfoKHR present = build_present(vk.present, vk.swap, vk.image_ind);
vkQueuePresentKHR(vk.queue, &present);
Granted this does not allow for conditional rendering but the gpu is in general fast enough to allow some geometry to be rendered out of frame without any noticeable delays. So until the player reaches a loading zone where new geometry has to be displayed you can keep those command buffers alive.
Your entire question is predicated on the assumption that you cannot do any command buffer building work without a specific swapchain image. That's not true at all.
First, you can always build secondary command buffers; providing a VkFramebuffer is merely a courtesy, not a requirement. And this is very important if you want to use Vulkan to improve CPU performance. After all, being able to build command buffers in parallel is one of the selling points of Vulkan. For you to only be creating one is something of a waste for a performance-conscious application.
In such a case, only the primary command buffer needs the actual image.
Second, who says that you will be doing the majority of your rendering to the presentable image? If you're doing deferred rendering, most of your stuff will be written to deferred buffers. Even post-processing effects like tone-mapping, SSAO, and so forth will probably be done to an intermediate buffer.
Worst-case scenario, you can always render to your own image. Then you build a command buffer who's only contents is an image copy from your image to the presentable one.
all stages before fragment can run without having any knowledge of which image the final result will be rendered to.
You assume that the hardware has a strict separation between vertex processing and rasterization. This is true only for tile-based hardware.
Direct renderers just execute the whole pipeline, top to bottom, for each rendering command. They don't store post-transformed vertex data in large buffers. It just flows down to the next step. So if the "fragment stage" has to wait on a semaphore, then you can assume that all other stages will be idle as well while waiting.
wouldn't it have made more sense if you could tell vkAcquireNextImageKHR which particular image you want to acquire, and iterate through them yourself?
No. The implementation would be unable to decide which image to give you next. This is precisely why you have to ask for an image: so that the implementation can figure out on its own which image it is safe for you to have.
Also, there's specific language in the specification that the semaphore and/or event you provide must not only be unsignaled but there cannot be any outstanding operations waiting on them. Why?
Because vkAcquireNextImageKHR can fail. If you have some operation in a queue that's waiting on a semaphore that's never going to fire, that will cause huge problems. You have to successfully acquire first, then submit work that is based on the semaphore.
Generally speaking, if you're regularly having trouble getting presentable images in a timely fashion, you need to make your swapchain longer. That's the point of having multiple buffers, after all.

Check Device Type For GPUImage

GPUImage requires, for iPhone 4 and below, images smaller than 2048 pixels. The 4S and above can handle much larger. How can I check to see which device my app is currently running on? I haven't found anything in UIDevice that does what I'm looking for. Any suggestions/workarounds?
For this, you don't need to check device type, you simply need to read the maximum texture size supported by the device. Luckily, there is a built-in method within GPUImage that does this for you:
GLint maxTextureSize = [GPUImageContext maximumTextureSizeForThisDevice];
The above will give you the maximum texture size for the device you're running this on. That will determine the largest image size that GPUImage can work with on that device, and should be future-proof against whatever iOS devices come next.
This method works by caching the results of this OpenGL ES query:
glGetIntegerv(GL_MAX_TEXTURE_SIZE, &maxTextureSize);
if you're curious.
I should also note that you can provide images larger than the max texture size to the framework, but they get scaled down to the largest size supported by the GPU before processing. At some point, I may complete my plan for tiling subsections of these images in processing so that larger images can be supported natively. That's a ways off, though.
This is among the best device-detection libraries I've come across: https://github.com/erica/uidevice-extension
EDIT: Although the readme seems to suggest that the more up-to-date versions are in her "Cookbook" sources. Perhaps this one is more current.
Here is a useful class that I have used several times in the past that is very simple and easy to implement,
https://gist.github.com/Jaybles/1323251

How to modify DirectX camera

Suppose I have a 3D (but not stereoscopic) DirectX game or program. Is there a way for a second program (or a driver) to change the camera position in the game?
I'm trying to build a head-tracking plugin or driver that I can use for my DirectX games/programs. An inertial motion sensor will give me the position of my head but my problem is using that position data to change the camera position, not with the hardware/math concerns of head tracking.
I haven't been able to find anything on how to do this so far, but iZ3D was able to create two cameras near the original camera and use it for stereoscopic stuff, so I know there exists some hook/link/connection into DirectX that makes camera manipulation by a second program possible.
If I am able to get this to work I'll release the code.
-Shane
Hooking Direct3D calls in its nature is just hooking DLL calls. I.e. its not something special to D3D but just a generic technique. Try googling for "hook dll" or start from here: [C++] Direct3D hooking sample. As it always happens with hooks there are many caveats and you'll have to make a pretty huge boilerplate to satisfy all needs of the hooked application.
Though, manipulation with camera in games usually gives not good results. There are at least two key features of modern PC game which will severely limit your idea:
Pre-clipping. Almost any game engine filters out objects that are behind the viewing plane. So when you rotate camera to a side you won't see the objects you'd expect to see in a real world - they were just not sent to D3D since game doesn't know that viewing plane has changed.
Multiple passes rendering. Many popular post processing effects are done in extra passes (either thru the whole scene or just part of it). Mirrors and "screens" are the most known such effects. Without knowing what camera you're manipulating with you'll most likely just break the scene.
Btw, #2 is the reason why stereoscopic mode is not 100% compatible with all games. For example, in Source engine HDR scenes are rendered in three passes and if you don't know how to distinguish them you'll do nothing but break the game. Take a look at how nVidia implements their stereoscopic mode: they make a separate hook for every popular game and even with this approach it's not always possible to get expected result.