Swap chains for windows covering multiple monitors - rendering

I'm currently developing multi-monitor DX11 app and I ran into a very specific problem. When creating a swap chain for a window, a window handle and a pointer to device object should be passed, both parameters are required to be non-NULL. But when a window covers two monitors connected to different devices, pointer to exactly what device should be passed? Or should I create swap chains for each monitor in order to perform rendering of window parts?
I'm aware that in windowed mode, DWM performs final merging of swap chains back buffers into the real back buffer of its very own swap chain. But I can't understand how to perform rendering to a window that can be dragged from monitor to another monitor and back.
On the other hand I do understand that swap chain buffers are located into device memory so device must be specified when creating a swap chain. Window handle is required too because rendering is performed to a window. The problem is that I can't understand what exactly device must be used in case of a window spanning two monitors and, if I should create swap chains for each monitor, should I merge rendering results from all swap chains?
Thank you!

In general, DWM makes it work. For your window you can create a swapchain on any device, and DWM will composite it. However it is possible there will be a performance drop when your window moves from a monitor connected to the adapter on which the windows' swapchain was created (most efficient) to a monitor connected to another adapter (less efficient, more copies).
Also, the window cannot go fullscreen on a monitor connected to an adapter different from the one on which the windows' swapchain was created.
Perhaps for maximum perfomance you need to have one device per adapter, and juggle your rendering from one the another depending on where the window sits. But I have no experience with that. (Also, those adapters may have very different performance profiles to the point that copying be less expensive than rendering on the slow adapter.)

Related

Do I need to create all the surfaces before creating the device?

just a quick question here...
So, as you know, when you create a Vulkan device, you need to make sure the physical device you chose supports presenting to a surface with vkGetPhysicalDeviceSurfaceSupportKHR() right?, that means, you need to create the surface before creating the device.
Now lets say that at run-time, the user may press a button which makes a new window open, and stuff is going to be drawn to that window, so you need a new surface right?, but the device has already been created...
Does this mean I have to create all the surfaces before I create the device or do I have to recreate the device, but, if need to recreate it, what happens with all the stuff that has been created/allocated from that device?
Does this mean I have to create all the surfaces before I create the device or do I have to recreate the device
Neither.
If the physical device cannot draw to the surface... then you need to find a physical device which can. This could happen if you have 2 GPUs, and each one is plugged into a different monitor. Each GPU can only draw to surfaces that are on their monitor (though sometimes there are ways for implementations to get around this).
So if the physical device for the logial VkDevice you're using cannot draw to the surface, you don't "recreate" the device. You create a new device, one which is almost certainly unable to draw to the surface that the old device could draw to. So in this case, you'd need 2 separate devices to render to the two surfaces.
But for most multi-monitor cases this isn't an issue. If you have a single GPU with multi-monitor output support, then any windows you create will almost certainly be compatible with that GPU. Integrated GPU + discrete GPU cases also tend to support the same surfaces.
The Vulkan API simply requires that you check to see if there is an incompatibility, and then deal with it however you can. Which could involve moving the window to the proper monitor or other OS-specific things.

Fast EELS acquisation

To acquire EELS, I used these below,
img:=camera.cm_acquire(procType,exp,binX, binY,tp,lf,bt,rt)
imgSP:=img.verticalSum() //this is a custom function to do vertical sum
and this,
imgSP:=EELSAcquireSpectrum(exp, nFrames, binX, binY, processing)
When using either one in my customized 2D mapping, they are much slower than the "spectrum Imaging" from Gatan. (The first one is faster than the 2nd one). Is the lack of speed the natural limitation with scripting? or there are better function calls?
Yes, the lack of speed is a limitation of the scripting giving you only access to the camera in single read mode. I.e. one command initiates the camera, exposes it, reads it out and returns the image.
In SpectrumImaging the camera is run in continuous mode, i.e. the same as if you have the live view running. The cameras is constantly exposed and reads out (with shutter, depending on the type of camera). This mode of camera acquisition is available as camera script command from GMS 3.4.0 onward.

Vulkan: VK_PRESENT_MODE_MAILBOX_KHR with two images equivalent to VK_PRESENT_MODE_FIFO_KHR?

I wrote some Vulkan code and I think I am hitting some driver bugs (Linux, mesa 13, intel). The driver only offers VK_PRESENT_MODE_MAILBOX_KHR (spec violation). I was under the impression that if I create my swap chain with an imageCount of 2, the resulting behavior should be equivalent to VK_PRESENT_MODE_FIFO_KHR.
My reasoning is that one image is being presented, so the swap chain will only give me an image and signal its availability (vkAcquireNextImageKHR with semaphore) if the other one was submitted. Then it would swap out which image is being presented on the next vblank completion.
However, I get very high framerates, so it is clear that not all images are actually presented.
Is it possible that the present engine does some kind of blit to internal memory and releases the image practically immediately?
Turns out I missed the fact that the image count you provide when creating a swapchain is a minimum. So the intel driver advertises that it wants at least 2 images, but will create 4 or more anyway, no matter what you tell it. How odd.

Impossible to acquire and present in parallel with rendering?

Note: I'm self-learning Vulkan with little knowledge of modern OpenGL.
Reading the Vulkan specifications, I can see very nice semaphores that allow the command buffer and the swapchain to synchronize. Here's what I understand to be a simple (yet I think inefficient) way of doing things:
Get image with vkAcquireNextImageKHR, signalling sem_post_acq
Build command buffer (or use pre-built) with:
Image barrier to transition image away from VK_IMAGE_LAYOUT_UNDEFINED
render
Image barrier to transition image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
Submit to queue, waiting on sem_post_acq on fragment stage and signalling sem_pre_present.
vkQueuePresentKHR waiting on sem_pre_present.
The problem here is that the image barriers in the command buffer must know which image they are transitioning, which means that vkAcquireNextImageKHR must return before one knows how to build the command buffer (or which pre-built command buffer to submit). But vkAcquireNextImageKHR could potentially sleep a lot (because the presentation engine is busy and there are no free images). On the other hand, the submission of the command buffer is costly itself, and more importantly, all stages before fragment can run without having any knowledge of which image the final result will be rendered to.
Theoretically, it seems to me that a scheme like the following would allow a higher degree of parallelism:
Build command buffer (or use pre-built) with:
Image barrier to transition image away from VK_IMAGE_LAYOUT_UNDEFINED
render
Image barrier to transition image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
Submit to queue, waiting on sem_post_acq on fragment stage and signalling sem_pre_present.
Get image with vkAcquireNextImageKHR, signalling sem_post_acq
vkQueuePresentKHR waiting on sem_pre_present.
Which would, again theoretically, allow the pipeline to execute all the way up to the fragment shader, while we wait for vkAcquireNextImageKHR. The only reason this doesn't work is that it is neither possible to tell the command buffer that this image will be determined later (with proper synchronization), nor is it possible to ask the presentation engine for a specific image.
My first question is: is my analysis correct? If so, is such an optimization not possible in Vulkan at all and why not?
My second question is: wouldn't it have made more sense if you could tell vkAcquireNextImageKHR which particular image you want to acquire, and iterate through them yourself? That way, you could know in advance which image you are going to ask for, and build and submit your command buffer accordingly.
Like Nicol said you can record secondaries independent of which image it will be rendering to.
However you can take it a step further and record command buffers for all swpachain images in advance and select the correct one to submit from the image acquired.
This type of reuse does take some extra consideration into account because all memory ranges used are baked into the command buffer. But in many situations the required render commands don't actually change frame one frame to the next, only a little bit of the data used.
So the sequence of such a frame would be:
vkAcquireNextImageKHR(vk.dev, vk.swap, 0, vk.acquire, VK_NULL_HANDLE, &vk.image_ind);
vkWaitForFences(vk.dev, 1, &vk.fences[vk.image_ind], true, ~0);
engine_update_render_data(vk.mapped_staging[vk.image_ind]);
VkSubmitInfo submit = build_submit(vk.acquire, vk.rend_cmd[vk.image_ind], vk.present);
vkQueueSubmit(vk.rend_queue, 1, &submit, vk.fences[vk.image_ind]);
VkPresentInfoKHR present = build_present(vk.present, vk.swap, vk.image_ind);
vkQueuePresentKHR(vk.queue, &present);
Granted this does not allow for conditional rendering but the gpu is in general fast enough to allow some geometry to be rendered out of frame without any noticeable delays. So until the player reaches a loading zone where new geometry has to be displayed you can keep those command buffers alive.
Your entire question is predicated on the assumption that you cannot do any command buffer building work without a specific swapchain image. That's not true at all.
First, you can always build secondary command buffers; providing a VkFramebuffer is merely a courtesy, not a requirement. And this is very important if you want to use Vulkan to improve CPU performance. After all, being able to build command buffers in parallel is one of the selling points of Vulkan. For you to only be creating one is something of a waste for a performance-conscious application.
In such a case, only the primary command buffer needs the actual image.
Second, who says that you will be doing the majority of your rendering to the presentable image? If you're doing deferred rendering, most of your stuff will be written to deferred buffers. Even post-processing effects like tone-mapping, SSAO, and so forth will probably be done to an intermediate buffer.
Worst-case scenario, you can always render to your own image. Then you build a command buffer who's only contents is an image copy from your image to the presentable one.
all stages before fragment can run without having any knowledge of which image the final result will be rendered to.
You assume that the hardware has a strict separation between vertex processing and rasterization. This is true only for tile-based hardware.
Direct renderers just execute the whole pipeline, top to bottom, for each rendering command. They don't store post-transformed vertex data in large buffers. It just flows down to the next step. So if the "fragment stage" has to wait on a semaphore, then you can assume that all other stages will be idle as well while waiting.
wouldn't it have made more sense if you could tell vkAcquireNextImageKHR which particular image you want to acquire, and iterate through them yourself?
No. The implementation would be unable to decide which image to give you next. This is precisely why you have to ask for an image: so that the implementation can figure out on its own which image it is safe for you to have.
Also, there's specific language in the specification that the semaphore and/or event you provide must not only be unsignaled but there cannot be any outstanding operations waiting on them. Why?
Because vkAcquireNextImageKHR can fail. If you have some operation in a queue that's waiting on a semaphore that's never going to fire, that will cause huge problems. You have to successfully acquire first, then submit work that is based on the semaphore.
Generally speaking, if you're regularly having trouble getting presentable images in a timely fashion, you need to make your swapchain longer. That's the point of having multiple buffers, after all.

How to modify DirectX camera

Suppose I have a 3D (but not stereoscopic) DirectX game or program. Is there a way for a second program (or a driver) to change the camera position in the game?
I'm trying to build a head-tracking plugin or driver that I can use for my DirectX games/programs. An inertial motion sensor will give me the position of my head but my problem is using that position data to change the camera position, not with the hardware/math concerns of head tracking.
I haven't been able to find anything on how to do this so far, but iZ3D was able to create two cameras near the original camera and use it for stereoscopic stuff, so I know there exists some hook/link/connection into DirectX that makes camera manipulation by a second program possible.
If I am able to get this to work I'll release the code.
-Shane
Hooking Direct3D calls in its nature is just hooking DLL calls. I.e. its not something special to D3D but just a generic technique. Try googling for "hook dll" or start from here: [C++] Direct3D hooking sample. As it always happens with hooks there are many caveats and you'll have to make a pretty huge boilerplate to satisfy all needs of the hooked application.
Though, manipulation with camera in games usually gives not good results. There are at least two key features of modern PC game which will severely limit your idea:
Pre-clipping. Almost any game engine filters out objects that are behind the viewing plane. So when you rotate camera to a side you won't see the objects you'd expect to see in a real world - they were just not sent to D3D since game doesn't know that viewing plane has changed.
Multiple passes rendering. Many popular post processing effects are done in extra passes (either thru the whole scene or just part of it). Mirrors and "screens" are the most known such effects. Without knowing what camera you're manipulating with you'll most likely just break the scene.
Btw, #2 is the reason why stereoscopic mode is not 100% compatible with all games. For example, in Source engine HDR scenes are rendered in three passes and if you don't know how to distinguish them you'll do nothing but break the game. Take a look at how nVidia implements their stereoscopic mode: they make a separate hook for every popular game and even with this approach it's not always possible to get expected result.