Does disabling depth test in Vulkan also disable depth write? - vulkan

The source of doubt in regard to this is that in OpenGL if you disable depth depth it also disables depth write. However I thought in Vulkan it doesn't (They are separate):
VkPipelineDepthStencilStateCreateInfo::depthWriteEnable;
VkPipelineDepthStencilStateCreateInfo::depthTestEnable;
I just want to confirm this as I can't find it in the docs that disabling one disables the other, I think they are separate.

Depth write can only be enabled, when depth test is also enabled. If depth test is disabled then depth writes are also disabled, regardless of the value of VkPipelineDepthStencilStateCreateInfo::depthWriteEnable.
See: https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/VkPipelineDepthStencilStateCreateInfo.html#_members
depthWriteEnable controls whether depth writes are enabled when depthTestEnable is VK_TRUE. Depth writes are always disabled when depthTestEnable is VK_FALSE.

Related

When does Image Layout Transition happen when no source or destination stage is specified

Based on the specs https://registry.khronos.org/vulkan/specs/1.3-khr-extensions/pdf/vkspec.pdf
It says "When a layout transition is specified in a
memory dependency, it happens-after the availability operations in the memory dependency, and happens-before the visibility operations"
As we know when calling vkCmdPipelineBarrier(layout1, layout2, ...), the srcAccessMask is the availability operation and the dstAccessMask is the visibility operation. I wonder if I set both of them to 0, which means there is no availability operation and no visibility operation in this memory dependency, will there be any layout transition from layout1 to layout2 actually happening after the barrier call in this case?
To answer my own question:
Based on the specs, the queue present call will execute the visibility operation in this case.
https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/vkQueuePresentKHR.html
"Any writes to memory backing the images referenced by the pImageIndices and pSwapchains members of pPresentInfo, that are available before vkQueuePresentKHR is executed, are automatically made visible to the read access performed by the presentation engine. This automatic visibility operation for an image happens-after the semaphore signal operation, and happens-before the presentation engine accesses the image."
So we only need to write the src access to register an availability operation and leave the dst access 0(the visibility operation) to the present call.
There is nothing in the standard which says that the availability operations that are part of a memory dependency are optional. They always happen as part of executing the memory dependency. They can be empty and make nothing available, but it still always happens. The same goes for visibility.
So the layout transition happens regardless of what is available or visible. This is useful as any writes being consumed may have been made available before now by some other operation.
The bigger issue is the lack of visibility for operations down the line. A layout transition is ultimately a write operation. So processes that need to use that image need to see the image in the new layout. So it needs to be visible to them. So if you do this, you will likely see your validation layers complain later about some form of memory hazard.

Flags getting randomly set in RCC(Reset and Clock Control register) on Power On

I am working on MM32Spin05 MCU. After power on, all 6 flags in the RCC(Reset and Clock Control) register are getting set.
After a reset, the default value of this register should be 0X XC000000.
But I am observing it as 0X FC000000.
I am not doing anything with respect to the watchdog timers, or the low power module or the s/w reset.
I have a requirement, that, if a software reset is done, a certain page in the flash memory is to be cleared. But on boot up, the flag is set for reasons unknown to me and hence, the flash memory page is getting cleared.
I am actually doing a Power Reset. I am turning off and then turning on the power supply to the MCU. On boot-up, the Software Reset flag is set and hence, according to my code, it is triggering the flash memory page erase. The Flash memory page should be erased ONLY on a software reset, not a power reset. Immediately after the MCU boots up, I print the RCC_CSR Register value, and see that all 6 flags are set.
LPWRRSTF: Low power reset flag
WDGRSTF: Window watchdog reset flag
IWDGRSTF: Independent watchdog reset flag
SFTRSTF: Software reset flag
PORRSTF: POR/PDR reset flag
PINRSTF: PIN reset flag
I am confused, as to why, a power reset is causing the software reset flag to set?
I am stuck on this for more than a week and am fully clueless about it. Any help or suggestions would be highly welcome.
Thanks in Advance
As the manual says, the top 4 bits are "don't care" on power-on (or pin) reset, and they can have any value.
Only if none of PORRSTF and PINRSTF are set, the other bits are relevant.
Even then you need to read the manual carefully to understand the conditions of these bits. There might be more to do than a simple single-bit check.

about LRU, How can the operating system know about a memory frame use when it's valid

In virtual memory, The LRU algorithm is about swapping out page frame which has been the least recently used, but how can the OS know about a frame use since it wont be raising any exception when it is valid ?
Are there use counters of some sort in the MMU or do OSes make regular check by invalidating pages without really swapping them out, only in order to make statistics on the page faults ?
I suppose it might be possible to find the information with Google, but I didn't find the right way to ask it:-/

Impossible to acquire and present in parallel with rendering?

Note: I'm self-learning Vulkan with little knowledge of modern OpenGL.
Reading the Vulkan specifications, I can see very nice semaphores that allow the command buffer and the swapchain to synchronize. Here's what I understand to be a simple (yet I think inefficient) way of doing things:
Get image with vkAcquireNextImageKHR, signalling sem_post_acq
Build command buffer (or use pre-built) with:
Image barrier to transition image away from VK_IMAGE_LAYOUT_UNDEFINED
render
Image barrier to transition image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
Submit to queue, waiting on sem_post_acq on fragment stage and signalling sem_pre_present.
vkQueuePresentKHR waiting on sem_pre_present.
The problem here is that the image barriers in the command buffer must know which image they are transitioning, which means that vkAcquireNextImageKHR must return before one knows how to build the command buffer (or which pre-built command buffer to submit). But vkAcquireNextImageKHR could potentially sleep a lot (because the presentation engine is busy and there are no free images). On the other hand, the submission of the command buffer is costly itself, and more importantly, all stages before fragment can run without having any knowledge of which image the final result will be rendered to.
Theoretically, it seems to me that a scheme like the following would allow a higher degree of parallelism:
Build command buffer (or use pre-built) with:
Image barrier to transition image away from VK_IMAGE_LAYOUT_UNDEFINED
render
Image barrier to transition image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
Submit to queue, waiting on sem_post_acq on fragment stage and signalling sem_pre_present.
Get image with vkAcquireNextImageKHR, signalling sem_post_acq
vkQueuePresentKHR waiting on sem_pre_present.
Which would, again theoretically, allow the pipeline to execute all the way up to the fragment shader, while we wait for vkAcquireNextImageKHR. The only reason this doesn't work is that it is neither possible to tell the command buffer that this image will be determined later (with proper synchronization), nor is it possible to ask the presentation engine for a specific image.
My first question is: is my analysis correct? If so, is such an optimization not possible in Vulkan at all and why not?
My second question is: wouldn't it have made more sense if you could tell vkAcquireNextImageKHR which particular image you want to acquire, and iterate through them yourself? That way, you could know in advance which image you are going to ask for, and build and submit your command buffer accordingly.
Like Nicol said you can record secondaries independent of which image it will be rendering to.
However you can take it a step further and record command buffers for all swpachain images in advance and select the correct one to submit from the image acquired.
This type of reuse does take some extra consideration into account because all memory ranges used are baked into the command buffer. But in many situations the required render commands don't actually change frame one frame to the next, only a little bit of the data used.
So the sequence of such a frame would be:
vkAcquireNextImageKHR(vk.dev, vk.swap, 0, vk.acquire, VK_NULL_HANDLE, &vk.image_ind);
vkWaitForFences(vk.dev, 1, &vk.fences[vk.image_ind], true, ~0);
engine_update_render_data(vk.mapped_staging[vk.image_ind]);
VkSubmitInfo submit = build_submit(vk.acquire, vk.rend_cmd[vk.image_ind], vk.present);
vkQueueSubmit(vk.rend_queue, 1, &submit, vk.fences[vk.image_ind]);
VkPresentInfoKHR present = build_present(vk.present, vk.swap, vk.image_ind);
vkQueuePresentKHR(vk.queue, &present);
Granted this does not allow for conditional rendering but the gpu is in general fast enough to allow some geometry to be rendered out of frame without any noticeable delays. So until the player reaches a loading zone where new geometry has to be displayed you can keep those command buffers alive.
Your entire question is predicated on the assumption that you cannot do any command buffer building work without a specific swapchain image. That's not true at all.
First, you can always build secondary command buffers; providing a VkFramebuffer is merely a courtesy, not a requirement. And this is very important if you want to use Vulkan to improve CPU performance. After all, being able to build command buffers in parallel is one of the selling points of Vulkan. For you to only be creating one is something of a waste for a performance-conscious application.
In such a case, only the primary command buffer needs the actual image.
Second, who says that you will be doing the majority of your rendering to the presentable image? If you're doing deferred rendering, most of your stuff will be written to deferred buffers. Even post-processing effects like tone-mapping, SSAO, and so forth will probably be done to an intermediate buffer.
Worst-case scenario, you can always render to your own image. Then you build a command buffer who's only contents is an image copy from your image to the presentable one.
all stages before fragment can run without having any knowledge of which image the final result will be rendered to.
You assume that the hardware has a strict separation between vertex processing and rasterization. This is true only for tile-based hardware.
Direct renderers just execute the whole pipeline, top to bottom, for each rendering command. They don't store post-transformed vertex data in large buffers. It just flows down to the next step. So if the "fragment stage" has to wait on a semaphore, then you can assume that all other stages will be idle as well while waiting.
wouldn't it have made more sense if you could tell vkAcquireNextImageKHR which particular image you want to acquire, and iterate through them yourself?
No. The implementation would be unable to decide which image to give you next. This is precisely why you have to ask for an image: so that the implementation can figure out on its own which image it is safe for you to have.
Also, there's specific language in the specification that the semaphore and/or event you provide must not only be unsignaled but there cannot be any outstanding operations waiting on them. Why?
Because vkAcquireNextImageKHR can fail. If you have some operation in a queue that's waiting on a semaphore that's never going to fire, that will cause huge problems. You have to successfully acquire first, then submit work that is based on the semaphore.
Generally speaking, if you're regularly having trouble getting presentable images in a timely fashion, you need to make your swapchain longer. That's the point of having multiple buffers, after all.

Why is rising edge preferred over falling edge

Flip-Flops(,Registers ...) are usually triggered by a rising or falling edge. But mostly in code you see an if-clause which uses the rising edge triggering. In fact i never saw a code with falling edge.
Why is that? Is it because naturally the programmers use rising edge, because they are used to, or is it because of some physical/analog law/fact, where the rising edge programming is faster/simpler/energy-efficient/... ?
As zennehoy says, it's convention - but one going back to when logic was done in discrete chips with a few gates or flipflops within them. Those packages of flipflops were always rising-edge triggered...as far as I recall, but maybe someone with better recollection of the yellow books will correct me!
So when synthesis came along, no doubt everyone felt comfortable carrying on that way!
Nothing more than a matter of convention.
Using the rising edge is more common, and most component libraries use the rising edge. This means that using those libraries requires you to also use rising edges, or add clock synchronization logic, or keep your paths so short that the delay is less than half a clock cycle. Just using rising edges everywhere is by far the easiest.
When you design a (single-edge) DFF in a chip, you must choose at which (rising or falling) clock edge it will operate. This decision is independent from the implementation approach (i.e., master-slave or pulsed-latch), and it does not alter the number of transistors in the DFF itself.
Since positive-edge is the typical default (as in FPGAs), to operate at the negative clock edge the usual procedure is to simply use a positive-edge DFF with an inverted version of the clock signal connected to its clock port. If this is done locally (near the DFF clock port), then two extra transistors are indeed needed (to build a CMOS inverter for the clock).
it is somewhat a matter of convention but if you look at the design of falling versus rising edge, there is only a difference of an added inverter, and it turned out to be 2 transistors less on rising edge
but there are designs out there that use both, for example in some data caches you write on rising edge and read on falling edge, or vice-versa depending on design choices!
good question, and try it out or take a course(maybe online) on digital integrated circuits