I am trying to plot image data to the wave form graph and I'm seeing memory leak. The leak doesn't seem to come from the .NET layer but comes from LabVIEW.
I have such block diagram as below and with this implementation there is a leakage. The memory gets full eventually and the system hangs.
https://yoshidad-gmail.tinytake.com/sf/MjU3Njk1M183NzUyNzI0
If I don't connect the image data (RawImageData) to the graph then there is no leak.
I am puzzled as to why this is happening ?
Thanks.
I would try closing your references, specifically the Frame reference. The way LabVIEW handles images is a little strange and they can act a bit more like actual memory (c/c++) refs than your typical LabVIEW ref.
It's a hunch, but the Frame might be held in memory until the reference is closed, especially since the data from the frame is being used by the WFGraph.
I had created my own swap chain with vkCreateImage by allocating its appropriate memory (VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT). And got the image data after vkQueueSubmit and vkQueueWaitIdle by mapping the memory associated with it.
Due to the advantages of staging buffers, I created the above memory with VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT and did a vkCmdCopyImageToBuffer in Command Buffers but the result is all values 0. But if I just associate a vkCreateBuffer to the above image and do vkCmdCopyBuffer I do get all the rendered image.
Is this an expected behavior that we cannot do vkCmdCopyImageToBuffer unless it's a system swapchain?
Edit 1:
I am rendering to vkCreateImage with memory type VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT. The things is that when I do vkCmdCopyImageToBuffer the image data in this buffer is all 0.
Now when I create a vkCreateBuffer and bind to the above image memory with vkBindBufferMemory. After which I do vkCmdCopyBuffer, I do get the image data. Why does vkCmdCopyImageToBuffer not work ? Is it that since I am allocating memory for image? Because in case of swapchain images where we do not allocate memory, works fine with vkCmdCopyImageToBuffer. Why do I need extra overhead of binding a buffer to my allocated image memory to make this work.
Check that you are paying attention to image layouts. You may need a barrier to transition the layouts appropriately.
You might also turn on the validation layers to see if they catch anything.
(I am confused by the description as well, so sorry for my vague answer.)
Note: I'm self-learning Vulkan with little knowledge of modern OpenGL.
Reading the Vulkan specifications, I can see very nice semaphores that allow the command buffer and the swapchain to synchronize. Here's what I understand to be a simple (yet I think inefficient) way of doing things:
Get image with vkAcquireNextImageKHR, signalling sem_post_acq
Build command buffer (or use pre-built) with:
Image barrier to transition image away from VK_IMAGE_LAYOUT_UNDEFINED
render
Image barrier to transition image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
Submit to queue, waiting on sem_post_acq on fragment stage and signalling sem_pre_present.
vkQueuePresentKHR waiting on sem_pre_present.
The problem here is that the image barriers in the command buffer must know which image they are transitioning, which means that vkAcquireNextImageKHR must return before one knows how to build the command buffer (or which pre-built command buffer to submit). But vkAcquireNextImageKHR could potentially sleep a lot (because the presentation engine is busy and there are no free images). On the other hand, the submission of the command buffer is costly itself, and more importantly, all stages before fragment can run without having any knowledge of which image the final result will be rendered to.
Theoretically, it seems to me that a scheme like the following would allow a higher degree of parallelism:
Build command buffer (or use pre-built) with:
Image barrier to transition image away from VK_IMAGE_LAYOUT_UNDEFINED
render
Image barrier to transition image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
Submit to queue, waiting on sem_post_acq on fragment stage and signalling sem_pre_present.
Get image with vkAcquireNextImageKHR, signalling sem_post_acq
vkQueuePresentKHR waiting on sem_pre_present.
Which would, again theoretically, allow the pipeline to execute all the way up to the fragment shader, while we wait for vkAcquireNextImageKHR. The only reason this doesn't work is that it is neither possible to tell the command buffer that this image will be determined later (with proper synchronization), nor is it possible to ask the presentation engine for a specific image.
My first question is: is my analysis correct? If so, is such an optimization not possible in Vulkan at all and why not?
My second question is: wouldn't it have made more sense if you could tell vkAcquireNextImageKHR which particular image you want to acquire, and iterate through them yourself? That way, you could know in advance which image you are going to ask for, and build and submit your command buffer accordingly.
Like Nicol said you can record secondaries independent of which image it will be rendering to.
However you can take it a step further and record command buffers for all swpachain images in advance and select the correct one to submit from the image acquired.
This type of reuse does take some extra consideration into account because all memory ranges used are baked into the command buffer. But in many situations the required render commands don't actually change frame one frame to the next, only a little bit of the data used.
So the sequence of such a frame would be:
vkAcquireNextImageKHR(vk.dev, vk.swap, 0, vk.acquire, VK_NULL_HANDLE, &vk.image_ind);
vkWaitForFences(vk.dev, 1, &vk.fences[vk.image_ind], true, ~0);
engine_update_render_data(vk.mapped_staging[vk.image_ind]);
VkSubmitInfo submit = build_submit(vk.acquire, vk.rend_cmd[vk.image_ind], vk.present);
vkQueueSubmit(vk.rend_queue, 1, &submit, vk.fences[vk.image_ind]);
VkPresentInfoKHR present = build_present(vk.present, vk.swap, vk.image_ind);
vkQueuePresentKHR(vk.queue, &present);
Granted this does not allow for conditional rendering but the gpu is in general fast enough to allow some geometry to be rendered out of frame without any noticeable delays. So until the player reaches a loading zone where new geometry has to be displayed you can keep those command buffers alive.
Your entire question is predicated on the assumption that you cannot do any command buffer building work without a specific swapchain image. That's not true at all.
First, you can always build secondary command buffers; providing a VkFramebuffer is merely a courtesy, not a requirement. And this is very important if you want to use Vulkan to improve CPU performance. After all, being able to build command buffers in parallel is one of the selling points of Vulkan. For you to only be creating one is something of a waste for a performance-conscious application.
In such a case, only the primary command buffer needs the actual image.
Second, who says that you will be doing the majority of your rendering to the presentable image? If you're doing deferred rendering, most of your stuff will be written to deferred buffers. Even post-processing effects like tone-mapping, SSAO, and so forth will probably be done to an intermediate buffer.
Worst-case scenario, you can always render to your own image. Then you build a command buffer who's only contents is an image copy from your image to the presentable one.
all stages before fragment can run without having any knowledge of which image the final result will be rendered to.
You assume that the hardware has a strict separation between vertex processing and rasterization. This is true only for tile-based hardware.
Direct renderers just execute the whole pipeline, top to bottom, for each rendering command. They don't store post-transformed vertex data in large buffers. It just flows down to the next step. So if the "fragment stage" has to wait on a semaphore, then you can assume that all other stages will be idle as well while waiting.
wouldn't it have made more sense if you could tell vkAcquireNextImageKHR which particular image you want to acquire, and iterate through them yourself?
No. The implementation would be unable to decide which image to give you next. This is precisely why you have to ask for an image: so that the implementation can figure out on its own which image it is safe for you to have.
Also, there's specific language in the specification that the semaphore and/or event you provide must not only be unsignaled but there cannot be any outstanding operations waiting on them. Why?
Because vkAcquireNextImageKHR can fail. If you have some operation in a queue that's waiting on a semaphore that's never going to fire, that will cause huge problems. You have to successfully acquire first, then submit work that is based on the semaphore.
Generally speaking, if you're regularly having trouble getting presentable images in a timely fashion, you need to make your swapchain longer. That's the point of having multiple buffers, after all.
I recently I came across an error that I cannot understand. The game I'm developing using Cocos2D just freezes at a certain random point -- it gets a SIGSTOP -- and I cannot find the reason. What tool can I use (and how do I use it) to find out where the error occurs and what's causing it?
Jeremy's suggestion to stop in the debugger is a good one.
There's a really quick way to investigate a freeze (or any performance issue), especially when it's not easy to reproduce. You have to have a terminal handy (so you'll need to be running in the iOS simulator or on Mac OS X, not on an iOS device).
When the hang occurs pop over to a terminal and run:
sample YourProgramName
(If there are spaces in your program name wrap that in quotes like sample "My Awesome Game".) The output of sample is a log showing where your program is spending time, and if your program is actually hung, it will be pretty obvious which functions are stuck.
I disagree with Aaron Golden's answer above as running on a device is extremely useful in order to have a real-case scenario of where the app freezes. The simulator has more memory and does not reproduce the hardware of the device in an accurate way (for example, the frame rate is in certain cases lower).
"Obviously", you need to connect your device (with a developer profile) on Xcode and look at the console terminal to look for traces that user #AaronGolden suggested.
If those are not enough you might want to enable a general exception breakpoint in Xcode to capture more of the stacktrace messages.
When I started learning Cocos2D my app often frooze. This is a list of common causes:
I wasn't using sprite sheets and hence the frame rate was dropping drammatically
I was using too much memory (too many high-definition sprites. Have a look at TexturePacker and use pvr.ccz or pvr.gz format; it cuts memory allocation in half)
Use instruments to profile your app for memory warnings (for example, look at allocation instruments and look for memory warnings).
Can someone provide examples of how to run MS Kinect Color, Skeleton and Depth streams in different threads ? I have searched the internet but not able to find anything. Thanks in advance.
The KinectExplorer example in the Microsoft Kinect Developer Toolkit provides a KinectDepthViewer control that shows how to process the depth data in a different thread -- the DepthColorizer class. The concepts can be adapted to process skeleton data as well.
You don't explain why you are wanting to run these on different threads, so it is unclear why you would need to do so. All the data is gathered off the UI thread, in their own process already. It is when you want to work with them on the UI thread that matters...
The color stream is just an RGB stream. There may be some processing you need to do to this image (e.g., skinning and face tracking), but generally it is not used as much as the others. The only processing generally required is to copy the bits from the stream into an Image for display, which has to be done on the UI thread anyway.
If you wish to color the depth stream for any reason, doing so on a non-UI thread is beneficial. If you're doing some special processing on it, that could be done on a non-UI thread too. The above example code could be easily adapted.
The skeleton stream already requires the most effort by the CPU, but all that effort is already done for you away from the UI. Once you have a chance to touch it, the data is just a series of objects and arrays. I can't really see what you would need to do on a separate thread at this point.
If you explain what you are trying to accomplish the need for separate processing threads may be more clear.