Metal drawable musings... ugh - core-animation

I have two issues in my Metal App.
My call to currentPassDescriptor is stalling. I have too many drawables, apparently.
I'm wholly confused on how to most performantly configure the multiple MTKViews I am using.
Issue (1)
I have a problem with currentPassDescriptor in my app. It is occasionally blocking (for 1.00s) which, according to the docs, is because there is no currentDrawable available.
Background: I have 4 HD 1920x1080 videos playing concurrently, tiled out onto a 3840x2160 second external display as a debugging configuration. The pixel buffers of these AVPlayer instances are captured by 4 independent CVDIsplayLink callbacks and, from within the callback, there is the draw call to its assigned MTKView. A total of 4 MTKViews are subviews tiled on a single NSWindow, and are configured for manual drawing.
I'm using CVDisplayLink callbacks manually. If I don't, then I get stutter when mousing up on the app’s menus, for example.
Within each draw call, I do a bit of kernel shader work then attempt to obtain the currentPassDescriptor. If successful, I do one pass of a fragment/vertex shader and then present the drawable. My code flow follows Apple’s sample code as well as published examples.
According to the Metal System Trace, most of draw calls take under 5ms. The GPU is about 20-25% utilized and there’s about 25% of the GPU memory free. I can also cause the main thread to usleep() for 1 second without any hiccups.
Without any user interaction, there’s about a 5% chance of the videos stalling out in the first minute. If there’s some UI work going then I see that as windowServer work in Instruments. I also note that AVFoundation seems to cache about 15 frames of video onto the GPU for each AVPlayer.
If the cadence of the draw calls is upset, there's about a 10% chance that things stall completely or some of the videos -- some will completely stall, some will stall with 1hz updates, some won't stall at all. There's also less chance of stalling when running Metal System Trace. The movies that have stalled seem to have done so on obtaining a currentPassDescriptor.
This is really a poor design to have this currentPassDescriptor block for ≈1s during a render loop. So much so that I’m thinking of eschewing the MTKView all together and just drawing to a CAMetalLayer myself. But the docs on CAMetalLayer seem to indicate the same blocking behaviour will occur.
I also grab these 4 pixel buffers on the fly and render sub-size regions-of-interest to 4 smaller MTKViews on the main monitor; but the stutters still occur if this code is removed.
Is the drawable buffer limit per MTKView or per the backing CALayer? The docs for maximumDrawableCount on CAMetalLayer say the number needs to be 2 or 3. This question ties into the configuration of the views.
Issue (2)
My current setup is a 3840x2160 NSWindow with a single content view. This subclass of NSView does some hiding/revealing of the mouse cursor by introducing an NSTrackingRectTag. The MTKViews are tiled subviews on this content view.
Is this the best configuration? Namely, one NSWindow with tiled MTKViews… or should I do one MTKView per window?
I'm also not sure how to best configure these windows/layers — ie. by setting (or clearing) wantsLayer, wantsUpdateLayer, and/or canDrawSubviewsIntoLayer. I'm currently just setting wantsLayer to YES on the single content view. Any hints on this would be great.
Does adjusting these properties collapse all the available drawables to the backing layer only; are there still 2 or 3 per MTKView?
NB: I've attached a sample run of my Metal app. The longest 'work' on the top graph is just under 5ms. The clumps of green/blue are rendering on the 4 MTKViews. The 'work' alternates a bit because one of the videos is a 60fps source; the others are all 30fps.

Related

Number or hardware draw calls in Core Animation

I’m building a Metal app which renders a few objects to a view, similar to how Core Animation renders its CALayers. Soon I realized that rendering each layer with a separate draw call can be expensive and inefficient if there are many objects, compared to drawing all objects from a single vertex buffer with a single draw call. However then the single buffer size may become large and if only a few of the objects change frequently, the entire buffer must be sent to the GPU which may also be inefficient.
So I’m wondering how Core Animation works in this regard. Does it render each layer with a separate draw call or unifies all layers into a single draw call, or something in between?
I tried to profile a Core Animation app with Instruments but the Metal draw calls don’t seem to be recorded, even though Core Animation is said to use Metal under the hood. Any insights?

In Vulkan, using FIFO, happens when you don't have an image ready to present when the vertical blank arrives?

I'm new to graphics, and I've been looking at Vulkan presentation modes. I was wondering: in a situation where we've only got 2 images in our swapchain (one that the screen's currently reading from and one that's free), what happens if we don't manage to finish drawing to the currently free image before the next vertical blank? Do we do the presentation and get weird tearing, or do skip the presentation and draw the same image again (I guess giving a "stuttering" effect)? Do we need to define what happens, or is it automatic?
As a side note, is this why people use longer swap chains? i.e. so that if you managed to draw out 2 images to your swap chain while the screen was displaying the last image but now you're running late, at least you can present the newer of the 2 images from before?
I'm not sure how much of this is specific to FIFO or mailbox mode: I guess with mailbox you'll already have used the newest image you've got, so you're stuck again?
[2-image swapchain][1]
[1]: https://i.stack.imgur.com/rxe51.png
Tearing never happens in regular FIFO (or mailbox) mode. When you present an image, this image will be used for all subsequent vblanks until a new image is presented. And since FIFO disallows tearing, in your case, the image will be fully displayed twice.
If you are using a 2-deep swapchain with FIFO, you have to produce each image on time in order to avoid stuttering. With longer swapchains and FIFO, you have more leeway to avoid visible stuttering. With longer swapchains and mailbox, you can get a similar effect, but there will be less visible latency when your application is running on-time.

How to improve MTKView rendering when using MPSImageScale and MTLBlitCommandEncoder

TL;DR: From within my MTKView's delegate drawInMTKView: method, part of my rendering pass involves adding an MPSImageBilinearScale performance shader and zero or more MTLBlitCommandEncoder requests for generateMipmapsForTexture. Is that a smart thing to do from within drawInMTKView:, which happens on the main thread? Do either of them block the main thread while running or are they only being encoded and then executed later and entirely on the GPU?
Longer Version:
I'm playing around with Metal within the context of an imaging application. I use Core Image to load an image and apply filters. The output image is displayed as a 2D plane in a metal view with a single texture. This works, but to improve performance I wanted to experiment with Core Image's ability to render out smaller tiles at a time. Each tile is rendered into its own IOSurface.
On each render pass, I check if there are any tiles that have been recently rendered. For each rendered tile (which is now an IOSurface), I create a Metal texture from a CVMetalTextureCache that is backed by the surface.
I think use a scaling MPS to copy from the tile-texture into the "master" texture. If a tile was copied over, then I issue a blit command to generate the mipmaps on the master texture.
What I'm seeing is that if my master texture is quite large, then generate the mipmaps can take "a bit of time". The same is true if I have a lot of tiles. It appears this is blocking the main thread because my FPS drops significantly. (The MTKView is running at the standard 60fps.)
If I play around with tile sizes, then I can improve performance in some areas but decrease it in others. For example, increasing the tile size that Core Image renders it creates less tiles, and thus less calls to generate mipmaps and blits, but at the cost of Core Image taking longer to render a region.
If I decrease the size of my "master" texture, then mipmap generation goes faster since only the dirty textures are updates, but there appears to be a lower bounds on how small I should make the master texture because if I make it too small, then I need to pass in a large number of textures to the fragment shader. (And it looks like that limit might be 128?)
What's not entirely clear to me is how much of this I can move off the main thread while still using MTKView. If part of the rendering pass is going to block the main thread, then I'd prefer to move it to a background through so that UI elements (like sliders and checkboxes) remain fully responsive.
Or maybe this isn't the right strategy in the first place? Is there a better way to display really large images in Metal other than tiling? (i.e.: Images larger than Metal's texture size limit of 16384?)

Correctly implement the blocks in the drawing view

How to correctly implement the blocks in the drawing view, so that when they could cut the line in two parts. Using UIImageView or UIImage?
After the cut blocks should fall under the influence of physics.
First, how many cuts could happen in total? How many independent pieces of block could result? 10? 100? Before implementing any of these, test moving that number of objects around on an iPhone or iPod touch. Just because it works on the simulator does not mean it will be fast enough on the actual device.
Second, as already noted, there are libraries for game graphics and physics that may do a lot of the work for you. Cocos2D appears to be a popular option, combining OpenGL drawing with relatively easy access to physics libraries.
Anyway, to do your own drawing, here are the choices:
Move all the graphics into OpenGL. This should not be undertaken lightly - you lose a lot of the ease of working in Cocoa Touch. You also have maximum control over your graphics and animation, and can achieve the smoothest performance if you take the time to optimise it.
Have a single UIView, adding CALayer sublayers to its main layer for every independent block. CALayers are designed for rapid moving and compositing. However, if you're running a physics simulation, your first step will be to remove their animation behavior. This tutorial series may be useful.
Have a separate UIView for each block. This will have similar performance to using CALayers, as UIViews are actually drawn with CALayer. This option will use up more memory, (you have at least as many layers and more views than before), but you have all of the power of CALayers plus a few drawing options that are easier on views.
Have a single UIView, and draw every block during its drawRect method. This may look easy to implement, but it will almost certainly be too slow.
If at all possible, test each of these. Before you continue with the cutting and physics parts, how many blocks can you animate across the screen before it slows down too far? Can you make a game with that Remember that your physics system will slow the game down when it does work.

Animate screen while loading textures

My RPG-like game has random battles. When the player enters a random battle, it is necessary for my game to load the textures used within that battle (animated monsters, animations, etc). The textures are quite a lot, and rather big (the battles are very graphical intensive).
Such process consumes significant time. And while it is loading, the whole screen freezes.
The game's map freezes, and the wait time is significant - I personally find it annoying.
I can't afford to preload the textures because, after doing some math, I realized:
If I preload all the textures at the beginning of the game, the application will definitely crash.
If I preload the textures that are used in a specific map when the player enters the map, the application is very likely to crash as well.
I can only afford to load the textures when I need them, and dispose of them as soon as the battle ends.
I'd prefer to not use a "loading screen" image because it affects my game's design and concept. I want to avoid this approach.
If I could do some kind of animation while loading the textures, it would be great, which leads to my question: is that possible? What kind of animation, you ask? Well, how about... you remember when Final Fantasy used to distort the screen while apparently loading the textures? Something like that. But well, distorting is quite a time-consuming process as well, so maybe just a cool frame-by-frame animation or something.
While writing this, I realized that I could make small pauses between textures (there are multiple textures), and during such pauses, I update the screen to represent the animation's state. However, this is very unlikely to happen, because each texture is 2048x2048, so the animation would be refreshed at a rather laggy (and annoying) rate. I'd prefer to avoid this as well.
In a similar bind, i chose to
Convert all my animation textures to gzipped PVR. The load time (depending on the device) is improved by a factor of 2 to 4. Any artefacts caused by a conversion to PVR are not noticeable in motion.
I preload the idle animations (almost always on, except during a skill use or when hurt. I do this during the battle scene's fade-in. I control the fade-in myself on a tick rate of 50 ms, and at every frame i fire-up a preload of one of the idles ( there are a maximum of 8 of them, they take 20 or so ms.).
I have an 'engagement' class which computes the entire fight ahead of time. When an animation becomes un-needed, i unload it. Also, during a 'hurt' animation, i prefetch the next skill animation.
loads of fun. Best of luck with your game.
ps. Dont trust the simulator for actual response times. Go to devices rapidly to determine if you really have a performance issue.
pps. About point 1, that caused a significant size reduction for my app.
Since the battles are supposed to be at random, would it be possible to preload the textures for the next battle before that battle happens? Then the battle can start whenever the loading has completed.
Game decides battle should happen soon
Generate random encounter (monsters/background/etc?)
Load textures for the encounter
Start the encounter after the textures have loaded
The battles are still random, it's just that the encounter has been determined a bit before the user is aware a battle is about to happen.
You could load lower-resolution textures first, and in a background thread (NSOperation I think) kick off the load for the bigger textures, and 'swap' them when done.
As for animation, a lot of games start by loading the small textures when the player is far away, and as they get closer, the higher-res textures will 'fade' in