Multiple renders to single texture without blocking MTLCommandBuffer

Multiple renders to single texture without blocking MTLCommandBuffer - objective-c

I am trying to render 3 separate things to one texture in Metal.
I have a MTLTexture that is used as a destination in 3 different MTLCommandBuffers. I commit them one after another. Each MTLCommandBuffer renders to a separate part of the texture - first draws in the 0 - 1/3 part, second draws the middle 1/3 - 2/3 and the last one draws 2/3 - 1.
id<MTLTexture> dst_texture = ...;
id<MTLCommandBuffer> buffer1 = [self drawToTexture:dst_texture];
[buffer1 commit];
id<MTLCommandBuffer> buffer2 = [self drawToTexture:dst_texture];
[buffer2 commit];
id<MTLCommandBuffer> buffer3 = [self drawToTexture:dst_texture];
[buffer3 commit];
The problem is that it seems I can't share the destination texture in the different command buffers - I get glitches, sometimes I can see only partial results on the destination texture...
Inside drawToTexture I use dst_texture this way:
_renderPassDescriptor.colorAttachments[0].texture = dst_texture;
_renderPassDescriptor.colorAttachments[0].loadAction = MTLLoadActionLoad;
The problem gets fixed when I call [buffer waitUntilCompleted] after each individual commit but I suppose it affects the performance and I would love to have it without blocking/waiting.
This works:
id<MTLTexture> dst_texture = ...;
id<MTLCommandBuffer> buffer1 = [self drawToTexture:dst_texture];
[buffer1 commit];
[buffer1 waitUntilCompleted];
id<MTLCommandBuffer> buffer2 = [self drawToTexture:dst_texture];
[buffer2 commit];
[buffer2 waitUntilCompleted];
id<MTLCommandBuffer> buffer3 = [self drawToTexture:dst_texture];
[buffer3 commit];
[buffer3 waitUntilCompleted];
What else I could do here to avoid waitUntilCompleted calls?

To answer the questions "I am trying to render 3 separate things in one texture in metal", and "What else I could do here to avoid waitUntilCompleted calls?" (Hamid has already explained why the problem occurs), is that you shouldn't be using multiple command buffers for basic rendering with multiple draw calls. If you're rendering to one texture then you need one command buffer which you use to create one renderPassDescriptor that you attach the texture to. Then you need one encoder created from the renderPassDescriptor where you can encode all the draw calls and change of buffers states etc..so as I said in the comment you bind the shader set buffers etc then draw and then don't call endEncoding but set shaders and buffers again and again for how many draw calls and buffer changes you want.
If you wanted to draw to multiple textures then you typically create multiple renderPassDescriptors (but still use one command buffer). Generally you use one command buffer per frame, or for a set of offscreen render passes.

The manual synchronization is only required:
For Untracked Resources.
Across Multiple Devices.
Between a GPU and the CPU.
Between separate command queues.
otherwise, metal automatically synchronizes resources (tracked) between the command buffers even if they are running in parallel.
If a command buffer includes write or read operations on a given MTLTexture, you must ensure that these operations complete before reading or writing the MTLTexture contents. You can use the addCompletedHandler: method, waitUntilCompleted method, or custom semaphores to signal that a command buffer has completed execution.

Related

Changing setPreferredIOBufferDuration at Runtime results in Core Audio Error -50

I am writing an Audio Unit (remote IO) based app that displays waveforms at a given buffer size. The app initially starts off with a preferred buffer size of 0.0001 which results in very small buffer frame sizes (i think its 14 frames). Than at runtime I have a UI element that allows switching buffer frame sizes via AVAudioSession's method setPreferredIOBufferDuration:Error:.
Here is the code where the first two cases change from a smaller to a larger sized buffer. 3-5 are not specified yet. But the app crashes at AudioUnitRender with -50 error code.
- (void)setBufferSizeFromMode:(int)mode {
NSTimeInterval bufferDuration;
switch (mode) {
case 1:
bufferDuration = 0.0001;
break;
case 2:
bufferDuration = 0.001;
break;
case 3:
bufferDuration = 0.0; // reserved
break;
case 4:
bufferDuration = 0.0; // reserved
break;
case 5:
bufferDuration = 0.0; // reserved
break;
default:
break;
}
AVAudioSession *session = [AVAudioSession sharedInstance];
NSError * audioSessionError = nil;
[session setPreferredIOBufferDuration:bufferDuration error:&audioSessionError];
if (audioSessionError) {
NSLog(#"Error %ld, %#",
(long)audioSessionError.code, audioSessionError.localizedDescription);
}
}
Based on reading the CoreAudio and AVFoundation documentation, I was led to believe that you can change audio hardware configuration at runtime. There may be some gaps in audio or distortion but I am fine with that for now. Is there an obvious reason for this crash? Or must I reinitialize everything (my audio session, my audio unit, my audio buffers, etc..) for each change of the buffer duration?
Edit: I have tried calling AudioOutputUnitStop(self.myRemoteIO); before changing the session buffer duration and than starting again after it is set. I've also tried setting the session to inactive and than reactivating it but both result with the -50 OSStatus from AudioUnitRender() in my AU input callback.

A -50 error usually means the audio unit code is trying to set or use an invalid parameter value.
Some iOS devices don't support actual buffer durations below 5.3 mS (or 0.0058 seconds on older devices). And iOS devices appear free to switch to an actual buffer duration 4X longer than that, or even alternate slightly different values, at times not under the apps control.
The inNumberFrames is given to the audio unit callback as a parameter, your app can't arbitrarily specify that value.
If you want to process given buffer sizes, pull them out of an intermediating lock-free circular FIFO, which the audio unit callback can feed into.
Also: Try waiting a second or so after calling audio stop before changing parameters or restarting. There appears to be a delay between when you call stop, and when the hardware actually stops.

Reducing peak memory consumption

I generate bitmaps using the next simplified for the sake of simplicity code:
for (int frameIndex = 0; frameIndex < 90; frameIndex++) {
UIGraphicsBeginImageContextWithOptions(CGSizeMake(130, 130));
// Making some rendering on the context.
// Save the current snapshot from the context.
UIImage *snapshot = UIGraphicsGetImageFromCurrentImageContext();
[self.snapshots addObject:snapshot];
UIGraphicsEndImageContext();
}
So nothing non-trivial but everything gets complicated when an operating system gives about 30 MB of memory for everything (in this particular case it is watch OS 2 but nevertheless it is not the OS-dependent question) and by exceeding the quota, the OS just kills the application's process.
The next graph from the Allocations Instrument illustrates the question:
It is the same graph but with different annotations of memory consumption - before, at the moment and after the aforementioned code execution. As it can be seen about 5.7 MB of bitmaps have been generated eventually and it is the absolutely acceptable result. What is not acceptable it is memory consumption (44.6 MB) at the peak of the graph - all of this memory is eaten by CoreUI: image data. Given the fact that the action takes place in a background thread the time of execution is not that important.
So the questions: What is the right approach to decreasing memory consumption (maybe by increasing the execution time) to fit the memory quota and why the memory consumption is increased despite UIGraphicsEndImageContext is called?
Update 1:
I think splitting the whole operation by using NSOperation, NSTimer etc. will do the trick but still trying to come up with some synchronous solution.
Tried to gather all answers together and tested the next piece of code:
UIGraphicsBeginImageContextWithOptions(CGSizeMake(130, 130));
for (int frameIndex = 0; frameIndex < 45; frameIndex++) {
// Making some rendering on the context.
#autoreleasepool {
UIImage *snapshot = UIGraphicsGetImageFromCurrentImageContext();
[self.snapshots addObject:snapshot];
}
CGContextClearRect(UIGraphicsGetCurrentContext(), CGSizeMake(130, 130));
}
for (int frameIndex = 0; frameIndex < 45; frameIndex++) {
// Making some rendering on the context.
#autoreleasepool {
UIImage *snapshot = UIGraphicsGetImageFromCurrentImageContext();
[self.snapshots addObject:snapshot];
}
CGContextClearRect(UIGraphicsGetCurrentContext(), CGSizeMake(130, 130));
}
UIGraphicsEndImageContext();
What has changed:
Split 90 iterations into 2 parts of 45.
Moved a graphic context outside and clear it after each iteration instead of creating new one.
Wrapped taking and storing snapshots in the autorelease pool.
As a result - nothing changed, memory consumption remains on the same level.
Also, if remove taking and storing a snapshot at all it will decrease memory consumption only for 4 MB i.e. less than 10%.
Update 2:
Doing rendering by a timer every 3 seconds generates the next graph:
As you see memory is not freed (to be precise - not fully) even if rendering is divided by time intervals. Something tells me that memory is not freed until the object that performs rendering exists.
Update 3:
The problem has been solved by combining 3 approaches:
Splitting the whole rendering task into subtasks. For example, 90 drawings are split into 6 subtasks by 15 drawings in each (The number of 15 was found empirically).
Executing all subtasks serially using dispatch_after with the small interval after each (0.05s in my case).
And the last and the most important. To avoid the memory leak like on the last graph - each subtask should be executed in a context of a new object. For example:
self.snapshots = [[SnaphotRender new] renderSnapshotsInRange:[0, 15]];
Thanks to everyone for answering but #EmilioPelaez was closest to the right answer.

Corrected the to the updated question frame count.
The total byte size of the images could be 130 * 130 * 4 (bytes per pixel) * 90 = ~6MB.
Not to sure about the watch but temporary memory might be building up, you could try wrapping the snap shot code in an #autoreleasepool block:
#autoreleasepool {
UIImage *snapshot = UIGraphicsGetImageFromCurrentImageContext();
[self.snapshots addObject:snapshot];
}

I think your problem is that you are doing all the snapshots in the same context (the context the for loop is in). I believe the memory is not being released until the context ends, which is when the graph goes down.
I would suggest you reduce the scope of the context, so instead of using a for loop to draw all frames you would keep track of the progress with some iVars and draw just one frame; whenever you finish rendering the frame, you can call the function again with a dispatch_after and modify the variables. Even if the delay is 0, it will allow the context to end and clean up the memory that is no longer being used.
PS. When I mean context I don't mean a graphics context, I mean a certain scope in your code.

wait, image resolution plays a big role, for example, a one mega byte jpeg image of the dimensions 5000 px * 3000 px would consume ram of size of 60 MB, 5000*3000*4 bytes are 60 MB, images get decompressed into the ram, so let's troubleshoot, so first, please tell us what kind of image sizes (dimensions) do you use ?

Edit (after clarifications):
I think, that the proper way should be not to store UIImage objects directly, but compressed NSData objects (i.e. using
UIImageJPEGRepresentation). And then, when need it, convert them again to UIImage objects.
However, if you use many of them simultaneously, you're going to run out memory quite rapidly.
Original answer:
Actually the total size can be higher than 10MB (probably > 20MB) depending on the scaling factor (as seen here). Note that the UIGraphicsBeginImageContextWithOptions requires a scale parameter in contrast to the UIGraphicsBeginImageContext. So my guess it's this is somehow related to a screenshot isn't it?
Also the method UIGraphicsGetImageFromCurrentImageContext it's thread safe, so it might be returning a copy or using more memory. Then you have your 40MB.
This answer states that iOS somehow stores the images compressed when not displayed. This can be the reason to the 6MB of usage afterwards.
My final guess is that the device is detecting a peak of memory and then tries to save memory somehow. Since the images are not being used, it compressed them internally and recycles memory.
So I wouldn't worry because it looks like it's taking care of it by itself. But then you can do as others have suggested, save the image into a file and don't keep it in memory if you're not going to use it.

Memory management issue with CIImage / CGImageRef

Good morning,
I encounter a memory management issue in the video processing software i'm trying to write. (video capture + (almost)real-time processing + display + recording).
The following code is part of the "..didOutputSampleBuffer.." function of AVCaptureVideoDataOutputSampleBufferDelegate.
capturePreviewLayer is a CALayer.
ctx is a CIContext I reuse over and over.
outImage is a vImage_Buffer.
With the commented section kept commented, memory usage is stable and acceptable, but if I uncomment it, memory won't stop increasing. Note that if I leave the filtering operation commented and only keep CIImage creation and conversion back to CGImageRef, the problem remains. (I mean : I don't think this is related to the filter itself).
If I run the XCode's Analyse, it points a potential memory leak if this part is uncommented, but none if it is commented.
Does anybody has an idea to explain and fix this ?
Thank you very much !
Note : I prefer not to use AVCaptureVideoPreviewLayer and its filters property.
CGImageRef convertedImage = vImageCreateCGImageFromBuffer(&outImage, &outputFormat, NULL, NULL, 0, &err);
//CIImage * img = [CIImage imageWithCGImage:convertedImage];
////[acc setValue:img forKey:#"inputImage"];
////img = [acc valueForKey:#"outputImage"];
//convertedImage = [self.ctx createCGImage:img fromRect:img.extent];
dispatch_sync(dispatch_get_main_queue(), ^{
self.capturePreviewLayer.contents = (__bridge id)(convertedImage);
});
CGImageRelease(convertedImage);
free(outImage.data);

Both vImageCreateCGImageFromBuffer() and -[CIContext createCGImage:fromRect:] give you a reference you are responsible for releasing. You are only releasing one of them.
When you replace the value of convertedImage with the new CGImageRef, you are losing the reference to the previous one without releasing it. You need to add another call to CGImageRelease(convertedImage) after your last use of the old image and before you lose that reference.

Grand Central Strategy for Opening Multiple Files

I have a working implementation using Grand Central dispatch queues that (1) opens a file and computes an OpenSSL DSA hash on "queue1", (2) writing out the hash to a new "side car" file for later verification on "queue2".
I would like to open multiple files at the same time, but based on some logic that doesn't "choke" the OS by having 100s of files open and exceeding the hard drive's sustainable output. Photo browsing applications such as iPhoto or Aperture seem to open multiple files and display them, so I'm assuming this can be done.
I'm assuming the biggest limitation will be disk I/O, as the application can (in theory) read and write multiple files simultaneously.
Any suggestions?
TIA

You are correct in that you'll be I/O bound, most assuredly. And it will be compounded by the random access nature of having multiple files open and being actively read at the same time.
Thus, you need to strike a bit of a balance. More likely than not, one file is not the most efficient, as you've observed.
Personally?
I'd use a dispatch semaphore.
Something like:
#property(nonatomic, assign) dispatch_queue_t dataQueue;
#property(nonatomic, assign) dispatch_semaphore_t execSemaphore;
And:
- (void) process:(NSData *)d {
dispatch_async(self.dataQueue, ^{
if (!dispatch_semaphore_wait(self.execSemaphore, DISPATCH_TIME_FOREVER)) {
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
... do calcualtion work here on d ...
dispatch_async(dispatch_get_main_queue(), ^{
.... update main thread w/new data here ....
});
dispatch_semaphore_signal(self.execSemaphore);
});
}
});
}
Where it is kicked off with:
self.dataQueue = dispatch_queue_create("com.yourcompany.dataqueue", NULL);
self.execSemaphore = dispatch_semaphore_create(3);
[self process: ...];
[self process: ...];
[self process: ...];
[self process: ...];
[self process: ...];
.... etc ....
You'll need to determine how best you want to handle the queueing. If there are many items and there is a notion of cancellation, enqueueing everything is likely wasteful. Similarly, you'll probably want to enqueue URLs to the files to process, and not NSData objects like the above.
In any case, the above will process three things simultaneously, regardless of how many have been enqueued.

I'd use NSOperation for this because of the ease of handling both dependencies and cancellation.
I'd create one operation each for reading the data file, computing the data file's hash, and writing the sidecar file. I'd make each write operation dependent on its associated compute operation, and each compute operation dependent on its associated read operation.
Then I'd add the read and write operations to one NSOperationQueue, the "I/O queue," with a restricted width. The compute operations I'd add to a separate NSOperationQueue, the "compute queue," with a non-restricted width.
The reason for the restriced width on the I/O queue is that your work will likely be I/O bound; you may want it to have a width greater than 1, but it's very likely to be directly related to the number of physical disks on which your input files reside. (Probably something like 2x, you'll want to determine this experimentally.)
The code would wind up looking something like this:
#implementation FileProcessor
static NSOperationQueue *FileProcessorIOQueue = nil;
static NSOperationQueue *FileProcessorComputeQueue = nil;
+ (void)inititalize
{
if (self == [FileProcessor class]) {
FileProcessorIOQueue = [[NSOperationQueue alloc] init];
[FileProcessorIOQueue setName:#"FileProcessorIOQueue"];
[FileProcessorIOQueue setMaxConcurrentOperationCount:2]; // limit width
FileProcessorComputeQueue = [[NSOperationQueue alloc] init];
[FileProcessorComputeQueue setName:#"FileProcessorComputeQueue"];
}
}
- (void)processFilesAtURLs:(NSArray *)URLs
{
for (NSURL *URL in URLs) {
__block NSData *fileData = nil; // set by readOperation
__block NSData *fileHashData = nil; // set by computeOperation
// Create operations to do the work for this URL
NSBlockOperation *readOperation =
[NSBlockOperation blockOperationWithBlock:^{
fileData = CreateDataFromFileAtURL(URL);
}];
NSBlockOperation *computeOperation =
[NSBlockOperation blockOperationWithBlock:^{
fileHashData = CreateHashFromData(fileData);
[fileData release]; // created in readOperation
}];
NSBlockOperation *writeOperation =
[NSBlockOperation blockOperationWithBlock:^{
WriteHashSidecarForFileAtURL(fileHashData, URL);
[fileHashData release]; // created in computeOperation
}];
// Set up dependencies between operations
[computeOperation addDependency:readOperation];
[writeOperation addDependency:computeOperation];
// Add operations to appropriate queues
[FileProcessorIOQueue addOperation:readOperation];
[FileProcessorComputeQueue addOperation:computeOperation];
[FileProcessorIOQueue addOperation:writeOperation];
}
}
#end
It's pretty straightforward; rather than deal with multiply-nested layers of sync/async as you would with the dispatch_* APIs, NSOperation allows you to define your units of work and your dependencies between them independently. For some situations this can be easier to understand and debug.

You have received excellent answers already, but I wanted to add a couple points. I have worked on projects that enumerate all the files in a file system and calculate MD5 and SHA1 hashes of each file (in addition to other processing). If you are doing something similar, where you are searching a large number of files and the files may have arbitrary content, then some points to consider:
As noted, you will be I/O bound. If you read more than 1 file simultaneously, you will have a negative impact on the performance of each calculation. Obviously, the goal of scheduling calculations in parallel is to keep the disk busy between files, but you may want to consider structuring your work differently. For example, set up one thread that enumerates and opens the files and a second thread the gets open file handles from the first thread one at a time and processes them. The file system will cache catalog information, so the enumeration won't have a severe impact on reading the data, which will actually have to hit the disk.
If the files can be arbitrarily large, Chris' approach may not be practical since the entire content is read into memory.
If you have no other use for the data than calculating the hash, then I suggest disabling file system caching before reading the data.
If using NSFileHandles, a simple category method will do this per-file:
#interface NSFileHandle (NSFileHandleCaching)
- (BOOL)disableFileSystemCache;
#end
#include <fcntl.h>
#implementation NSFileHandle (NSFileHandleCaching)
- (BOOL)disableFileSystemCache {
return (fcntl([self fileDescriptor], F_NOCACHE, 1) != -1);
}
#end
If the sidecar files are small, you may want to collect them in memory and write them out in batches to minimize disruption of the processing.
The file system (HFS, at least) stores file records for files in a directory sequentially, so traverse the file system breadth-first (i.e., process each file in a directory before entering subdirectories).
The above is just suggestions, of course. You will want to experiment and measure performance to confirm the actual impact.

libdispatch actually provides APIs explicitly for this! Check out dispatch_io; it will handle parallelizing IO when appropriate, and otherwise serializing it to avoid thrashing the disk.

The following link is to a BitBucket project I setup utilizing NSOperation and Grand Central Dispatch in use a primitive file integrity application.
https://bitbucket.org/torresj/hashar-cocoa
I hope it is of help/use.

Optimize a views drawing code

in a simple drawing application I have a model which has a NSMutableArray curvedPaths holding all the lines the user has drawn. A line itself is also a NSMutableArray, containing the point objects. As I draw curved NSBezier paths, my point array has the following structure: linePoint, controlPoint, controlPoint, linePoint, controlPoint, controlPoint, etc... I thought having one array holding all the points plus control points would be more efficient than dealing with 2 or 3 different arrays.
Obviously my view draws the paths it gets from the model, which leads to the actual question: Is there a way to optimize the following code (inside the view's drawRect method) in terms of speed?
int lineCount = [[model curvedPaths] count];
// Go through paths
for (int i=0; i < lineCount; i++)
{
// Get the Color
NSColor *theColor = [model getColorOfPath:[[model curvedPaths] objectAtIndex:i]];
// Get the points
NSArray *thePoints = [model getPointsOfPath:[[model curvedPaths] objectAtIndex:i]];
// Create a new path for performance reasons
NSBezierPath *path = [[NSBezierPath alloc] init];
// Set the color
[theColor set];
// Move to first point without drawing
[path moveToPoint:[[thePoints objectAtIndex:0] myNSPoint]];
int pointCount = [thePoints count] - 3;
// Go through points
for (int j=0; j < pointCount; j+=3)
{
[path curveToPoint:[[thePoints objectAtIndex:j+3] myNSPoint]
controlPoint1:[[thePoints objectAtIndex:j+1] myNSPoint]
controlPoint2:[[thePoints objectAtIndex:j+2] myNSPoint]];
}
// Draw the path
[path stroke];
// Bye stuff
[path release];
[theColor release];
}
Thanks,
xonic

Hey xon1c, the code looks good. In general it is impossible to optimize without measuring performance in specific cases.
For example, lets say the code above is only ever called once. It draws a picture in a view and it never needs redrawing. Say the code above takes 50 milliseconds to run. You could rewrite it in openGL and do every optimisation under the sun and get that time down to 20 milliseconds and realistically the 30 milliseconds that you have saved makes no difference to anyone and you pretty much just wasted your time and added a load of code-bloat that is going to be more difficult to maintain.
However, if the code above is called 50 times a second and most of those times it is drawing the same thing then you could meaningfully optimise it.
When it comes to drawing the best way to optimise is to is to eliminate unnecessary drawing.
Each time you draw you recreate the NSBezierPaths - are they always different? You may want to maintain the list of NSBezier paths to draw, keep that in sync with your model, and keep drawrect solely for drawing the paths.
Are you drawing to areas of your view which don't need redrawing? The argument to drawrect is the area of the view that needs redrawing - you could test against that (or getRectsBeingDrawn:count:), but it may be in your case that you know that the entire view needs to be redrawn.
If the paths themselves don't change often, but the view needs redrawing often - eg when the shapes of the paths aren't changing but their positions are animating and they overlap in different ways, you could draw the paths to images (textures) and then inside drawrect you would draw the texture to the view instead of drawing the path to the view. This can be faster because the texture is only created once and uploaded to video memory which is faster to draw to the screen. You should look at Core Animation if this is what you need todo.
If you find that drawing the paths is too slow you could look at CGPath
So, on the whole, it really does depend on what you are doing. The best advice is, as ever, not to get sucked into premature optimisation. If your app isn't actually too slow for your users, your code is just fine.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas