how to read VBR audio in novacaine (as opposed to PCM) - objective-c

The creator of novacaine offered example code where audio data is read from a a file and fed to a ring buffer. When the file reader is created though, the output is forced to be PCM:
- (id)initWithAudioFileURL:(NSURL *)urlToAudioFile samplingRate:(float)thisSamplingRate numChannels:(UInt32)thisNumChannels
{
...
// We're going to impose a format upon the input file
// Single-channel float does the trick.
_outputFormat.mSampleRate = self.samplingRate;
_outputFormat.mFormatID = kAudioFormatLinearPCM;
_outputFormat.mFormatFlags = kAudioFormatFlagIsFloat;
_outputFormat.mBytesPerPacket = 4*self.numChannels;
_outputFormat.mFramesPerPacket = 1;
_outputFormat.mBytesPerFrame = 4*self.numChannels;
_outputFormat.mChannelsPerFrame = self.numChannels;
_outputFormat.mBitsPerChannel = 32;
}
I'm trying to contribute to the novacaine project by allowing it to
Read from the iPod library (which can only be accessed via AVAssetReader, rather than the audio file services library)
Read and write VBR packets rather than PCM.
So this is what my equivalent function of the above looks like (see the NOTE: parts)
- (id)initWithAudioAssetURL:(NSURL *)urlToAsset samplingRate:(float)thisSamplingRate numChannels:(UInt32)thisNumChannels
{
self = [super init];
if (self)
{
// Zero-out our timer, so we know we're not using our callback yet
self.callbackTimer = nil;
// Open a reference to the audio Asset Track and setup the reader
self.assetURL = urlToAsset;
AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:self.assetURL options:nil];
NSError * error = nil;
AVAssetReader* reader = [[AVAssetReader alloc] initWithAsset:songAsset error:&error];
AVAssetTrack* track = [songAsset.tracks objectAtIndex:0];
//NOTE: we use the track's native settings here, as opposed to forcing it to be PCM
//like the example above
_readerOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:track
outputSettings:NULL];
_nativeTrackASBD = [self getTrackNativeSettings:track];
[reader addOutput:_readerOutput];
[reader startReading];
// Set a few defaults and presets
self.samplingRate = thisSamplingRate;
self.numChannels = thisNumChannels;
self.latency = .011609977; // 512 samples / ( 44100 samples / sec ) default
// Arbitrary buffer sizes that don't matter so much as long as they're "big enough"
self.outputBufferSize = 100000; //buffer sample sizes vary around 60-70k, we keep it # 100k to be safe
self.numSamplesReadPerPacket = 8192;
self.desiredPrebufferedSamples = self.numSamplesReadPerPacket*2;
//NOTE: these buffers are float, where as the above audio code is in SInt16
self.outputBuffer = (float *)calloc(2*self.samplingRate, sizeof(float));
self.holdingBuffer = (float *)calloc(2*self.samplingRate, sizeof(float));
// Allocate a ring buffer (this is what's going to buffer our audio)
ringBuffer = new RingBuffer(self.outputBufferSize, self.numChannels);
// Fill up the buffers, so we're ready to play immediately
[self bufferNewAudioFromAsset];
}
return self;
}
Looking at the code, it seems that everything is made in float (the audio buffers, the output format etc).. Is there a reason for this? (Keep in mind that the iOS audio canonical format is in SInt16, not float).. for example see in the Novocaine::renderCallback function:
else if ( sm.numBytesPerSample == 2 ) // then we need to convert SInt16 -> Float (and also scale)
{
float scale = (float)INT16_MAX;
vDSP_vsmul(sm.outData, 1, &scale, sm.outData, 1, inNumberFrames*sm.numOutputChannels);
for (int iBuffer=0; iBuffer < ioData->mNumberBuffers; ++iBuffer) {
int thisNumChannels = ioData->mBuffers[iBuffer].mNumberChannels;
for (int iChannel = 0; iChannel < thisNumChannels; ++iChannel) {
vDSP_vfix16(sm.outData+iChannel, sm.numOutputChannels, (SInt16 *)ioData->mBuffers[iBuffer].mData+iChannel, thisNumChannels, inNumberFrames);
}
}
}
What are the list of things that I gotta change to make this library compatible with reading and writing VBR Data?
A

Related

mmap() and newBufferWithBytesNoCopy causing IOAF code -536870211 error if the file is too small

I noticed that, while generating a texture from an MTLBuffer created from mmap() via newBufferWithBytesNoCopy, if the size requested by the len argument to mmap, page aligned, is larger than the actual size of the file, page aligned, the mmap call succeeds, and the newBufferWithBytesNoCopy message does not result in a nil return or error, but when I pass the buffer to the GPU to copy the data to an MTLTexture, the following is printed to the console, and all GPU commands fail to perform any action:
Execution of the command buffer was aborted due to an error during execution. Internal Error (IOAF code -536870211)
Here is code to demonstrate the problem:
static id<MTLDevice> Device;
static id<MTLCommandQueue> Queue;
static id<MTLTexture> BlockTexture[3];
#define TEX_LEN_1 1 // These are all made 1 in this question for simplicity
#define TEX_LEN_2 1
#define TEX_LEN_4 1
#define TEX_SIZE ((TEX_LEN_1<<10)+(TEX_LEN_2<<11)+(TEX_LEN_4<<12))
#define PAGE_ALIGN(S) ((S)+PAGE_SIZE-1&~(PAGE_SIZE-1))
int main(void) {
if (!(Queue = [Device = MTLCreateSystemDefaultDevice() newCommandQueue]))
return EXIT_FAILURE;
#autoreleasepool {
const id<MTLBuffer> data = ({
void *const map = ({
NSFileHandle *const file = [NSFileHandle fileHandleForReadingAtPath:[NSBundle.mainBundle pathForResource:#"Content" ofType:nil]];
if (!file)
return EXIT_FAILURE;
mmap(NULL, TEX_SIZE, PROT_READ, MAP_SHARED, file.fileDescriptor, 0);
});
if (map == MAP_FAILED)
return errno;
[Device newBufferWithBytesNoCopy:map length:PAGE_ALIGN(TEX_SIZE) options:MTLResourceStorageModeShared deallocator:^(void *const ptr, const NSUInteger len){
munmap(ptr, len);
}];
});
if (!data)
return EXIT_FAILURE;
const id<MTLCommandBuffer> buffer = [Queue commandBuffer];
const id<MTLBlitCommandEncoder> encoder = [buffer blitCommandEncoder];
if (!encoder)
return EXIT_FAILURE;
{
MTLTextureDescriptor *const descriptor = [MTLTextureDescriptor new];
descriptor.width = descriptor.height = 32;
descriptor.mipmapLevelCount = 6;
descriptor.textureType = MTLTextureType2DArray;
descriptor.storageMode = MTLStorageModePrivate;
const enum MTLPixelFormat format[] = {MTLPixelFormatR8Unorm, MTLPixelFormatRG8Unorm, MTLPixelFormatRGBA8Unorm};
const NSUInteger len[] = {TEX_LEN_1, TEX_LEN_2, TEX_LEN_4};
for (NSUInteger i = 3, off = 0; i--;) {
descriptor.pixelFormat = format[i];
const NSUInteger l = descriptor.arrayLength = len[i];
const id<MTLTexture> texture = [Device newTextureWithDescriptor:descriptor];
if (!texture)
return EXIT_FAILURE;
const NSUInteger br = 32<<i, bi = 1024<<i;
for (NSUInteger j = 0; j < l; off += bi)
[encoder copyFromBuffer:data sourceOffset:off sourceBytesPerRow:br sourceBytesPerImage:bi sourceSize:(const MTLSize){32, 32, 1} toTexture:texture destinationSlice:j++ destinationLevel:0 destinationOrigin:(const MTLOrigin){0}];
[encoder generateMipmapsForTexture:BlockTexture[i] = texture];
}
}
[encoder endEncoding];
[buffer commit];
}
// Rest of code to initialize application (omitted)
}
In this case, the command will fail if the size of the actual Content file is less than 4097 bytes, assuming a 4096 page size. What is the most strange is that neither the mmap() nor the newBufferWithBytesNoCopy fails in this case, but the GPU execution fails so badly that any/all subsequent GPU calls also fail.
Is there a way to cause predictable behavior? I thought that mmap() space beyond the file was just valid 0 memory. Why is this apparently not the case if the space is being used by the GPU? At the very least, how can I detect GPU execution errors or invalid buffers like this to handle them gracefully, besides manually checking if the file is too small? Am I using these functions incorrectly somehow? Is something here undefined behavior?
My research efforts including Google searching for terms such as newBufferWithBytesNoCopy and/or mmap together with 536870211, and got absolutely no results. Now, this question is the only result for such searches.
My guess is this problem has to do with the inner workings of the GPU and/or the MTLBuffer implementation and/or mmap() and its underlying facilities. Not having access to these inner workings, I have no idea how to even start figuring out a solution. I would appreciate an expert to enlighten me as to what is actually going on behind the scenes causing this error, and how to avoid it (besides manually checking if the file is too big, as this is a 'workaround' but does not really fix the problem at its base, or at the very least how to gracefully detect GPU crashes of this type and abort the application gracefully.

How to setup MTLTexture and MTLBuffers for Metal Performance Shaders Find Keypoints

Problem
I'm trying out the performance shaders for the first time and encountered a runtime problem. The MTLTexture that MTKTextureLoader returns seems to be uncompatible with Metal Performance Shaders' MPSImageFindKeypoints encoder.
The only hint so far that I found is from #warrenm's sample code on MPS that specifies MTKTextureLoaderOptions just like I did. I did not find any other mentions in the docs.
Any help is highly appreciated.
Error
/BuildRoot/Library/Caches/com.apple.xbs/Sources/MetalImage/MetalImage-121.0.2/MPSImage/Filters/MPSKeypoint.mm:166: failed assertion `Source 0x282ce8fc0 texture type (80) is unsupported
where 0x282ce8fc0 is the MTLTexture from the texture loader.
As far as I could see there is no MTLTexture type 80, the enum ranges up to 8 or so (not hex).
Code
CGFloat w = CGImageGetWidth(_image);
CGFloat h = CGImageGetHeight(_image);
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
id<MTLCommandQueue> commandQueue = [device newCommandQueue];
NSDictionary* textureOptions = #{ MTKTextureLoaderOptionSRGB: [[NSNumber alloc] initWithBool:NO] };
id<MTLTexture> texture = [[[MTKTextureLoader alloc] initWithDevice:device] newTextureWithCGImage:_image
options:textureOptions
error:nil];
id<MTLBuffer> keypointDataBuffer;
id<MTLBuffer> keypointCountBuffer;
MTLRegion region = MTLRegionMake2D(0, 0, w, h);
id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];
MPSImageKeypointRangeInfo rangeInfo = {100,0.5};
MPSImageFindKeypoints* imageFindKeypoints = [[MPSImageFindKeypoints alloc] initWithDevice:device
info:&rangeInfo];
[imageFindKeypoints encodeToCommandBuffer:commandBuffer
sourceTexture:texture
regions:&region
numberOfRegions:1
keypointCountBuffer:keypointCountBuffer
keypointCountBufferOffset:0
keypointDataBuffer:keypointDataBuffer
keypointDataBufferOffset:0];
[commandBuffer commit];
NSLog(keypointCountBuffer);
NSLog(keypointDataBuffer);
Edit
After converting my image to the correct pixel format I am now initialising the buffers like so:
id<MTLBuffer> keypointDataBuffer = [device newBufferWithLength:maxKeypoints*(sizeof(MPSImageKeypointData)) options:MTLResourceOptionCPUCacheModeDefault];
id<MTLBuffer> keypointCountBuffer = [device newBufferWithLength:sizeof(int) options:MTLResourceOptionCPUCacheModeDefault];
There is no error anymore. But how can I reading the contents now?
((MPSImageKeypointData*)[keypointDataBuffer contents])[0].keypointCoordinate returns (0,0) for all indexes. Also I don't know how to read the keypointsCountBuffer. The buffer contents converted to an int value show a higher value than the defined maxKeypoints. I don't see where the docs say what kind of format the count buffer has.
Finally the code is running and just for completeness sake I thought I should post the whole code as an answer
code
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
id<MTLCommandQueue> commandQueue = [device newCommandQueue];
// init textures
NSDictionary* textureOptions = #{ MTKTextureLoaderOptionSRGB: [[NSNumber alloc] initWithBool:NO] };
id<MTLTexture> texture = [[[MTKTextureLoader alloc] initWithDevice:device] newTextureWithCGImage:_lopoImage
options:textureOptions
error:nil];
MTLTextureDescriptor *descriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:(MTLPixelFormatR8Unorm) width:w height:h mipmapped:NO];
descriptor.usage = (MTLTextureUsageShaderRead | MTLTextureUsageShaderWrite);
id<MTLTexture> unormTexture = [device newTextureWithDescriptor:descriptor];
// init arrays and buffers for keypoint finder
int maxKeypoints = w*h;
id<MTLBuffer> keypointDataBuffer = [device newBufferWithLength:sizeof(MPSImageKeypointData)*maxKeypoints options:MTLResourceOptionCPUCacheModeWriteCombined];
id<MTLBuffer> keypointCountBuffer = [device newBufferWithLength:sizeof(int) options:MTLResourceOptionCPUCacheModeWriteCombined];
MTLRegion region = MTLRegionMake2D(0, 0, w, h);
// init colorspace converter
CGColorSpaceRef srcColorSpace = CGColorSpaceCreateWithName(kCGColorSpaceSRGB);
CGColorSpaceRef dstColorSpace = CGColorSpaceCreateWithName(kCGColorSpaceLinearGray);
CGColorConversionInfoRef conversionInfo = CGColorConversionInfoCreate(srcColorSpace, dstColorSpace);
MPSImageConversion *conversion = [[MPSImageConversion alloc] initWithDevice:device
srcAlpha:(MPSAlphaTypeAlphaIsOne)
destAlpha:(MPSAlphaTypeNonPremultiplied)
backgroundColor:nil
conversionInfo:conversionInfo];
// init keypoint finder
MPSImageKeypointRangeInfo rangeInfo = {maxKeypoints,0.75};
MPSImageFindKeypoints* imageFindKeypoints = [[MPSImageFindKeypoints alloc] initWithDevice:device
info:&rangeInfo];
// encode command buffer
id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];
[conversion encodeToCommandBuffer:commandBuffer sourceTexture:texture destinationTexture:unormTexture];
[imageFindKeypoints encodeToCommandBuffer:commandBuffer
sourceTexture:unormTexture
regions:&region
numberOfRegions:1
keypointCountBuffer:keypointCountBuffer
keypointCountBufferOffset:0
keypointDataBuffer:keypointDataBuffer
keypointDataBufferOffset:0];
// run command buffer
[commandBuffer commit];
[commandBuffer waitUntilCompleted];
// read keypoints
int count = ((int*)[keypointCountBuffer contents])[0];
MPSImageKeypointData* keypointDataArray = ((MPSImageKeypointData*)[keypointDataBuffer contents]);
for (int i = 0 ; i<count;i++) {
simd_ushort2 coordinate = keypointDataArray[i].keypointCoordinate;
NSLog(#"color:%f | at:(%u,%u)", keypointDataArray[i].keypointColorValue, coordinate[0], coordinate[1] );
}
I guess there should be a more clever way to allocate the keypoint buffers with [device newBufferWithBytesNoCopy] so then you would not need to copy the contents back into your allocated arrays. It just didn't figure out to correctly align the buffer.
Also I should mention that I guess usually you will have a grayscale texture after any kind of feature detection so that the image converting part will not be necessary.

CGImageRef faster way to access pixel data?

My current method is:
CGDataProviderRef provider = CGImageGetDataProvider(imageRef);
imageData.rawData = CGDataProviderCopyData(provider);
imageData.imageData = (UInt8 *) CFDataGetBytePtr(imageData.rawData);
I only get about 30 frames per second. I know part of the performance hit is copying the data, it'd be nice if I could just have access to the stream of bytes and not have it automatically create a copy for me.
I'm trying to get it to process CGImageRefs as fast as possible, is there a faster way?
Here's my working solutions snippet:
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification {
// Insert code here to initialize your application
//timer = [NSTimer scheduledTimerWithTimeInterval:1.0/60.0 //2000.0
// target:self
// selector:#selector(timerLogic)
// userInfo:nil
// repeats:YES];
leagueGameState = [LeagueGameState new];
[self updateWindowList];
lastTime = CACurrentMediaTime();
// Create a capture session
mSession = [[AVCaptureSession alloc] init];
// Set the session preset as you wish
mSession.sessionPreset = AVCaptureSessionPresetMedium;
// If you're on a multi-display system and you want to capture a secondary display,
// you can call CGGetActiveDisplayList() to get the list of all active displays.
// For this example, we just specify the main display.
// To capture both a main and secondary display at the same time, use two active
// capture sessions, one for each display. On Mac OS X, AVCaptureMovieFileOutput
// only supports writing to a single video track.
CGDirectDisplayID displayId = kCGDirectMainDisplay;
// Create a ScreenInput with the display and add it to the session
AVCaptureScreenInput *input = [[AVCaptureScreenInput alloc] initWithDisplayID:displayId];
input.minFrameDuration = CMTimeMake(1, 60);
//if (!input) {
// [mSession release];
// mSession = nil;
// return;
//}
if ([mSession canAddInput:input]) {
NSLog(#"Added screen capture input");
[mSession addInput:input];
} else {
NSLog(#"Couldn't add screen capture input");
}
//**********************Add output here
//dispatch_queue_t _videoDataOutputQueue;
//_videoDataOutputQueue = dispatch_queue_create( "com.apple.sample.capturepipeline.video", DISPATCH_QUEUE_SERIAL );
//dispatch_set_target_queue( _videoDataOutputQueue, dispatch_get_global_queue( DISPATCH_QUEUE_PRIORITY_HIGH, 0 ) );
AVCaptureVideoDataOutput *videoOut = [[AVCaptureVideoDataOutput alloc] init];
videoOut.videoSettings = #{ (id)kCVPixelBufferPixelFormatTypeKey : #(kCVPixelFormatType_32BGRA) };
[videoOut setSampleBufferDelegate:self queue:dispatch_get_main_queue()];
// RosyWriter records videos and we prefer not to have any dropped frames in the video recording.
// By setting alwaysDiscardsLateVideoFrames to NO we ensure that minor fluctuations in system load or in our processing time for a given frame won't cause framedrops.
// We do however need to ensure that on average we can process frames in realtime.
// If we were doing preview only we would probably want to set alwaysDiscardsLateVideoFrames to YES.
videoOut.alwaysDiscardsLateVideoFrames = YES;
if ( [mSession canAddOutput:videoOut] ) {
NSLog(#"Added output video");
[mSession addOutput:videoOut];
} else {NSLog(#"Couldn't add output video");}
// Start running the session
[mSession startRunning];
NSLog(#"Set up session");
}
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
//NSLog(#"Captures output from sample buffer");
//CMFormatDescriptionRef formatDescription = CMSampleBufferGetFormatDescription( sampleBuffer );
/*
if ( self.outputVideoFormatDescription == nil ) {
// Don't render the first sample buffer.
// This gives us one frame interval (33ms at 30fps) for setupVideoPipelineWithInputFormatDescription: to complete.
// Ideally this would be done asynchronously to ensure frames don't back up on slower devices.
[self setupVideoPipelineWithInputFormatDescription:formatDescription];
}
else {*/
[self renderVideoSampleBuffer:sampleBuffer];
//}
}
- (void)renderVideoSampleBuffer:(CMSampleBufferRef)sampleBuffer
{
//CVPixelBufferRef renderedPixelBuffer = NULL;
//CMTime timestamp = CMSampleBufferGetPresentationTimeStamp( sampleBuffer );
//[self calculateFramerateAtTimestamp:timestamp];
// We must not use the GPU while running in the background.
// setRenderingEnabled: takes the same lock so the caller can guarantee no GPU usage once the setter returns.
//#synchronized( _renderer )
//{
// if ( _renderingEnabled ) {
CVPixelBufferRef sourcePixelBuffer = CMSampleBufferGetImageBuffer( sampleBuffer );
const int kBytesPerPixel = 4;
CVPixelBufferLockBaseAddress( sourcePixelBuffer, 0 );
int bufferWidth = (int)CVPixelBufferGetWidth( sourcePixelBuffer );
int bufferHeight = (int)CVPixelBufferGetHeight( sourcePixelBuffer );
size_t bytesPerRow = CVPixelBufferGetBytesPerRow( sourcePixelBuffer );
uint8_t *baseAddress = CVPixelBufferGetBaseAddress( sourcePixelBuffer );
int count = 0;
for ( int row = 0; row < bufferHeight; row++ )
{
uint8_t *pixel = baseAddress + row * bytesPerRow;
for ( int column = 0; column < bufferWidth; column++ )
{
count ++;
pixel[1] = 0; // De-green (second pixel in BGRA is green)
pixel += kBytesPerPixel;
}
}
CVPixelBufferUnlockBaseAddress( sourcePixelBuffer, 0 );
//NSLog(#"Test Looped %d times", count);
CIImage *ciImage = [CIImage imageWithCVImageBuffer:sourcePixelBuffer];
/*
CIContext *temporaryContext = [CIContext contextWithCGContext:
[[NSGraphicsContext currentContext] graphicsPort]
options: nil];
CGImageRef videoImage = [temporaryContext
createCGImage:ciImage
fromRect:CGRectMake(0, 0,
CVPixelBufferGetWidth(sourcePixelBuffer),
CVPixelBufferGetHeight(sourcePixelBuffer))];
*/
//UIImage *uiImage = [UIImage imageWithCGImage:videoImage];
// Create a bitmap rep from the image...
NSBitmapImageRep *bitmapRep = [[NSBitmapImageRep alloc] initWithCIImage:ciImage];
// Create an NSImage and add the bitmap rep to it...
NSImage *image = [[NSImage alloc] init];
[image addRepresentation:bitmapRep];
// Set the output view to the new NSImage.
[imageView setImage:image];
//CGImageRelease(videoImage);
//renderedPixelBuffer = [_renderer copyRenderedPixelBuffer:sourcePixelBuffer];
// }
// else {
// return;
// }
//}
//Profile code? See how fast it's running?
if (CACurrentMediaTime() - lastTime > 3) //10 seconds
{
float time = CACurrentMediaTime() - lastTime;
[fpsText setStringValue:[NSString stringWithFormat:#"Elapsed Time: %f ms, %f fps", time * 1000 / loopsTaken, (1000.0)/(time * 1000.0 / loopsTaken)]];
lastTime = CACurrentMediaTime();
loopsTaken = 0;
[self updateWindowList];
if (leagueGameState.leaguePID == -1) {
[statusText setStringValue:#"No League Instance Found"];
}
}
else
{
loopsTaken++;
}
}
I get a very nice 60 frames per second even after looping through the data.
It captures the screen, I get the data, I modify the data and I re-show the data.
Which "stream of bytes" do you mean? CGImage represents the final bitmap data, but under the hood it may still be compressed. The bitmap may currently be stored on the GPU, so getting to it might require a GPU->CPU fetch (which is expensive, and should be avoided when you don't need it).
If you're trying to do this at greater than 30fps, you may want to rethink how you're attacking the problem, and use tools designed for that, like Core Image, Core Video, or Metal. Core Graphics is optimized for display, not processing (and definitely not real-time processing). A key difference in tools like Core Image is that you can perform more of your work on the GPU without shuffling data back to the CPU. This is absolutely critical for maintaining fast pipelines. Whenever possible, you want to avoid getting the actual bytes.
If you have a CGImage already, you can convert it to a CIImage with imageWithCGImage: and then use CIImage to process it further. If you really need access to the bytes, your options are the one you're using, or to render it into a bitmap context (which also will require copying) with CGContextDrawImage. There's just no promise that a CGImage has a bunch of bitmap bytes hanging around at any given time that you can look at, and it doesn't provide "lock your buffer" methods like you'll find in real-time frameworks like Core Video.
Some very good introductions to high-speed image processing from WWDC videos:
WWDC 2013 Session 509 Core Image Effects and Techniques
WWDC 2014 Session 514 Advances in Core Image
WWDC 2014 Sessions 603-605 Working with Metal

reading samples with AVAssetReader and timeRange in real time

Previously I read audio samples from a complete audio file using CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer. Right now I would like to do the same using ranges (ie i specify the range in time.. read a small chunk of audio as per the time, and then go back and read again). The reason why I want to use time range is b/c I want to control the size of each read (to fit in a packet with a max size).
for some reason, there is always a bump between each read. In my code you'll notice that I start the AVAssetReader and end it every time I set a time range, and that's b/c I cannot dynamically adjust the time range after the reader has started (see here for more details).
Could it be that starting and ending a reader is just too expensive to produce a continuous real time experience? Or are there other ways of doing this that I'm not aware of?
Also note that this jitter or lag happens at whatever point I set the time interval to be.. which makes me believe that starting and ending a reader the way I am is too expensive for real time audio playback.
- (void) setupReader
{
NSURL *assetURL = [NSURL URLWithString:#"ipod-library://item/item.m4a?id=1053020204400037178"];
songAsset = [AVURLAsset URLAssetWithURL:assetURL options:nil];
track = [songAsset.tracks objectAtIndex:0];
nativeTrackASBD = [self getTrackNativeSettings:track];
// set CM time parameters
assetCMTime = songAsset.duration;
CMTimeReadDurationInSeconds = CMTimeMakeWithSeconds(1, assetCMTime.timescale);
currentCMTime = CMTimeMake(0,assetCMTime.timescale);
}
-(void)readVBRPackets
{
// make sure assetCMTime is greater than currentCMTime
while (CMTimeCompare(assetCMTime,currentCMTime) == 1 )
{
NSError * error = nil;
reader = [[AVAssetReader alloc] initWithAsset:songAsset error:&error];
readerOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:track
outputSettings:nil];
[reader addOutput:readerOutput];
reader.timeRange = CMTimeRangeMake(currentCMTime, CMTimeReadDurationInSeconds);
[reader startReading];
while ((sample = [readerOutput copyNextSampleBuffer])) {
CMItemCount numSamples = CMSampleBufferGetNumSamples(sample);
if (numSamples == 0) {
continue;
}
NSLog(#"reading sample");
CMBlockBufferRef CMBuffer = CMSampleBufferGetDataBuffer( sample );
AudioBufferList audioBufferList;
OSStatus err = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
sample,
NULL,
&audioBufferList,
sizeof(audioBufferList),
NULL,
NULL,
kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
&CMBuffer
);
const AudioStreamPacketDescription * inPacketDescriptions;
size_t packetDescriptionsSizeOut;
size_t inNumberPackets;
CheckError(CMSampleBufferGetAudioStreamPacketDescriptionsPtr(sample,
&inPacketDescriptions,
&packetDescriptionsSizeOut),
"could not read sample packet descriptions");
inNumberPackets = packetDescriptionsSizeOut/sizeof(AudioStreamPacketDescription);
AudioBuffer audioBuffer = audioBufferList.mBuffers[0];
for (int i = 0; i < inNumberPackets; ++i)
{
SInt64 dataOffset = inPacketDescriptions[i].mStartOffset;
UInt32 packetSize = inPacketDescriptions[i].mDataByteSize;
size_t packetSpaceRemaining;
packetSpaceRemaining = bufferByteSize - bytesFilled;
// if the space remaining in the buffer is not
// enough for the data contained in this packet
// then just write it
if (packetSpaceRemaining < packetSize)
{
[self enqueueBuffer];
}
// copy data to the audio queue buffer
AudioQueueBufferRef fillBuf = audioQueueBuffers[fillBufferIndex];
memcpy((char*)fillBuf->mAudioData + bytesFilled,
(const char*)(audioBuffer.mData + dataOffset), packetSize);
// fill out packet description
packetDescs[packetsFilled] = inPacketDescriptions[i];
packetDescs[packetsFilled].mStartOffset = bytesFilled;
bytesFilled += packetSize;
packetsFilled += 1;
// if this is the last packet, then ship it
size_t packetsDescsRemaining = kAQMaxPacketDescs - packetsFilled;
if (packetsDescsRemaining == 0) {
[self enqueueBuffer];
}
}
CFRelease(CMBuffer);
CMSampleBufferInvalidate(sample);
CFRelease(sample);
}
[reader cancelReading];
reader = NULL;
readerOutput = NULL;
currentCMTime = CMTimeAdd(currentCMTime, CMTimeReadDurationInSeconds);
}
}
I know what happens :-D It took me near a whole day to figure it out.
In fact AVAssetReader fades the first 1024 samples (maybe a little more) in. That's why you hear the jitter effect.
I fixed it by reading 1024 samples before the position I really want to read, then skip that 1024 samples.
I hope it'll work for you also.

Setting Time Range in AVAssetReader causes freeze

So, I'm trying to do a simple calculation over previously recorded audio (from an AVAsset) in order to create a wave form visual. I currently do this by averaging a set of samples, the size of which is determined by dividing the audio file size by the resolution I want for the wave form.
This all works fine, except for one problem....it's too slow. Running on a 3GS, processing an audio file takes about 3% of the time it takes to play it, which is way to slow (for example, a 1 hour audio file takes about 2.5 minutes to process). I've tried to optimize the method as much as possible but it's not working. I'll post the code I use to process the file. Maybe someone will be able to help with that but what I'm really looking for is a way to process the file without having to go over every single byte. So, say given a resolution of 2,000 I'd want to access the file and take a sample at each of the 2,000 points. I think this would be a lot quicker, especially if the file is larger. But the only way I know to get the raw data is to access the audio file in a linear manner. Any ideas? Here's the code I use to process the file (note, all class vars begin with '_'):
So I've completely changed this question. I belatedly realized that AVAssetReader has a timeRange property that's used for "seeking", which is exactly what I was looking for (see original question above). Furthermore, the question has been asked and answered (I just didn't find it before) and I don't want to duplicate questions. However, I'm still having a problem. My app freezes for a while and then eventually crashes when ever I try to copyNextSampleBuffer. I'm not sure what's going on. I don't seem to be in any kind of recursion loop, it just never returns from the function call. Checking the logs show give me this error:
Exception Type: 00000020
Exception Codes: 0x8badf00d
Highlighted Thread: 0
Application Specific Information:
App[10570] has active assertions beyond permitted time:
{(
<SBProcessAssertion: 0xddd9300> identifier: Suspending process: App[10570] permittedBackgroundDuration: 10.000000 reason: suspend owner pid:52 preventSuspend preventThrottleDownCPU preventThrottleDownUI
)}
I use a time profiler on the app and yep, it just sits there with a minimal amount of processing. Can't quite figure out what's going on. It's important to note that this doesn't occur if I don't set the timeRange property of AVAssetReader. I've checked and the values for timeRange are valid, but setting it is causing the problem for some reason. Here's my processing code:
- (void) processSampleData{
if (!_asset || CMTimeGetSeconds(_asset.duration) <= 0) return;
NSError *error = nil;
AVAssetTrack *songTrack = _asset.tracks.firstObject;
if (!songTrack) return;
NSDictionary *outputSettingsDict = [[NSDictionary alloc] initWithObjectsAndKeys:
[NSNumber numberWithInt:kAudioFormatLinearPCM],AVFormatIDKey,
[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsBigEndianKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsNonInterleaved,
nil];
UInt32 sampleRate = 44100.0;
_channelCount = 1;
NSArray *formatDesc = songTrack.formatDescriptions;
for(unsigned int i = 0; i < [formatDesc count]; ++i) {
CMAudioFormatDescriptionRef item = (__bridge_retained CMAudioFormatDescriptionRef)[formatDesc objectAtIndex:i];
const AudioStreamBasicDescription* fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription (item);
if(fmtDesc ) {
sampleRate = fmtDesc->mSampleRate;
_channelCount = fmtDesc->mChannelsPerFrame;
}
CFRelease(item);
}
UInt32 bytesPerSample = 2 * _channelCount; //Bytes are hard coded by AVLinearPCMBitDepthKey
_normalizedMax = 0;
_sampledData = [[NSMutableData alloc] init];
SInt16 *channels[_channelCount];
char *sampleRef;
SInt16 *samples;
NSInteger sampleTally = 0;
SInt16 cTotal;
_sampleCount = DefaultSampleSize * [UIScreen mainScreen].scale;
NSTimeInterval intervalBetweenSamples = _asset.duration.value / _sampleCount;
NSTimeInterval sampleSize = fmax(100, intervalBetweenSamples / _sampleCount);
double assetTimeScale = _asset.duration.timescale;
CMTimeRange timeRange = CMTimeRangeMake(CMTimeMake(0, assetTimeScale), CMTimeMake(sampleSize, assetTimeScale));
SInt16 totals[_channelCount];
#autoreleasepool {
for (int i = 0; i < _sampleCount; i++) {
AVAssetReader *reader = [AVAssetReader assetReaderWithAsset:_asset error:&error];
AVAssetReaderTrackOutput *trackOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:songTrack outputSettings:outputSettingsDict];
[reader addOutput:trackOutput];
reader.timeRange = timeRange;
[reader startReading];
while (reader.status == AVAssetReaderStatusReading) {
CMSampleBufferRef sampleBufferRef = [trackOutput copyNextSampleBuffer];
if (sampleBufferRef){
CMBlockBufferRef blockBufferRef = CMSampleBufferGetDataBuffer(sampleBufferRef);
size_t length = CMBlockBufferGetDataLength(blockBufferRef);
int sampleCount = length / bytesPerSample;
for (int i = 0; i < sampleCount ; i += _channelCount) {
CMBlockBufferAccessDataBytes(blockBufferRef, i * bytesPerSample, _channelCount, channels, &sampleRef);
samples = (SInt16 *)sampleRef;
for (int channel = 0; channel < _channelCount; channel++)
totals[channel] += samples[channel];
sampleTally++;
}
CMSampleBufferInvalidate(sampleBufferRef);
CFRelease(sampleBufferRef);
}
}
for (int i = 0; i < _channelCount; i++){
cTotal = abs(totals[i] / sampleTally);
if (cTotal > _normalizedMax) _normalizedMax = cTotal;
[_sampledData appendBytes:&cTotal length:sizeof(cTotal)];
totals[i] = 0;
}
sampleTally = 0;
timeRange.start = CMTimeMake((intervalBetweenSamples * (i + 1)) - sampleSize, assetTimeScale); //Take the sample just before the interval
}
}
_assetNeedsProcessing = NO;
}
I finally figured out why. Apparently there is some sort of 'minimum' duration you can specify for the timeRange of an AVAssetReader. I'm not sure what exactly that minimum is, somewhere above 1,000 but less than 5,000. It's possible that the minimum changes with the duration of the asset...honestly I'm not sure. Instead, I kept the duration (which is infinity) the same and simply changed the start time. Instead of processing the whole sample, I copy only one buffer block, process that and then seek to the next time. I'm still having trouble with the code, but I'll post that as another question if I can't figure it out.