Previously I read audio samples from a complete audio file using CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer. Right now I would like to do the same using ranges (ie i specify the range in time.. read a small chunk of audio as per the time, and then go back and read again). The reason why I want to use time range is b/c I want to control the size of each read (to fit in a packet with a max size).
for some reason, there is always a bump between each read. In my code you'll notice that I start the AVAssetReader and end it every time I set a time range, and that's b/c I cannot dynamically adjust the time range after the reader has started (see here for more details).
Could it be that starting and ending a reader is just too expensive to produce a continuous real time experience? Or are there other ways of doing this that I'm not aware of?
Also note that this jitter or lag happens at whatever point I set the time interval to be.. which makes me believe that starting and ending a reader the way I am is too expensive for real time audio playback.
- (void) setupReader
{
NSURL *assetURL = [NSURL URLWithString:#"ipod-library://item/item.m4a?id=1053020204400037178"];
songAsset = [AVURLAsset URLAssetWithURL:assetURL options:nil];
track = [songAsset.tracks objectAtIndex:0];
nativeTrackASBD = [self getTrackNativeSettings:track];
// set CM time parameters
assetCMTime = songAsset.duration;
CMTimeReadDurationInSeconds = CMTimeMakeWithSeconds(1, assetCMTime.timescale);
currentCMTime = CMTimeMake(0,assetCMTime.timescale);
}
-(void)readVBRPackets
{
// make sure assetCMTime is greater than currentCMTime
while (CMTimeCompare(assetCMTime,currentCMTime) == 1 )
{
NSError * error = nil;
reader = [[AVAssetReader alloc] initWithAsset:songAsset error:&error];
readerOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:track
outputSettings:nil];
[reader addOutput:readerOutput];
reader.timeRange = CMTimeRangeMake(currentCMTime, CMTimeReadDurationInSeconds);
[reader startReading];
while ((sample = [readerOutput copyNextSampleBuffer])) {
CMItemCount numSamples = CMSampleBufferGetNumSamples(sample);
if (numSamples == 0) {
continue;
}
NSLog(#"reading sample");
CMBlockBufferRef CMBuffer = CMSampleBufferGetDataBuffer( sample );
AudioBufferList audioBufferList;
OSStatus err = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
sample,
NULL,
&audioBufferList,
sizeof(audioBufferList),
NULL,
NULL,
kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
&CMBuffer
);
const AudioStreamPacketDescription * inPacketDescriptions;
size_t packetDescriptionsSizeOut;
size_t inNumberPackets;
CheckError(CMSampleBufferGetAudioStreamPacketDescriptionsPtr(sample,
&inPacketDescriptions,
&packetDescriptionsSizeOut),
"could not read sample packet descriptions");
inNumberPackets = packetDescriptionsSizeOut/sizeof(AudioStreamPacketDescription);
AudioBuffer audioBuffer = audioBufferList.mBuffers[0];
for (int i = 0; i < inNumberPackets; ++i)
{
SInt64 dataOffset = inPacketDescriptions[i].mStartOffset;
UInt32 packetSize = inPacketDescriptions[i].mDataByteSize;
size_t packetSpaceRemaining;
packetSpaceRemaining = bufferByteSize - bytesFilled;
// if the space remaining in the buffer is not
// enough for the data contained in this packet
// then just write it
if (packetSpaceRemaining < packetSize)
{
[self enqueueBuffer];
}
// copy data to the audio queue buffer
AudioQueueBufferRef fillBuf = audioQueueBuffers[fillBufferIndex];
memcpy((char*)fillBuf->mAudioData + bytesFilled,
(const char*)(audioBuffer.mData + dataOffset), packetSize);
// fill out packet description
packetDescs[packetsFilled] = inPacketDescriptions[i];
packetDescs[packetsFilled].mStartOffset = bytesFilled;
bytesFilled += packetSize;
packetsFilled += 1;
// if this is the last packet, then ship it
size_t packetsDescsRemaining = kAQMaxPacketDescs - packetsFilled;
if (packetsDescsRemaining == 0) {
[self enqueueBuffer];
}
}
CFRelease(CMBuffer);
CMSampleBufferInvalidate(sample);
CFRelease(sample);
}
[reader cancelReading];
reader = NULL;
readerOutput = NULL;
currentCMTime = CMTimeAdd(currentCMTime, CMTimeReadDurationInSeconds);
}
}
I know what happens :-D It took me near a whole day to figure it out.
In fact AVAssetReader fades the first 1024 samples (maybe a little more) in. That's why you hear the jitter effect.
I fixed it by reading 1024 samples before the position I really want to read, then skip that 1024 samples.
I hope it'll work for you also.
Related
My current method is:
CGDataProviderRef provider = CGImageGetDataProvider(imageRef);
imageData.rawData = CGDataProviderCopyData(provider);
imageData.imageData = (UInt8 *) CFDataGetBytePtr(imageData.rawData);
I only get about 30 frames per second. I know part of the performance hit is copying the data, it'd be nice if I could just have access to the stream of bytes and not have it automatically create a copy for me.
I'm trying to get it to process CGImageRefs as fast as possible, is there a faster way?
Here's my working solutions snippet:
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification {
// Insert code here to initialize your application
//timer = [NSTimer scheduledTimerWithTimeInterval:1.0/60.0 //2000.0
// target:self
// selector:#selector(timerLogic)
// userInfo:nil
// repeats:YES];
leagueGameState = [LeagueGameState new];
[self updateWindowList];
lastTime = CACurrentMediaTime();
// Create a capture session
mSession = [[AVCaptureSession alloc] init];
// Set the session preset as you wish
mSession.sessionPreset = AVCaptureSessionPresetMedium;
// If you're on a multi-display system and you want to capture a secondary display,
// you can call CGGetActiveDisplayList() to get the list of all active displays.
// For this example, we just specify the main display.
// To capture both a main and secondary display at the same time, use two active
// capture sessions, one for each display. On Mac OS X, AVCaptureMovieFileOutput
// only supports writing to a single video track.
CGDirectDisplayID displayId = kCGDirectMainDisplay;
// Create a ScreenInput with the display and add it to the session
AVCaptureScreenInput *input = [[AVCaptureScreenInput alloc] initWithDisplayID:displayId];
input.minFrameDuration = CMTimeMake(1, 60);
//if (!input) {
// [mSession release];
// mSession = nil;
// return;
//}
if ([mSession canAddInput:input]) {
NSLog(#"Added screen capture input");
[mSession addInput:input];
} else {
NSLog(#"Couldn't add screen capture input");
}
//**********************Add output here
//dispatch_queue_t _videoDataOutputQueue;
//_videoDataOutputQueue = dispatch_queue_create( "com.apple.sample.capturepipeline.video", DISPATCH_QUEUE_SERIAL );
//dispatch_set_target_queue( _videoDataOutputQueue, dispatch_get_global_queue( DISPATCH_QUEUE_PRIORITY_HIGH, 0 ) );
AVCaptureVideoDataOutput *videoOut = [[AVCaptureVideoDataOutput alloc] init];
videoOut.videoSettings = #{ (id)kCVPixelBufferPixelFormatTypeKey : #(kCVPixelFormatType_32BGRA) };
[videoOut setSampleBufferDelegate:self queue:dispatch_get_main_queue()];
// RosyWriter records videos and we prefer not to have any dropped frames in the video recording.
// By setting alwaysDiscardsLateVideoFrames to NO we ensure that minor fluctuations in system load or in our processing time for a given frame won't cause framedrops.
// We do however need to ensure that on average we can process frames in realtime.
// If we were doing preview only we would probably want to set alwaysDiscardsLateVideoFrames to YES.
videoOut.alwaysDiscardsLateVideoFrames = YES;
if ( [mSession canAddOutput:videoOut] ) {
NSLog(#"Added output video");
[mSession addOutput:videoOut];
} else {NSLog(#"Couldn't add output video");}
// Start running the session
[mSession startRunning];
NSLog(#"Set up session");
}
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
//NSLog(#"Captures output from sample buffer");
//CMFormatDescriptionRef formatDescription = CMSampleBufferGetFormatDescription( sampleBuffer );
/*
if ( self.outputVideoFormatDescription == nil ) {
// Don't render the first sample buffer.
// This gives us one frame interval (33ms at 30fps) for setupVideoPipelineWithInputFormatDescription: to complete.
// Ideally this would be done asynchronously to ensure frames don't back up on slower devices.
[self setupVideoPipelineWithInputFormatDescription:formatDescription];
}
else {*/
[self renderVideoSampleBuffer:sampleBuffer];
//}
}
- (void)renderVideoSampleBuffer:(CMSampleBufferRef)sampleBuffer
{
//CVPixelBufferRef renderedPixelBuffer = NULL;
//CMTime timestamp = CMSampleBufferGetPresentationTimeStamp( sampleBuffer );
//[self calculateFramerateAtTimestamp:timestamp];
// We must not use the GPU while running in the background.
// setRenderingEnabled: takes the same lock so the caller can guarantee no GPU usage once the setter returns.
//#synchronized( _renderer )
//{
// if ( _renderingEnabled ) {
CVPixelBufferRef sourcePixelBuffer = CMSampleBufferGetImageBuffer( sampleBuffer );
const int kBytesPerPixel = 4;
CVPixelBufferLockBaseAddress( sourcePixelBuffer, 0 );
int bufferWidth = (int)CVPixelBufferGetWidth( sourcePixelBuffer );
int bufferHeight = (int)CVPixelBufferGetHeight( sourcePixelBuffer );
size_t bytesPerRow = CVPixelBufferGetBytesPerRow( sourcePixelBuffer );
uint8_t *baseAddress = CVPixelBufferGetBaseAddress( sourcePixelBuffer );
int count = 0;
for ( int row = 0; row < bufferHeight; row++ )
{
uint8_t *pixel = baseAddress + row * bytesPerRow;
for ( int column = 0; column < bufferWidth; column++ )
{
count ++;
pixel[1] = 0; // De-green (second pixel in BGRA is green)
pixel += kBytesPerPixel;
}
}
CVPixelBufferUnlockBaseAddress( sourcePixelBuffer, 0 );
//NSLog(#"Test Looped %d times", count);
CIImage *ciImage = [CIImage imageWithCVImageBuffer:sourcePixelBuffer];
/*
CIContext *temporaryContext = [CIContext contextWithCGContext:
[[NSGraphicsContext currentContext] graphicsPort]
options: nil];
CGImageRef videoImage = [temporaryContext
createCGImage:ciImage
fromRect:CGRectMake(0, 0,
CVPixelBufferGetWidth(sourcePixelBuffer),
CVPixelBufferGetHeight(sourcePixelBuffer))];
*/
//UIImage *uiImage = [UIImage imageWithCGImage:videoImage];
// Create a bitmap rep from the image...
NSBitmapImageRep *bitmapRep = [[NSBitmapImageRep alloc] initWithCIImage:ciImage];
// Create an NSImage and add the bitmap rep to it...
NSImage *image = [[NSImage alloc] init];
[image addRepresentation:bitmapRep];
// Set the output view to the new NSImage.
[imageView setImage:image];
//CGImageRelease(videoImage);
//renderedPixelBuffer = [_renderer copyRenderedPixelBuffer:sourcePixelBuffer];
// }
// else {
// return;
// }
//}
//Profile code? See how fast it's running?
if (CACurrentMediaTime() - lastTime > 3) //10 seconds
{
float time = CACurrentMediaTime() - lastTime;
[fpsText setStringValue:[NSString stringWithFormat:#"Elapsed Time: %f ms, %f fps", time * 1000 / loopsTaken, (1000.0)/(time * 1000.0 / loopsTaken)]];
lastTime = CACurrentMediaTime();
loopsTaken = 0;
[self updateWindowList];
if (leagueGameState.leaguePID == -1) {
[statusText setStringValue:#"No League Instance Found"];
}
}
else
{
loopsTaken++;
}
}
I get a very nice 60 frames per second even after looping through the data.
It captures the screen, I get the data, I modify the data and I re-show the data.
Which "stream of bytes" do you mean? CGImage represents the final bitmap data, but under the hood it may still be compressed. The bitmap may currently be stored on the GPU, so getting to it might require a GPU->CPU fetch (which is expensive, and should be avoided when you don't need it).
If you're trying to do this at greater than 30fps, you may want to rethink how you're attacking the problem, and use tools designed for that, like Core Image, Core Video, or Metal. Core Graphics is optimized for display, not processing (and definitely not real-time processing). A key difference in tools like Core Image is that you can perform more of your work on the GPU without shuffling data back to the CPU. This is absolutely critical for maintaining fast pipelines. Whenever possible, you want to avoid getting the actual bytes.
If you have a CGImage already, you can convert it to a CIImage with imageWithCGImage: and then use CIImage to process it further. If you really need access to the bytes, your options are the one you're using, or to render it into a bitmap context (which also will require copying) with CGContextDrawImage. There's just no promise that a CGImage has a bunch of bitmap bytes hanging around at any given time that you can look at, and it doesn't provide "lock your buffer" methods like you'll find in real-time frameworks like Core Video.
Some very good introductions to high-speed image processing from WWDC videos:
WWDC 2013 Session 509 Core Image Effects and Techniques
WWDC 2014 Session 514 Advances in Core Image
WWDC 2014 Sessions 603-605 Working with Metal
The creator of novacaine offered example code where audio data is read from a a file and fed to a ring buffer. When the file reader is created though, the output is forced to be PCM:
- (id)initWithAudioFileURL:(NSURL *)urlToAudioFile samplingRate:(float)thisSamplingRate numChannels:(UInt32)thisNumChannels
{
...
// We're going to impose a format upon the input file
// Single-channel float does the trick.
_outputFormat.mSampleRate = self.samplingRate;
_outputFormat.mFormatID = kAudioFormatLinearPCM;
_outputFormat.mFormatFlags = kAudioFormatFlagIsFloat;
_outputFormat.mBytesPerPacket = 4*self.numChannels;
_outputFormat.mFramesPerPacket = 1;
_outputFormat.mBytesPerFrame = 4*self.numChannels;
_outputFormat.mChannelsPerFrame = self.numChannels;
_outputFormat.mBitsPerChannel = 32;
}
I'm trying to contribute to the novacaine project by allowing it to
Read from the iPod library (which can only be accessed via AVAssetReader, rather than the audio file services library)
Read and write VBR packets rather than PCM.
So this is what my equivalent function of the above looks like (see the NOTE: parts)
- (id)initWithAudioAssetURL:(NSURL *)urlToAsset samplingRate:(float)thisSamplingRate numChannels:(UInt32)thisNumChannels
{
self = [super init];
if (self)
{
// Zero-out our timer, so we know we're not using our callback yet
self.callbackTimer = nil;
// Open a reference to the audio Asset Track and setup the reader
self.assetURL = urlToAsset;
AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:self.assetURL options:nil];
NSError * error = nil;
AVAssetReader* reader = [[AVAssetReader alloc] initWithAsset:songAsset error:&error];
AVAssetTrack* track = [songAsset.tracks objectAtIndex:0];
//NOTE: we use the track's native settings here, as opposed to forcing it to be PCM
//like the example above
_readerOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:track
outputSettings:NULL];
_nativeTrackASBD = [self getTrackNativeSettings:track];
[reader addOutput:_readerOutput];
[reader startReading];
// Set a few defaults and presets
self.samplingRate = thisSamplingRate;
self.numChannels = thisNumChannels;
self.latency = .011609977; // 512 samples / ( 44100 samples / sec ) default
// Arbitrary buffer sizes that don't matter so much as long as they're "big enough"
self.outputBufferSize = 100000; //buffer sample sizes vary around 60-70k, we keep it # 100k to be safe
self.numSamplesReadPerPacket = 8192;
self.desiredPrebufferedSamples = self.numSamplesReadPerPacket*2;
//NOTE: these buffers are float, where as the above audio code is in SInt16
self.outputBuffer = (float *)calloc(2*self.samplingRate, sizeof(float));
self.holdingBuffer = (float *)calloc(2*self.samplingRate, sizeof(float));
// Allocate a ring buffer (this is what's going to buffer our audio)
ringBuffer = new RingBuffer(self.outputBufferSize, self.numChannels);
// Fill up the buffers, so we're ready to play immediately
[self bufferNewAudioFromAsset];
}
return self;
}
Looking at the code, it seems that everything is made in float (the audio buffers, the output format etc).. Is there a reason for this? (Keep in mind that the iOS audio canonical format is in SInt16, not float).. for example see in the Novocaine::renderCallback function:
else if ( sm.numBytesPerSample == 2 ) // then we need to convert SInt16 -> Float (and also scale)
{
float scale = (float)INT16_MAX;
vDSP_vsmul(sm.outData, 1, &scale, sm.outData, 1, inNumberFrames*sm.numOutputChannels);
for (int iBuffer=0; iBuffer < ioData->mNumberBuffers; ++iBuffer) {
int thisNumChannels = ioData->mBuffers[iBuffer].mNumberChannels;
for (int iChannel = 0; iChannel < thisNumChannels; ++iChannel) {
vDSP_vfix16(sm.outData+iChannel, sm.numOutputChannels, (SInt16 *)ioData->mBuffers[iBuffer].mData+iChannel, thisNumChannels, inNumberFrames);
}
}
}
What are the list of things that I gotta change to make this library compatible with reading and writing VBR Data?
A
I've been pulling my hair out over this.
I've found a few things here, but nothing actually seems to work. And the documentation is really limited.
What I'm trying to figure out here is how to get the start time code of a Quicktime movie in Objective-C from the timecode track, and getting a human-readable output from that.
I've found this:
SMPTE TimeCode from Quick Time
It works perfectly in 32-bit mode. But it doesn't work in 64-bit mode because of the Quicktime API. The software I need to incorporate it into already has been and must continue to run 64-bit.
I'm losing my mind here. Anyone out there know about these APIs?
Ultimately, the goal here is to figure out the start timecode of the Quicktime because its needed to set the OFFSET in FCP-X XML files. Without it, the video files are brought in without audio (or, really, its just slipped a lot).
Use AVFoundation framework instead of QuickTime. The player initialisation is well explained in the documentation: https://developer.apple.com/library/mac/#documentation/AudioVideo/Conceptual/AVFoundationPG/Articles/02_Playback.html#//apple_ref/doc/uid/TP40010188-CH3-SW2
Once your AVAsset is loaded in memory, you can extract the first sample frame number (timeStampFrame) by reading the content of the timecode track if present:
long timeStampFrame = 0;
for (AVAssetTrack * track in [_asset tracks]) {
if ([[track mediaType] isEqualToString:AVMediaTypeTimecode]) {
AVAssetReader *assetReader = [AVAssetReader assetReaderWithAsset:_asset error:nil];
AVAssetReaderTrackOutput *assetReaderOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:track outputSettings:nil];
if ([assetReader canAddOutput:assetReaderOutput]) {
[assetReader addOutput:assetReaderOutput];
if ([assetReader startReading] == YES) {
int count = 0;
while ( [assetReader status]==AVAssetReaderStatusReading ) {
CMSampleBufferRef sampleBuffer = [assetReaderOutput copyNextSampleBuffer];
if (sampleBuffer == NULL) {
if ([assetReader status] == AVAssetReaderStatusFailed)
break;
else
continue;
}
count++;
CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
size_t length = CMBlockBufferGetDataLength(blockBuffer);
if (length>0) {
unsigned char *buffer = malloc(length);
memset(buffer, 0, length);
CMBlockBufferCopyDataBytes(blockBuffer, 0, length, buffer);
for (int i=0; i<length; i++) {
timeStampFrame = (timeStampFrame << 8) + buffer[i];
}
free(buffer);
}
CFRelease(sampleBuffer);
}
if (count == 0) {
NSLog(#"No sample in the timecode track: %#", [assetReader error]);
}
NSLog(#"Processed %d sample", count);
}
}
if ([assetReader status] != AVAssetReaderStatusCompleted)
[assetReader cancelReading];
}
}
This is a little more tricky than the QuickTime API and there must be some improvement to the code above but it works for me.
Currently, I'm doing a little test project to see if I can get samples from an AVAssetReader to play back using an AudioQueue on iOS.
I've read this:
( Play raw uncompressed sound with AudioQueue, no sound )
and this: ( How to correctly read decoded PCM samples on iOS using AVAssetReader -- currently incorrect decoding ),
Which both actually did help. Before reading, I was getting no sound at all. Now, I'm getting sound, but the audio is playing SUPER fast. This is my first foray into audio programming, so any help is greatly appreciated.
I initialize the reader thusly:
NSDictionary * outputSettings = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
[NSNumber numberWithFloat:44100.0], AVSampleRateKey,
[NSNumber numberWithInt:2], AVNumberOfChannelsKey,
[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
[NSNumber numberWithBool:NO], AVLinearPCMIsFloatKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
nil];
output = [[AVAssetReaderAudioMixOutput alloc] initWithAudioTracks:uasset.tracks audioSettings:outputSettings];
[reader addOutput:output];
...
And I grab the data thusly:
CMSampleBufferRef ref= [output copyNextSampleBuffer];
// NSLog(#"%#",ref);
if(ref==NULL)
return;
//copy data to file
//read next one
AudioBufferList audioBufferList;
NSMutableData *data = [NSMutableData data];
CMBlockBufferRef blockBuffer;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(ref, NULL, &audioBufferList, sizeof(audioBufferList), NULL, NULL, 0, &blockBuffer);
// NSLog(#"%#",blockBuffer);
if(blockBuffer==NULL)
{
[data release];
return;
}
if(&audioBufferList==NULL)
{
[data release];
return;
}
//stash data in same object
for( int y=0; y<audioBufferList.mNumberBuffers; y++ )
{
// NSData* throwData;
AudioBuffer audioBuffer = audioBufferList.mBuffers[y];
[self.delegate streamer:self didGetAudioBuffer:audioBuffer];
/*
Float32 *frame = (Float32*)audioBuffer.mData;
throwData = [NSData dataWithBytes:audioBuffer.mData length:audioBuffer.mDataByteSize];
[self.delegate streamer:self didGetAudioBuffer:throwData];
[data appendBytes:audioBuffer.mData length:audioBuffer.mDataByteSize];
*/
}
which eventually brings us to the audio queue, set up in this way:
//Apple's own code for canonical PCM
audioDesc.mSampleRate = 44100.0;
audioDesc.mFormatID = kAudioFormatLinearPCM;
audioDesc.mFormatFlags = kAudioFormatFlagsAudioUnitCanonical;
audioDesc.mBytesPerPacket = 2 * sizeof (AudioUnitSampleType); // 8
audioDesc.mFramesPerPacket = 1;
audioDesc.mBytesPerFrame = 1 * sizeof (AudioUnitSampleType); // 8
audioDesc.mChannelsPerFrame = 2;
audioDesc.mBitsPerChannel = 8 * sizeof (AudioUnitSampleType); // 32
err = AudioQueueNewOutput(&audioDesc, handler_OSStreamingAudio_queueOutput, self, NULL, NULL, 0, &audioQueue);
if(err){
#pragma warning handle error
//never errs, am using breakpoint to check
return;
}
and we enqueue thusly
while (inNumberBytes)
{
size_t bufSpaceRemaining = kAQDefaultBufSize - bytesFilled;
if (bufSpaceRemaining < inNumberBytes)
{
AudioQueueBufferRef fillBuf = audioQueueBuffer[fillBufferIndex];
fillBuf->mAudioDataByteSize = bytesFilled;
err = AudioQueueEnqueueBuffer(audioQueue, fillBuf, 0, NULL);
}
bufSpaceRemaining = kAQDefaultBufSize - bytesFilled;
size_t copySize;
if (bufSpaceRemaining < inNumberBytes)
{
copySize = bufSpaceRemaining;
}
else
{
copySize = inNumberBytes;
}
if (bytesFilled > packetBufferSize)
{
return;
}
AudioQueueBufferRef fillBuf = audioQueueBuffer[fillBufferIndex];
memcpy((char*)fillBuf->mAudioData + bytesFilled, (const char*)(inInputData + offset), copySize);
bytesFilled += copySize;
packetsFilled = 0;
inNumberBytes -= copySize;
offset += copySize;
}
}
I tried to be as code inclusive as possible so as to make it easy for everyone to point out where I'm being a moron. That being said, I have a feeling my problem occurs either in the output settings declaration of the track reader or in the actual declaration of the AudioQueue (where I describe to the queue what kind of audio I'm going to be sending it). The fact of the matter is, I don't really know mathematically how to actually generate those numbers (bytes per packet, frames per packet, what have you). An explanation of that would be greatly appreciated, and thanks for the help in advance.
Not sure how much of an answer this is, but there will be too much text and links for a comment and hopefully it will help (maybe guide you to your answer).
First off I know with my current project adjusting the sample rate will effect the speed of the sound, so you can try to play with those settings. But 44k is what I see in most default implementation including the apple example SpeakHere. However I would spend some time comparing your code to that example because there are quite a few differences. like checking before enqueueing.
First check out this posting https://stackoverflow.com/a/4299665/530933
It talks about how you need to know the audio format, specifically how many bytes in a frame, and casting appropriately
also good luck. I have had quite a few questions posted here, apple forums, and the ios forum (not the official one). With very little responses/help. To get where I am today (audio recording & streaming in ulaw) I ended up having to open an Apple Dev Support Ticket. Which prior to tackling the audio I never knew existed (dev support). One good thing is that if you have a valid dev account you get 2 incidents for free! CoreAudio is not fun. Documentation is sparse, and besides SpeakHere there are not many examples. One thing I did find is that the framework headers do have some good info and this book. Unfortunately I have only started the book otherwise I may be able to help you further.
You can also check some of my own postings which I have tried to answer to the best of my abilities.
This is my main audio question which I have spent alot of time on to compile all pertinent links and code.
using AQRecorder (audioqueue recorder example) in an objective c class
trying to use AVAssetWriter for ulaw audio (2)
For some reason, even though every example I've seen of the audio queue using LPCM had
ASBD.mBitsPerChannel = 8* sizeof (AudioUnitSampleType);
For me it turns out I needed
ASBD.mBitsPerChannel = 2*bytesPerSample;
for a description of:
ASBD.mFormatID = kAudioFormatLinearPCM;
ASBD.mFormatFlags = kAudioFormatFlagsAudioUnitCanonical;
ASBD.mBytesPerPacket = bytesPerSample;
ASBD.mBytesPerFrame = bytesPerSample;
ASBD.mFramesPerPacket = 1;
ASBD.mBitsPerChannel = 2*bytesPerSample;
ASBD.mChannelsPerFrame = 2;
ASBD.mSampleRate = 48000;
I have no idea why this works, which bothers me a great deal... but hopefully I can figure it all out eventually.
If anyone can explain to me why this works, I'd be very thankful.
So, I'm trying to do a simple calculation over previously recorded audio (from an AVAsset) in order to create a wave form visual. I currently do this by averaging a set of samples, the size of which is determined by dividing the audio file size by the resolution I want for the wave form.
This all works fine, except for one problem....it's too slow. Running on a 3GS, processing an audio file takes about 3% of the time it takes to play it, which is way to slow (for example, a 1 hour audio file takes about 2.5 minutes to process). I've tried to optimize the method as much as possible but it's not working. I'll post the code I use to process the file. Maybe someone will be able to help with that but what I'm really looking for is a way to process the file without having to go over every single byte. So, say given a resolution of 2,000 I'd want to access the file and take a sample at each of the 2,000 points. I think this would be a lot quicker, especially if the file is larger. But the only way I know to get the raw data is to access the audio file in a linear manner. Any ideas? Here's the code I use to process the file (note, all class vars begin with '_'):
So I've completely changed this question. I belatedly realized that AVAssetReader has a timeRange property that's used for "seeking", which is exactly what I was looking for (see original question above). Furthermore, the question has been asked and answered (I just didn't find it before) and I don't want to duplicate questions. However, I'm still having a problem. My app freezes for a while and then eventually crashes when ever I try to copyNextSampleBuffer. I'm not sure what's going on. I don't seem to be in any kind of recursion loop, it just never returns from the function call. Checking the logs show give me this error:
Exception Type: 00000020
Exception Codes: 0x8badf00d
Highlighted Thread: 0
Application Specific Information:
App[10570] has active assertions beyond permitted time:
{(
<SBProcessAssertion: 0xddd9300> identifier: Suspending process: App[10570] permittedBackgroundDuration: 10.000000 reason: suspend owner pid:52 preventSuspend preventThrottleDownCPU preventThrottleDownUI
)}
I use a time profiler on the app and yep, it just sits there with a minimal amount of processing. Can't quite figure out what's going on. It's important to note that this doesn't occur if I don't set the timeRange property of AVAssetReader. I've checked and the values for timeRange are valid, but setting it is causing the problem for some reason. Here's my processing code:
- (void) processSampleData{
if (!_asset || CMTimeGetSeconds(_asset.duration) <= 0) return;
NSError *error = nil;
AVAssetTrack *songTrack = _asset.tracks.firstObject;
if (!songTrack) return;
NSDictionary *outputSettingsDict = [[NSDictionary alloc] initWithObjectsAndKeys:
[NSNumber numberWithInt:kAudioFormatLinearPCM],AVFormatIDKey,
[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsBigEndianKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsNonInterleaved,
nil];
UInt32 sampleRate = 44100.0;
_channelCount = 1;
NSArray *formatDesc = songTrack.formatDescriptions;
for(unsigned int i = 0; i < [formatDesc count]; ++i) {
CMAudioFormatDescriptionRef item = (__bridge_retained CMAudioFormatDescriptionRef)[formatDesc objectAtIndex:i];
const AudioStreamBasicDescription* fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription (item);
if(fmtDesc ) {
sampleRate = fmtDesc->mSampleRate;
_channelCount = fmtDesc->mChannelsPerFrame;
}
CFRelease(item);
}
UInt32 bytesPerSample = 2 * _channelCount; //Bytes are hard coded by AVLinearPCMBitDepthKey
_normalizedMax = 0;
_sampledData = [[NSMutableData alloc] init];
SInt16 *channels[_channelCount];
char *sampleRef;
SInt16 *samples;
NSInteger sampleTally = 0;
SInt16 cTotal;
_sampleCount = DefaultSampleSize * [UIScreen mainScreen].scale;
NSTimeInterval intervalBetweenSamples = _asset.duration.value / _sampleCount;
NSTimeInterval sampleSize = fmax(100, intervalBetweenSamples / _sampleCount);
double assetTimeScale = _asset.duration.timescale;
CMTimeRange timeRange = CMTimeRangeMake(CMTimeMake(0, assetTimeScale), CMTimeMake(sampleSize, assetTimeScale));
SInt16 totals[_channelCount];
#autoreleasepool {
for (int i = 0; i < _sampleCount; i++) {
AVAssetReader *reader = [AVAssetReader assetReaderWithAsset:_asset error:&error];
AVAssetReaderTrackOutput *trackOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:songTrack outputSettings:outputSettingsDict];
[reader addOutput:trackOutput];
reader.timeRange = timeRange;
[reader startReading];
while (reader.status == AVAssetReaderStatusReading) {
CMSampleBufferRef sampleBufferRef = [trackOutput copyNextSampleBuffer];
if (sampleBufferRef){
CMBlockBufferRef blockBufferRef = CMSampleBufferGetDataBuffer(sampleBufferRef);
size_t length = CMBlockBufferGetDataLength(blockBufferRef);
int sampleCount = length / bytesPerSample;
for (int i = 0; i < sampleCount ; i += _channelCount) {
CMBlockBufferAccessDataBytes(blockBufferRef, i * bytesPerSample, _channelCount, channels, &sampleRef);
samples = (SInt16 *)sampleRef;
for (int channel = 0; channel < _channelCount; channel++)
totals[channel] += samples[channel];
sampleTally++;
}
CMSampleBufferInvalidate(sampleBufferRef);
CFRelease(sampleBufferRef);
}
}
for (int i = 0; i < _channelCount; i++){
cTotal = abs(totals[i] / sampleTally);
if (cTotal > _normalizedMax) _normalizedMax = cTotal;
[_sampledData appendBytes:&cTotal length:sizeof(cTotal)];
totals[i] = 0;
}
sampleTally = 0;
timeRange.start = CMTimeMake((intervalBetweenSamples * (i + 1)) - sampleSize, assetTimeScale); //Take the sample just before the interval
}
}
_assetNeedsProcessing = NO;
}
I finally figured out why. Apparently there is some sort of 'minimum' duration you can specify for the timeRange of an AVAssetReader. I'm not sure what exactly that minimum is, somewhere above 1,000 but less than 5,000. It's possible that the minimum changes with the duration of the asset...honestly I'm not sure. Instead, I kept the duration (which is infinity) the same and simply changed the start time. Instead of processing the whole sample, I copy only one buffer block, process that and then seek to the next time. I'm still having trouble with the code, but I'll post that as another question if I can't figure it out.