Is there a way in AppKit to measure the width of a large number of NSString objects(say a million) really fast? I have tried 3 different ways to do this:
[NSString sizeWithAttributes:]
[NSAttributedString size]
NSLayoutManager (get text width instead of height)
Here are some performance metrics
Count\Mechanism sizeWithAttributes NSAttributedString NSLayoutManager
1000 0.057 0.031 0.007
10000 0.329 0.325 0.064
100000 3.06 3.14 0.689
1000000 29.5 31.3 7.06
NSLayoutManager is clearly the way to go, but the problem being
High memory footprint(more than 1GB according to profiler) because of the creation of heavyweight NSTextStorage objects.
High creation time. All of the time taken is during creation of the above strings, which is a dealbreaker in itself.(subsequently measuring NSTextStorage objects which have glyphs created and laid out only takes about 0.0002 seconds).
7 seconds is still too slow for what I am trying to do. Is there a faster way? To measure a million strings in about a second?
In case you want to play around, Here is the github project.
Here are some ideas I haven't tried.
Use Core Text directly. The other APIs are built on top of it.
Parallelize. All modern Macs (and even all modern iOS devices) have multiple cores. Divide up the string array into several subarrays. For each subarray, submit a block to a global GCD queue. In the block, create the necessary Core Text or NSLayoutManager objects and measure the strings in the subarray. Both APIs can be used safely this way. (Core Text) (NSLayoutManager)
Regarding “High memory footprint”: Use Local Autorelease Pool Blocks to Reduce Peak Memory Footprint.
Regarding “All of the time taken is during creation of the above strings, which is a dealbreaker in itself”: Are you saying all the time is spent in these lines:
double random = (double)arc4random_uniform(1000) / 1000;
NSString *randomNumber = [NSString stringWithFormat:#"%f", random];
Formatting a floating-point number is expensive. Is this your real use case? If you just want to format a random rational of the form n/1000 for 0 ≤ n < 1000, there are faster ways. Also, in many fonts, all digits have the same width, so that it's easy to typeset columns of numbers. If you pick such a font, you can avoid measuring the strings in the first place.
UPDATE
Here's the fastest code I've come up with using Core Text. The dispatched version is almost twice as fast as the single-threaded version on my Core i7 MacBook Pro. My fork of your project is here.
static CGFloat maxWidthOfStringsUsingCTFramesetter(
NSArray *strings, NSRange range) {
NSString *bigString =
[[strings subarrayWithRange:range] componentsJoinedByString:#"\n"];
NSAttributedString *richText =
[[NSAttributedString alloc]
initWithString:bigString
attributes:#{ NSFontAttributeName: (__bridge NSFont *)font }];
CGPathRef path =
CGPathCreateWithRect(CGRectMake(0, 0, CGFLOAT_MAX, CGFLOAT_MAX), NULL);
CGFloat width = 0.0;
CTFramesetterRef setter =
CTFramesetterCreateWithAttributedString(
(__bridge CFAttributedStringRef)richText);
CTFrameRef frame =
CTFramesetterCreateFrame(
setter, CFRangeMake(0, bigString.length), path, NULL);
NSArray *lines = (__bridge NSArray *)CTFrameGetLines(frame);
for (id item in lines) {
CTLineRef line = (__bridge CTLineRef)item;
width = MAX(width, CTLineGetTypographicBounds(line, NULL, NULL, NULL));
}
CFRelease(frame);
CFRelease(setter);
CFRelease(path);
return (CGFloat)width;
}
static void test_CTFramesetter() {
runTest(__func__, ^{
return maxWidthOfStringsUsingCTFramesetter(
testStrings, NSMakeRange(0, testStrings.count));
});
}
static void test_CTFramesetter_dispatched() {
runTest(__func__, ^{
dispatch_queue_t gatherQueue = dispatch_queue_create(
"test_CTFramesetter_dispatched result-gathering queue", nil);
dispatch_queue_t runQueue =
dispatch_get_global_queue(QOS_CLASS_UTILITY, 0);
dispatch_group_t group = dispatch_group_create();
__block CGFloat gatheredWidth = 0.0;
const size_t Parallelism = 16;
const size_t totalCount = testStrings.count;
// Force unsigned long to get 64-bit math to avoid overflow for
// large totalCounts.
for (unsigned long i = 0; i < Parallelism; ++i) {
NSUInteger start = (totalCount * i) / Parallelism;
NSUInteger end = (totalCount * (i + 1)) / Parallelism;
NSRange range = NSMakeRange(start, end - start);
dispatch_group_async(group, runQueue, ^{
double width =
maxWidthOfStringsUsingCTFramesetter(testStrings, range);
dispatch_sync(gatherQueue, ^{
gatheredWidth = MAX(gatheredWidth, width);
});
});
}
dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
return gatheredWidth;
});
}
Related
I've been working on getting a clean sine wave sound that can change frequencies when different notes are played. From what I've understood, I need to resize the buffer's frameLength relative to the frequency to avoid those popping sounds caused when the frame ends on a sine's peak.
So on every iteration, I set the frameLength and then populate buffer with the signal.
AVAudioPlayerNode *audioPlayer = [[AVAudioPlayerNode alloc] init];
AVAudioPCMBuffer *buffer = [[AVAudioPCMBuffer alloc] initWithPCMFormat:[audioPlayer outputFormatForBus:0] frameCapacity:44100*10];`
while(YES){
AVAudioFrameCount frameCount = ceil(44100.0/osc.frequency);
[buffer setFrameLength:frameCount];
[audioPlayer scheduleBuffer:buffer atTime:0 options:AVAudioPlayerNodeBufferLoops completionHandler:nil];
for(int i = 0; i < [buffer frameLength]; i++){
for (int channelNumber = 0; channelNumber < channelCount ; channelNumber++) {
float * const channelBuffer = floatChannelData[channelNumber];
channelBuffer[i] = [self getSignalOnFrame:i];
}
}
}
where the signal is generated from:
(float)getSignalOnFrame:(int)i {
float sampleRate = 44100.0;
return [osc amplitude] * sinf([osc frequency] * i * 2.0 * M_PI / sampleRate);
}
The starting tone sounds fine and there are no popping sounds when notes change but the notes themselves sound like they're being turned into sawtooth waves or something.
Any ideas on what I might be missing here?
Or should I just create a whole new audioPlayer with a fresh buffer for each note played?
Thanks for any advice!
If the buffers are contiguous, then a better method to not have discontinuities in sine wave generation is to remember the phase of the sinewave at the end of one buffer, and use that phase as the starting point (angle) to generate the next buffer.
If the buffers are not contiguous, then a common way to avoid clicks is to gradually taper the first and last few milliseconds of each buffer from full gain to zero. A linear gain taper will do, but a raised cosine taper is a slightly smoother taper.
When using CFSet and CFDictionary configured with custom callbacks to use integers as their keys, I've noticed some wildly varying performance of their internal hashing implementation. I'm using 64 bit integers (int64_t) with a range of roughly 1 - 1,000,000.
While profiling my application with, I noticed that every so often, a certain combination of factors would produce unusually poor performance. Looking at Instruments, CFBasicHash was taking much longer than usual.
After a bunch of investigating, I finally narrowed things down to a set of 400,000 integers that, when added to a CFSet or CFDictionary cause terrible performance with hashing.
The hashing implementation in CFBasicHash.m is beyond my understating for a problem like this, so I was wondering if anyone had any idea why such a completely random set of integers could cause such dreadful performance.
The following test application will output an average iteration time of 37ms for adding sequential integers to a set, but an average run time of 3622ms when adding the same number of integers but from the problematic data set.
(And if you insert the same number of completely random integers, then performance is much closer to 37ms. As well, adding these problematic integers to an std::map or std:set produces acceptable performance.)
#import <Foundation/Foundation.h>
extern uint64_t dispatch_benchmark(size_t count, void (^block)(void));
int main(int argc, char *argv[]) {
#autoreleasepool {
NSString *data = [NSString stringWithContentsOfFile:#"Integers.txt" encoding:NSUTF8StringEncoding error:NULL];
NSArray *components = [data componentsSeparatedByString:#","];
NSInteger count = components.count;
int64_t *numbers = (int64_t *)malloc(sizeof(int64_t) * count);
int64_t *sequentialNumbers = (int64_t *)malloc(sizeof(int64_t) * count);
for (NSInteger c = 0; c < count; c++) {
numbers[c] = [components[c] integerValue];
sequentialNumbers[c] = c;
}
NSLog(#"Beginning test with %# numbers...", #(count));
// Test #1 - Loading sequential integers
uint64_t t1 = dispatch_benchmark(10, ^{
CFMutableSetRef mutableSetRef = CFSetCreateMutable(NULL, 0, NULL);
for (NSInteger c = 0; c < count; c++) {
CFSetAddValue(mutableSetRef, (const void *)sequentialNumbers[c]);
}
NSLog(#"Sequential iteration completed with %# items in set.", #(CFSetGetCount(mutableSetRef)));
CFRelease(mutableSetRef);
});
NSLog(#"Sequential Numbers Average Runtime: %llu ms", t1 / NSEC_PER_MSEC);
NSLog(#"-----");
// Test #2 - Loading data set
uint64_t t2 = dispatch_benchmark(10, ^{
CFMutableSetRef mutableSetRef = CFSetCreateMutable(NULL, 0, NULL);
for (NSInteger c = 0; c < count; c++) {
CFSetAddValue(mutableSetRef, (const void *)numbers[c]);
}
NSLog(#"Dataset iteration completed with %# items in set.", #(CFSetGetCount(mutableSetRef)));
CFRelease(mutableSetRef);
});
NSLog(#"Dataset Average Runtime: %llu ms", t2 / NSEC_PER_MSEC);
free(sequentialNumbers);
free(numbers);
}
}
Example output:
Sequential Numbers Average Runtime: 37 ms
Dataset Average Runtime: 3622 ms
The integers are available here:
Gist (Integers.txt) or Dropbox (Integers.txt)
Can anyone help explain what is "special" about the given integers that might cause such a degradation in the hashing implementation used by CFSet and CFDictionary?
In a GPS app that allows the user to display a list of complex location points that we call tracks on various different types of map, each track can consist of between 2k to 10k of location points. The tracks are copiously clipped, pruned and path-simplified when they are rendered on non-Google map types. This is to keep memory usage down and performance up. We typically only wind up submitting far less than a thousand (aggregate) transformed location points to the OpenGL pipeline, even in the worst cases.
In integrating the Google Maps SDK for iOS, we initially attempted to continue to leverage our own OpenGL track rendering system, but ran into issues with conflicting OpenGL context usage (rendering worked, but we couldn't get GMSMapView to and our own internal OpenGL resources to both release without someone touching deleted memory).
So we are trying to leverage the GMSPolyline constructs and just let the Google SDK do the track rendering, but we've run into major memory usage issues, and are looking for guidance in working around them.
Using Xcode Instruments, we've monitored memory usage when creating about 25 poly lines with about 23k location points total (not each). Over the course of poly line creation, app memory usage grows from about 14 MB to about 172 MB, a net peak of about 158 MB. Shortly after all the poly lines are created, memory usage finally drops back down to around 19 MB and seems stable, for a cumulative net of around 5 MB, so it seems each location point requires around 220 bytes (5 MB / 23k points) to store.
What hurts us is the peak memory usage. While our laboratory test only used 23k location points, in the real world there are often many more, and iOS seems to jettison our application after Google Maps has consumed around 450 MB on an iPhone 5 (whereas our internal poly line rendering system peaks at around 12 MB for the same test case).
Clearly the GMSPolyLine construct is not intended for the heavy weight usage that we require.
We tried wrapping some of the poly line creation loops with separate autorelease pools, and then draining those at appropriate points, but this has no impact on memory use. The peak memory use after the poly lines are created and control is returned to the main run loop didn't change at all. Later it became clear why; the Google Map system isn't releasing resources until the first DisplayLink callback after the poly lines are created.
Our next effort will be to manually throttle the amount of data we're pushing at GMSPolyline, probably using our own bounds testing, clipping, pruning & minimization, rather than relying on Google Maps to do this efficiently.
The drawback here is that it will mean many more GMSPolyline objects will be allocated and deallocated, potentially while the user is panning/zooming around the map. Each of these objects will have far fewer location points, yet still, we're concerned about unforeseen consequences of this approach, hidden overhead of many GMSPolyline allocs and deallocate.
So the question is, what is the best approach for dealing with this situation, and can someone from Google shed some light on any GMSPolyline best practices, upper bounds, bottlenecks, etc. ?
why don´t you try to use google API for direction, based on basic http requests. https://developers.google.com/maps/documentation/directions/ . (check the conditions on licensing and nº of requests).
And then plot the the data with IOS MKPolyline. i´m Sure you will have better performance. And you will only depend on google for the positioning data.
to convert the response from google API to coordinates, use the well known method (taken from other post) below:
- (NSMutableArray *)parseResponse:(NSDictionary *)response
{
NSArray *routes = [response objectForKey:#"routes"];
NSDictionary *route = [routes lastObject];
if (route) {
NSString *overviewPolyline = [[route objectForKey: #"overview_polyline"] objectForKey:#"points"];
return [self decodePolyLine:overviewPolyline];
}
return nil;
}
-(NSMutableArray *)decodePolyLine:(NSString *)encodedStr {
NSMutableString *encoded = [[NSMutableString alloc]initWithCapacity:[encodedStr length]];
[encoded appendString:encodedStr];
[encoded replaceOccurrencesOfString:#"\\\\" withString:#"\\"
options:NSLiteralSearch range:NSMakeRange(0,
[encoded length])];
NSInteger len = [encoded length];
NSInteger index = 0;
NSMutableArray *array = [[NSMutableArray alloc] init]; NSInteger lat=0;
NSInteger lng=0;
while (index < len) {
NSInteger b; NSInteger shift = 0; NSInteger result = 0; do {
b = [encoded characterAtIndex:index++] - 63; result |= (b & 0x1f) << shift;
shift += 5;
} while (b >= 0x20);
NSInteger dlat = ((result & 1) ? ~(result >> 1)
: (result >> 1)); lat += dlat;
shift = 0; result = 0; do {
b = [encoded characterAtIndex:index++] - 63; result |= (b & 0x1f) << shift;
shift += 5;
} while (b >= 0x20);
NSInteger dlng = ((result & 1) ? ~(result >> 1)
: (result >> 1)); lng += dlng;
NSNumber *latitude = [[NSNumber alloc] initWithFloat:lat * 1e-5]; NSNumber *longitude = [[NSNumber alloc] initWithFloat:lng * 1e-5];
CLLocation *location = [[CLLocation alloc] initWithLatitude: [latitude floatValue] longitude:[longitude floatValue]];
[array addObject:location]; }
return array;
}
I had a similar problem with performance on google sdk and it work for me.
So, I'm trying to do a simple calculation over previously recorded audio (from an AVAsset) in order to create a wave form visual. I currently do this by averaging a set of samples, the size of which is determined by dividing the audio file size by the resolution I want for the wave form.
This all works fine, except for one problem....it's too slow. Running on a 3GS, processing an audio file takes about 3% of the time it takes to play it, which is way to slow (for example, a 1 hour audio file takes about 2.5 minutes to process). I've tried to optimize the method as much as possible but it's not working. I'll post the code I use to process the file. Maybe someone will be able to help with that but what I'm really looking for is a way to process the file without having to go over every single byte. So, say given a resolution of 2,000 I'd want to access the file and take a sample at each of the 2,000 points. I think this would be a lot quicker, especially if the file is larger. But the only way I know to get the raw data is to access the audio file in a linear manner. Any ideas? Here's the code I use to process the file (note, all class vars begin with '_'):
So I've completely changed this question. I belatedly realized that AVAssetReader has a timeRange property that's used for "seeking", which is exactly what I was looking for (see original question above). Furthermore, the question has been asked and answered (I just didn't find it before) and I don't want to duplicate questions. However, I'm still having a problem. My app freezes for a while and then eventually crashes when ever I try to copyNextSampleBuffer. I'm not sure what's going on. I don't seem to be in any kind of recursion loop, it just never returns from the function call. Checking the logs show give me this error:
Exception Type: 00000020
Exception Codes: 0x8badf00d
Highlighted Thread: 0
Application Specific Information:
App[10570] has active assertions beyond permitted time:
{(
<SBProcessAssertion: 0xddd9300> identifier: Suspending process: App[10570] permittedBackgroundDuration: 10.000000 reason: suspend owner pid:52 preventSuspend preventThrottleDownCPU preventThrottleDownUI
)}
I use a time profiler on the app and yep, it just sits there with a minimal amount of processing. Can't quite figure out what's going on. It's important to note that this doesn't occur if I don't set the timeRange property of AVAssetReader. I've checked and the values for timeRange are valid, but setting it is causing the problem for some reason. Here's my processing code:
- (void) processSampleData{
if (!_asset || CMTimeGetSeconds(_asset.duration) <= 0) return;
NSError *error = nil;
AVAssetTrack *songTrack = _asset.tracks.firstObject;
if (!songTrack) return;
NSDictionary *outputSettingsDict = [[NSDictionary alloc] initWithObjectsAndKeys:
[NSNumber numberWithInt:kAudioFormatLinearPCM],AVFormatIDKey,
[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsBigEndianKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsNonInterleaved,
nil];
UInt32 sampleRate = 44100.0;
_channelCount = 1;
NSArray *formatDesc = songTrack.formatDescriptions;
for(unsigned int i = 0; i < [formatDesc count]; ++i) {
CMAudioFormatDescriptionRef item = (__bridge_retained CMAudioFormatDescriptionRef)[formatDesc objectAtIndex:i];
const AudioStreamBasicDescription* fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription (item);
if(fmtDesc ) {
sampleRate = fmtDesc->mSampleRate;
_channelCount = fmtDesc->mChannelsPerFrame;
}
CFRelease(item);
}
UInt32 bytesPerSample = 2 * _channelCount; //Bytes are hard coded by AVLinearPCMBitDepthKey
_normalizedMax = 0;
_sampledData = [[NSMutableData alloc] init];
SInt16 *channels[_channelCount];
char *sampleRef;
SInt16 *samples;
NSInteger sampleTally = 0;
SInt16 cTotal;
_sampleCount = DefaultSampleSize * [UIScreen mainScreen].scale;
NSTimeInterval intervalBetweenSamples = _asset.duration.value / _sampleCount;
NSTimeInterval sampleSize = fmax(100, intervalBetweenSamples / _sampleCount);
double assetTimeScale = _asset.duration.timescale;
CMTimeRange timeRange = CMTimeRangeMake(CMTimeMake(0, assetTimeScale), CMTimeMake(sampleSize, assetTimeScale));
SInt16 totals[_channelCount];
#autoreleasepool {
for (int i = 0; i < _sampleCount; i++) {
AVAssetReader *reader = [AVAssetReader assetReaderWithAsset:_asset error:&error];
AVAssetReaderTrackOutput *trackOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:songTrack outputSettings:outputSettingsDict];
[reader addOutput:trackOutput];
reader.timeRange = timeRange;
[reader startReading];
while (reader.status == AVAssetReaderStatusReading) {
CMSampleBufferRef sampleBufferRef = [trackOutput copyNextSampleBuffer];
if (sampleBufferRef){
CMBlockBufferRef blockBufferRef = CMSampleBufferGetDataBuffer(sampleBufferRef);
size_t length = CMBlockBufferGetDataLength(blockBufferRef);
int sampleCount = length / bytesPerSample;
for (int i = 0; i < sampleCount ; i += _channelCount) {
CMBlockBufferAccessDataBytes(blockBufferRef, i * bytesPerSample, _channelCount, channels, &sampleRef);
samples = (SInt16 *)sampleRef;
for (int channel = 0; channel < _channelCount; channel++)
totals[channel] += samples[channel];
sampleTally++;
}
CMSampleBufferInvalidate(sampleBufferRef);
CFRelease(sampleBufferRef);
}
}
for (int i = 0; i < _channelCount; i++){
cTotal = abs(totals[i] / sampleTally);
if (cTotal > _normalizedMax) _normalizedMax = cTotal;
[_sampledData appendBytes:&cTotal length:sizeof(cTotal)];
totals[i] = 0;
}
sampleTally = 0;
timeRange.start = CMTimeMake((intervalBetweenSamples * (i + 1)) - sampleSize, assetTimeScale); //Take the sample just before the interval
}
}
_assetNeedsProcessing = NO;
}
I finally figured out why. Apparently there is some sort of 'minimum' duration you can specify for the timeRange of an AVAssetReader. I'm not sure what exactly that minimum is, somewhere above 1,000 but less than 5,000. It's possible that the minimum changes with the duration of the asset...honestly I'm not sure. Instead, I kept the duration (which is infinity) the same and simply changed the start time. Instead of processing the whole sample, I copy only one buffer block, process that and then seek to the next time. I'm still having trouble with the code, but I'll post that as another question if I can't figure it out.
I am developing an iPhone game with cocos2d-iphone.
I am particularly interested in how much memory is CCSpriteFrameCache "holding" at the moment. I am wondering - is there a way to know that? Without using any Xcode tools?
Perhaps there is a variable that will already let me know an estimate memory consumption value on my app?
Generally speaking the problem you are posing is not easy to solve.
In the case of CCSpriteFrameCache, since this class contains a pointer to an NSMutableDictionary of sprite frames, which are textures, you could iterate the dictionary and accumulating the texture dimensions (multiplied by the size of each pixel).
Another approach would be converting the dictionary into NSData like this:
NSData * data = [NSPropertyListSerialization dataFromPropertyList:spriteFrameDictionary
format:NSPropertyListBinaryFormat_v1_0 errorDescription:NULL];
NSLog(#"size: %d", [data length]);
but this would require you to implement the NSCoding protocol for the CCSpriteFrame class.
About accumulating the textures size, you can multiply width by height by pixel size; the pixel size depends on the pixel format: RGBA8888 is 32 bits, RGB565 is 16 bits; also you have to take into account that open gl textures have only sizes that are power of 2: 256x256, 512x512, 1024x512 etc.
Actually if you are concerned about memory consumption of your textures, they are stored by CCTextureCache. There is a CCTextureCache (Debug) method called dumpCachedTextureInfo method in there. I have not tried it myself, but here :
#implementation CCTextureCache (Debug)
-(void) dumpCachedTextureInfo
{
NSUInteger count = 0;
NSUInteger totalBytes = 0;
for (NSString* texKey in textures_) {
CCTexture2D* tex = [textures_ objectForKey:texKey];
NSUInteger bpp = [tex bitsPerPixelForFormat];
// Each texture takes up width * height * bytesPerPixel bytes.
NSUInteger bytes = tex.pixelsWide * tex.pixelsHigh * bpp / 8;
totalBytes += bytes;
count++;
CCLOG( #"cocos2d: \"%#\" rc=%lu id=%lu %lu x %lu # %ld bpp => %lu KB",
texKey,
(long)[tex retainCount],
(long)tex.name,
(long)tex.pixelsWide,
(long)tex.pixelsHigh,
(long)bpp,
(long)bytes / 1024 );
}
CCLOG( #"cocos2d: CCTextureCache dumpDebugInfo: %ld textures, for %lu KB (%.2f MB)", (long)count, (long)totalBytes / 1024, totalBytes / (1024.0f*1024.0f));
}
You want to calculate per texture the bit format, since it is possible to store different texture formats in the cache, depending on your current needs. It will give you (last line) the summary of contents, including total memory consumed.