Wildly varying hashing performance with CFSet and CFDictionary on OS X - objective-c

When using CFSet and CFDictionary configured with custom callbacks to use integers as their keys, I've noticed some wildly varying performance of their internal hashing implementation. I'm using 64 bit integers (int64_t) with a range of roughly 1 - 1,000,000.
While profiling my application with, I noticed that every so often, a certain combination of factors would produce unusually poor performance. Looking at Instruments, CFBasicHash was taking much longer than usual.
After a bunch of investigating, I finally narrowed things down to a set of 400,000 integers that, when added to a CFSet or CFDictionary cause terrible performance with hashing.
The hashing implementation in CFBasicHash.m is beyond my understating for a problem like this, so I was wondering if anyone had any idea why such a completely random set of integers could cause such dreadful performance.
The following test application will output an average iteration time of 37ms for adding sequential integers to a set, but an average run time of 3622ms when adding the same number of integers but from the problematic data set.
(And if you insert the same number of completely random integers, then performance is much closer to 37ms. As well, adding these problematic integers to an std::map or std:set produces acceptable performance.)
#import <Foundation/Foundation.h>
extern uint64_t dispatch_benchmark(size_t count, void (^block)(void));
int main(int argc, char *argv[]) {
#autoreleasepool {
NSString *data = [NSString stringWithContentsOfFile:#"Integers.txt" encoding:NSUTF8StringEncoding error:NULL];
NSArray *components = [data componentsSeparatedByString:#","];
NSInteger count = components.count;
int64_t *numbers = (int64_t *)malloc(sizeof(int64_t) * count);
int64_t *sequentialNumbers = (int64_t *)malloc(sizeof(int64_t) * count);
for (NSInteger c = 0; c < count; c++) {
numbers[c] = [components[c] integerValue];
sequentialNumbers[c] = c;
}
NSLog(#"Beginning test with %# numbers...", #(count));
// Test #1 - Loading sequential integers
uint64_t t1 = dispatch_benchmark(10, ^{
CFMutableSetRef mutableSetRef = CFSetCreateMutable(NULL, 0, NULL);
for (NSInteger c = 0; c < count; c++) {
CFSetAddValue(mutableSetRef, (const void *)sequentialNumbers[c]);
}
NSLog(#"Sequential iteration completed with %# items in set.", #(CFSetGetCount(mutableSetRef)));
CFRelease(mutableSetRef);
});
NSLog(#"Sequential Numbers Average Runtime: %llu ms", t1 / NSEC_PER_MSEC);
NSLog(#"-----");
// Test #2 - Loading data set
uint64_t t2 = dispatch_benchmark(10, ^{
CFMutableSetRef mutableSetRef = CFSetCreateMutable(NULL, 0, NULL);
for (NSInteger c = 0; c < count; c++) {
CFSetAddValue(mutableSetRef, (const void *)numbers[c]);
}
NSLog(#"Dataset iteration completed with %# items in set.", #(CFSetGetCount(mutableSetRef)));
CFRelease(mutableSetRef);
});
NSLog(#"Dataset Average Runtime: %llu ms", t2 / NSEC_PER_MSEC);
free(sequentialNumbers);
free(numbers);
}
}
Example output:
Sequential Numbers Average Runtime: 37 ms
Dataset Average Runtime: 3622 ms
The integers are available here:
Gist (Integers.txt) or Dropbox (Integers.txt)
Can anyone help explain what is "special" about the given integers that might cause such a degradation in the hashing implementation used by CFSet and CFDictionary?

Related

How to print out an integer raised to the 100th power (handling overflow)

So my friend asked me this question as interview practice:
Using Objective-C & Foundation Kit, Write a method that takes a single digit int, and logs out to the console the precise result of that int being raised to the power of 100.
Initially I thought it sounded easy, but then I realized that even a single digit number raised to the power of 100 would quickly come close to 100 digits, which would overflow.
So I tried tackling this problem by creating an NSArray w/ NSNumbers (for reflection), where each object in the array is a place in the final result number. Then I perform the multiplication math (including factoring in carries), and then print out a string formed by concatenating the objects in the array. Here is my implementation w/ input 3:
NSNumber *firstNum = [NSNumber numberWithInteger:3];
NSMutableArray *numArray = [NSMutableArray arrayWithArray:#[firstNum]];
for( int i=0; i<99; i++)
{
int previousCarry = 0;
for( int j=0; j<[numArray count]; j++)
{
int newInt = [firstNum intValue] * [[numArray objectAtIndex:j] intValue] + previousCarry;
NSNumber *calculation = [NSNumber numberWithInteger:newInt];
previousCarry = [calculation intValue]/10;
NSNumber *newValue = [NSNumber numberWithInteger:(newInt % 10)];
[numArray replaceObjectAtIndex:j withObject:newValue];
}
if(previousCarry > 0)
{
[numArray addObject:[NSNumber numberWithInteger:previousCarry]];
}
}
NSArray* reversedArray = [[numArray reverseObjectEnumerator] allObjects];
NSString *finalNumber = [reversedArray componentsJoinedByString:#""];
NSLog(#"%#", finalNumber);
This isn't a problem out of a textbook or anything so I don't have any reference to double check my work. How does this solution sound to you guys? I'm a little worried that it's pretty naive even though the complexity is O(N), I can't help but feel like I'm not utilizing a type/class or method unique to Objective-C or Foundation Kit that would maybe produce a more optimal solution-- or at the very least make the algorithm cleaner and look more impressive
Write a method that takes a single digit int, and logs out to the console the precise result of that int being raised to the power of 100.
That strikes me as a typical interview "trick"[*] question - "single digit", "logs out to console"...
Here goes:
NSString *singleDigitTo100(int d)
{
static NSString *powers[] =
{
#"0",
#"1",
#"1267650600228229401496703205376",
#"515377520732011331036461129765621272702107522001",
#"1606938044258990275541962092341162602522202993782792835301376",
#"7888609052210118054117285652827862296732064351090230047702789306640625",
#"653318623500070906096690267158057820537143710472954871543071966369497141477376",
#"3234476509624757991344647769100216810857203198904625400933895331391691459636928060001",
#"2037035976334486086268445688409378161051468393665936250636140449354381299763336706183397376",
#"265613988875874769338781322035779626829233452653394495974574961739092490901302182994384699044001"
};
return powers[d % 10]; // simple bounds check...
}
And the rest is easy :-)
And if you are wondering, those numbers came from bc - standard command line calculator in U*ix and hence OS X. You could of course invoke bc from Objective-C if you really want to calculate the answers on the fly.
[*] It is not really a "trick" question but asking if you understand that sometimes the best solution is a simple lookup table.
As you have correctly figured out, you will need to use some sort of big integer library. This is a nice example you can refer to: https://mattmccutchen.net/bigint/
Furthermore, you can calculate x^n in O(lg(n)) rather than in O(n), using divide and conquer:
f(x, n):
if n == 0: # Stopping condition
return 1
temp = f(n/2)
result = temp * temp
if n%2 == 1:
result *= x
return result
x = 5 # Or another one digit number.
n = 100
result = f(x, 100) # This is the result you are looking for.
Note that x represents your integer and n the power you are raising x to.

Convert really large decimal string to hex?

I've got a really large decimal number in an NSString, which is too large to fit into any variable including NSDecimal. I was doing the math manually, but if I can't fit the number into a variable then I can't be dividing it. So what would be a good way to convert the string?
Example Input: 423723487924398723478243789243879243978234
Output: 4DD361F5A772159224CE9EB0C215D2915FA
I was looking at the first answer here, but it's in C# and I don't know it's objective C equivalent.
Does anyone have any ideas that don't involve using an external library?
If this is all you need, it's not too hard to implement, especially if you're willing to use Objective-C++. By using Objective-C++, you can use a vector to manage memory, which simplifies the code.
Here's the interface we'll implement:
// NSString+BigDecimalToHex.h
#interface NSString (BigDecimalToHex)
- (NSString *)hexStringFromDecimalString;
#end
To implement it, we'll represent an arbitrary-precision non-negative integer as a vector of base-65536 digits:
// NSString+BigDecimalToHex.mm
#import "NSString+BigDecimalToHex.h"
#import <vector>
// index 0 is the least significant digit
typedef std::vector<uint16_t> BigInt;
The "hard" part is to multiply a BigInt by 10 and add a single decimal digit to it. We can very easily implement this as long multiplication with a preloaded carry:
static void insertDecimalDigit(BigInt &b, uint16_t decimalDigit) {
uint32_t carry = decimalDigit;
for (size_t i = 0; i < b.size(); ++i) {
uint32_t product = b[i] * (uint32_t)10 + carry;
b[i] = (uint16_t)product;
carry = product >> 16;
}
if (carry > 0) {
b.push_back(carry);
}
}
With that helper method, we're ready to implement the interface. First, we need to convert the decimal digit string to a BigInt by calling the helper method once for each decimal digit:
- (NSString *)hexStringFromDecimalString {
NSUInteger length = self.length;
unichar decimalCharacters[length];
[self getCharacters:decimalCharacters range:NSMakeRange(0, length)];
BigInt b;
for (NSUInteger i = 0; i < length; ++i) {
insertDecimalDigit(b, decimalCharacters[i] - '0');
}
If the input string is empty, or all zeros, then b is empty. We need to check for that:
if (b.size() == 0) {
return #"0";
}
Now we need to convert b to a hex digit string. The most significant digit of b is at the highest index. To avoid leading zeros, we'll handle that digit specially:
NSMutableString *hexString = [NSMutableString stringWithFormat:#"%X", b.back()];
Then we convert each remaining base-65536 digit to four hex digits, in order from most significant to least significant:
for (ssize_t i = b.size() - 2; i >= 0; --i) {
[hexString appendFormat:#"%04X", b[i]];
}
And then we're done:
return hexString;
}
You can find my full test program (to run as a Mac command-line program) in this gist.

Arbitrary precision bit manipulation (Objective C)

I need to do bit operations on representations of arbitrary precision numbers in Objective C. So far I have been using NSData objects to hold the numbers - is there a way to bit shift the content of those? If not, is there a different way to achieve this?
Using NSMutableData you can fetch the byte in a char, shift your bits and replace it with -replaceBytesInRange:withBytes:.
I don't see any other solution except for writing your own date holder class using a char * buffer to hold the raw data.
As you'll have spotted, Apple doesn't provide arbitrary precision support. Nothing is provided larger than the 1024-bit integers in vecLib.
I also don't think NSData provides shifts and rolls. So you're going to have to roll your own. E.g. a very naive version, which may have some small errors as I'm typing it directly here:
#interface NSData (Shifts)
- (NSData *)dataByShiftingLeft:(NSUInteger)bitCount
{
// we'll work byte by byte
int wholeBytes = bitCount >> 3;
int extraBits = bitCount&7;
NSMutableData *newData = [NSMutableData dataWithLength:self.length + wholeBytes + (extraBits ? 1 : 0)];
if(extraBits)
{
uint8_t *sourceBytes = [self bytes];
uint8_t *destinationBytes = [newData mutableBytes];
for(int index = 0; index < self.length-1; index++)
{
destinationBytes[index] =
(sourceBytes[index] >> (8-extraBits)) |
(sourceBytes[index+1] << extraBits);
}
destinationBytes[index] = roll >> (8-extraBits);
}
else
/* just copy all of self into the beginning of newData */
return newData;
}
#end
Of course, that assumes the number of bits you want to shift by is itself expressible as an NSUInteger, amongst other sins.

How can NSArray be this slow?

I'm coming from a C++/STL world and I wanted to check how objective-c containers are in comparison to stl.
I wanted to compare an array of numbers but the only way to add a number to an NSArray is using NSNumber which is utterly slow and drank my ram empty so I guess I need to dealloc them manually. But I don't want to test side effects so I just added [NSNull null] into the array.
The results of adding 10k things into array 1k times:
NSArray - 0.923411 seconds
vector<int> - 0.129984 seconds
I thought it might be allocations and deallocations so I set the number of arrays(imax in the code) to 1 and number of additions to 10000000(jmax) but it was even slower
NSArray - 2.19859 seconds
vector<int> - 0.223471 seconds
Edit:
As mentioned in the comments the constant increasing size of the array might be the problem so I made NSArray using arrayWithCapacity, but vector with reserve too and it was even slower than before(!) (imax = 1, jmax = 10000000).
NSArray - 2.55942
vector<int> - 0.19139
End edit
Why is this so slow?
My code for reference:
#import <Foundation/Foundation.h>
#include <vector>
#include <iostream>
#include <time.h>
using namespace std;
int main (int argc, const char * argv[])
{
int imax = 1000;
int jmax = 10000;
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
cout << "Vector insertions" << endl;
clock_t start = clock();
for(int i = 0; i < imax; i++)
{
vector<int> *v = new vector<int>();
for(int j = 0; j < jmax; j++)
{
v->push_back(j);
}
delete v;
}
double interval = (clock() - start) / (double)CLOCKS_PER_SEC;
cout << interval << " seconds" << endl;
cout << "NSArray insertions" << endl;
start = clock();
for(int i = 0; i < imax; i++)
{
NSMutableArray *v = [[NSMutableArray alloc] init];
for(int j = 0; j < jmax; j++)
{
[v addObject:[NSNull null]];
}
[v dealloc];
}
interval = (clock() - start) / (double)CLOCKS_PER_SEC;
cout << interval << " seconds" << endl;
[pool drain];
return 0;
}
#JeremyP provides an excellent link and information. Always read the fish. Here's some breakdown of what's eating time, though, and what you might do about it.
First, there's the many calls to objc_msgSend() for dynamic dispatch. These can be avoided and you'll save some of the time (though not as much as you'd think. objc_msgSend() is crazy optimized). But you'll knock maybe 5% off by skipping it:
IMP addObject = class_getMethodImplementation([NSMutableArray class], #selector(addObject:));
NSNull *null = [NSNull null];
start = clock();
for(int i = 0; i < imax; i++)
{
NSMutableArray *v = [[NSMutableArray alloc] init];
for(int j = 0; j < jmax; j++)
{
addObject(v, #selector(addObject:), null);
}
[v release];
}
A lot of time is eaten up with retain/release. You can avoid that (and stick real numbers in rather than NSNumber) by using a non-retaining CFMutableArray). This will get the append times to about 2x of vector.
CFArrayCallBacks cb = {0};
for(int i = 0; i < imax; i++)
{
CFMutableArrayRef v = CFArrayCreateMutable(NULL, 0, &cb);
for(int j = 0; j < jmax; j++)
{
CFArrayAppendValue(v, &j);
}
CFRelease(v);
}
The biggest cost of this one is the calls to memmove() (or the collectable version of it on the Mac).
Man, NSMutableArray sure is slow. How could Apple be so stupid, right? I mean, really... wait... I wonder if there's something NSMutableArray does better than vector?
Try swapping out these lines for their obvious counterparts:
v->insert(v->begin(), j);
NSNumber *num = [[NSNumber alloc] initWithInt:j];
[v insertObject:num atIndex:0];
[num release];
(Yes, including creating and releasing the NSNumber, not just using NSNull.)
Oh, and you might try this one too to see just how fast NSMutableArray and CFMutableArray really can be:
CFArrayInsertValueAtIndex(v, 0, &j);
In my tests I get:
Vector insertions
7.83188 seconds
NSArray insertions
2.66572 seconds
Non-retaining
0.310126 seconds
The short answer: Yes, NSArray really is quite a bit slower than C++'s STL collection classes. This has much to do with compile time vs. runtime behaviors, optimization opportunities on the part of the compiler, and numerous implementation details.
(And, as Rob points out, NSMutableArray is optimized for random insertion and performs better than C++ for that...)
The real answer:
Micro-benchmarks are useless for optimizing user facing applications.
Using a micro-benchmark to make implementation decisions is the very definition of premature optimization.
You would be hard pressed to find an Objective-C app targeted to iOS or Mac OS X where CPU profiling would show any significant time spent in code paths related to NSArray, yet the vast majority of those apps use the NS* collection classes pretty much exclusively.
Certainly, there are cases where the performance of NS* aren't viable and, for that, you turn to C++/STL.
None of this is to imply that your question is invalid. Without more context, it is difficult to say if the observed performance difference really matters (however, in my experience, just about every time a developer has asked a question based on a micro-benchmark, it has been misguided).
Oh -- and read this as it gives a bit of insight into the implementation of *Array.
It's a fully fledged Objective-C object which means there is an overhead each time you add an object due to Cocoa's message lookup algorithm which is necessary to implement properly dynamic binding.
There's also the point that NSArrays are not necessarily internally structured as a contiguous set of pointers. For very large arrays, NSArray performs much better (i.e. has much better big O time complexity) than C++ vectors. Have a read of this the definitive Ridiculous Fish Blog on the topic.
At least some of the time is consumed in repeatedly increasing the capacity of the NSArray. It should be faster to initialize the NSArray to the right (or at least a better) capacity initially with:
[NSMutableArray arrayWithCapacity:10000];
#include <stdio.h>
#include <time.h>
int main (int argc, char **argv)
{
int imax = 1000;
int jmax = 10000;
clock_t start = clock();
for(int i = 0; i < imax; i++)
{
int array[jmax];
for(int j = 0; j < jmax; j++)
j[array] = 0;
}
double interval = (clock() - start) / (double)CLOCKS_PER_SEC;
printf("%f\n", interval);
return 0;
}
Output in my 2GHz Core2Duo iMac (compiled with LLVM):
0.000003

Quickest way to be sure region of memory is blank (all NULL)?

If I have an unsigned char *data pointer and I want to check whether size_t length of the data at that pointer is NULL, what would be the fastest way to do that? In other words, what's the fastest way to make sure a region of memory is blank?
I am implementing in iOS, so you can assume iOS frameworks are available, if that helps. On the other hand, simple C approaches (memcmp and the like) are also OK.
Note, I am not trying to clear the memory, but rather trying to confirm that it is already clear (I am trying to find out whether there is anything at all in some bitmap data, if that helps). For example, I think the following would work, though I have not tried it yet:
- BOOL data:(unsigned char *)data isNullToLength:(size_t)length {
unsigned char tester[length] = {};
memset(tester, 0, length);
if (memcmp(tester, data, length) != 0) {
return NO;
}
return YES;
}
I would rather not create a tester array, though, because the source data may be quite large and I'd rather avoid allocating memory for the test, even temporarily. But I may just being too conservative there.
UPDATE: Some Tests
Thanks to everyone for the great responses below. I decided to create a test app to see how these performed, the answers surprised me, so I thought I'd share them. First I'll show you the version of the algorithms I used (in some cases they differ slightly from those proposed) and then I'll share some results from the field.
The Tests
First I created some sample data:
size_t length = 1024 * 768;
unsigned char *data = (unsigned char *)calloc(sizeof(unsigned char), (unsigned long)length);
int i;
int count;
long check;
int loop = 5000;
Each test consisted of a loop run loop times. During the loop some random data was added to and removed from the data byte stream. Note that half the time there was actually no data added, so half the time the test should not find any non-zero data. Note the testZeros call is a placeholder for calls to the test routines below. A timer was started before the loop and stopped after the loop.
count = 0;
for (i=0; i<loop; i++) {
int r = random() % length;
if (random() % 2) { data[r] = 1; }
if (! testZeros(data, length)) {
count++;
}
data[r] = 0;
}
Test A: nullToLength. This was more or less my original formulation above, debugged and simplified a bit.
- (BOOL)data:(void *)data isNullToLength:(size_t)length {
void *tester = (void *)calloc(sizeof(void), (unsigned long)length);
int test = memcmp(tester, data, length);
free(tester);
return (! test);
}
Test B: allZero. Proposal by Carrotman.
BOOL allZero (unsigned char *data, size_t length) {
bool allZero = true;
for (int i = 0; i < length; i++){
if (*data++){
allZero = false;
break;
}
}
return allZero;
}
Test C: is_all_zero. Proposed by Lundin.
BOOL is_all_zero (unsigned char *data, size_t length)
{
BOOL result = TRUE;
unsigned char* end = data + length;
unsigned char* i;
for(i=data; i<end; i++) {
if(*i > 0) {
result = FALSE;
break;
}
}
return result;
}
Test D: sumArray. This is the top answer from the nearly duplicate question, proposed by vladr.
BOOL sumArray (unsigned char *data, size_t length) {
int sum = 0;
for (int i = 0; i < length; ++i) {
sum |= data[i];
}
return (sum == 0);
}
Test E: lulz. Proposed by Steve Jessop.
BOOL lulz (unsigned char *data, size_t length) {
if (length == 0) return 1;
if (*data) return 0;
return memcmp(data, data+1, length-1) == 0;
}
Test F: NSData. This is a test using NSData object I discovered in the iOS SDK while working on all of these. It turns out Apple does have an idea of how to compare byte streams that is designed to be hardware independent.
- (BOOL)nsdTestData: (NSData *)nsdData length: (NSUInteger)length {
void *tester = (void *)calloc(sizeof(void), (unsigned long)length);
NSData *nsdTester = [NSData dataWithBytesNoCopy:tester length:(NSUInteger)length freeWhenDone:NO];
int test = [nsdData isEqualToData:nsdTester];
free(tester);
return (test);
}
Results
So how did these approaches compare? Here are two sets of data, each representing 5000 loops through the check. First I tried this on the iPhone Simulator running on a relatively old iMac, then I tried this running on a first generation iPad.
On the iPhone 4.3 Simulator running on an iMac:
// Test A, nullToLength: 0.727 seconds
// Test F, NSData: 0.727
// Test E, lulz: 0.735
// Test C, is_all_zero: 7.340
// Test B, allZero: 8.736
// Test D, sumArray: 13.995
On a first generation iPad:
// Test A, nullToLength: 21.770 seconds
// Test F, NSData: 22.184
// Test E, lulz: 26.036
// Test C, is_all_zero: 54.747
// Test B, allZero: 63.185
// Test D, sumArray: 84.014
These are just two samples, I ran the test many times with only slightly varying results. The order of performance was always the same: A & F very close, E just behind, C, B, and D. I'd say that A, F, and E are virtual ties, on iOS I'd prefer F because it takes advantage of Apple's protection from processor change issues, but A & E are very close. The memcmp approach clearly wins over the simple loop approach, close to ten times faster in the simulator and twice as fast on the device itself. Oddly enough, D, the winning answer from the other thread performed very poorly in this test, probably because it does not break out of the loop when it hits the first difference.
I think you should do it with an explicit loop, but just for lulz:
if (length == 0) return 1;
if (*pdata) return 0;
return memcmp(pdata, pdata+1, length-1) == 0;
Unlike memcpy, memcmp does not require that the two data sections don't overlap.
It may well be slower than the loop, though, because the un-alignedness of the input pointers means there probably isn't much the implementation of memcmp can do to optimize, plus it's comparing memory with memory rather than memory with a constant. Easy enough to profile it and find out.
Not sure if it's the best, but I probably would do something like this:
bool allZero = true;
for (int i = 0; i < size_t; i++){
if (*data++){
//Roll back so data points to the non-zero char
data--;
//Do whatever is needed if it isn't zero.
allZero = false;
break;
}
}
If you've just allocated this memory, you can always call calloc rather than malloc (calloc requires that all the data is zeroed out). (Edit: reading your comment on the first post, you don't really need this. I'll just leave it just in case)
If you're allocating the memory yourself, I'd suggest using the calloc() function. It's just like malloc(), except it zeros out the buffer first. It's what's used to allocate memory for Objective-C objects and is the reason that all ivars default to 0.
On the other hand, if this is a statically declared buffer, or a buffer you're not allocating yourself, memset() is the easy way to do this.
Logic to get a value, check it, and set it will be at least as expensive as just setting it. You want it to be null, so just set it to null using memset().
This would be the preferred way to do it in C:
BOOL is_all_zero (const unsigned char* data, size_t length)
{
BOOL result = TRUE;
const unsigned char* end = data + length;
const unsigned char* i;
for(i=data; i<end; i++)
{
if(*i > 0)
{
result = FALSE;
break;
}
}
return result;
}
(Though note that strictly and formally speaking, a memory cell containing a NULL pointer mustn't necessarily be 0, as long as a null pointer cast results in the value zero, and a cast of a zero to a pointer results in a NULL pointer. In practice, this shouldn't matter as all known compilers use 0 or (void*) 0 for NULL.)
Note the edit to the initial question above. I did some tests and it is clear that the memcmp approach or using Apple's NSData object and its isEqualToData: method are the best approaches for speed. The simple loops are clearer to me, but slower on the device.