Finding objects in Core Data by array attribute, performantly in >10k elements

Finding objects in Core Data by array attribute, performantly in >10k elements - objective-c

Short:
I need to find core data objects by a key, which holds a unique immutable array (fixed length, but chosen at runtime) of arbitrary objects (for which not only element membership, but also element order determines uniqueness). NSManagedObject however forbids overriding [isEqual:]. Now what?
Long:
I have an entity (see diagram image for entity "…Link") in my Core Data model for which I have to guarantee uniqueness based on an attribute key ("tuple"). So far so good.
The entity's unique attribute however has to be an NSArray.
And to make things a bit more difficult I neither know the class type of the tuple's elements.
Nor do I know the tuple's element count. Well, actually the count is the same for every tuple (per core data context at least), but not known before the app runs.
There must only ever be one instance of my link entity with a given tuple.
And for obvious reason only ever one tuple instance with a given array of arbitrary objects.
Whereas two tuples are to be considered equal if [tuple_1 isEqual:tuple_n] returns YES. NSManagedObject forbids the overriding of [isEqual:] and [hash] though, otherwise things would be pretty much a piece of cake.
"…Tuple" objects are created together with their array of tokens (via a convenience method) and are immutable (and so is each "…Token" and its data attribute). (think of "…Tuple" as a "…Link"'s dictionary key.)
"…Tuple" implements "- (NSArray *)tokens;", which returnes a neatly ordered array of tokens, based on the "order" keys of "…TokenOrder". (Tuples are expected to contain at most 5 elements.)
I however expect to have tens of thousands (potentially even more in some edge cases) of "…Link" objects, which I have to (frequently) find based on their "tuple" attribute.
Sadly I couldn't find any article (let alone solution) for such a scenario in any literature or the web.
Any ideas?
A possible solution I've come up with so far would be:
Narrow amount of elements to compare
by tuple by adding another attribute
to "…Tuple" called "tupleHash",
which is pre-calculated on
object creation via: Snippet 1
Query with NSPredicate for objects of matching tupleHash (narrowing down the list of candidates quite a bit).
Find "…Link" featuring given tuple in narrowed candidate list by: Snippet 1
Snippet 1:
NSUInteger tupleHash = [[self class] hash];
for (id token in self.tokens) {
tupleHash ^= [token.data hash];
}
Snippet 2:
__block NSArray *tupleTokens = someTokens;
NSArray *filteredEntries = [narrowedCandidates filteredArrayUsingPredicate:
[NSPredicate predicateWithBlock: ^(id evaluatedObject, NSDictionary *bindings) {
return [evaluatedObject.tuple.tokens isEqualToArray:tupleTokens];
}]];
(Sorry, markdown appears to oppose mixing of lists with code snippets.)
Good idea of or just insane?
Thanks in advance!

I strongly suggest that you calculate a hash for your objects and store it in your database.
Your second snippet will seriously hurt performance, that's for sure.
Update:
You don't need to use the hash method of NSArray.
To calculate the hash, you can perform a SHA1 or MD5 on the array values, concatenated. There are many algorithms for hashing, these are just two.
You can create a category for NSArray, say myHash to make the code reusable.

As recommended in a comment by Joe Blow I'm just gonna go with SQLite. Core Data simply appears to be the wrong tool here.
Benefits:
Fast thanks to SQL's column indexing
No object allocation/initialization on SELECT, prior to returning the results. (which Core Data would require for attribute checks)
Easily query link tuples using JOINs.
Easy use of SQLite's JOIN, GROUP BY, ORDER BY, etc
Little to no wrapper code thanks to EGODatabase (FMDB-inspired SQLite Objective-C wrapper)

Related

How to use NSCache with multiple pieces of information that together act as a 'key'?

I'm trying to understand the concept of NSCache, and one thing that strikes me is that a NSCache instance does not guarantee to give back the value to a key you stored before. It might not even store the key value pair when you try to add it, if it deems that the performance is more important at the moment.
What that implies, for me, is that:
Each key must 'hold' enough information to generate the value if necessary
Each query for the NSCache, which essentially is just in the form of a key, should thus wrap up all the information needed to generate the corresponding value.
From the above two points one can say that NSCache serves no purpose of establishing any kind of association between a key and a value - the user must be able to generate the value independent of the cache, and the sole purpose of using a NSCache is not to 'look up' some value, but rather just to trade memory for some performance boost
So my problem is about storing transparency masks for images. Initially I thought I just need to use the names of the images as the keys, but from my deductions above it seems that's not sufficient - I also have to include all other parameters used in generating a mask e.g. the transparency threshold, for example. It also means that every time I ask the cache for a mask I have to provide ALL the parameters. And the only way that I can think of about doing that is to use something like NSInvocation as the key; but that seems a rather clunky solution.

It is the very nature of a cache to be volatile, so caches should only ever be used to speed up access to information that could also be acquired some other way.
Your idea to create keys that hold all this information should work - just remember to store all your keys somewhere other than the cache as well.
As for the key, you can create a very simple class that has nothing but a couple of properties (the ones that make up a key), an isEqual: and hash method and maybe an initializer that takes parameters for each of your properties.
This requires extremely little code, since accessors and iVars for properties are autogenerated, so the only thing you really need to write is the isEqual: method (and hash).
This class is so small and taylor-made for the particular case you need it for, it makes sense to declare and implement it at the top of the .m file you're going to use it in. This way, you don't pollute the rest of the system. Just add #interface and #implementation sections for your class at the top of your .m file.

After more thought about this I think I've got one thing wrong - the keys in a NSCache do not necessarily need to hold all the information for generating the values. A key in a NSCache can serve the same purpose as that in a NSDictionary - a unique identifier to look up the value. The only difference, though, is that you'd always need to have a backup plan B for a NSCache in case the key-value pair added before is destroyed.
In simplier terms, operations on the two different classes look like this:
NSDictionary
generate each value V for each key K and add the pairs to the dictionary
look up V using K
NSCache
look up V using K
if V == nil, generate the value V and add the pair to the cache
Therefore it's possible to convert almost any NSDictionary to a NSCache, only that after the conversion you can't pass the NSCache around - you have to know how to generate the values at all times and thus the NSCache instance would most probably be a private property used exclusively in a certain class.
For my problem I've resolved to use a method like this (self is supposedly pointing to a subclass of NSCache, but I haven't tested it yet)
- (Mask *) maskForImageName:(NSString *)name maskGenerator:(Mask *(^)(NSString *))generator {
Mask *mask = [self objectForKey:name];
if (!mask) {
mask = generator(name);
[self setObject:mask forKey:name];
}
return mask;
}
It would be further simplified if objective-c is a functional, lazy-style language, in which case I don't even need to wrap the generator in a block; but I'm satisfied with this solution for now. In fact I feel that this pattern is almost always used with NSCache so I'd just add it as a category to NSCache.

Array from set: why does NSSet use allObjects, while NSOrderedSet uses array?

In Foundation, if I want to convert a set to an NSArray, I can use:
-[NSSet allObjects]
-[NSOrderedSet array]
Why are these different?

Speculation, but:
Because when NSSet was created the only other major collection type was NSArray, which was (and still is, largely) the most common collection type. So a method called "allObjects" would obviously return an NSArray.
When NSOrderedSet was added much more recently, it had to deal with the existence of prior collections - primarily, NSArray and NSSet. So an "allObjects" method would be ambiguous. Ergo it has two methods, -array and -set.
And/or, the -array and -set methods return proxies to what are likely the same or similar classes used internally. So in a functional sense they're a little different - those proxies will see mutations made on the original NSOrderedSet. -allObjects on the other hand does not - it genuinely creates an independent array, since its internal storage is likely a hashtable or similar that isn't readily proxied.

While there are other differences†, .allObjects does not imply a definite ordering, and .array does; and that's exactly what you are getting.
† .array returns a live proxy of the underlying NSOrderedSet, and if the underlying ordered set changes, the proxy will change with it.

Also... The NSArray returned by 'allObjects' is a copy of the values in the set.
But the NSArray returned by 'array' is a proxy of the objects in the set.
Thus if you change the value of any object in the set, you will change the value of the object in the array. A copy of the ordered set is not being made. So the two properties have different names because they do different things.

ObjC: Best use an NSArray or NSDictionnary for this (zBuffer)?

Say I have a collection of "node" instances. An integer property call zIndex will be used to group them.
What are the pros/cons for storing them in :
1) An array of arrays
2) A dictionary of arrays
In pseudo code, I would describe the expected result like this:
zBuffer[100] = [node1, node 2];
zBuffer[105] = [playerNode, collectable1];
zBuffer[110] = [foreground1, foreground2];
And I'm wondering about what zBuffers should be; Must NSArrays only be used for sequential read/write? Like not using non-continuous indexes?
I tried with an NSMutableArray:
[zBuffer objectAtIndex:zOrder]
But it fails if the array contains no data for that index (like out-of-bound exception).
Thanks for your advices!
J

As far as I can see, one of your requirements is that the indexes you use to access zBuffer be not contiguous (100, 105, 100). In this case, I would not use an array for that, since the indexes you can use with an array must be less than the count of elements of the array (if you have 3 elements, then indexes range from 0 to 2).
Instead I would use NSMutableDictionary, where you can use the zIndex key as a "name" for groups of objects you are looking for.
This suggestion does not take into account any other requirements that you might have, especially concerning complexity and the kind of operations you are going to carry through on your collection of nodes (beyond accessing them through zIndex).

You could actually provide both. It looks like what you want to have is a sparse array: so you look up objects by index, but it's permissible for there not to be an object at a certain index. So you could make that.
I'd do that by creating an NSMutableArray subclass that implements the primitive methods documented. Internally, your subclass would use an NSMutableDictionary for storage, with numbers (the "filled" indices) as keys. -objectAtIndex: returns either the object with that number as its key or nil if the array is empty at that point.
There are some ambiguities in this use of the array contract that it's up to you to decide how to address:
does count return 1+(highest index in use), or the number of objects in the array?
the enumerator and fast enumeration patterns never expect to see nil, so you need to come up with an enumerator that always returns an object (but lets me see what index it's at) if you want users of your class to enumerator over the array.
you won't be able to initialise it with the +arrayWithObjects: (id) firstObject,... pattern of initialisers because they use nil as a sentinel.

Objective-C implementation of a histogram or bag datastructure

Instead of implementing my own I was wondering if anyone knows of a histogram or bag datastructure implementation in Objective-C that I can use.
Essentially a histogram is a hashmap of lists where the lists contain values that relate to their hash entry. A good example is a histogram of supermarket items where you place each group of items dairy, meat, canned goods in their own bag. You can then very easily access each group of items according to their type.

NSCountedSet is a multiset (aka "bag") that counts distinct objects, but doesn't allow duplicates. However, based on your explanation, I don't think that's what you need, and neither is a histogram, which automatically buckets values based on a set of (usually numerical) ranges.
I believe what you really want is a multimap, which is a "key to one-or-more values" relation. The data structures framework I maintain includes CHMultiDictionary, a multimap implementation. I won't claim by any means that it's perfect or complete, but I hope it may be helpful for your problem.

It sounds to me like you simply want a dictionary of arrays. You can put NSArrays as elements of NSDictionarys, something like:
NSMutableDictionary* dict = [NSMutableDictionary dictionary];
[dict setObject:[NSMutableArray arrayWithObjects:#"milk", #"eggs", #"cheese", nil] forKey:#"dairy"];
[dict setObject:[NSMutableArray arrayWithObjects:#"steak", #"sausages", #"mince", nil] forKey:#"meat"];
[[dict objectForKey:#"meat"] addObject:#"lamb"];
NSLog( #"Dictionary is %#", dict );

There's one in the GNU Objective-C Class library, but the docs appear to be pretty incomplete and the project's homepage must be currently having a problem -- still, if GPL software is acceptable for your project, you might want to download and check the sources.

CFIOMultimap apparently is an implementation of a multimap. However, as of the time of writing I couldn't get it to work. It returns nils all the time when I subscript.
Perhaps it can be fixed and adapted for your use.

Techniques for implementing -hash on mutable Cocoa objects

The documentation for -hash says it must not change while a mutable object is stored in a collection, and similarly the documentation for -isEqual: says the -hash value must be the same for equal objects.
Given this, does anybody have any suggestions for the best way to implement -hash such that it meets both these conditions and yet is actually calculated intelligently (i.e. doesn't just return 0)? Does anybody know how the mutable versions of framework-provided classes do this?
The simplest thing to do is of course just forget the first condition (about it not changing) and just make sure I never accidentally mutate an object while it's in a collection, but I'm wondering if there's any solution that's more flexible.
EDIT: I'm wondering here whether it's possible to maintain the 2 contracts (where equal objects have equal hashes, and hashes don't change while the object is in a collection) when I'm mutating the internal state of the object. My inclination is to say "no", unless I do something stupid like always return 0 for the hash, but that's why I'm asking this question.

Interesting question, but I think what you want is logically impossible. Say you start with 2 objects, A and B. They're both different, and they start with different hash codes. You add both to some hash table. Now, you want to mutate A, but you can't change the hash code because it's already in the table. However, it's possible to change A in such a way that it .equals() B.
In this case, you have 2 choices, neither of which works:
Change the hashcode of A to equal B.hashcode, which violates the constraint of not changing hash codes while in a hash table.
Don't change the hashcode, in which case A.equals(B) but they don't have the same hashcodes.
It seems to me that there's no possible way to do this without using a constant as a hashcode.

My reading of the documentation is that a mutable object's value for hash can (and probably should) change when it is mutated, but should not change when the object hasn't been mutated. The portion of the documentation to which to refer, therefore, is saying, "Don't mutate objects that are stored in a collection, because that will cause their hash value to change."
To quote directly from the NSObject documentation for hash:
If a mutable object is added to a
collection that uses hash values to
determine the object’s position in the
collection, the value returned by the
hash method of the object must not
change while the object is in the
collection. Therefore, either the hash
method must not rely on any of the
object’s internal state information or
you must make sure the object’s
internal state information does not
change while the object is in the
collection.
(Emphasis mine.)

The question here isn't how to meet both of these requirements, but rather which one you should meet. In Apple's documentation, it is clearly stated that:
a mutable dictionary can be put in a hash table but you must not change it while it is in there.
This being said, it seems more important that you meet the equality requirement of hashes. The hash of an object should always be a way to check if an object is equal to another. If this is ever not the case, it is not a true hash function.
Just to finish up my answer, I'll give an example of a good hash implementation. Let's say you are writing the implementation of -hash on a collection that you have created. This collection stores an array of NSObjects as pointers. Since all NSObjects implement the hash function, you can use their hashes in calculating the collection's hash:
- (NSUInteger)hash {
NSUInteger theHash = 0;
for (NSObject * aPtr in self) { // fast enumeration
theHash ^= [aPtr hash];
}
return theHash;
}
This way, two collection objects containing the same pointers (in the same order) will have the same hash.

Since you are already overriding -isEqual: to do a value-based comparison, are you sure you really need to bother with -hash?
I can't guess what exactly you need this for of course, but if you want to do value-based comparison without deviating from the expected implementation of -isEqual: to only return YES when hashes are identical, a better approach might be to mimick NSString's -isEqualToString:, so to create your own -isEqualToFoo: method instead of using or overriding -isEqual:.

The answer to this question and the key to avoiding many cocoa-bugs is this:
Read the documentation carefully. Place every word and punctuation on a golden scale and weight it as it was the world's last grain of wheat.
Let's read the documentation again:
If a mutable object is added to a collection that uses hash values to determine the object’s position in the collection, [...]
(emphasis mine).
What the writer of the docs, in his/hers eternal wisdom, mean by this is that when you are implementing a collection, like a dictionary, you shouldn't use the hash for positioning since that can change. In other words it has little to do with implementing -hash on mutable Cocoa objects (which all of us thought it had, assuming the documentation has not changed in the last ~10 years since the question was asked).
That is why dictionaries always copy their keys - so they can guarantee
that the hash value won't change.
You will then ask the question: But, good sir, how does NSMapTable and similar handle this?
The answer to this is according to the documentation:
"Its keys or values may be copied on input or may use pointer identity for equality and hashing."
(emphasis mine again).
Since we were so easily fooled by the documentation last time, let's run a little experiment to see for ourselves how stuff actually work:
NSMutableString *string = [NSMutableString stringWithString:#"so lets mutate this"];
NSString *originalString = string.copy;
NSMapTable *mutableStrings = [NSMapTable strongToStrongObjectsMapTable];
[mutableStrings setObject:originalString forKey:string];
[string appendString:#" into a larger string"];
if ([mutableStrings objectForKey:string] == nil)
NSLog(#"not found!");
if ([mutableStrings objectForKey:originalString] == nil)
NSLog(#"Not even the original string is found?");
for (NSString *inCollection in mutableStrings)
{
NSLog(#"key '%#' : is '%#' (null)", inCollection, [mutableStrings objectForKey:inCollection]);
}
for (NSString *value in NSAllMapTableValues(mutableStrings))
{
NSLog(#"value exists: %#", value);
}
Surprise!
So, instead of using pointer equality, they focus on the words "may" here which in this case mean "may not", and simply copy the hash value when adding stuff to the collection.
(All this is actually good, since it would be quite difficult to implement NSHashMap, or -hash, otherwise).

In Java, most mutable classes simply don’t override Object.hashCode() so that the default implementation returns a value that is based on the address of the object and doesn’t change. It might just be the same with Objective C.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding objects in Core Data by array attribute, performantly in >10k elements - objective-c

Related

How to use NSCache with multiple pieces of information that together act as a 'key'?

Array from set: why does NSSet use allObjects, while NSOrderedSet uses array?

ObjC: Best use an NSArray or NSDictionnary for this (zBuffer)?

Objective-C implementation of a histogram or bag datastructure

Techniques for implementing -hash on mutable Cocoa objects

Categories

Resources