Optimizing algorithm for matching duplicates - objective-c

I've written a small utility program that identifies duplicate tracks in iTunes.
The actual matching of tracks takes a long time, and I'd like to optimize it.
I am storing track data in an NSMutableDictionary that stores individual track data in
NSMutableDictionaries keyed by trackID. These individual track dictionaries have
at least the following keys:
TrackID
Name
Artist
Duration (in milli ####.####)
To determine if any tracks match one another, I must check:
If the duration of two tracks are within 5 seconds of each other
Name matches
Artist matches
The slow way for me to do it is using two for-loops:
-(void)findDuplicateTracks {
NSArray *allTracks = [tracks allValues];
BOOL isMatch = NO;
int numMatches = 0;
// outer loop
NSMutableDictionary *track = nil;
NSMutableDictionary *otherTrack = nil;
for (int i = 0; i < [allTracks count]; i++) {
track = [allTracks objectAtIndex:i];
NSDictionary *summary = nil;
if (![claimedTracks containsObject:track]) {
NSAutoreleasePool *aPool = [[NSAutoreleasePool alloc] init];
NSUInteger duration1 = (NSUInteger) [track objectForKey:kTotalTime];
NSString *nName = [track objectForKey:knName];
NSString *nArtist = [track objectForKey:knArtist];
// inner loop - no need to check tracks that have
// already appeared in i
for (int j = i + 1; j < [allTracks count]; j++) {
otherTrack = [allTracks objectAtIndex:j];
if (![claimedTracks containsObject:otherTrack]) {
NSUInteger duration2 = (NSUInteger)[otherTrack objectForKey:kTotalTime];
// duration check
isMatch = (abs(duration1 - duration2) < kDurationThreshold);
// match name
if (isMatch) {
NSString *onName = [otherTrack objectForKey:knName];
isMatch = [nName isEqualToString:onName];
}
// match artist
if (isMatch) {
NSString *onArtist = [otherTrack objectForKey:knArtist];
isMatch = [nArtist isEqualToString:onArtist];
}
// save match data
if (isMatch) {
++numMatches;
// claim both tracks
[claimedTracks addObject:track];
[claimedTracks addObject:otherTrack];
if (![summary isMemberOfClass:[NSDictionary class]]) {
[track setObject:[NSNumber numberWithBool:NO] forKey:#"willDelete"];
summary = [self dictionarySummaryForTrack:track];
}
[otherTrack setObject:[NSNumber numberWithBool:NO] forKey:#"willDelete"];
[[summary objectForKey:kMatches]
addObject:otherTrack];
}
}
}
[aPool drain];
}
}
}
This becomes quite slow for large music libraries, and only uses 1
processor. One recommended optimization was to use blocks and process
the tracks in batches (of 100 tracks). I tried that. If my code
originally took 9 hours to run, it now takes about 2 hours on a
quad-core. That's still too slow. But (talking above my pay grade here)
perhaps there is a way to store all the values I need in a C structure that "fits on the stack" and then I wouldn't have to fetch the values from slower memory. This seems too low-level for me, but I'm willing to learn if I had an example.
BTW, I profiled this in Instruments and [NSCFSet member:] takes up
86.6% percent of the CPU time.
Then I thought I should extract all the durations into a sorted array so I would not have
to look up the duration value in the dictionary. I think that is a good
idea, but when I started to implement it, I wondered how to determine
the best batch size.
If I have the following durations:
2 2 3 4 5 6 6 16 17 38 59 Duration
0 1 2 3 4 5 6 7 8 9 10 Index
Then just by iterating over the array, I know that to find matching
tracks of the song at index 0, I only need to compare it against songs
up to index 6. That's great, I have my first batch. But now I have to
start over at index 1 only to find that it's batch should also stop at
index 6 and exclude index 0. I'm assuming I'm wasting a lot of
processing cycles here determining what the batch should be/the duration
matches. This seemed like a "set" problem, but we didn't do much of
that in my Intro to Algorithms class.
My questions are:
1) What is the most efficient way to identify matching tracks? Is it
something similar to what's above? Is it using disjoint and [unified]
set operations that are slightly above my knowledge level? Is it
filtering arrays using NSArray? Is there an online resource that
describes this problem and solution?
I am willing to restructure the tracks dictionary in whatever way
(datastructure) is most efficient. I had at first thought I needed to
perform many lookups by TrackID, but that is no longer the case.
2) Is there a more efficient way to approach this problem? How do you
rock stars go from paragraph 1 to an optimized solution?
I have searched for the answer, longer than I care to admit, and found
these interesting, but unhelpful answers:
find duplicates
Find all duplicates and missing values in a sorted array
Thanks for any help you can provide,
Lance

My first thought is to maintain some sorted collections as indices into your dictionary so you can stop doing an O(n^2) search comparing every track to every other track.
If you had arrays of TrackIDs sorted by duration then for any track you could do a more efficient O(log n) binary search to find tracks with durations within your 5 second tolerance.
Even better for artist and name you can store a dictionary keyed on the artist or track name whose values are arrays of TrackIDs. Then you only need a O(1) lookup to get the set of tracks for a particular artist which should allow you to very quickly determine if there are any possible duplicates.
Finally if you've built that sort of dictionary of titles to TrackIDs then you can go through all of it's keys and only search for duplicates when there are multiple tracks with the same title. Doing further comparisons only when there are multiple tracks with the same title should eliminate a significant percentage of the library and massively reduce your search time (down to O(n) to build the dictionary and another O(n) for a worst case search for duplicates still leaves you at O(n) rather than the O(n^2) you have now).
If nothing else do that last optimization, the resulting performance increase should be huge for an library without a significant number of duplicates:
NSMutableArray *possibleDuplicates = [NSMutableArray array];
NSMutableDictionary *knownTitles = [NSMutableDictionary dictionary];
for (NSMutableDictionary *track in [tracks allKeys]) {
if ([knownTitles objectForKey:[track objectForKey:#"title"]] != nil) {
[possibleDuplicates addObject:track];
}
else {
[knownTitles addObject:[track objectForKey:#"TrackID"] forKey:[track objectForKey:#"title"]];
}
}
//check for duplicates of the tracks in possibleDuplicates only.

There are several ways to do this, but here's my first naïve guess:
Have a mutable dictionary.
The keys in this dictionary are the names of the songs.
The value of each key is another mutable dictionary.
The keys of this secondary mutable dictionary are the artists.
The value of each key is a mutable array of songs.
You'd end up with something like this:
NSArray *songs = ...; //your array of songs
NSMutableDictionary *nameCache = [NSMutableDictionary dictionary];
for (Song *song in songs) {
NSString *name = [song name];
NSMutableDictionary *artistCache = [nameCache objectForKey:name];
if (artistCache == nil) {
artistCache = [NSMutableDictionary dictionary];
[nameCache setObject:artistCache forKey:name];
}
NSString *artist = [song artist];
NSMutableArray *songCache = [artistCache objectForKey:artist];
if (songCache == nil) {
songCache = [NSMutableArray array];
[artistCache setObject:songCache forKey:artist];
}
for (Song *otherSong in songCache) {
//these are songs that have the same name and artist
NSTimeInterval myDuration = [song duration];
NSTimeInterval otherDuration = [otherSong duration];
if (fabs(myDuration - otherDuration) < 5.0f) {
//name matches, artist matches, and their difference in duration is less than 5 seconds
}
}
[songCache addObject:song];
}
This is a worst-case O(n2) algorithm (if every song has the same name, artist, and duration). It's a best-case O(n) algorithm (if every song has a different name/artist/duration), and will realistically end up being closer to O(n) than to O(n2) (most likely).

Related

Printing the most frequent words in a file(string) Objective-C

New to objective-c, need help to solve this:
Write a function that takes two parameters:
1 a String representing a text document and
2 an integer providing the number of items to return. Implement the function such that it returns a list of Strings ordered by word frequency, the most frequently occurring word first. Use your best judgement to decide how words are separated. Your solution should run in O(n) time where n is the number of characters in the document. Implement this function as you would for a production/commercial system. You may use any standard data structures.
What I tried so far (work in progress): ` // Function work in progress
// -(NSString *) wordFrequency:(int)itemsToReturn inDocument:(NSString *)textDocument ;
// Get the desktop directory (where the text document is)
NSURL *desktopDirectory = [[NSFileManager defaultManager] URLForDirectory:NSDesktopDirectory inDomain:NSUserDomainMask appropriateForURL:nil create:NO error:nil];
// Create full path to the file
NSURL *fullPath = [desktopDirectory URLByAppendingPathComponent:#"document.txt"];
// Load the string
NSString *content = [NSString stringWithContentsOfURL:fullPath encoding:NSUTF8StringEncoding error:nil];
// Optional code for confirmation - Check that the file is here and print its content to the console
// NSLog(#" The string is:%#", content);
// Create an array with the words contain in the string
NSArray *myWords = [content componentsSeparatedByString:#" "];
// Optional code for confirmation - Print content of the array to the console
// NSLog(#"array: %#", myWords);
// Take an NSCountedSet of objects in an array and order those objects by their object count then returns a sorted array, sorted in descending order by the count of the objects.
NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:myWords];
NSMutableArray *dictArray = [NSMutableArray array];
[countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
[dictArray addObject:#{#"word": obj,
#"count": #([countedSet countForObject:obj])}];
}];
NSLog(#"Words sorted by count: %#", [dictArray sortedArrayUsingDescriptors:#[[NSSortDescriptor sortDescriptorWithKey:#"count" ascending:NO]]]);
}
return 0;
}
This is a classic job for map-reduce. I am very familiar with objective-c, but as far as I know - these concepts are very easily implemented in it.
1st map-reduce is counting the number of occurances.
This step is basically grouping elements according to the word, and then counting them.
map(text):
for each word in text:
emit(word,'1')
reduce(word,list<number>):
emit (word,sum(number))
An alternative for using map-reduce is to use iterative calculation and a hash-map which will be a histogram that counts number of occurances per word.
After you have a a list of numbers and occurances, all you got to do is actually get top k out of them. This is nicely explained in this thread: Store the largest 5000 numbers from a stream of numbers.
In here, the 'comparator' is #occurances of each word, as calculated in previous step.
The basic idea is to use a min-heap, and store k first elements in it.
Now, iterate the remaining of the elements, and if the new one is bigger than the top (minimal element in the heap), remove the top and replace it with the new element.
At the end, you have a heap containing k largest elements, and they are already in a heap - so they are already sorted (though in reversed order, but dealing with it is fairly easy).
Complexity is O(nlogK)
To achieve O(n + klogk) you may use selection algorithm instead of the min-heap solution to get top-k, and then sort the retrieved elements.

best way to populate NSArray in this algorithm

I intend to make a program that does the following:
Create an NSArray populated with numbers from 1 to 100,000.
Loop over some code that deletes certain elements of the NSArray when certain conditions are met.
Store the resultant NSArray.
However the above steps will also be looped over many times and so I need a fast way of making this NSArray that has 100,000 number elements.
So what is the fastest way of doing it?
Is there an alternative to iteratively populating an Array using a for loop? Such as an NSArray method that could do this quickly for me?
Or perhaps I could make the NSArray with the 100,000 numbers by any means the first time. And then create every new NSArray (for step 1) by using method arraywithArray? (is it quicker way of doing it?)
Or perhaps you have something completely different in mind that will achieve what I want.
edit: replace NSArray with NSMutableArray in above post
It is difficult to tell in advance which method will be the fastest. I like the block based functions, e.g.
NSMutableArray *array = ...; // your mutable array
NSIndexSet *toBeRemoved = [array indexesOfObjectsPassingTest:^BOOL(NSNumber *num, NSUInteger idx, BOOL *stop) {
// Block is called for each number "num" in the array.
// return YES if the element should be removed and NO otherwise;
}];
[array removeObjectsAtIndexes:toBeRemoved];
You should probably start with a correctly working algorithm and then use Instruments for profiling.
You may want to look at NSMutableIndexSet. It is designed to efficiently store ranges of numbers.
You can initialize it like this:
NSMutableIndexSet *set = [[NSMutableIndexSet alloc]
initWithIndexesInRange:NSMakeRange(1, 100000)];
Then you can remove, for example, 123 from it like this:
[set removeIndex:123];
Or you can remove 400 through 409 like this:
[set removeIndexesInRange:NSMakeRange(400, 10)];
You can iterate through all of the remaining indexes in the set like this:
[set enumerateIndexesUsingBlock:^(NSUInteger i, BOOL *stop) {
NSLog(#"set still includes %lu", (unsigned long)i);
}];
or, more efficiently, like this:
[set enumerateRangesUsingBlock:^(NSRange range, BOOL *stop) {
NSLog(#"set still includes %lu indexes starting at %lu",
(unsigned long)range.length, (unsigned long)range.location);
}];
I'm quite certain it will be fastest to create the array using a c array, then creating an NSArray from that (benchmark coming soon). Depending on how you want to delete the numbers, it may be fastest to do that in the initial loop:
const int max_num = 100000;
...
id *nums = malloc(max_num * sizeof(*nums));
int c = 0;
for(int i = 1; i <= max_num; i++) {
if(!should_skip(i)) nums[c++] = #(i);
}
NSArray *nsa = [NSArray arrayWithObjects:nums count:c];
First benchmark was somewhat surprising. For 100M objects:
NSArray alloc init: 8.6s
NSArray alloc initWithCapacity: 8.6s
id *nums: 6.4s
So an array is faster, but not by as much as I expected.
You can use fast enumeration to search through the array.
for(NSNumber item in myArrayOfNumbers)
{
If(some condition)
{
NSLog(#"Found an Item: %#",item);
}
}
You might want to reconsider what you are doing here. Ask yourself why you want such an array. If your goal is to manipulate an arbitrarily large collection of integers, you'll likely prefer to use NSIndexSet (and its mutable counterpart).
If you really want to manipulate a NSArray in the most efficient way, you will want to implement a dedicated subclass that is especially optimized for this kind of job.

Getting Top 10 Highest Numbers From Array?

I am having a bit of a issue. I have an NSMutableDictionary with 10 NSMutableArrays in it. Each array has somewhere between 0-10 numbers which could each be any integer, e.g. 12 or 103.
What I need to do is get the top 10 highest numbers from across each of the arrays. The trouble is, I need to keep a reference of the array it came from in the dictionary (the key) and the index position of the number from the array it came form.
Easiest way, is to sort the array in Descending order, and then grab the first 10 indexes
Or if they are inside dictionaries, iterate the dictionary allValues, grab all the arrays, add all the elements inside a common array, and sort that
It seems as if the data structure you want to end up with is an array of objects, where each object is functionally similar to an "index path" except that it's composed of a string (key) and a value (offset).
Assuming that the actual search for highest numbers isn't in question, then I'd suggest creating one of these objects whenever you find a candidate number so that, once the top ten are found, the objects can be used as back-pointers to the numbers' source locations.
Sounds like some sort of homework :)
So you have this:
NSMutableDictionary* source = [#{
#"1" : #[ #10, #20, #100 … ],
#"2" : #[ #8, #42, #17 … ]
} mutableCopy];
So lets start by creating another arrangement:
NSMutableArray* numbers = [NSMutableArray new];
for (NSArray* array in source.allValues)
{
for (NSNumber* number in array)
{
[numbers addObject: #{ #"number" : number, #"parent" : array }];
}
}
This is what we get:
#[
#{ #"number" : #10, #"parent" : <array> },
#{ #"number" : #20, #"parent" : <array> },
…
]
Now we can sort and find the numbers you wanted.
[numbers sortUsingComparator: ^( id lhs, id rhs ){
return [((NSDictionary*) rhs)[#"number"] compare: ((NSDictionary*) lhs)[#"number"]];
}];
NSArray* topNumbers = [numbers subarrayWithRange: NSMakeRange( 0, 10 )];
Here we are. topNumbers contains the numbers you needed along the source array.
This is quite a naive way to do it. It can be optimized both in CPU time and memory usage by a fair amount. But hey, keep it simple is not a bad thing.
Not addressed: what if the tenth and eleventh numbers are equal? (adressed here: Pick Out Specific Number from Array?) range checks. not tested, not even compiled. ;)
Walk through the arrays creating an object/structure for each element, consisting of the numeric "key" value and the "path" (array indices) to the element. Sort the objects/structures so created. (This is referred to as a "tag sort".)
The other approach, if you only need the top N values (where N << total number of entries) is to create an array of N elements, consisting of the above key and path info. Scan through all the arrays and compare each array element to the smallest key of the N currently stored. If you find an element larger than the smallest stored, replace the smallest stored and sort the N elements to select a new smallest stored.
You have to short your array in descending order using 'C' logic. Here i'm going to give an example according to your condition....
// adding 20 elements in an array, suppose this is your original array (array1).
NSMutableArray *array1 = [[NSMutableArray alloc]init];
for(int i=0;i<20;i++)
{
NSString *str = [NSString stringWithFormat:#"%d",(i*4)];
[array1 addObject:str];
}
//make a copy of your original array
NSMutableArray *array2 = [[NSMutableArray alloc]initWithArray:array1];
// this is the array which will get your sorting list
NSMutableArray *array3 = [[NSMutableArray alloc]init];
//declare an integer for compare as a maximum number and it to 0 initially
int max = 0;
// this is the logic to sort an array
for(int i=0;i<20;i++)
{
for(int j=0;j<[array2 count];j++)
{
int f = [[array2 objectAtIndex:j] intValue];
if(max<f)
{
max = f;
}
}
NSString *str = [[NSNumber numberWithInt:max]stringValue];
//max has a maximum value then add it to array3 and remove from array2
//for a new shorting
[array3 addObject:str];
[array2 removeObject:str];
// set 0 to max again
max = 0;
}
//now after all procedure print the **array3**
// and you will get all the objects in descending order,
//you can take top **10** variables from **array3**
NSLog(#"your sorting array %#", **array3**);
}

Non-repeating arc4random_uniform

I've been trying to get non-repeating arc4random_uniform to work for ages now for my iPhone app. Been over all the questions and answers relating to this on stackoverflow with no luck and now I'm hoping someone can help me. What I want to do is is choose 13 different random numbers between 1 and 104. I have gotten it to work to the point of it choosing 13 different numbers, but sometimes two of them are the same.
int rand = arc4random_uniform(104);
This is what I'm doing, and then I'm using the rand to choose from an array. If it's easier to shuffle the array and then pick 13 from the top, then I'll try that, but I would need help on how to, since that seems harder.
Thankful for any advice.
There's no guarantee whatsoever that ar4random_uniform() won't repeat. Think about it for a second -- you're asking it to produce a number between 0 and 103. If you do that one hundred and five times, it has no choice but to repeat one of its earlier selections. How could the function know how many times you're going to request a number?
You will either have to check the list of numbers that you've already gotten and request a new one if it's a repeat, or shuffle the array. There should be any number of questions on SO for that. Here's one of the oldest: What's the Best Way to Shuffle an NSMutableArray?.
There's also quite a few questions about non-repeating random numbers: https://stackoverflow.com/search?q=%5Bobjc%5D+non-repeating+random+numbers
You can create an NSMutableSet and implement it like this:
NSMutableArray* numbers = [[NSMutableArray alloc] initWithCapacity: 13];
NSMutableSet* usedValues = [[NSMutableSet alloc] initWithCapacity: 13];
for (int i = 0; i < 13; i++) {
int randomNum = arc4random_uniform(104);
while ([usedValues containsObject: [NSNumber numberWithInt: randomNum]) {
randomNum = arc4random_uniform(104)
}
[[usedValues addObject: [NSNumber numberWithInt: randomNum];
[numbers addObject: [[NSNumber numberWithInt: randomNum];
}
Alternatively you can also create a mutable array of 105 integers each a unique one, and arc4random_uniform([arrayname count]) and then delete that same one from the array, then you'll get a random int each time without repeating (though the smaller the array gets the easier it is to predict what the next number will be, just simple probability)
The best algorithm that I have found for this exact question is described here:
Algorithm to select a single, random combination of values?
Instead of shuffling an array of 104 elements, you just need to loop through 13 times. Here is my implementation of the algorithm in Objective C:
// Implementation of the Floyd algorithm from Programming Pearls.
// Returns a NSSet of num_values from 0 to max_value - 1.
static NSSet* getUniqueRandomNumbers(int num_values, int max_value) {
assert(max_value >= num_values);
NSMutableSet* set = [NSMutableSet setWithCapacity:num_values];
for (int i = max_value - num_values; i < max_value; ++i) {
NSNumber* rand = [NSNumber numberWithInt:arc4random_uniform(i)];
if ([set containsObject:rand]) {
[set addObject:[NSNumber numberWithInt:i]];
} else {
[set addObject:rand];
}
}
return set;
}

Best way to remove from NSMutableArray while iterating?

In Cocoa, if I want to loop through an NSMutableArray and remove multiple objects that fit a certain criteria, what's the best way to do this without restarting the loop each time I remove an object?
Thanks,
Edit: Just to clarify - I was looking for the best way, e.g. something more elegant than manually updating the index I'm at. For example in C++ I can do;
iterator it = someList.begin();
while (it != someList.end())
{
if (shouldRemove(it))
it = someList.erase(it);
}
For clarity I like to make an initial loop where I collect the items to delete. Then I delete them. Here's a sample using Objective-C 2.0 syntax:
NSMutableArray *discardedItems = [NSMutableArray array];
for (SomeObjectClass *item in originalArrayOfItems) {
if ([item shouldBeDiscarded])
[discardedItems addObject:item];
}
[originalArrayOfItems removeObjectsInArray:discardedItems];
Then there is no question about whether indices are being updated correctly, or other little bookkeeping details.
Edited to add:
It's been noted in other answers that the inverse formulation should be faster. i.e. If you iterate through the array and compose a new array of objects to keep, instead of objects to discard. That may be true (although what about the memory and processing cost of allocating a new array, and discarding the old one?) but even if it's faster it may not be as big a deal as it would be for a naive implementation, because NSArrays do not behave like "normal" arrays. They talk the talk but they walk a different walk. See a good analysis here:
The inverse formulation may be faster, but I've never needed to care whether it is, because the above formulation has always been fast enough for my needs.
For me the take-home message is to use whatever formulation is clearest to you. Optimize only if necessary. I personally find the above formulation clearest, which is why I use it. But if the inverse formulation is clearer to you, go for it.
One more variation. So you get readability and good performace:
NSMutableIndexSet *discardedItems = [NSMutableIndexSet indexSet];
SomeObjectClass *item;
NSUInteger index = 0;
for (item in originalArrayOfItems) {
if ([item shouldBeDiscarded])
[discardedItems addIndex:index];
index++;
}
[originalArrayOfItems removeObjectsAtIndexes:discardedItems];
This is a very simple problem. You just iterate backwards:
for (NSInteger i = array.count - 1; i >= 0; i--) {
ElementType* element = array[i];
if ([element shouldBeRemoved]) {
[array removeObjectAtIndex:i];
}
}
This is a very common pattern.
Some of the other answers would have poor performance on very large arrays, because methods like removeObject: and removeObjectsInArray: involve doing a linear search of the receiver, which is a waste because you already know where the object is. Also, any call to removeObjectAtIndex: will have to copy values from the index to the end of the array up by one slot at a time.
More efficient would be the following:
NSMutableArray *array = ...
NSMutableArray *itemsToKeep = [NSMutableArray arrayWithCapacity:[array count]];
for (id object in array) {
if (! shouldRemove(object)) {
[itemsToKeep addObject:object];
}
}
[array setArray:itemsToKeep];
Because we set the capacity of itemsToKeep, we don't waste any time copying values during a resize. We don't modify the array in place, so we are free to use Fast Enumeration. Using setArray: to replace the contents of array with itemsToKeep will be efficient. Depending on your code, you could even replace the last line with:
[array release];
array = [itemsToKeep retain];
So there isn't even a need to copy values, only swap a pointer.
You can use NSpredicate to remove items from your mutable array. This requires no for loops.
For example if you have an NSMutableArray of names, you can create a predicate like this one:
NSPredicate *caseInsensitiveBNames =
[NSPredicate predicateWithFormat:#"SELF beginswith[c] 'b'"];
The following line will leave you with an array that contains only names starting with b.
[namesArray filterUsingPredicate:caseInsensitiveBNames];
If you have trouble creating the predicates you need, use this apple developer link.
I did a performance test using 4 different methods. Each test iterated through all elements in a 100,000 element array, and removed every 5th item. The results did not vary much with/ without optimization. These were done on an iPad 4:
(1) removeObjectAtIndex: -- 271 ms
(2) removeObjectsAtIndexes: -- 1010 ms (because building the index set takes ~700 ms; otherwise this is basically the same as calling removeObjectAtIndex: for each item)
(3) removeObjects: -- 326 ms
(4) make a new array with objects passing the test -- 17 ms
So, creating a new array is by far the fastest. The other methods are all comparable, except that using removeObjectsAtIndexes: will be worse with more items to remove, because of the time needed to build the index set.
Either use loop counting down over indices:
for (NSInteger i = array.count - 1; i >= 0; --i) {
or make a copy with the objects you want to keep.
In particular, do not use a for (id object in array) loop or NSEnumerator.
For iOS 4+ or OS X 10.6+, Apple added passingTest series of APIs in NSMutableArray, like – indexesOfObjectsPassingTest:. A solution with such API would be:
NSIndexSet *indexesToBeRemoved = [someList indexesOfObjectsPassingTest:
^BOOL(id obj, NSUInteger idx, BOOL *stop) {
return [self shouldRemove:obj];
}];
[someList removeObjectsAtIndexes:indexesToBeRemoved];
Nowadays you can use reversed block-based enumeration. A simple example code:
NSMutableArray *array = [#[#{#"name": #"a", #"shouldDelete": #(YES)},
#{#"name": #"b", #"shouldDelete": #(NO)},
#{#"name": #"c", #"shouldDelete": #(YES)},
#{#"name": #"d", #"shouldDelete": #(NO)}] mutableCopy];
[array enumerateObjectsWithOptions:NSEnumerationReverse usingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
if([obj[#"shouldDelete"] boolValue])
[array removeObjectAtIndex:idx];
}];
Result:
(
{
name = b;
shouldDelete = 0;
},
{
name = d;
shouldDelete = 0;
}
)
another option with just one line of code:
[array filterUsingPredicate:[NSPredicate predicateWithFormat:#"shouldDelete == NO"]];
In a more declarative way, depending on the criteria matching the items to remove you could use:
[theArray filterUsingPredicate:aPredicate]
#Nathan should be very efficient
Here's the easy and clean way. I like to duplicate my array right in the fast enumeration call:
for (LineItem *item in [NSArray arrayWithArray:self.lineItems])
{
if ([item.toBeRemoved boolValue] == YES)
{
[self.lineItems removeObject:item];
}
}
This way you enumerate through a copy of the array being deleted from, both holding the same objects. An NSArray holds object pointers only so this is totally fine memory/performance wise.
Add the objects you want to remove to a second array and, after the loop, use -removeObjectsInArray:.
this should do it:
NSMutableArray* myArray = ....;
int i;
for(i=0; i<[myArray count]; i++) {
id element = [myArray objectAtIndex:i];
if(element == ...) {
[myArray removeObjectAtIndex:i];
i--;
}
}
hope this helps...
Why don't you add the objects to be removed to another NSMutableArray. When you are finished iterating, you can remove the objects that you have collected.
How about swapping the elements you want to delete with the 'n'th element, 'n-1'th element and so on?
When you're done you resize the array to 'previous size - number of swaps'
If all objects in your array are unique or you want to remove all occurrences of an object when found, you could fast enumerate on an array copy and use [NSMutableArray removeObject:] to remove the object from the original.
NSMutableArray *myArray;
NSArray *myArrayCopy = [NSArray arrayWithArray:myArray];
for (NSObject *anObject in myArrayCopy) {
if (shouldRemove(anObject)) {
[myArray removeObject:anObject];
}
}
benzado's anwser above is what you should do for preformace. In one of my applications removeObjectsInArray took a running time of 1 minute, just adding to a new array took .023 seconds.
I define a category that lets me filter using a block, like this:
#implementation NSMutableArray (Filtering)
- (void)filterUsingTest:(BOOL (^)(id obj, NSUInteger idx))predicate {
NSMutableIndexSet *indexesFailingTest = [[NSMutableIndexSet alloc] init];
NSUInteger index = 0;
for (id object in self) {
if (!predicate(object, index)) {
[indexesFailingTest addIndex:index];
}
++index;
}
[self removeObjectsAtIndexes:indexesFailingTest];
[indexesFailingTest release];
}
#end
which can then be used like this:
[myMutableArray filterUsingTest:^BOOL(id obj, NSUInteger idx) {
return [self doIWantToKeepThisObject:obj atIndex:idx];
}];
A nicer implementation could be to use the category method below on NSMutableArray.
#implementation NSMutableArray(BMCommons)
- (void)removeObjectsWithPredicate:(BOOL (^)(id obj))predicate {
if (predicate != nil) {
NSMutableArray *newArray = [[NSMutableArray alloc] initWithCapacity:self.count];
for (id obj in self) {
BOOL shouldRemove = predicate(obj);
if (!shouldRemove) {
[newArray addObject:obj];
}
}
[self setArray:newArray];
}
}
#end
The predicate block can be implemented to do processing on each object in the array. If the predicate returns true the object is removed.
An example for a date array to remove all dates that lie in the past:
NSMutableArray *dates = ...;
[dates removeObjectsWithPredicate:^BOOL(id obj) {
NSDate *date = (NSDate *)obj;
return [date timeIntervalSinceNow] < 0;
}];
Iterating backwards-ly was my favourite for years , but for a long time I never encountered the case where the 'deepest' ( highest count) object was removed first. Momentarily before the pointer moves on to the next index there ain't anything and it crashes.
Benzado's way is the closest to what i do now but I never realised there would be the stack reshuffle after every remove.
under Xcode 6 this works
NSMutableArray *itemsToKeep = [NSMutableArray arrayWithCapacity:[array count]];
for (id object in array)
{
if ( [object isNotEqualTo:#"whatever"]) {
[itemsToKeep addObject:object ];
}
}
array = nil;
array = [[NSMutableArray alloc]initWithArray:itemsToKeep];