Printing the most frequent words in a file(string) Objective-C - objective-c

New to objective-c, need help to solve this:
Write a function that takes two parameters:
1 a String representing a text document and
2 an integer providing the number of items to return. Implement the function such that it returns a list of Strings ordered by word frequency, the most frequently occurring word first. Use your best judgement to decide how words are separated. Your solution should run in O(n) time where n is the number of characters in the document. Implement this function as you would for a production/commercial system. You may use any standard data structures.
What I tried so far (work in progress): ` // Function work in progress
// -(NSString *) wordFrequency:(int)itemsToReturn inDocument:(NSString *)textDocument ;
// Get the desktop directory (where the text document is)
NSURL *desktopDirectory = [[NSFileManager defaultManager] URLForDirectory:NSDesktopDirectory inDomain:NSUserDomainMask appropriateForURL:nil create:NO error:nil];
// Create full path to the file
NSURL *fullPath = [desktopDirectory URLByAppendingPathComponent:#"document.txt"];
// Load the string
NSString *content = [NSString stringWithContentsOfURL:fullPath encoding:NSUTF8StringEncoding error:nil];
// Optional code for confirmation - Check that the file is here and print its content to the console
// NSLog(#" The string is:%#", content);
// Create an array with the words contain in the string
NSArray *myWords = [content componentsSeparatedByString:#" "];
// Optional code for confirmation - Print content of the array to the console
// NSLog(#"array: %#", myWords);
// Take an NSCountedSet of objects in an array and order those objects by their object count then returns a sorted array, sorted in descending order by the count of the objects.
NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:myWords];
NSMutableArray *dictArray = [NSMutableArray array];
[countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
[dictArray addObject:#{#"word": obj,
#"count": #([countedSet countForObject:obj])}];
}];
NSLog(#"Words sorted by count: %#", [dictArray sortedArrayUsingDescriptors:#[[NSSortDescriptor sortDescriptorWithKey:#"count" ascending:NO]]]);
}
return 0;
}

This is a classic job for map-reduce. I am very familiar with objective-c, but as far as I know - these concepts are very easily implemented in it.
1st map-reduce is counting the number of occurances.
This step is basically grouping elements according to the word, and then counting them.
map(text):
for each word in text:
emit(word,'1')
reduce(word,list<number>):
emit (word,sum(number))
An alternative for using map-reduce is to use iterative calculation and a hash-map which will be a histogram that counts number of occurances per word.
After you have a a list of numbers and occurances, all you got to do is actually get top k out of them. This is nicely explained in this thread: Store the largest 5000 numbers from a stream of numbers.
In here, the 'comparator' is #occurances of each word, as calculated in previous step.
The basic idea is to use a min-heap, and store k first elements in it.
Now, iterate the remaining of the elements, and if the new one is bigger than the top (minimal element in the heap), remove the top and replace it with the new element.
At the end, you have a heap containing k largest elements, and they are already in a heap - so they are already sorted (though in reversed order, but dealing with it is fairly easy).
Complexity is O(nlogK)
To achieve O(n + klogk) you may use selection algorithm instead of the min-heap solution to get top-k, and then sort the retrieved elements.

Related

Find string in array

I have a fun challenging problem. So I have a mutable array that contains all of my items. I have a textfield that **might have one or two of these items if the person types them in. **
items= [[NSArray alloc]initWithObjects:#"apple", #"orange", #"pear", nil];
items2= [[NSArray alloc]initWithObjects:#"cheese", #"milk", #"eggs", nil];
Allitems= [NSMutableArray array];
[Allitems addObjectsFromArray:items];
[Allitems addObjectsFromArray:items2];
NSArray*WORDS =[Textfield componentsSeparatedByString:#" "];
I am trying to detect what specific words from **Allitems are in the textfield. (If the textfield contains any string from ALLitems, how can I find what specific string?**
for (int i = 0; i < [Allitems count]; i++)
{
NSString *grabstring;
grabstring=[Allitems objectAtIndex:i];
if (textfield isEqualto:grabstring){
?????
pull that specific string from allitems.
}
}
You want the intersection of two sets:
NSMutableSet* intersectionSet = [NSMutableSet setWithArray:Allitems];
[intersectionSet intersectSet:[NSSet setWithArray:WORDS]];
NSArray* intersectionArray = [intersectionSet allObjects];
After this intersectionArray contains the items that are present in both Allitems and WORDS.
BTW, why do you capitalise variable names in a non-standard and inconsistent manner? Why not just allItems and words?
As #Arkku suggests: It's better to switch the arrays. In your example it does not matter much, but in case Allitems were (very) big, you can save (a lot of) memory and CPU usage:
NSMutableSet* intersectionSet = [NSMutableSet setWithArray:WORDS];
[intersectionSet intersectSet:[NSSet setWithArray:Allitems]];
NSArray* intersectionArray = [intersectionSet allObjects];
There are a various ways of doing it, each with different pros and cons. Let's have the following (consistently capitalized) variables in common for each case:
NSArray *allItems = #[ #"apple", #"orange", #"pear", #"cheese", #"milk", #"egg" ];
NSString *textFieldText = #"CHEESE ham pear";
NSArray *words = [textFieldText.lowercaseString componentsSeparatedByString:#" "];
NSPredicate
NSArray *matchingItems = [allItems filteredArrayUsingPredicate:
[NSPredicate predicateWithFormat:#"SELF IN %#", words]];
This is perhaps the shortest (in lines of code) way, but not the most performant if allItems can be very long as it requires traversing all of it.
Iteration
Of course you could also simply iterate over the collection and do the matching manually:
NSMutableArray *matchingItems = [NSMutableArray array];
for (NSString *item in allItems) {
if ([words containsObject:item]) {
[matchingItems addObject:item];
}
}
Again requires traversing all of allItems (although you could break the iteration if all words are matched).
In addition to the for loop there are of course many other ways for iteration, e.g., enumerateObjectsUsingBlock:, but they are unlikely to have any advantage here.
NSSet
NSSet is often a good option for this kind of matching since testing set membership is faster than with NSArray. However, if using the most straightforward method intersetSet: (in NSMutableSet) care must be taken to not inadvertently create a large mutable set only to discard most of its items.
If the order of allItems does not matter, the best way would be to change it from an array into a set and always keep that set around, i.e., instead of creating the array allItems, you would create an NSSet:
NSSet *setOfAllItems = [NSSet setWithArray:allItems];
Or if it needs to be mutable:
NSMutableSet *setOfAllItems = [NSMutableSet set];
[setOfAllItems addObjectsFromArray:items1];
[setOfAllItems addObjectsFromArray:items2];
Then, when you have that set, you create a temporary mutable set out of words (which is presumably always the smaller set):
NSMutableSet *setOfMatches = [NSMutableSet setWithArray:words];
[setOfMatches intersectSet:setOfAllItems];
NSArray *matchingItems = setOfMatches.allObjects;
This would be likely be the most performant solution if setOfAllItems is large, but note that the matches will then need to be exact. The other methods are more easily adapted to things like matching the strings in words against fields of objects or keys in a dictionary (and returning the matched objects rather than the strings). In such a case one possibility to consider would be an NSDictionary mapping the words to match to the objects to return (also fast to then iterate over words and test for membership in the dictionary).
Conversion to string
And, since the question included conversion of matches to a string:
[matchingItems componentsJoinedByString:#", "]
In the example case this would result in the string "pear, cheese" (or possibly "cheese, pear" if using sets).

Optimise searching in an array, search by comparison of 2 strings in Objective C

I have a list of contacts retrieved from Address book stored inside a MutableArray contactList. Each contact is an object which has properties like "contactName, contactImage.... etc".
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0),^{
//getAllContacts is a method which returns a Mutable array of Objects
self.contactList = [NSMutableArray arrayWithArray:[instance getAllContacts]];
//groupLetterToLoad could be "DEF"
for(int j=0; j<self.groupLetterToLoad.length;j++) {
//1st iteration D, 2nd iteration E and 3rd iteration F
NSString *testChar = [NSString stringWithFormat:#"%c",[self.groupLetterToLoad characterAtIndex:j]];
//check D,E,F with contact name property's first letter of the contact list array
for(int i=0;i<self.contactList.count;i++) {
NSString *firstChar =[[[self.contactList objectAtIndex:i] contactName] substringToIndex:1];
if([testChar isEqualToString: firstChar]) {
pos=i; //retrieve the index of the matched position
break;
}
}
if(pos!=-1) break;
}
});
Now this has two for loops (Time O(n^2)).. The disadvantage here is, if the groupLetterToLoad is "WXYZ", then comparison will start from W with A to W with Z.. How can I optimise it?
Ordering your array by contactName and performing a half interval search will reduce your complexity greatly if can avoid sorting every time you search (hint: keep [instance getAllContacts] sorted).
http://rosettacode.org/wiki/Binary_search#Objective-C - that's a starting point. you could replace the compare: with your first character comparison.
This isn't an algorithmic improvement, but the way you're handling characters is about the slowest way possible. If your group letters are really ASCII letters as you indicate, try this (I include the "if" in my answer because doing correct comparison of non-ASCII is really best left up to NSString):
1) Instead of using -substringToIndex to get the first character, use -characterAtIndex:0 and store a unichar
2) Instead of using +stringWithFormat:#"%c" to make a single character string, just use -characterAtIndex: and store it in a unichar
3) Instead of using -isEqualToString:, use == on the unichars
Unrelated, I'm pretty suspicious of the thread-safety of this. Are all those properties on self and instance you're accessing really not accessed on any other queue or thread?

Parsing text from one array into another array in Objective C

I created an array called NSArray citiesList from a text file separating each object by the "," at the end of the line. Here is what the raw data looks like from the text file.
City:San Jose|County:Santa Clara|Pop:945942,
City:San Francisco|County:San Francisco|Pop:805235,
City:Oakland|County:Alameda|Pop:390724,
City:Fremont|County:Alameda|Pop:214089,
City:Santa Rosa|County:Sonoma|Pop:167815,
The citiesList array is fine (I can see count the objects, see the data, etc.) Now I want to parse out the city and Pop: in each of the array objects. I assume that you create a for loop to run through the objects, so if I wanted to create a mutable array called cityNames to populate just the city names into this array I would use this kind of for loop:
SMutableArray *cityNames = [NSMutableArray array];
for (NSString *i in citiesList) {
[cityNames addObject:[???]];
}
My question is what is what query should I use to find just the City: San Francisco from the objects in my array?
You can continue to use componentsSeparatedByString to divide up the sections and key/value pairs. Or you can use an NSScanner to read through the string parsing out the key/value pairs. You could use rangeOfString to find the "|" and then extract a range. So many options.
Many good suggestions in the answers here in case you really want to construct an algorithm to parse the string.
As an alternative to that, you can also look at it as a problem of declaring the structure of the data and then just have the system do the parsing. For a case like yours, regular expressions will do that nicely. Whether you prefer to do it one way or the other is largely a question of taste and coding standards.
In your specific case (if the city name is all you need to extract from the string), then also notice that there is a bit of a shortcut available that will turn it into a one-line solution: Match the whole string, define a single capture group and substitute that one to make a new string:
NSString *city = [i stringByReplacingOccurrencesOfString: #".*City:(.*?)\\|.*"
withString: #"$1"
options: NSRegularExpressionSearch
range: NSMakeRange(0, row.length)];
The variable i is the same that you have defined in your for-loop, i.e. a string containing a string representing a line in your input file:
City:San Jose|County:Santa Clara|Pop:945942,
I have added the initial .* to make the pattern robust to future new fields added to the rows. You can remove it if you don't like it.
The $1 in the substitution string represents the first capture group, i.e. the parenthesis in the regex pattern. In this specific case, the substring containing the city name. Had there been more capture groups, they would have been named $2-$9. You can check the documentation on NSRegularExpression and NSString if you want to know more.
Regular expressions are a topic all of their own, not confined to the Cocoa, although all platforms use regex implementations with their own idiosyncrasies.
You want to use componentsSeparatedByString: as below. (These lines do no error checking)
NSArray *fields = [i componentsSeparatedByString:#"|"];
NSString *city = [[[fields objectAtIndex:0] componentsSeparatedByString:#":"] objectAtIndex:1];
NSString *county = [[[fields objectAtIndex:1] componentsSeparatedByString:#":"] objectAtIndex:1];
If you can drop the keys, and a couple delimiters like this:
San Jose|Santa Clara|945942
San Francisco|San Francisco|805235
Oakland|Alameda|390724
Fremont|Alameda|214089
Santa Rosa|Sonoma|167815
Then you can simplify the code (still no error checking):
NSArray *fields = [i componentsSeparatedByString:#"|"];
NSString *city = [fields objectAtIndex:0];
NSString *county = [fields objectAtIndex:1];
for (NSString *i in citiesList) {
// Divide each city into an array, where object 0 is the name, 1 is county, 2 is pop
NSArray *stringComponents = [i componentsSeparatedByString:#"|"];
// Remove "City:" from string and add the city name to the array
NSString *cityName = [[stringComponents objectAtIndex:0] stringByReplacingCharactersInRange:NSMakeRange(0, 5) withString:#""];
[cityNames addObject:cityName];
}

best way to populate NSArray in this algorithm

I intend to make a program that does the following:
Create an NSArray populated with numbers from 1 to 100,000.
Loop over some code that deletes certain elements of the NSArray when certain conditions are met.
Store the resultant NSArray.
However the above steps will also be looped over many times and so I need a fast way of making this NSArray that has 100,000 number elements.
So what is the fastest way of doing it?
Is there an alternative to iteratively populating an Array using a for loop? Such as an NSArray method that could do this quickly for me?
Or perhaps I could make the NSArray with the 100,000 numbers by any means the first time. And then create every new NSArray (for step 1) by using method arraywithArray? (is it quicker way of doing it?)
Or perhaps you have something completely different in mind that will achieve what I want.
edit: replace NSArray with NSMutableArray in above post
It is difficult to tell in advance which method will be the fastest. I like the block based functions, e.g.
NSMutableArray *array = ...; // your mutable array
NSIndexSet *toBeRemoved = [array indexesOfObjectsPassingTest:^BOOL(NSNumber *num, NSUInteger idx, BOOL *stop) {
// Block is called for each number "num" in the array.
// return YES if the element should be removed and NO otherwise;
}];
[array removeObjectsAtIndexes:toBeRemoved];
You should probably start with a correctly working algorithm and then use Instruments for profiling.
You may want to look at NSMutableIndexSet. It is designed to efficiently store ranges of numbers.
You can initialize it like this:
NSMutableIndexSet *set = [[NSMutableIndexSet alloc]
initWithIndexesInRange:NSMakeRange(1, 100000)];
Then you can remove, for example, 123 from it like this:
[set removeIndex:123];
Or you can remove 400 through 409 like this:
[set removeIndexesInRange:NSMakeRange(400, 10)];
You can iterate through all of the remaining indexes in the set like this:
[set enumerateIndexesUsingBlock:^(NSUInteger i, BOOL *stop) {
NSLog(#"set still includes %lu", (unsigned long)i);
}];
or, more efficiently, like this:
[set enumerateRangesUsingBlock:^(NSRange range, BOOL *stop) {
NSLog(#"set still includes %lu indexes starting at %lu",
(unsigned long)range.length, (unsigned long)range.location);
}];
I'm quite certain it will be fastest to create the array using a c array, then creating an NSArray from that (benchmark coming soon). Depending on how you want to delete the numbers, it may be fastest to do that in the initial loop:
const int max_num = 100000;
...
id *nums = malloc(max_num * sizeof(*nums));
int c = 0;
for(int i = 1; i <= max_num; i++) {
if(!should_skip(i)) nums[c++] = #(i);
}
NSArray *nsa = [NSArray arrayWithObjects:nums count:c];
First benchmark was somewhat surprising. For 100M objects:
NSArray alloc init: 8.6s
NSArray alloc initWithCapacity: 8.6s
id *nums: 6.4s
So an array is faster, but not by as much as I expected.
You can use fast enumeration to search through the array.
for(NSNumber item in myArrayOfNumbers)
{
If(some condition)
{
NSLog(#"Found an Item: %#",item);
}
}
You might want to reconsider what you are doing here. Ask yourself why you want such an array. If your goal is to manipulate an arbitrarily large collection of integers, you'll likely prefer to use NSIndexSet (and its mutable counterpart).
If you really want to manipulate a NSArray in the most efficient way, you will want to implement a dedicated subclass that is especially optimized for this kind of job.

Is there a way to get Spell Check data from an NSString?

I'm writing a simple shift cipher iPhone app as a pet project, and one piece of functionality I'm currently designing is a "universal" decryption of an NSString, that returns an NSArray, all of NSStrings:
- (NSArray*) decryptString: (NSString*)ciphertext{
NSMutableArray* theDecryptions = [NSMutableArray arrayWithCapacity:ALPHABET];
for (int i = 0; i < ALPHABET; ++i) {
NSString* theNewPlainText = [self decryptString:ciphertext ForShift:i];
[theDecryptions insertObject:theNewPlainText
atIndex:i];
}
return theDecryptions;
}
I'd really like to pass this NSArray into another method that attempts to spell check each individual string within the array, and builds a new array that puts the strings with the fewest typo'd words at lower indicies, so they're displayed first. I'd like to use the system's dictionary like a text field would, so I can match against words that have been trained into the phone by its user.
My current guess is to split a given string up into words, then spell check each with NSSpellChecker's -checkSpellingOfString:StartingAt: and using the number of correct words to sort the Array. Is there an existing library method or well-accepted pattern that would help return such a value for a given string?
Well, I found a solution that works using UIKit/UITextChecker. It correctly finds the user's most preferred language dictionary, but I'm not sure if it includes learned words in the actual rangeOfMisspelledWords... method. If it doesn't, calling [UITextChecker hasLearnedWord] on currentWord inside the bottom if statement should be enough to find user-taught words.
As noted in the comments, it may be prudent to call rangeOfMisspelledWords with each of the top few languages in [UITextChecker availableLanguages], to help multilingual users.
-(void) checkForDefinedWords {
NSArray* words = [message componentsSeparatedByString:#" "];
NSInteger wordsFound = 0;
UITextChecker* checker = [[UITextChecker alloc] init];
//get the first language in the checker's memory- this is the user's
//preferred language.
//TODO: May want to search with every language (or top few) in the array
NSString* preferredLang = [[UITextChecker availableLanguages] objectAtIndex:0];
//for each word in the array, determine whether it is a valid word
for(NSString* currentWord in words){
NSRange range;
range = [checker rangeOfMisspelledWordInString:currentWord
range:NSMakeRange(0, [currentWord length])
startingAt:0
wrap:NO
language:preferredLang];
//if it is valid (no errors found), increment wordsFound
if (range.location == NSNotFound) {
//NSLog(#"%# %#", #"Valid Word found:", currentWord);
wordsFound++;
}
else {
//NSLog(#"%# %#", #"Invalid Word found:", currentWord);
}
}
//After all "words" have been searched, save wordsFound to validWordCount
[self setValidWordCount:wordsFound];
[checker release];
}