How to get the first N words from a NSString in Objective-C? - objective-c

What's the simplest way, given a string:
NSString *str = #"Some really really long string is here and I just want the first 10 words, for example";
to result in an NSString with the first N (e.g., 10) words?
EDIT: I'd also like to make sure it doesn't fail if the str is shorter than N.

If the words are space-separated:
NSInteger nWords = 10;
NSRange wordRange = NSMakeRange(0, nWords);
NSArray *firstWords = [[str componentsSeparatedByString:#" "] subarrayWithRange:wordRange];
if you want to break on all whitespace:
NSCharacterSet *delimiterCharacterSet = [NSCharacterSet whitespaceAndNewlineCharacterSet];
NSArray *firstWords = [[str componentsSeparatedByCharactersInSet:delimiterCharacterSet] subarrayWithRange:wordRange];
Then,
NSString *result = [firstWords componentsJoinedByString:#" "];

While Barry Wark's code works well for English, it is not the preferred way to detect word breaks. Many languages, such as Chinese and Japanese, do not separate words using spaces. And German, for example, has many compounds that are difficult to separate correctly.
What you want to use is CFStringTokenizer:
CFStringRef string; // Get string from somewhere
CFLocaleRef locale = CFLocaleCopyCurrent();
CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, string, CFRangeMake(0, CFStringGetLength(string)), kCFStringTokenizerUnitWord, locale);
CFStringTokenizerTokenType tokenType = kCFStringTokenizerTokenNone;
unsigned tokensFound = 0, desiredTokens = 10; // or the desired number of tokens
while(kCFStringTokenizerTokenNone != (tokenType = CFStringTokenizerAdvanceToNextToken(tokenizer)) && tokensFound < desiredTokens) {
CFRange tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer);
CFStringRef tokenValue = CFStringCreateWithSubstring(kCFAllocatorDefault, string, tokenRange);
// Do something with the token
CFShow(tokenValue);
CFRelease(tokenValue);
++tokensFound;
}
// Clean up
CFRelease(tokenizer);
CFRelease(locale);

Based on Barry's answer, I wrote a function for the sake of this page (still giving him credit on SO)
+ (NSString*)firstWords:(NSString*)theStr howMany:(NSInteger)maxWords {
NSArray *theWords = [theStr componentsSeparatedByString:#" "];
if ([theWords count] < maxWords) {
maxWords = [theWords count];
}
NSRange wordRange = NSMakeRange(0, maxWords - 1);
NSArray *firstWords = [theWords subarrayWithRange:wordRange];
return [firstWords componentsJoinedByString:#" "];
}

Here's my solution, derived from the answers given here, for my own problem of removing the first word from a string...
NSMutableArray *words = [NSMutableArray arrayWithArray:[lowerString componentsSeparatedByString:#" "]];
[words removeObjectAtIndex:0];
return [words componentsJoinedByString:#" "];

Related

Take all numbers separated by spaces from a string and place in an array

I have a NSString formatted like this:
"Hello world 12 looking for some 56"
I want to find all instances of numbers separated by whitespace and place them in an NSArray. I dont want to remove the numbers though.
Whats the best way of achieving this?
This is a solution using regular expression as suggested in the comment.
NSString *string = #"Hello world 12 looking for some 56";
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:#"\\b\\d+" options:nil error:nil];
NSArray *matches = [expression matchesInString:string options:nil range:(NSMakeRange(0, string.length))];
NSMutableArray *result = [[NSMutableArray alloc] init];
for (NSTextCheckingResult *match in matches) {
[result addObject:[string substringWithRange:match.range]];
}
NSLog(#"%#", result);
First make an array using NSString's componentsSeparatedByString method and take reference to this SO question. Then iterate the array and refer to this SO question to check if an array element is number: Checking if NSString is Integer.
I don't know where you are looking to do perform this action because it may not be fast (such as if it's being called in a table cell it may be choppy) based upon the string size.
Code:
+ (NSArray *)getNumbersFromString:(NSString *)str {
NSMutableArray *retVal = [NSMutableArray array];
NSCharacterSet *numericSet = [NSCharacterSet decimalDigitCharacterSet];
NSString *placeholder = #"";
unichar currentChar;
for (int i = [str length] - 1; i >= 0; i--) {
currentChar = [str characterAtIndex:i];
if ([numericSet characterIsMember:currentChar]) {
placeholder = [placeholder stringByAppendingString:
[NSString stringWithCharacters:&currentChar
length:[placeholder length]+1];
} else {
if ([placeholder length] > 0) [retVal addObject:[placeholder intValue]];
else placeholder = #"";
return [retVal copy];
}
To explain what is happening above, essentially I am,
going through every character until I find a number
adding that number including any numbers after to a string
once it finds a number it adds it to an array
Hope this helps please ask for clarification if needed

Check if NSString only contains one character repeated

I want to know a simple and fast way to determine if all characters in an NSString are the same.
For example:
NSString *string = "aaaaaaaaa"
=> return YES
NSString *string = "aaaaaaabb"
=> return NO
I know that I can achieve it by using a loop but my NSString is long so I prefer a shorter and simpler way.
you can use this, replace first character with null and check lenght:
-(BOOL)sameCharsInString:(NSString *)str{
if ([str length] == 0 ) return NO;
return [[str stringByReplacingOccurrencesOfString:[str substringToIndex:1] withString:#""] length] == 0 ? YES : NO;
}
Here are two possibilities that fail as quickly as possible and don't (explicitly) create copies of the original string, which should be advantageous since you said the string was large.
First, use NSScanner to repeatedly try to read the first character in the string. If the loop ends before the scanner has reached the end of the string, there are other characters present.
NSScanner * scanner = [NSScanner scannerWithString:s];
NSString * firstChar = [s substringWithRange:[s rangeOfComposedCharacterSequenceAtIndex:0]];
while( [scanner scanString:firstChar intoString:NULL] ) continue;
BOOL stringContainsOnlyOneCharacter = [scanner isAtEnd];
Regex is also a good tool for this problem, since "a character followed by any number of repetitions of that character" is in very simply expressed with a single back reference:
// Match one of any character at the start of the string,
// followed by any number of repetitions of that same character
// until the end of the string.
NSString * patt = #"^(.)\\1*$";
NSRegularExpression * regEx =
[NSRegularExpression regularExpressionWithPattern:patt
options:0
error:NULL];
NSArray * matches = [regEx matchesInString:s
options:0
range:(NSRange){0, [s length]}];
BOOL stringContainsOnlyOneCharacter = ([matches count] == 1);
Both these options correctly deal with multi-byte and composed characters; the regex version also does not require an explicit check for the empty string.
use this loop:
NSString *firstChar = [str substringWithRange:NSMakeRange(0, 1)];
for (int i = 1; i < [str length]; i++) {
NSString *ch = [str substringWithRange:NSMakeRange(i, 1)];
if(![ch isEqualToString:firstChar])
{
return NO;
}
}
return YES;

Get a substring from an NSString until arriving to any letter in an NSArray - objective C

I am trying to parse a set of words that contain -- first greek letters, then english letters. This would be easy if there was a delimiter between the sets.That is what I've built so far..
- (void)loadWordFileToArray:(NSBundle *)bundle {
NSLog(#"loadWordFileToArray");
if (bundle != nil) {
NSString *path = [bundle pathForResource:#"alfa" ofType:#"txt"];
//pull the content from the file into memory
NSData* data = [NSData dataWithContentsOfFile:path];
//convert the bytes from the file into a string
NSString* string = [[NSString alloc] initWithBytes:[data bytes]
length:[data length]
encoding:NSUTF8StringEncoding];
//split the string around newline characters to create an array
NSString* delimiter = #"\n";
incomingWords = [string componentsSeparatedByString:delimiter];
NSLog(#"incomingWords count: %lu", (unsigned long)incomingWords.count);
}
}
-(void)parseWordArray{
NSLog(#"parseWordArray");
NSString *seperator = #" = ";
int i = 0;
for (i=0; i < incomingWords.count; i++) {
NSString *incomingString = [incomingWords objectAtIndex:i];
NSScanner *scanner = [NSScanner localizedScannerWithString: incomingString];
NSString *firstString;
NSString *secondString;
NSInteger scanPosition;
[scanner scanUpToString:seperator intoString:&firstString];
scanPosition = [scanner scanLocation];
secondString = [[scanner string] substringFromIndex:scanPosition+[seperator length]];
// NSLog(#"greek: %#", firstString);
// NSLog(#"english: %#", secondString);
[outgoingWords insertObject:[NSMutableArray arrayWithObjects:#"greek", firstString, #"english",secondString,#"category", #"", nil] atIndex:0];
[englishWords insertObject:[NSMutableArray arrayWithObjects:secondString,nil] atIndex:0];
}
}
But I cannot count on there being delimiters.
I have looked at this question. I want something similar. This would be: grab the characters in the string until an english letter is found. Then take the first group to one new string, and all the characters after to a second new string.
I only have to run this a few times, so optimization is not my highest priority.. Any help would be appreciated..
EDIT:
I've changed my code as shown below to make use of NSLinguisticTagger. This works, but is this the best way? Note that the interpretation for english characters is -- for some reason "und"...
The incoming string is: άγαλμα, το statue, only the last 6 characters are in english.
int j = 0;
for (j=0; j<incomingString.length; j++) {
NSString *language = [tagger tagAtIndex:j scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];
if ([language isEqual: #"und"]) {
NSLog(#"j is: %i", j);
int k = 0;
for (k=0; k<j; k++) {
NSRange range = NSMakeRange (0, k);
NSString *tempString = [incomingString substringWithRange:range ];
NSLog (#"tempString: %#", tempString);
}
return;
}
NSLog (#"Language: %#", language);
}
Alright so what you could do is use NSLinguisticTagger to find out the language of the word (or letter) and if the language has changed then you know where to split the string. You can use NSLinguisticTagger like this:
NSArray *tagschemes = #[NSLinguisticTagSchemeLanguage];
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagschemes options: NSLinguisticTagPunctuation | NSLinguisticTaggerOmitWhitespace];
[tagger setString:#"This is my string in English."];
NSString *language = [tagger tagAtIndex:0 scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];
//Loop through each index of the string's characters and check the language as above.
//If it has changed then you can assume the language has changed.
Alternatively you can use NSSpellChecker's requestCheckingOfString to get teh dominant language in a range of characters:
NSSpellChecker *spellChecker = [NSSpellChecker sharedSpellChecker];
[spellChecker setAutomaticallyIdentifiesLanguages:YES];
NSString *spellCheckText = #"Guten Herr Mustermann. Dies ist ein deutscher Text. Bitte löschen Sie diesen nicht.";
[spellChecker requestCheckingOfString:spellCheckText
range:(NSRange){0, [spellCheckText length]}
types:NSTextCheckingTypeOrthography
options:nil
inSpellDocumentWithTag:0
completionHandler:^(NSInteger sequenceNumber, NSArray *results, NSOrthography *orthography, NSInteger wordCount) {
NSLog(#"dominant language = %#", orthography.dominantLanguage);
}];
This answer has information on how to detect the language of an NSString.
Allow me to introduce two good friends of mine.
NSCharacterSet and NSRegularExpression.
Along with them, normalization. (In Unicode terms)
First, you should normalize strings before analyzing them against a character set.
You will need to look at the choices, but normalizing to all composed forms is the way I would go.
This means an accented character is one instead of two or more.
It simplifies the number of things to compare.
Next, you can easily build your own NSCharacterSet objects from strings (loaded from files even) to use to test set membership.
Lastly, regular expressions can achieve the same thing with Unicode Property Names as classes or categories of characters. Regular expressions could be more terse but more expressive.

How would I Find a Word, and Take all characters after the first occurrence of the word, but stop after an occurrence of a different word?

For instance
NSString *string = #"I need help finding a string";
NSString *newString = #"need";
I would need this to work not only to work for this string. An example would be to take a string and remove everything after the word "I " and before the word " help".
Thank you very much!
Moved from a comment for legibility:
NSString *string = #"I need help finding a string";
NSRange rr2 = [TWEET rangeOfString:#"I "];
NSRange rr3 = [TWEET rangeOfString:#" help"];
int lengt = rr3.location - rr2.location;
int location = rr2.location + rr2.length;
NSRange aa;
aa.location = location;
aa.length = lengt;
NSString *link;
link = [TWEET substringWithRange:aa];
NSLog(#"The link is %#", link);
One way would be to split the string into single words, and iterate through it, first searching for the first word, while adding every word to a new string until you found it and the searching for the second word and after you found that just add the remaining words.
This could look like this in code:
NSString *myString = #"I need help finding a string";
NSString *firstWord = #"need";
NSString *secondWord = #"a";
NSMutableString *newString = [NSMutableString stringWithString:#""];
int index = 0;
for (NSString *word in [myString componentsSeparatedByString:#" "]) {
if(index == 0) {
if([word isEqualToString:firstWord])
index = 1;
[newString appendFormat:#"%# ", word];
}
else if(index == 1) {
if([word isEqualToString:secondWord])
index = 2;
}
else
[newString appendFormat:#"%# ", word];
}

Replace characters in NSString

I am trying to replace all characters except last 4 in a String with *'s.
In objective-c there is a method in NSString class replaceStringWithCharactersInRange: withString: where I would give it range (0,[string length]-4) ) with string #"*". This is what it does: 123456789ABCD is modified to *ABCD while I am looking to make ********ABCD.
I understand that it replaced range I specified with string object. How to accomplish this ?
NSError *error;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\d" options:NSRegularExpressionCaseInsensitive error:&error];
NSString *newString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:#"*"];
This looks like a simple problem... get the first part string and return it with the last four characters appended to it.
Here is a function that returns the needed string :
-(NSString *)neededStringWithString:(NSString *)aString {
// if the string has less than or 4 characters, return nil
if([aString length] <= 4) {
return nil;
}
NSUInteger countOfCharToReplace = [aString length] - 4;
NSString *firstPart = #"*";
while(--countOfCharToReplace) {
firstPart = [firstPart stringByAppendingString:#"*"];
}
// range for the last four
NSRange lastFourRange = NSMakeRange([aString length] - 4, 4);
// return the combined string
return [firstPart stringByAppendingString:
[aString substringWithRange:lastFourRange]];
}
The most unintuitive part in Cocoa is creating the repeating stars without some kind of awkward looping. stringByPaddingToLength:withString:startingAtIndex: allows you to create a repeating string of any length you like, so once you have that, here's a simple solution:
NSInteger starUpTo = [string length] - 4;
if (starUpTo > 0) {
NSString *stars = [#"" stringByPaddingToLength:starUpTo withString:#"*" startingAtIndex:0];
return [string stringByReplacingCharactersInRange:NSMakeRange(0, starUpTo) withString:stars];
} else {
return string;
}
I'm not sure why the accepted answer was accepted, since it only works if everything but last 4 is a digit. Here's a simple way:
NSMutableString * str1 = [[NSMutableString alloc]initWithString:#"1234567890ABCD"];
NSRange r = NSMakeRange(0, [str1 length] - 4);
[str1 replaceCharactersInRange:r withString:[[NSString string] stringByPaddingToLength:r.length withString:#"*" startingAtIndex:0]];
NSLog(#"%#",str1);
You could use [theString substringToIndex:[theString length]-4] to get the first part of the string and then combine [theString length]-4 *'s with the second part. Perhaps their is an easier way to do this..
NSMutableString * str1 = [[NSMutableString alloc]initWithString:#"1234567890ABCD"];
[str1 replaceCharactersInRange:NSMakeRange(0, [str1 length] - 4) withString:#"*"];
NSLog(#"%#",str1);
it works
The regexp didn't work on iOS7, but perhaps this helps:
- (NSString *)encryptString:(NSString *)pass {
NSMutableString *secret = [NSMutableString new];
for (int i=0; i<[pass length]; i++) {
[secret appendString:#"*"];
}
return secret;
}
In your case you should stop replacing the last 4 characters. Bit crude, but gets the job done