What's a simple implementation of the following NSString category method that returns the number of words in self, where words are separated by any number of consecutive spaces or newline characters? Also, the string will be less than 140 characters, so in this case, I prefer simplicity & readability at the sacrifice of a bit of performance.
#interface NSString (Additions)
- (NSUInteger)wordCount;
#end
I found the following solutions:
implementation of -[NSString wordCount]
implementation of -[NSString wordCount] - seems a bit simpler
But, isn't there a simpler way?
Why not just do the following?
- (NSUInteger)wordCount {
NSCharacterSet *separators = [NSCharacterSet whitespaceAndNewlineCharacterSet];
NSArray *words = [self componentsSeparatedByCharactersInSet:separators];
NSIndexSet *separatorIndexes = [words indexesOfObjectsPassingTest:^BOOL(id obj, NSUInteger idx, BOOL *stop) {
return [obj isEqualToString:#""];
}];
return [words count] - [separatorIndexes count];
}
I believe you have identified the 'simplest'. Nevertheless, to answer to your original question - "a simple implementation of the following NSString category...", and have it posted directly here for posterity:
#implementation NSString (GSBString)
- (NSUInteger)wordCount
{
__block int words = 0;
[self enumerateSubstringsInRange:NSMakeRange(0,self.length)
options:NSStringEnumerationByWords
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {words++;}];
return words;
}
#end
There are a number of simpler implementations, but they all have tradeoffs. For example, Cocoa (but not Cocoa Touch) has word-counting baked in:
- (NSUInteger)wordCount {
return [[NSSpellChecker sharedSpellChecker] countWordsInString:self language:nil];
}
It's also trivial to count words as accurately as the scanner simply using [[self componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] count]. But I've found the performance of that method degrades a lot for longer strings.
So it depends on the tradeoffs you want to make. I've found the absolute fastest is just to go straight-up ICU. If you want simplest, using existing code is probably simpler than writing any code at all.
- (NSUInteger) wordCount
{
NSArray *words = [self componentsSeparatedByString:#" "];
return [words count];
}
Looks like the second link I gave in my question still reigns as not only the fastest but also, in hindsight, a relatively simple implementation of -[NSString wordCount].
A Objective-C one-liner version
NSInteger wordCount = word ? ([word stringByTrimmingCharactersInSet:NSCharacterSet.whitespaceAndNewlineCharacterSet.invertedSet].length + 1) : 0;
Swift 3:
let words: [Any] = (string.components(separatedBy: " "))
let count = words.count
Related
I'm trying to repair some mis-numbered movie subtitle files (each sub is separated by a blank line). The following code scans up to the faulty subtitle index number in a test file. If I just 'printf' the faulty old indices and replacement new indices, everything appears just as expected.
//######################################################################
-(IBAction)scanToSubIndex:(id)sender
{
NSMutableString* tempString = [[NSMutableString alloc] initWithString:[theTextView string]];
int textLen = (int)[tempString length];
NSScanner *theScanner = [NSScanner scannerWithString:tempString];
while ([theScanner isAtEnd] == NO)
{
[theScanner scanUpToString:#"\r\n\r\n" intoString:NULL];
[theScanner scanString:#"\r\n\r\n" intoString:NULL];
if([theScanner scanLocation] >= textLen)
break;
else
{ // remove OLD subtitle index...
NSString *oldNumStr;
[theScanner scanUpToString:#"\r\n" intoString:&oldNumStr];
printf("old number:%s\n", [oldNumStr UTF8String]);
NSRange range = [tempString rangeOfString:oldNumStr];
[tempString deleteCharactersInRange:range];
// ...and insert SEQUENTIAL index
NSString *newNumStr = [self changeSubIndex];
printf("new number:%s\n\n", [newNumStr UTF8String]);
[tempString insertString:newNumStr atIndex:range.location];
}
}
printf("\ntempString\n\n:%s\n", [tempString UTF8String]);
}
//######################################################################
-(NSString*)changeSubIndex
{
static int newIndex = 1;
// convert int to string and return...
NSString *numString = [NSString stringWithFormat:#"%d", newIndex];
++newIndex;
return numString;
}
When I attempt to write the new indices to the mute string however, I end up with disordered results like this:
sub 1
sub 2
sub 3
sub 1
sub 5
sub 6
sub 7
sub 5
sub 9
sub 7
sub 8
An interesting observation (and possible clue?) is that when I reach subtitle number 1000, every number gets written to the mutable string in sequential order as required. I've been struggling with this for a couple of weeks now, and I can't find any other similar questions on SO. Any help much appreciated :-)
NSScanner & NSMutableString
NSMutableString is a subclass of NSString. In other words, you can pass NSMutableString at places where the NSString is expected. But it doesn't mean you're allowed to modify it.
scannerWithString: expects NSString. Translated to human language - I expect a string and I also do expect that the string is read-only (wont be modified).
In other words - your code is considered to be a programmer error - you give something to the NSScanner, NSScanner expects immutable string and you're modifying it.
We don't know what the NSScanner class is doing under the hood. There can be buffering or any other kind of optimization.
Even if you will be lucky with the mentioned scanLocation fix (in the comments), you shouldn't rely on it, because the under the hood implementation can change with any new release.
Don't do this. Not just here, but everywhere where you see immutable data type.
(There're situations where you can do it, but then you should really know what the under the hood implementation is doing, be certain that it wont be modified, etc. But generally speaking, it's not a good idea unless you know what you're doing.)
Sample
This sample code is based on the following assumptions:
we're talking about SubRip Text (SRT)
file is small (can easily fit memory)
rest of the SRT file is correct
especially the delimiter (#"\r\n")
#import Foundation;
NS_ASSUME_NONNULL_BEGIN
#interface SubRipText : NSObject
+ (NSString *)fixSubtitleIndexes:(NSString *)string;
#end
NS_ASSUME_NONNULL_END
#implementation SubRipText
+ (NSString *)fixSubtitleIndexes:(NSString *)string {
NSMutableString *result = [#"" mutableCopy];
__block BOOL nextLineIsIndex = YES;
__block NSUInteger index = 1;
[string enumerateLinesUsingBlock:^(NSString * _Nonnull line, BOOL * _Nonnull stop) {
if (nextLineIsIndex) {
[result appendFormat:#"%lu\r\n", (unsigned long)index];
index++;
nextLineIsIndex = NO;
return;
}
[result appendFormat:#"%#\r\n", line];
nextLineIsIndex = line.length == 0;
}];
return result;
}
#end
Usage:
NSString *test = #"29\r\n"
"00:00:00,498 --> 00:00:02,827\r\n"
"Hallo\r\n"
"\r\n"
"4023\r\n"
"00:00:02,827 --> 00:00:06,383\r\n"
"This is two lines,\r\n"
"subtitles rocks!\r\n"
"\r\n"
"1234\r\n"
"00:00:06,383 --> 00:00:09,427\r\n"
"Maybe not,\r\n"
"just learn English :)\r\n";
NSString *result = [SubRipText fixSubtitleIndexes:test];
NSLog(#"%#", result);
Output:
1
00:00:00,498 --> 00:00:02,827
Hallo
2
00:00:02,827 --> 00:00:06,383
This is two lines,
subtitles rocks!
3
00:00:06,383 --> 00:00:09,427
Maybe not,
just learn English :)
There're other ways how to achieve this, but you should think about readability, speed of writing, speed of running, ... Depends on your usage - how many of them are you going to fix, etc.
I have a NSArray, and I want to find the last occurrence of an element.
For example:
[apple, oranges, pears, apple, bananas];
int i = lastIndexOf("apple");
out: i == 3;
I'm struggling to find a simple solution looking an the APIS, but there aren't example so it's pretty hard to understand which function I should use.
NSUInteger index = [array indexOfObjectWithOptions:NSEnumerationReverse
passingTest:^(id obj, NSUInteger i, BOOL *stop) {
return [#"apples" isEqualToString:obj];
}];
If the array doesn't contain #"apples", index will be NSNotFound.
NSArray has indexOfObjectWithOptions:passingTest:, this will allow you to search in reverse.
For example:
NSArray *myArr = #[#"apple", #"oranges", #"pears", #"apple", #"bananas"];
NSString *target = #"apple";
NSUInteger index = [myArr indexOfObjectWithOptions:NSEnumerationReverse
passingTest:^BOOL(NSString *obj, NSUInteger idx, BOOL *stop) {
return [target isEqualToString:obj];
}];
You can find out more details of this method in the Documentation by Apple.
If anyone wants a reusable method with categories, I had written one for lastIndexOf.
Code can be found and freely used from here -
http://www.tejasshirodkar.com/blog/2013/06/nsarray-lastindexof-nsmutablearray-lastindexof/
I've been trying to figure out a way of checking how many of a certain object are in an NSArray.
I've looked through the docs and I'm pretty sure there is no premade method for this. Also I can't find anything here on SO.
Do anybody know about a good way to do this? Because I seriously can't come up with anything.
In this specific case I have an array with strings (most cases several of each) and I want to count how many strings in the array that matches to whatever I ask for.
If this is a primary use of the data structure and order doesn't matter, consider switching to an NSCountedSet which is specifically for solving this problem efficiently.
If you need an ordered collection, and you don't have a huge set of objects, than the fast enumeration answers are the best approach.
If you want to know where the objects are, then use indexesOfObjectsPassingTest:.
If you have a huge number of object, I would look at indexesOfObjectsWithOptions:passingTest: with the NSEnumerationConcurrent option. This will allow you to search the array on multiple cores. (This is only possibly faster on a multi-core device, and even then is probably only faster if you have a very large collection. You should absolutely test before assuming that concurrent will be faster.) Even if you just need the final count, it may be faster for certain data sets to use this method and then use count on the final index set.
There actually is a method for this: - (NSIndexSet *)indexesOfObjectsPassingTest:(BOOL (^)(id obj, NSUInteger idx, BOOL *stop))predicate
NSIndexSet *indexes = [array indexesOfObjectsPassingTest:^(id obj, NSUInteger index, BOOL *stop) {
return [obj isEqualTo:myOtherObject];
}];
Sounds like a case for NSCountedSet, which does what you are after with its initWithArray: initializer:
// Example array of strings
NSArray *array = [NSArray arrayWithObjects:
#"Joe", #"Jane", #"Peter", #"Paul",
#"Joe", #"Peter", #"Paul",
#"Joe",
#"Jane", #"Peter",
nil];
NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray: array];
// for-in will let you loop over the counted set
for (NSString *str in countedSet) {
NSLog(#"Count of %#: %ld", str, (long)[countedSet countForObject:str]);
}
One approach would be to iterate and check.
- (int)repeatsOf:(NSString *)repeater inArray:(NSArray *)array {
int count = 0;
for (NSString *item in array) {
if ([item isEqualToString:repeater]) {
count++;
}
}
return count;
}
You could try a simple loop. Suppose needle is your reference string and array is your NSArray of strings:
unsigned int n = 0;
for (NSString * str in array)
{
if ([needle isEqualToString:str])
{
++n;
}
}
Now n holds the count of strings in equal to needle.
You could define a function like this:
- (int)countStringsThatMatch:(NSString*)match inArray:(NSArray*)array
{
int matches = 0;
for (id string in array) {
if ([string isEqualToString:match]) {
matches++;
}
}
return matches;
}
And then use it like:
int count = [self countStringsThatMatch:#"someString" inArray:someArray];
- (NSUInteger) objectCountInArray:(NSArray *)array
matchingString:(NSString *)stringToMatch {
NSUInteger count = 0;
for (NSString *string in array) {
count += [string isEqualToString:stringToMatch] ? 1 : 0;
}
return count;
}
You can try to expand this to use a block that gets an object and returns a BOOL. Then you can use it to compare an array of whatever you want.
I want to use NSRegularExpression to get some operations done in right order. I have a NSString:
my_simple_string
And I want to call a method (my method, doesn't matter here) like in CSS style so in my NSDictionary i have NSStrings:
*
my*
my_simple*
my_simple_string
What i want to do is that my method calls for all this values above. I currently use
IsEqualToString
and compare every substring. But this is not a perfect solution and when i do my research in web i find a post that suggest use NSRegularExpression for it, but i have no idea how can it be helpful.
One more important thing - "_" is not a separator, I don't have a separator.
EDIT:
NSString *str = #"my_simple_string";
NSArray *arr = [NSArray arrayWithObjects:#"*",#"my*",#"my_simple*",#"my_simple_string*", nil];
for(int i = 0 ; i < [str length] ; i++) {
NSString *cut = [str substringToIndex:i];
cut = [cut stringByAppendingString:#"*"];
for(int j = 0; j < [arr count] ; j++)
if([cut isEqualToString:[arr objectAtIndex:j]])
NSLog(#"find it!");
}
NSLog(#"-----");
This looks like a typical regex application. You may find examples on how to use NSRegularExpression in the Apple documentation.
If it's the first time you deal with regexs you'd better check some tutorial on internet (there are plenty of them).
I would like to loop through an NSString and call a custom function on every word that has certain criterion (For example, "has 2 'L's"). I was wondering what the best way of approaching that was. Should I use Find/Replace patterns? Blocks?
-(NSString *)convert:(NSString *)wordToConvert{
/// This I have already written
Return finalWord;
}
-(NSString *) method:(NSString *) sentenceContainingWords{
// match every word that meets the criteria (for example the 2Ls) and replace it with what convert: does.
}
To enumerate the words in a string, you should use -[NSString enumerateSubstringsInRange:options:usingBlock:] with NSStringEnumerationByWords and NSStringEnumerationLocalized. All of the other methods listed use a means of identifying words which may not be locale-appropriate or correspond to the system definition. For example, two words separated by a comma but not whitespace (e.g. "foo,bar") would not be treated as separate words by any of the other answers, but they are in Cocoa text views.
[aString enumerateSubstringsInRange:NSMakeRange(0, [aString length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
if ([substring rangeOfString:#"ll" options:NSCaseInsensitiveSearch].location != NSNotFound)
/* do whatever */;
}];
As documented for -enumerateSubstringsInRange:options:usingBlock:, if you call it on a mutable string, you can safely mutate the string being enumerated within the enclosingRange. So, if you want to replace the matching words, you can with something like [aString replaceCharactersInRange:substringRange withString:replacementString].
The two ways I know of looping an array that will work for you are as follows:
NSArray *words = [sentence componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
for (NSString *word in words)
{
NSString *transformedWord = [obj method:word];
}
and
NSArray *words = [sentence componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
[words enumerateObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(id word, NSUInteger idx, BOOL *stop){
NSString *transformedWord = [obj method:word];
}];
The other method, –makeObjectsPerformSelector:withObject:, won't work for you. It expects to be able to call [word method:obj] which is backwards from what you expect.
If you could write your criteria with regular expressions, then you could probably do a regular expression matching to fetch these words and then pass them to your convert: method.
You could also do a split of string into an array of words using componentsSeparatedByString: or componentsSeparatedByCharactersInSet:, then go over the words in the array and detect if they fit your criteria somehow. If they fit, then pass them to convert:.
Hope this helps.
As of iOS 12/macOS 10.14 the recommended way to do this is with the Natural Language framework.
For example:
import NaturalLanguage
let myString = "..."
let tokeniser = NLTokenizer(unit: .word)
tokeniser.string = myString
tokeniser.enumerateTokens(in: myString.startIndex..<myString.endIndex) { wordRange, attributes in
performActionOnWord(myString[wordRange])
return true // or return false to stop enumeration
}
Using NLTokenizer also has the benefit of allowing you to optionally specify the language of the string beforehand:
tokeniser.setLanguage(.hebrew)
I would recommend using a while loop to go through the string like this.
NSRange spaceRange = [sentenceContainingWords rangeOfString:#" "];
NSRange previousRange = (NSRange){0,0};
do {
NSString *wordString;
wordString = [sentenceContainingWord substringWithRange:(NSRange){previousRange.location+1,(spaceRange.location-1)-(previousRange.location+1)}];
//use the +1's to not include the spaces in the strings
[self convert:wordString];
previousRange = spaceRange;
spaceRange = [sentenceContainingWords rangeOfString:#" "];
} while(spaceRange.location != NSNotFound);
This code would probably need to be rewritten because its pretty rough, but you should get the idea.
Edit: Just saw Jacob Gorban's post, you should definitely do it like that.