capitalizedString doesn't capitalize correctly words starting with numbers? - objective-c

I'm using the NSString method [myString capitalizedString], to capitalize all words of my string.
However capitalization doesn't work very well for words starting with numbers.
i.e. 2nd chance
becomes
2Nd Chance
Even if n is not the first letter of the word.
thanks

You have to roll your own solution to this problem. The Apple docs state that you may not get the specified behavior using that function for multi-word strings and for strings with special characters. Here's a pretty crude solution
NSString *text = #"2nd place is nothing";
// break the string into words by separating on spaces.
NSArray *words = [text componentsSeparatedByString:#" "];
// create a new array to hold the capitalized versions.
NSMutableArray *newWords = [[NSMutableArray alloc]init];
// we want to ignore words starting with numbers.
// This class helps us to determine if a string is a number.
NSNumberFormatter *num = [[NSNumberFormatter alloc]init];
for (NSString *item in words) {
NSString *word = item;
// if the first letter of the word is not a number (numberFromString returns nil)
if ([num numberFromString:[item substringWithRange:NSMakeRange(0, 1)]] == nil) {
word = [item capitalizedString]; // capitalize that word.
}
// if it is a number, don't change the word (this is implied).
[newWords addObject:word]; // add the word to the new list.
}
NSLog(#"%#", [newWords description]);

Unfortunately this seems to be the general behaviour of capitalizedString.
Perhaps a not so nice workaround / hack would be to replace each number with a string before the transformation, and then change it back afterwards.
So, "2nd chance" -> "xyznd chance" -> "Xyznd Chance" -> "2nd Chance"

Related

Replacing bad words in a string in Objective-C

I have a game with a public highscore list where I allow layers to enter their name (or anything unto 12 characters). I am trying to create a couple of functions to filter out bad words from a list of bad words
I have in a text file. I have two methods:
One to read in the text file:
-(void) getTheBadWordsAndSaveForLater {
badWordsFilePath = [[NSBundle mainBundle] pathForResource:#"badwords" ofType:#"txt"];
badWordFile = [[NSString alloc] initWithContentsOfFile:badWordsFilePath encoding:NSUTF8StringEncoding error:nil];
badwords =[[NSArray alloc] initWithContentsOfFile:badWordFile];
badwords = [badWordFile componentsSeparatedByString:#"\n"];
NSLog(#"Number Of Words Found in file: %i",[badwords count]);
for (NSString* words in badwords) {
NSLog(#"Word in Array----- %#",words);
}
}
And one to check a word (NSString*) agains the list that I read in:
-(NSString *) removeBadWords :(NSString *) string {
// If I hard code this line below, it works....
// *****************************************************************************
//badwords =[[NSMutableArray alloc] initWithObjects:#"shet",#"shat",#"shut",nil];
// *****************************************************************************
NSLog(#"checking: %#",string);
for (NSString* words in badwords) {
string = [string stringByReplacingOccurrencesOfString:words withString:#"-" options:NSCaseInsensitiveSearch range:NSMakeRange(0, string.length)];
NSLog(#"Word in Array: %#",words);
}
NSLog(#"Cleaned Word Returned: %#",string);
return string;
}
The issue I'm having is that when I hardcode the words into an array (see commented out above) then it works like a charm. But when I use the array I read in with the first method, it does't work - the stringByReplacingOccurrencesOfString:words does not seem to have an effect. I have traced out to the log so I can see if the words are coming thru and they are... That one line just doesn't seem to see the words unless I hardcore into the array.
Any suggestions?
A couple of thoughts:
You have two lines:
badwords =[[NSArray alloc] initWithContentsOfFile:badWordFile];
badwords = [badWordFile componentsSeparatedByString:#"\n"];
There's no point in doing that initWithContentsOfFile if you're just going to replace it with the componentsSeparatedByString on the next line. Plus, initWithContentsOfFile assumes the file is a property list (plist), but the rest of your code clearly assumes it's a newline separated text file. Personally, I would have used the plist format (it obviates the need to trim the whitespace from the individual words), but you can use whichever you prefer. But use one or the other, but not both.
If you're staying with the newline separated list of bad words, then just get rid of that line that says initWithContentsOfFile, you disregard the results of that, anyway. Thus:
- (void)getTheBadWordsAndSaveForLater {
// these should be local variables, so get rid of your instance variables of the same name
NSString *badWordsFilePath = [[NSBundle mainBundle] pathForResource:#"badwords" ofType:#"txt"];
NSString *badWordFile = [[NSString alloc] initWithContentsOfFile:badWordsFilePath encoding:NSUTF8StringEncoding error:nil];
// calculate `badwords` solely from `componentsSeparatedByString`, not `initWithContentsOfFile`
badwords = [badWordFile componentsSeparatedByString:#"\n"];
// confirm what we got
NSLog(#"Found %i words: %#", [badwords count], badwords);
}
You might want to look for whole word occurrences only, rather than just the presence of the bad word anywhere:
- (NSString *) removeBadWords:(NSString *) string {
NSLog(#"checking: %# for occurrences of these bad words: %#", string, badwords);
for (NSString* badword in badwords) {
NSString *searchString = [NSString stringWithFormat:#"\\b%#\\b", badword];
string = [string stringByReplacingOccurrencesOfString:searchString
withString:#"-"
options:NSCaseInsensitiveSearch | NSRegularExpressionSearch
range:NSMakeRange(0, string.length)];
}
NSLog(#"resulted in: %#", string);
return string;
}
This uses a "regular expression" search, where \b stands for "a boundary between words". Thus, \bhell\b (or, because backslashes have to be quoted in a NSString literal, that's #"\\bhell\\b") will search for the word "hell" that is a separate word, but won't match "hello", for example.
Note, above, I am also logging badwords to see if that variable was reset somehow. That's the only thing that would make sense given the symptoms you describe, namely that the loading of the bad words from the text file works but replace process fails. So examine badwords before you replace and make sure it's still set properly.

Sort ignoring punctuation (Objective-C)

I am trying to sort an iOS UITableView object. I am currently using the following code:
// Sort terms alphabetically, ignoring case
[self.termsList sortUsingSelector:#selector(localizedCaseInsensitiveCompare:)];
This sorts my list, whist ignoring case. However, it would be nice to ignore punctuation as well. For example:
c.a.t.
car
cat
should be sorted as follows:
car
c.a.t.
cat
(It doesn't actually matter which of the two cats (cat or c.a.t.) comes first, so long as they're sorted next to one another).
Is there a simple method to get around this? I presume the solution would involve extracting JUST the alphanumeric characters from the strings, then comparing those, then returning them back to their former states with the non-alphanumeric characters included again.
In point of fact, the only characters I truly care about are periods (.) but if there is a solution that covers all punctuation easily then it'd be useful to know.
Note: I asked this exact same question of Java a month ago. Now, I am creating the same solution in Objective-C. I wonder if there are any tricks available for the iOS API that make this easy...
Edit: I have tried using the following code to strip punctuation and populate another array which I sort (suggested by #tiguero). However, I don't know how to do the last step: to actually sort the first array according to the order of the second. Here is my code:
NSMutableArray *arrayWithoutPunctuation = [[NSMutableArray alloc] init];
for (NSString *item in arrayWithPunctuation)
{
// Replace hyphens/periods with spaces
item = [item stringByReplacingOccurrencesOfString:#"-" withString:#" "]; // ...hyphens
item = [item stringByReplacingOccurrencesOfString:#"." withString:#" "]; // ...periods
[arrayWithoutPunctuation addObject:item];
}
[arrayWithoutPunctuation sortUsingSelector:#selector(localizedCaseInsensitiveCompare:)];
This provides 'arrayWithoutPunctuation' which is sorted, but of course doesn't contain the punctuation. This is no good, since, although it is now sorted nicely, it no longer contains punctuation which is crucial to the array in the first place. What I need to do is sort 'arrayWithPunctuation' according to the order of 'arrayWithoutPunctuation'... Any help appreciated.
You can use a comparison block on an NSArray and your code will look like the following:
NSArray* yourStringList = [NSArray arrayWithObjects:#"c.a.t.", #"car", #"cat", nil];
NSArray* yourStringSorted = [yourStringList sortedArrayUsingComparator:^(id a, id b){
NSString* as = (NSString*)a;
NSString* bs = (NSString*)b;
NSCharacterSet *unwantedChars = [NSCharacterSet characterSetWithCharactersInString:#"\\.:',"];
//Remove unwanted chars
as = [[as componentsSeparatedByCharactersInSet: unwantedChars] componentsJoinedByString: #""];
bs = [[as componentsSeparatedByCharactersInSet: unwantedChars] componentsJoinedByString: #""];
// make the case insensitive comparison btw your two strings
return [as caseInsensitiveCompare: bs];
}];
This might not be the most efficient code actually one other option would be to iterate on your array first and remove all unwanted chars and use a selector with the caseInsensitiveCompare method:
NSString* yourStringSorted = [yourStringList sortedArrayUsingSelector:#selector(caseInsensitiveCompare:)];
This is a bit cleaner, and a bit more efficient:
NSArray* strings = #[#".....c",#"a.",#"a",#"b",#"b...",#"a..,"];
NSArray* sorted_strings = [strings sortedArrayUsingComparator:^NSComparisonResult(id obj1, id obj2) {
NSString* a = [obj1 stringByTrimmingCharactersInSet:[NSCharacterSet punctuationCharacterSet]];
NSString* b = [obj2 stringByTrimmingCharactersInSet:[NSCharacterSet punctuationCharacterSet]];
return [a caseInsensitiveCompare:b];
}];
For real efficiency, I'd write a compare method that ignores punctuation, so that no memory allocations would be needed just to compare.
My solution would be to group each string into a custom object with two properties
the original string
the string without punctuation
...and then sort the objects based on the string without punctuation.
Objective C has some handy ways to do that.
So let's say we have two strings in this object:
NSString *myString;
NSString *modified;
First, add your custom objects to an array
NSMutableArray *myStrings = [[NSMutableArray alloc] init];
[myStrings addObject: ...];
Then, sort the array by the modified variable using the handy NSSortDescriptor.
//You can specify the variable name to sort by
//Sorting is done according to the locale using localizedStandardCompare
NSSortDescriptor *mySortDescriptor = [NSSortDescriptor sortDescriptorWithKey:#"modified" ascending:YES selector:#selector(localizedStandardCompare:)];
[myStrings sortedArrayUsingDescriptors:#[ mySortDescriptor ]];
Voila! Your objects (and strings) are sorted. For more info on NSSortDescriptor...

Is there a way to get Spell Check data from an NSString?

I'm writing a simple shift cipher iPhone app as a pet project, and one piece of functionality I'm currently designing is a "universal" decryption of an NSString, that returns an NSArray, all of NSStrings:
- (NSArray*) decryptString: (NSString*)ciphertext{
NSMutableArray* theDecryptions = [NSMutableArray arrayWithCapacity:ALPHABET];
for (int i = 0; i < ALPHABET; ++i) {
NSString* theNewPlainText = [self decryptString:ciphertext ForShift:i];
[theDecryptions insertObject:theNewPlainText
atIndex:i];
}
return theDecryptions;
}
I'd really like to pass this NSArray into another method that attempts to spell check each individual string within the array, and builds a new array that puts the strings with the fewest typo'd words at lower indicies, so they're displayed first. I'd like to use the system's dictionary like a text field would, so I can match against words that have been trained into the phone by its user.
My current guess is to split a given string up into words, then spell check each with NSSpellChecker's -checkSpellingOfString:StartingAt: and using the number of correct words to sort the Array. Is there an existing library method or well-accepted pattern that would help return such a value for a given string?
Well, I found a solution that works using UIKit/UITextChecker. It correctly finds the user's most preferred language dictionary, but I'm not sure if it includes learned words in the actual rangeOfMisspelledWords... method. If it doesn't, calling [UITextChecker hasLearnedWord] on currentWord inside the bottom if statement should be enough to find user-taught words.
As noted in the comments, it may be prudent to call rangeOfMisspelledWords with each of the top few languages in [UITextChecker availableLanguages], to help multilingual users.
-(void) checkForDefinedWords {
NSArray* words = [message componentsSeparatedByString:#" "];
NSInteger wordsFound = 0;
UITextChecker* checker = [[UITextChecker alloc] init];
//get the first language in the checker's memory- this is the user's
//preferred language.
//TODO: May want to search with every language (or top few) in the array
NSString* preferredLang = [[UITextChecker availableLanguages] objectAtIndex:0];
//for each word in the array, determine whether it is a valid word
for(NSString* currentWord in words){
NSRange range;
range = [checker rangeOfMisspelledWordInString:currentWord
range:NSMakeRange(0, [currentWord length])
startingAt:0
wrap:NO
language:preferredLang];
//if it is valid (no errors found), increment wordsFound
if (range.location == NSNotFound) {
//NSLog(#"%# %#", #"Valid Word found:", currentWord);
wordsFound++;
}
else {
//NSLog(#"%# %#", #"Invalid Word found:", currentWord);
}
}
//After all "words" have been searched, save wordsFound to validWordCount
[self setValidWordCount:wordsFound];
[checker release];
}

Best way to split a string into tokens skipping escaped delimiters?

I'm receiving an NSString which uses commas as delimiters, and a backslash as an escape character. I was looking into splitting the string using componentsSeparatedByString, but I found no way to specify the escape character. Is there a built-in way to do this? NSScanner? CFStringTokenizer?
If not, would it be better to split the string at the commas, and then rejoin tokens that were falsely split (after inspecting them for a (non-escaped) escape character at the end) or looping through each character trying to find a comma, and then looking back one character to see if the comma is escaped or not (and then one more character to see if the escape character is escaped).
Now that I think about it, I would need to check that the amount of escape characters before a delimiter is even, because only then is the delimiter itself not being escaped.
If someone has a method that does this, I'd appreciate it if I could take a look at it.
I think the most straightforward method to do this would be to go through the string character by character as you suggest, appending into new string objects. You can follow two simple rules:
if you find a backslash, ignore but copy the next character (if exists) unconditionally
if you find a comma, end of that section
You could do this manually or use some of the functionality of NSScanner to help you (scanUpToCharactersFromSet:intoString:)
I would prefer to use a regular expression based parser to weed out the escape characters and then possibly doing a split operation (of some type) on the string.
Okay, (I hope) this is what wipolar suggested. It's the first implementation that works. I've just started with a non-GC-collected language, so please post a comment if you think this code can be improved, especially in the memory-management department.
- (NSArray *) splitUnescapedCharsFrom: (NSString *) str atChar: (char) delim withEscape: (char) esc
{
NSMutableArray * result = [[NSMutableArray alloc] init];
NSMutableString * currWord = [[NSMutableString alloc] init];
for (int i = 0; i < [str length]; i++)
{
if ([str characterAtIndex:i] == esc)
{
[currWord appendFormat:#"%c", [str characterAtIndex:++i]];
}
else if ([str characterAtIndex:i] == delim)
{
[result addObject:[NSString stringWithString:currWord]];
[currWord release];
currWord = [[NSMutableString alloc] init];
}
else
{
[currWord appendFormat:#"%c", [str characterAtIndex:i]];
}
}
[result addObject:[NSString stringWithString:currWord]];
[currWord release];
return [NSArray arrayWithArray:result];
}

NSString - Convert to pure alphabet only (i.e. remove accents+punctuation)

I'm trying to compare names without any punctuation, spaces, accents etc.
At the moment I am doing the following:
-(NSString*) prepareString:(NSString*)a {
//remove any accents and punctuation;
a=[[[NSString alloc] initWithData:[a dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];
a=[a stringByReplacingOccurrencesOfString:#" " withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"'" withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"`" withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"-" withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"_" withString:#""];
a=[a lowercaseString];
return a;
}
However, I need to do this for hundreds of strings and I need to make this more efficient. Any ideas?
NSString* finish = [[start componentsSeparatedByCharactersInSet:[[NSCharacterSet letterCharacterSet] invertedSet]] componentsJoinedByString:#""];
Before using any of these solutions, don't forget to use decomposedStringWithCanonicalMapping to decompose any accented letters. This will turn, for example, é (U+00E9) into e ‌́ (U+0065 U+0301). Then, when you strip out the non-alphanumeric characters, the unaccented letters will remain.
The reason why this is important is that you probably don't want, say, “dän” and “dün”* to be treated as the same. If you stripped out all accented letters, as some of these solutions may do, you'll end up with “dn”, so those strings will compare as equal.
So, you should decompose them first, so that you can strip the accents and leave the letters.
*Example from German. Thanks to Joris Weimar for providing it.
On a similar question, Ole Begemann suggests using stringByFoldingWithOptions: and I believe this is the best solution here:
NSString *accentedString = #"ÁlgeBra";
NSString *unaccentedString = [accentedString stringByFoldingWithOptions:NSDiacriticInsensitiveSearch locale:[NSLocale currentLocale]];
Depending on the nature of the strings you want to convert, you might want to set a fixed locale (e.g. English) instead of using the user's current locale. That way, you can be sure to get the same results on every machine.
One important precision over the answer of BillyTheKid18756 (that was corrected by Luiz but it was not obvious in the explanation of the code):
DO NOT USE stringWithCString as a second step to remove accents, it can add unwanted characters at the end of your string as the NSData is not NULL-terminated (as stringWithCString expects it).
Or use it and add an additional NULL byte to your NSData, like Luiz did in his code.
I think a simpler answer is to replace:
NSString *sanitizedText = [NSString stringWithCString:[sanitizedData bytes] encoding:NSASCIIStringEncoding];
By:
NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
If I take back the code of BillyTheKid18756, here is the complete correct code:
// The input text
NSString *text = #"BûvérÈ!#$&%^&(*^(_()-*/48";
// Defining what characters to accept
NSMutableCharacterSet *acceptedCharacters = [[NSMutableCharacterSet alloc] init];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet letterCharacterSet]];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet decimalDigitCharacterSet]];
[acceptedCharacters addCharactersInString:#" _-.!"];
// Turn accented letters into normal letters (optional)
NSData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
// Corrected back-conversion from NSData to NSString
NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
// Removing unaccepted characters
NSString* output = [[sanitizedText componentsSeparatedByCharactersInSet:[acceptedCharacters invertedSet]] componentsJoinedByString:#""];
If you are trying to compare strings, use one of these methods. Don't try to change data.
- (NSComparisonResult)localizedCompare:(NSString *)aString
- (NSComparisonResult)localizedCaseInsensitiveCompare:(NSString *)aString
- (NSComparisonResult)compare:(NSString *)aString options:(NSStringCompareOptions)mask range:(NSRange)range locale:(id)locale
You NEED to consider user locale to do things write with strings, particularly things like names.
In most languages, characters like ä and å are not the same other than they look similar. They are inherently distinct characters with meaning distinct from others, but the actual rules and semantics are distinct to each locale.
The correct way to compare and sort strings is by considering the user's locale. Anything else is naive, wrong and very 1990's. Stop doing it.
If you are trying to pass data to a system that cannot support non-ASCII, well, this is just a wrong thing to do. Pass it as data blobs.
https://developer.apple.com/library/ios/documentation/cocoa/Conceptual/Strings/Articles/SearchingStrings.html
Plus normalizing your strings first (see Peter Hosey's post) precomposing or decomposing, basically pick a normalized form.
- (NSString *)decomposedStringWithCanonicalMapping
- (NSString *)decomposedStringWithCompatibilityMapping
- (NSString *)precomposedStringWithCanonicalMapping
- (NSString *)precomposedStringWithCompatibilityMapping
No, it's not nearly as simple and easy as we tend to think.
Yes, it requires informed and careful decision making. (and a bit of non-English language experience helps)
Consider using the RegexKit framework. You could do something like:
NSString *searchString = #"This is neat.";
NSString *regexString = #"[\W]";
NSString *replaceWithString = #"";
NSString *replacedString = [searchString stringByReplacingOccurrencesOfRegex:regexString withString:replaceWithString];
NSLog (#"%#", replacedString);
//... Thisisneat
Consider using NSScanner, and specifically the methods -setCharactersToBeSkipped: (which accepts an NSCharacterSet) and -scanString:intoString: (which accepts a string and returns the scanned string by reference).
You may also want to couple this with -[NSString localizedCompare:], or perhaps -[NSString compare:options:] with the NSDiacriticInsensitiveSearch option. That could simplify having to remove/replace accents, so you can focus on removing puncuation, whitespace, etc.
If you must use an approach like you presented in your question, at least use an NSMutableString and replaceOccurrencesOfString:withString:options:range: — that will be much more efficient than creating tons of nearly-identical autoreleased strings. It could be that just reducing the number of allocations will boost performance "enough" for the time being.
To give a complete example by combining the answers from Luiz and Peter, adding a few lines, you get the code below.
The code does the following:
Creates a set of accepted characters
Turn accented letters into normal letters
Remove characters not in the set
Objective-C
// The input text
NSString *text = #"BûvérÈ!#$&%^&(*^(_()-*/48";
// Create set of accepted characters
NSMutableCharacterSet *acceptedCharacters = [[NSMutableCharacterSet alloc] init];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet letterCharacterSet]];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet decimalDigitCharacterSet]];
[acceptedCharacters addCharactersInString:#" _-.!"];
// Turn accented letters into normal letters (optional)
NSData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *sanitizedText = [NSString stringWithCString:[sanitizedData bytes] encoding:NSASCIIStringEncoding];
// Remove characters not in the set
NSString* output = [[sanitizedText componentsSeparatedByCharactersInSet:[acceptedCharacters invertedSet]] componentsJoinedByString:#""];
Swift (2.2) example
let text = "BûvérÈ!#$&%^&(*^(_()-*/48"
// Create set of accepted characters
let acceptedCharacters = NSMutableCharacterSet()
acceptedCharacters.formUnionWithCharacterSet(NSCharacterSet.letterCharacterSet())
acceptedCharacters.formUnionWithCharacterSet(NSCharacterSet.decimalDigitCharacterSet())
acceptedCharacters.addCharactersInString(" _-.!")
// Turn accented letters into normal letters (optional)
let sanitizedData = text.dataUsingEncoding(NSASCIIStringEncoding, allowLossyConversion: true)
let sanitizedText = String(data: sanitizedData!, encoding: NSASCIIStringEncoding)
// Remove characters not in the set
let components = sanitizedText!.componentsSeparatedByCharactersInSet(acceptedCharacters.invertedSet)
let output = components.joinWithSeparator("")
Output
The output for both examples would be: BuverE!_-48
Just bumped into this, maybe its too late, but here is what worked for me:
// text is the input string, and this just removes accents from the letters
// lossy encoding turns accented letters into normal letters
NSMutableData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding
allowLossyConversion:YES];
// increase length by 1 adds a 0 byte (increaseLengthBy
// guarantees to fill the new space with 0s), effectively turning
// sanitizedData into a c-string
[sanitizedData increaseLengthBy:1];
// now we just create a string with the c-string in sanitizedData
NSString *final = [NSString stringWithCString:[sanitizedData bytes]];
#interface NSString (Filtering)
- (NSString*)stringByFilteringCharacters:(NSCharacterSet*)charSet;
#end
#implementation NSString (Filtering)
- (NSString*)stringByFilteringCharacters:(NSCharacterSet*)charSet {
NSMutableString * mutString = [NSMutableString stringWithCapacity:[self length]];
for (int i = 0; i < [self length]; i++){
char c = [self characterAtIndex:i];
if(![charSet characterIsMember:c]) [mutString appendFormat:#"%c", c];
}
return [NSString stringWithString:mutString];
}
#end
These answers didn't work as expected for me. Specifically, decomposedStringWithCanonicalMapping didn't strip accents/umlauts as I'd expected.
Here's a variation on what I used that answers the brief:
// replace accents, umlauts etc with equivalent letter i.e 'é' becomes 'e'.
// Always use en_GB (or a locale without the characters you wish to strip) as locale, no matter which language we're taking as input
NSString *processedString = [string stringByFoldingWithOptions: NSDiacriticInsensitiveSearch locale: [NSLocale localeWithLocaleIdentifier: #"en_GB"]];
// remove non-letters
processedString = [[processedString componentsSeparatedByCharactersInSet:[[NSCharacterSet letterCharacterSet] invertedSet]] componentsJoinedByString:#""];
// trim whitespace
processedString = [processedString stringByTrimmingCharactersInSet: [NSCharacterSet whitespaceCharacterSet]];
return processedString;
Peter's Solution in Swift:
let newString = oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet).joinWithSeparator("")
Example:
let oldString = "Jo_ - h !. nn y"
// "Jo_ - h !. nn y"
oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet)
// ["Jo", "h", "nn", "y"]
oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet).joinWithSeparator("")
// "Johnny"
I wanted to filter out everything except letters and numbers, so I adapted Lorean's implementation of a Category on NSString to work a little different. In this example, you specify a string with only the characters you want to keep, and everything else is filtered out:
#interface NSString (PraxCategories)
+ (NSString *)lettersAndNumbers;
- (NSString*)stringByKeepingOnlyLettersAndNumbers;
- (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string;
#end
#implementation NSString (PraxCategories)
+ (NSString *)lettersAndNumbers { return #"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; }
- (NSString*)stringByKeepingOnlyLettersAndNumbers {
return [self stringByKeepingOnlyCharactersInString:[NSString lettersAndNumbers]];
}
- (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string {
NSCharacterSet *characterSet = [NSCharacterSet characterSetWithCharactersInString:string];
NSMutableString * mutableString = #"".mutableCopy;
for (int i = 0; i < [self length]; i++){
char character = [self characterAtIndex:i];
if([characterSet characterIsMember:character]) [mutableString appendFormat:#"%c", character];
}
return mutableString.copy;
}
#end
Once you've made your Categories, using them is trivial, and you can use them on any NSString:
NSString *string = someStringValueThatYouWantToFilter;
string = [string stringByKeepingOnlyLettersAndNumbers];
Or, for example, if you wanted to get rid of everything except vowels:
string = [string stringByKeepingOnlyCharactersInString:#"aeiouAEIOU"];
If you're still learning Objective-C and aren't using Categories, I encourage you to try them out. They're the best place to put things like this because it gives more functionality to all objects of the class you Categorize.
Categories simplify and encapsulate the code you're adding, making it easy to reuse on all of your projects. It's a great feature of Objective-C!