Decoding word-encoded Content-Disposition header file name in Objective-C - objective-c

I am trying to retrieve a file name that can't be represented in ASCII from the content-disposition header.
This file name is word-encoded. Below is the encoded file name:
=?UTF-8?Q?=C3=ABst=C3=A9_=C3=A9_=C3=BAm_n=C3=B4m=C3=A9?= =?UTF-8?Q?_a=C3=A7ent=C3=BAad=C3=B5.xlsx?=
How do I get the decoded file name (that actually is "ësté é úm nômé açentúadõ.xlsx")?
PS: I am looking for an Objective-C implementation.

You probably want to search for a MIME handling framework, but I searched online and came up with nothing, so....
I couldn't find an example online, so I'm just showing the algorithm here. It's not the best example since I'm making a big assumption. That being that the string is always UTF-8 Q-encoded.
Q-encoding is like URL-encoding (percent-encoding), which Foundation's NSString already has support for decoding. The only (practical) difference when decoding (there are bigger differences when encoding) is that % encodings are = encodings instead.
Then there's the lead-in and lead-out stuff. Each encoded block has the format =?charset-name?encoding-type? ... encoded string here ... ?=. You should really read the charset name is use that encoding, and you should really read the encoding-type, since it may be "Q" or "B" (Base64).
This example only works for Q-encoding (a subset of quoted-printable). You should be able to easily modify it to handle the different charsets and to handle Base64 encoding however.
#import <Foundation/Foundation.h>
int main(void) {
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
NSString *encodedString = #"=?UTF-8?Q?=C3=ABst=C3=A9_=C3=A9_=C3=BAm_n=C3=B4m=C3=A9?= =?UTF-8?Q?_a=C3=A7ent=C3=BAad=C3=B5.xlsx?=";
NSScanner *scanner = [NSScanner scannerWithString:encodedString];
NSString *buf = nil;
NSMutableString *decodedString = [[NSMutableString alloc] init];
while ([scanner scanString:#"=?UTF-8?Q?" intoString:NULL]
|| ([scanner scanUpToString:#"=?UTF-8?Q?" intoString:&buf] && [scanner scanString:#"=?UTF-8?Q?" intoString:NULL])) {
if (buf != nil) {
[decodedString appendString:buf];
}
buf = nil;
NSString *encodedRange;
if (![scanner scanUpToString:#"?=" intoString:&encodedRange]) {
break; // Invalid encoding
}
[scanner scanString:#"?=" intoString:NULL]; // Skip the terminating "?="
// Decode the encoded portion (naively using UTF-8 and assuming it really is Q encoded)
// I'm doing this really naively, but it should work
// Firstly I'm encoding % signs so I can cheat and turn this into a URL-encoded string, which NSString can decode
encodedRange = [encodedRange stringByReplacingOccurrencesOfString:#"%" withString:#"=25"];
// Turn this into a URL-encoded string
encodedRange = [encodedRange stringByReplacingOccurrencesOfString:#"=" withString:#"%"];
// Remove the underscores
encodedRange = [encodedRange stringByReplacingOccurrencesOfString:#"_" withString:#" "];
[decodedString appendString:[encodedRange stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding]];
}
NSLog(#"Decoded string = %#", decodedString);
[decodedString release];
[pool drain];
return 0;
}
This outputs:
chrisbook-pro:~ chris$ ./qp-decode
2010-12-01 18:54:42.903 qp-decode[9643:903] Decoded string = ësté é úm nômé açentúadõ.xlsx

Created an easier / successful method here using a trick involving NSString percent escapes..
https://stackoverflow.com/a/10888548/285694

I recently implemented a NSString category that decodes MIME Encoded-Word with either Q-encoding or B-encoding.
The code is available on GitHub and is briefly explained in this answer.

Related

Can stringEncodingForData:encodingOptions:convertedString:usedLossyConversion: return NSUTF16StringEncoding or NSUTF32StringEncoding?

I'd like to know if calling stringEncodingForData:encodingOptions:convertedString:usedLossyConversion: can return NSUTF16StringEncoding, NSUTF32StringEncoding or any of their variants?
The reason I'm asking is because of this documentation note on cStringUsingEncoding::
Special Considerations
UTF-16 and UTF-32 are not considered to be C string encodings, and should not be used with this method—the results of passing NSUTF16StringEncoding, NSUTF32StringEncoding, or any of their variants are undefined.
So I understand that creating a C string with UTF-16 or UTF-32 is unsupported, but I'm not sure if attempting String Encoding Detection with stringEncodingForData:encodingOptions:convertedString:usedLossyConversion: may return UTF-16 and UTF-32 or not.
An example scenario, (adapted from SSZipArchive.m), may be:
// name is a null-terminated C string built with `fread` from stdio.h:
char *name = (char *)malloc(size_name + 1);
size_t read = fread(name, 1, size_name + 1, file);
name[size_name] = '\0';
// dataName is the data object of name
NSData *dataName = [NSData dataWithBytes:(const void *)name length:sizeof(unsigned char) * size_name];
// stringName is the string object of dataName
NSString *stringName = nil;
NSStringEncoding encoding = [NSString stringEncodingForData:dataName encodingOptions:nil convertedString:&stringName usedLossyConversion:nil];
In the above code, can encoding be NSUTF16StringEncoding, NSUTF32StringEncoding or any of their variants?
Platforms: macOS 10.10+, iOS 8.0+, watchOS 2.0+, tvOS 9.0+.
Yes, if the string is encoded using one of those encodings. The notes about C strings are specific to C strings. An NSString is not a C string, and the method you're describing doesn't work on C strings; it works on arbitrary data that may be encoded in a wide variety of ways.
As an example:
#import <Foundation/Foundation.h>
int main(int argc, const char * argv[]) {
#autoreleasepool {
NSData *data = [#"test" dataUsingEncoding:NSUTF16StringEncoding];
NSStringEncoding encoding = [NSString stringEncodingForData:data
encodingOptions:nil
convertedString:nil
usedLossyConversion:nil];
NSLog(#"%ld == %ld", (unsigned long)encoding,
(unsigned long)NSUTF16StringEncoding);
}
return 0;
}
// Output: 10 == 10
This said, in your specific example, if name is really what it says it is, "a null-terminated C string," then it could never be UTF-16, because C strings cannot be encoded in UTF-16. C strings are \0 terminated, and \0 is a very common character in UTF-16. Without seeing more code, however, I would not gamble on whether that comment is accurate.
If your real question here is "given an arbitrary c-string-safe encoding, is it possible that stringEncodingForData: will return a not-c-string-safe encoding," then the answer is "yes, it could, and it's definitely not promised that it won't even if it doesn't today." If you need to prevent that, I recommend using NSStringEncodingDetectionSuggestedEncodingsKey and ...UseOnlySuggestedEncodingsKey to force it to be an encoding you can handle. (You could also use ...DisallowedEncodingsKey to prevent specific multi-byte encodings, but that wouldn't be as robust.)

How to read input in Objective-C?

I am trying to write some simple code that searches two dictionaries for a string and prints to the console if the string appears in both dictionaries. I want the user to be able to input the string via the console, and then pass the string as a variable into a message. I was wondering how I could go about getting a string from the console and using it as the argument in the following method call.
[x rangeOfString:"the string goes here" options:NSCaseInsensitiveSearch];
I am unsure as to how to get the string from the user. Do I use scanf(), or fgets(), into a char and then convert it into a NSSstring, or simply scan into an NSString itself. I am then wondering how to pass that string as an argument. Please help:
Here is the code I have so far. I know it is not succinct, but I just want to get the job done:
#import <Foundation/Foundation.h>
#include <stdio.h>
#include "stdlib.h"
int main(int argc, const char* argv[]){
#autoreleasepool {
char *name[100];
printf("Please enter the name you wish to search for");
scanf("%s", *name);
NSString *name2 = [NSString stringWithFormat:#"%s" , *name];
NSString *nameString = [NSString stringWithContentsOfFile:#"/usr/share/dict/propernames" encoding:NSUTF8StringEncoding error:NULL];
NSString *dictionary = [NSString stringWithContentsOfFile:#"/usr/share/dict/words" encoding:NSUTF8StringEncoding error:NULL];
NSArray *nameString2 = [nameString componentsSeparatedByString:#"\n"];
NSArray *dictionary2 = [dictionary componentsSeparatedByString:#"\n"];
int nsYES = 0;
int dictYES = 0;
for (NSString *n in nameString2) {
NSRange r = [n rangeOfString:name2 options:NSCaseInsensitiveSearch];
if (r.location != NSNotFound){
nsYES = 1;
}
}
for (NSString *x in dictionary2) {
NSRange l = [x rangeOfString:name2 options:NSCaseInsensitiveSearch];
if (l.location != NSNotFound){
dictYES = 1;
}
}
if (dictYES && nsYES){
NSLog(#"glen appears in both dictionaries");
}
}
}
Thanks.
Safely reading from standard input in an interactive manner in C is kind of involved. The standard functions require a fixed-size buffer, which means either some input will be too long (and corrupt your memory!) or you'll have to read in a loop. And unfortunately, Cocoa doesn't offer us a whole lot of help.
For reading standard input entirely (as in, if you're expecting an input file over standard input), there is NSFileHandle, which makes it pretty succinct. But for interactively reading and writing like you want to do here, you pretty much have to go with the linked answer for reading.
Once you have read some input into a C string, you can easily turn it into an NSString with, for example, +[NSString stringWithUTF8String:].

Parsing a binary MOBI file: best approach?

It contains METADATA in between binary data. I'm able to parse the first line with the title Agent_of_Chang2e, but I need to get the metadata on the bottom of the header as well. I know there are not standard specifics for it.
This code isn't able to decode the bottom lines. For example I get the following wrong formatted text:
FÃHANGE</b1èrX)¯­ÌiadenÕniverse<sup><smalÀ|®¿8</¡Îovelÿ·?=SharonÌeeándÓteveÍiller8PblockquoteßßÚ>TIa÷orkyfiction.Áll#eãacÐ0hðortrayedén{n)áreïrzus0¢°usly.Ôhatíean0authhmxétlõp.7N_\
©ß© 1988âyÓOOKãsòeserved.0ðart)publicaZmayâehproduc
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
char buffer[1024];
FILE* file = fopen([path UTF8String], "r");
if (file != 0)
{
while(fgets(buffer, 1024, file) != NULL)
{
NSString* string = [[NSString alloc] initWithCString: buffer encoding:NSASCIIStringEncoding];
NSLog(#"%#",string);
[string release];
}
fclose(file);
}
[pool drain];
nielsbot already posted a link to the format specification.
As you can read there, the file is not text file, but binary encoded. Parsing it with NSString instances is no good idea.
You have to read the file binary, i. e. using NSData:
NSData content = [NSData dataWithContentsOfFile:path];
Then you have to take out the relevant information by yourself. For example, if you want to read the uncompressed text length, you will find in the linked document that this information starts at position 4 and has a length of 4.
int32_t uncompressedTextLength; // 4 bytes are 32 bit.
[content getBytes:&uncompressedLenght range:NSMakeRange(4, 4)];
Maybe you have to deal with endianess.
Use NSTask or system() to pass the file through the strings utility and parse the output of that:
strings /bin/bash | more
...
...
677778899999999999999999999999999999:::;;<<====>>>>>>>>>>>????????
#(#)PROGRAM:bash PROJECT:bash-92
...
...
First, I am pretty sure the texts will be UTF-8 or UTF-16 encoded.
Second, you cannot just take random 1024 bytes and expect them to work as a text. What about byte order (big endian vs little endian)?

Using NSScanner to parse a string

I have a string with formatting tags in it, such as There are {adults} adults, and {children} children. I have a dictionary which has "adults" and "children" as keys, and I need to look up the value and replace the macros with that value. This is fully dynamic; the keys could be anything (so I can't hardcode a stringByReplacingString).
In the past, I've done similar things before just by looping through a mutable string, and searching for the characters; removing what I've already searched for from the source string as I go. It seems like this is exactly the type of thing NSScanner is designed for, so I tried this:
NSScanner *scanner = [NSScanner scannerWithString:format];
NSString *foundString;
scanner.charactersToBeSkipped = nil;
NSMutableString *formatedResponse = [NSMutableString string];
while ([scanner scanUpToString:#"{" intoString:&foundString]) {
[formatedResponse appendString:[foundString stringByReplacingOccurrencesOfString:#"{" withString:#""]]; //Formatted string contains everything up to the {
[scanner scanUpToString:#"}" intoString:&foundString];
NSString *key = [foundString stringByReplacingOccurrencesOfString:#"}" withString:#""];
[formatedResponse appendString:[data objectForKey:key]];
}
NSRange range = [format rangeOfString:#"}" options:NSBackwardsSearch];
if (range.location != NSNotFound) {
[formatedResponse appendString:[format substringFromIndex:range.location + 1]];
}
The problem with this is that when my string starts with "{", then the scanner returns NO, instead of YES. (Which is what the documentation says should happen). So am I misusing NSScanner? The fact that scanUpToString doesn't include the string that was being searched for as part of its output seems to make it almost useless...
Can this be easily changed to do what I want, or do I need to re-write using a mutable string and searching for the characters manually?
Use isAtEnd to determine when to stop. Also, the { and } are not included in the result of scanUpToString:, so they will be at the beginning of the next string, but the append after the loop is not necessary since the scanner will return scanned content even if the search string is not found.
// Prevent scanner from ignoring whitespace between formats. For example, without this, "{a} {b}" and "{a}{b}" and "{a}
//{b}" are all equivalent
[scanner setCharactersToBeSkipped:[NSCharacterSet characterSetWithCharactersInString:#""]];
while(![scanner isAtEnd]) {
if([scanner scanUpToString:#"{" intoString:&foundString]) {
[formattedResponse appendString:foundString];
}
if(![scanner isAtEnd]) {
[scanner scanString:#"{" intoString:nil];
foundString = #""; // scanUpToString doesn't modify foundString if no characters are scanned
[scanner scanUpToString:#"}" intoString:&foundString];
[formattedResponse appendString:[data objectForKey:foundString];
[scanner scanString:#"}"];
}
}

NSString - Convert to pure alphabet only (i.e. remove accents+punctuation)

I'm trying to compare names without any punctuation, spaces, accents etc.
At the moment I am doing the following:
-(NSString*) prepareString:(NSString*)a {
//remove any accents and punctuation;
a=[[[NSString alloc] initWithData:[a dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];
a=[a stringByReplacingOccurrencesOfString:#" " withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"'" withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"`" withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"-" withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"_" withString:#""];
a=[a lowercaseString];
return a;
}
However, I need to do this for hundreds of strings and I need to make this more efficient. Any ideas?
NSString* finish = [[start componentsSeparatedByCharactersInSet:[[NSCharacterSet letterCharacterSet] invertedSet]] componentsJoinedByString:#""];
Before using any of these solutions, don't forget to use decomposedStringWithCanonicalMapping to decompose any accented letters. This will turn, for example, é (U+00E9) into e ‌́ (U+0065 U+0301). Then, when you strip out the non-alphanumeric characters, the unaccented letters will remain.
The reason why this is important is that you probably don't want, say, “dän” and “dün”* to be treated as the same. If you stripped out all accented letters, as some of these solutions may do, you'll end up with “dn”, so those strings will compare as equal.
So, you should decompose them first, so that you can strip the accents and leave the letters.
*Example from German. Thanks to Joris Weimar for providing it.
On a similar question, Ole Begemann suggests using stringByFoldingWithOptions: and I believe this is the best solution here:
NSString *accentedString = #"ÁlgeBra";
NSString *unaccentedString = [accentedString stringByFoldingWithOptions:NSDiacriticInsensitiveSearch locale:[NSLocale currentLocale]];
Depending on the nature of the strings you want to convert, you might want to set a fixed locale (e.g. English) instead of using the user's current locale. That way, you can be sure to get the same results on every machine.
One important precision over the answer of BillyTheKid18756 (that was corrected by Luiz but it was not obvious in the explanation of the code):
DO NOT USE stringWithCString as a second step to remove accents, it can add unwanted characters at the end of your string as the NSData is not NULL-terminated (as stringWithCString expects it).
Or use it and add an additional NULL byte to your NSData, like Luiz did in his code.
I think a simpler answer is to replace:
NSString *sanitizedText = [NSString stringWithCString:[sanitizedData bytes] encoding:NSASCIIStringEncoding];
By:
NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
If I take back the code of BillyTheKid18756, here is the complete correct code:
// The input text
NSString *text = #"BûvérÈ!#$&%^&(*^(_()-*/48";
// Defining what characters to accept
NSMutableCharacterSet *acceptedCharacters = [[NSMutableCharacterSet alloc] init];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet letterCharacterSet]];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet decimalDigitCharacterSet]];
[acceptedCharacters addCharactersInString:#" _-.!"];
// Turn accented letters into normal letters (optional)
NSData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
// Corrected back-conversion from NSData to NSString
NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
// Removing unaccepted characters
NSString* output = [[sanitizedText componentsSeparatedByCharactersInSet:[acceptedCharacters invertedSet]] componentsJoinedByString:#""];
If you are trying to compare strings, use one of these methods. Don't try to change data.
- (NSComparisonResult)localizedCompare:(NSString *)aString
- (NSComparisonResult)localizedCaseInsensitiveCompare:(NSString *)aString
- (NSComparisonResult)compare:(NSString *)aString options:(NSStringCompareOptions)mask range:(NSRange)range locale:(id)locale
You NEED to consider user locale to do things write with strings, particularly things like names.
In most languages, characters like ä and å are not the same other than they look similar. They are inherently distinct characters with meaning distinct from others, but the actual rules and semantics are distinct to each locale.
The correct way to compare and sort strings is by considering the user's locale. Anything else is naive, wrong and very 1990's. Stop doing it.
If you are trying to pass data to a system that cannot support non-ASCII, well, this is just a wrong thing to do. Pass it as data blobs.
https://developer.apple.com/library/ios/documentation/cocoa/Conceptual/Strings/Articles/SearchingStrings.html
Plus normalizing your strings first (see Peter Hosey's post) precomposing or decomposing, basically pick a normalized form.
- (NSString *)decomposedStringWithCanonicalMapping
- (NSString *)decomposedStringWithCompatibilityMapping
- (NSString *)precomposedStringWithCanonicalMapping
- (NSString *)precomposedStringWithCompatibilityMapping
No, it's not nearly as simple and easy as we tend to think.
Yes, it requires informed and careful decision making. (and a bit of non-English language experience helps)
Consider using the RegexKit framework. You could do something like:
NSString *searchString = #"This is neat.";
NSString *regexString = #"[\W]";
NSString *replaceWithString = #"";
NSString *replacedString = [searchString stringByReplacingOccurrencesOfRegex:regexString withString:replaceWithString];
NSLog (#"%#", replacedString);
//... Thisisneat
Consider using NSScanner, and specifically the methods -setCharactersToBeSkipped: (which accepts an NSCharacterSet) and -scanString:intoString: (which accepts a string and returns the scanned string by reference).
You may also want to couple this with -[NSString localizedCompare:], or perhaps -[NSString compare:options:] with the NSDiacriticInsensitiveSearch option. That could simplify having to remove/replace accents, so you can focus on removing puncuation, whitespace, etc.
If you must use an approach like you presented in your question, at least use an NSMutableString and replaceOccurrencesOfString:withString:options:range: — that will be much more efficient than creating tons of nearly-identical autoreleased strings. It could be that just reducing the number of allocations will boost performance "enough" for the time being.
To give a complete example by combining the answers from Luiz and Peter, adding a few lines, you get the code below.
The code does the following:
Creates a set of accepted characters
Turn accented letters into normal letters
Remove characters not in the set
Objective-C
// The input text
NSString *text = #"BûvérÈ!#$&%^&(*^(_()-*/48";
// Create set of accepted characters
NSMutableCharacterSet *acceptedCharacters = [[NSMutableCharacterSet alloc] init];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet letterCharacterSet]];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet decimalDigitCharacterSet]];
[acceptedCharacters addCharactersInString:#" _-.!"];
// Turn accented letters into normal letters (optional)
NSData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *sanitizedText = [NSString stringWithCString:[sanitizedData bytes] encoding:NSASCIIStringEncoding];
// Remove characters not in the set
NSString* output = [[sanitizedText componentsSeparatedByCharactersInSet:[acceptedCharacters invertedSet]] componentsJoinedByString:#""];
Swift (2.2) example
let text = "BûvérÈ!#$&%^&(*^(_()-*/48"
// Create set of accepted characters
let acceptedCharacters = NSMutableCharacterSet()
acceptedCharacters.formUnionWithCharacterSet(NSCharacterSet.letterCharacterSet())
acceptedCharacters.formUnionWithCharacterSet(NSCharacterSet.decimalDigitCharacterSet())
acceptedCharacters.addCharactersInString(" _-.!")
// Turn accented letters into normal letters (optional)
let sanitizedData = text.dataUsingEncoding(NSASCIIStringEncoding, allowLossyConversion: true)
let sanitizedText = String(data: sanitizedData!, encoding: NSASCIIStringEncoding)
// Remove characters not in the set
let components = sanitizedText!.componentsSeparatedByCharactersInSet(acceptedCharacters.invertedSet)
let output = components.joinWithSeparator("")
Output
The output for both examples would be: BuverE!_-48
Just bumped into this, maybe its too late, but here is what worked for me:
// text is the input string, and this just removes accents from the letters
// lossy encoding turns accented letters into normal letters
NSMutableData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding
allowLossyConversion:YES];
// increase length by 1 adds a 0 byte (increaseLengthBy
// guarantees to fill the new space with 0s), effectively turning
// sanitizedData into a c-string
[sanitizedData increaseLengthBy:1];
// now we just create a string with the c-string in sanitizedData
NSString *final = [NSString stringWithCString:[sanitizedData bytes]];
#interface NSString (Filtering)
- (NSString*)stringByFilteringCharacters:(NSCharacterSet*)charSet;
#end
#implementation NSString (Filtering)
- (NSString*)stringByFilteringCharacters:(NSCharacterSet*)charSet {
NSMutableString * mutString = [NSMutableString stringWithCapacity:[self length]];
for (int i = 0; i < [self length]; i++){
char c = [self characterAtIndex:i];
if(![charSet characterIsMember:c]) [mutString appendFormat:#"%c", c];
}
return [NSString stringWithString:mutString];
}
#end
These answers didn't work as expected for me. Specifically, decomposedStringWithCanonicalMapping didn't strip accents/umlauts as I'd expected.
Here's a variation on what I used that answers the brief:
// replace accents, umlauts etc with equivalent letter i.e 'é' becomes 'e'.
// Always use en_GB (or a locale without the characters you wish to strip) as locale, no matter which language we're taking as input
NSString *processedString = [string stringByFoldingWithOptions: NSDiacriticInsensitiveSearch locale: [NSLocale localeWithLocaleIdentifier: #"en_GB"]];
// remove non-letters
processedString = [[processedString componentsSeparatedByCharactersInSet:[[NSCharacterSet letterCharacterSet] invertedSet]] componentsJoinedByString:#""];
// trim whitespace
processedString = [processedString stringByTrimmingCharactersInSet: [NSCharacterSet whitespaceCharacterSet]];
return processedString;
Peter's Solution in Swift:
let newString = oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet).joinWithSeparator("")
Example:
let oldString = "Jo_ - h !. nn y"
// "Jo_ - h !. nn y"
oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet)
// ["Jo", "h", "nn", "y"]
oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet).joinWithSeparator("")
// "Johnny"
I wanted to filter out everything except letters and numbers, so I adapted Lorean's implementation of a Category on NSString to work a little different. In this example, you specify a string with only the characters you want to keep, and everything else is filtered out:
#interface NSString (PraxCategories)
+ (NSString *)lettersAndNumbers;
- (NSString*)stringByKeepingOnlyLettersAndNumbers;
- (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string;
#end
#implementation NSString (PraxCategories)
+ (NSString *)lettersAndNumbers { return #"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; }
- (NSString*)stringByKeepingOnlyLettersAndNumbers {
return [self stringByKeepingOnlyCharactersInString:[NSString lettersAndNumbers]];
}
- (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string {
NSCharacterSet *characterSet = [NSCharacterSet characterSetWithCharactersInString:string];
NSMutableString * mutableString = #"".mutableCopy;
for (int i = 0; i < [self length]; i++){
char character = [self characterAtIndex:i];
if([characterSet characterIsMember:character]) [mutableString appendFormat:#"%c", character];
}
return mutableString.copy;
}
#end
Once you've made your Categories, using them is trivial, and you can use them on any NSString:
NSString *string = someStringValueThatYouWantToFilter;
string = [string stringByKeepingOnlyLettersAndNumbers];
Or, for example, if you wanted to get rid of everything except vowels:
string = [string stringByKeepingOnlyCharactersInString:#"aeiouAEIOU"];
If you're still learning Objective-C and aren't using Categories, I encourage you to try them out. They're the best place to put things like this because it gives more functionality to all objects of the class you Categorize.
Categories simplify and encapsulate the code you're adding, making it easy to reuse on all of your projects. It's a great feature of Objective-C!