in objective-c is there any easy way to add backslash in front of special characters? - objective-c

Note: Not sure why this is marked as duplicate as I clearly stated that I don't want to use stringByReplacingOccurrencesOfString over and over again.
I have a question regarding the special character filename.
I have implemented a program, so that when you open a file or multiple files, the program will read all these filenames and local path and store them into the NSMutableArray. This part works perfectly without a problem.
My program also need to use NSTask to manipulate these files. However, the problem is, sometimes filename will contain special characters, for example, /Users/josh/Desktop/Screen Shot 2013-03-19 at 2.05.06 PM.png.
I have to replace space with backslash and space
NSString *urlPath = [[self url] path];
urlPath = [urlPath stringByReplacingOccurrencesOfString:#"(" withString:#"\\("];
urlPath = [urlPath stringByReplacingOccurrencesOfString:#")" withString:#"\\)"];
urlPath = [urlPath stringByReplacingOccurrencesOfString:#" " withString:#"\\ "];
to: /Users/josh/Desktop/Screen\ Shot\ 2013-03-19\ at\ 2.05.06\ PM.png
so that I can manipulate the file properly.
Same for the ( and ). I also need to add backslash before that.
but there are too many special characters. ie.
/Users/josh/Desktop/~!##$?:<,.>%^&*()_+`-={}[]\|'';.txt
I need to change to:
/Users/josh/Desktop/\~\!#\#\$\?\:\<\,.\>\%^\&\*\(\)_+\`-\=\{\}\[\]\\\|\'\'\;.txt
and not to mention other special characters (ie. accent)
Is there any easy way to put a backslash in front of each special character, as I don't want to keep calling stringByReplacingOccurrencesOfString over and over again.

As described in NSTask's documentation for the setArguments: method, there should be no need to do special quoting:
Discussion
The NSTask object converts both path and the strings in
arguments to appropriate C-style strings (using
fileSystemRepresentation) before passing them to the task via argv[].
The strings in arguments do not undergo shell expansion, so you do not
need to do special quoting, and shell variables, such as $PWD, are not
resolved.
If you feel it is necessary, can you please provide some examples of the commands you want to run in the NSTask?
[UPDATE]: I see in the comments that you indeed are using the NSTask to execute a bash shell with -c, which I had wondered about. I've generally used NSTask to execute the command directly rather than going through the shell, like this:
NSTask *task = [[NSTask alloc] init];
[task setLaunchPath:#"/bin/ls"];
[task setArguments:[NSArray arrayWithObjects:#"-l", self.url.path, nil]];
Can you give a more accurate example of the actual command you want to run? For example, are you piping a series of commands together? Perhaps there might be an alternate way to achieve the same results without the need for using the bash shell...

I think you may be able to use an NSRegularExpressionSearch search.
It would look something like this
+ (NSString *) addBackslashes: (NSString *) string
{
// First convert the name string to a pure ASCII string
NSData *asciiData = [string dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *asciiString = [[[NSString alloc] initWithData:asciiData encoding:NSASCIIStringEncoding] lowercaseString];
// Define the characters that we will replace
NSString *searchCharacters = #"PUT IN ALL OF YOUR SPECIAL CHARACTERS HERE";
// example NSString *searchCharacters = #"!##$%&*()";
// replace them
NSString *regExPattern = [NSString stringWithFormat:#"[%#]", searchCharacters];
string = [asciiString stringByReplacingOccurrencesOfString:regExPattern withString: [NSString stringWithFormat:#"\\%#", regExPattern] options:NSRegularExpressionSearch range:NSMakeRange(0, asciiString.length)];
return string;
}

you could maintain a set of strings that need to be escaped and use NSScanner to build the new string by iterating the the source string and each time a problematic character is found u first add \\ to a destination string and continue coping the next chars.
NSString *sourceString = #"/Users/josh/Desktop/\"Screen Shot\" 2013-03-19 at 2\\05\\06 PM.png";
NSMutableString *destString = [#"" mutableCopy];
NSCharacterSet *escapeCharsSet = [NSCharacterSet characterSetWithCharactersInString:#" ()\\"];
NSScanner *scanner = [NSScanner scannerWithString:sourceString];
while (![scanner isAtEnd]) {
NSString *tempString;
[scanner scanUpToCharactersFromSet:escapeCharsSet intoString:&tempString];
if([scanner isAtEnd]){
[destString appendString:tempString];
}
else {
[destString appendFormat:#"%#\\%#", tempString, [sourceString substringWithRange:NSMakeRange([scanner scanLocation], 1)]];
[scanner setScanLocation:[scanner scanLocation]+1];
}
}
NSLog(#"\n%#\n%#", sourceString, destString);
result:
/Users/josh/Desktop/Screen Shot 2013-03-19 at 2.05.06 PM.png
/Users/josh/Desktop/Screen\ Shot\ 2013-03-19\ at\ 2.05.06\ PM.png

Related

Replacing bad words in a string in Objective-C

I have a game with a public highscore list where I allow layers to enter their name (or anything unto 12 characters). I am trying to create a couple of functions to filter out bad words from a list of bad words
I have in a text file. I have two methods:
One to read in the text file:
-(void) getTheBadWordsAndSaveForLater {
badWordsFilePath = [[NSBundle mainBundle] pathForResource:#"badwords" ofType:#"txt"];
badWordFile = [[NSString alloc] initWithContentsOfFile:badWordsFilePath encoding:NSUTF8StringEncoding error:nil];
badwords =[[NSArray alloc] initWithContentsOfFile:badWordFile];
badwords = [badWordFile componentsSeparatedByString:#"\n"];
NSLog(#"Number Of Words Found in file: %i",[badwords count]);
for (NSString* words in badwords) {
NSLog(#"Word in Array----- %#",words);
}
}
And one to check a word (NSString*) agains the list that I read in:
-(NSString *) removeBadWords :(NSString *) string {
// If I hard code this line below, it works....
// *****************************************************************************
//badwords =[[NSMutableArray alloc] initWithObjects:#"shet",#"shat",#"shut",nil];
// *****************************************************************************
NSLog(#"checking: %#",string);
for (NSString* words in badwords) {
string = [string stringByReplacingOccurrencesOfString:words withString:#"-" options:NSCaseInsensitiveSearch range:NSMakeRange(0, string.length)];
NSLog(#"Word in Array: %#",words);
}
NSLog(#"Cleaned Word Returned: %#",string);
return string;
}
The issue I'm having is that when I hardcode the words into an array (see commented out above) then it works like a charm. But when I use the array I read in with the first method, it does't work - the stringByReplacingOccurrencesOfString:words does not seem to have an effect. I have traced out to the log so I can see if the words are coming thru and they are... That one line just doesn't seem to see the words unless I hardcore into the array.
Any suggestions?
A couple of thoughts:
You have two lines:
badwords =[[NSArray alloc] initWithContentsOfFile:badWordFile];
badwords = [badWordFile componentsSeparatedByString:#"\n"];
There's no point in doing that initWithContentsOfFile if you're just going to replace it with the componentsSeparatedByString on the next line. Plus, initWithContentsOfFile assumes the file is a property list (plist), but the rest of your code clearly assumes it's a newline separated text file. Personally, I would have used the plist format (it obviates the need to trim the whitespace from the individual words), but you can use whichever you prefer. But use one or the other, but not both.
If you're staying with the newline separated list of bad words, then just get rid of that line that says initWithContentsOfFile, you disregard the results of that, anyway. Thus:
- (void)getTheBadWordsAndSaveForLater {
// these should be local variables, so get rid of your instance variables of the same name
NSString *badWordsFilePath = [[NSBundle mainBundle] pathForResource:#"badwords" ofType:#"txt"];
NSString *badWordFile = [[NSString alloc] initWithContentsOfFile:badWordsFilePath encoding:NSUTF8StringEncoding error:nil];
// calculate `badwords` solely from `componentsSeparatedByString`, not `initWithContentsOfFile`
badwords = [badWordFile componentsSeparatedByString:#"\n"];
// confirm what we got
NSLog(#"Found %i words: %#", [badwords count], badwords);
}
You might want to look for whole word occurrences only, rather than just the presence of the bad word anywhere:
- (NSString *) removeBadWords:(NSString *) string {
NSLog(#"checking: %# for occurrences of these bad words: %#", string, badwords);
for (NSString* badword in badwords) {
NSString *searchString = [NSString stringWithFormat:#"\\b%#\\b", badword];
string = [string stringByReplacingOccurrencesOfString:searchString
withString:#"-"
options:NSCaseInsensitiveSearch | NSRegularExpressionSearch
range:NSMakeRange(0, string.length)];
}
NSLog(#"resulted in: %#", string);
return string;
}
This uses a "regular expression" search, where \b stands for "a boundary between words". Thus, \bhell\b (or, because backslashes have to be quoted in a NSString literal, that's #"\\bhell\\b") will search for the word "hell" that is a separate word, but won't match "hello", for example.
Note, above, I am also logging badwords to see if that variable was reset somehow. That's the only thing that would make sense given the symptoms you describe, namely that the loading of the bad words from the text file works but replace process fails. So examine badwords before you replace and make sure it's still set properly.

Split NSString into words, then rejoin it into original form

I am splitting an NSString like this: (filter string is an nsstring)
seperatorSet = [NSMutableCharacterSet whitespaceAndNewlineCharacterSet];
[seperatorSet formUnionWithCharacterSet:[NSCharacterSet punctuationCharacterSet]];
NSMutableArray *words = [[filterString componentsSeparatedByCharactersInSet:seperatorSet] mutableCopy];
I want to put words back into the form of filter string with the original punctuation and spacing. The reason I want to do this is I want to change some words and put it back together as it was originally.
A more robust way to split by words is to use string enumeration. A space is not always the delimiter and not all languages delimit spaces anyway (e.g. Japanese).
NSString * string = #" \n word1! word2,%$?'/word3.word4 ";
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationByWords
usingBlock:
^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(#"Substring: '%#'", substring);
}];
// Logs:
// Substring: 'word1'
// Substring: 'word2'
// Substring: 'word3'
// Substring: 'word4'
NSString *myString = #"Foo Bar Blah B..";
NSArray *myWords = [myString componentsSeparatedByCharactersInSet:
[NSCharacterSet characterSetWithCharactersInString:#" "]
];
NSString* string = [myWords componentsJoinedByString: #" "];
NSLog(#"%#",string);
Since you eliminate the original punctuation, there's no way to turn it back automatically.
The only way is not to use componentsSeparatedByCharactersInSet.
An alternative solution may be to iterate through the string and, for each char, check if it belongs to your character set.
If yes, add the char to a list and the substring to another list (you may use NSMutableArray class).
This way, for example, you know that the punctuation char between the first and the second substring is the first character in your list of separators.
You can use the pathArray componentsJoinedByString: method of the array class to rejoin the words:
NSString *orig = [words pathArray componentsJoinedByString:#" "];
How are you determining which words need to be replaced? Instead of breaking it apart in the first place, perhaps using -stringByReplacingOccurrencesOfString:withString:options:range: would be more suitable.
My guess is you may not be using the best API. If you're really worried about words, you should be using a word-based API. I'm a bit hazy on whether that would be NSDataDetector or something else. (I believe NSRegularExpression can deal with word boundaries in a smarter way.)
If you are using Mac OS X 10.7+ or iOS 4+ you can use NSRegularExpression, The pattern to replace a word is: "\b word \b" - (no spaces around word) \b matches a word boundary. Look at methods replaceMatchesInString:options:range:withTemplate: and stringByReplacingMatchesInString:options:range:withTemplate:.
Under 10.6 pr earlier if you wish to use regular expressions you can wrap the regcomp/regexec C-based functions, they support word boundaries as well. However you may prefer to use one of the other Cocoa options mentioned in other answers for this simple case.

Parsing a Java Properties file in Objective-C for iPhone

I'm looking for a way in the iPhone SDK to read in a Properties file (not the XML flavor) for example this one:
# a comment
! a comment
a = a string
b = a string with escape sequences \t \n \r \\ \" \' \ (space) \u0123
c = a string with a continuation line \
continuation line
d.e.f = another string
would result in four key/value pairs.
I can't change this format as it is sent to me by a web service. Can you please direct me?
Thanks,
Emmanuel
I would take a look at ParseKit http://parsekit.com/. Otherwise you could use RegexKitLite and create some regular expressions.
I've ended with this solution if anybody is interested :
#interface NSDictionary (PropertiesFile)
+ (NSDictionary *)dictionaryWithPropertiesFile:(NSString *)file;
#end
#implementation NSDictionary (PropertiesFile)
+ (NSDictionary *)dictionaryWithPropertiesFile:(NSString *)file {
NSError *error = nil;
NSString *propertyFileContent = [[NSString alloc] initWithContentsOfFile:file encoding:NSUTF8StringEncoding error:&error];
if (error) return nil;
NSArray *properties = [propertyFileContent componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]];
if (properties.count == 0) return nil;
NSMutableDictionary *result = [[NSMutableDictionary alloc] initWithCapacity:properties.count];
for (NSString *propertySting in properties) {
NSArray *property = [propertySting componentsSeparatedByString:#"="];
if (property.count != 2) continue;
[result setObject:property[1] forKey:property[0]];
}
return result.allKeys.count > 0 ? result : nil;
}
#end
It's a perfectly simple parsing problem. Read a line. Ignore if comment. Check for continuation, and read/append continuation lines as needed. Look for the "=". Make the left side of the "=" (after trimming white space) the key. Either parse the right side yourself or put it into an NSString and use stringWithFormat on it to "reduce" any escapes to pure character form. Return key and reduced right side.
(But refreshing my memory on the properties file format reminds me that:
The key contains all of the characters in the line starting with the
first non-white space character and up to, but not including, the
first unescaped '=', ':', or white space character other than a line
terminator. All of these key termination characters may be included in
the key by escaping them with a preceding backslash character;
So a little scanning of the line is required to separate the key from the rest. Nothing particularly difficult, though.)
Have you considered using lex/yacc or flex/bison to generate your own compiler code from a description of the grammar for properties files? I'm not sure if there are any existing grammars defined for a Java properties file, but it seems like it would be a pretty simple grammar to write.
Here's another SO post that mentions this approach for general purpose parsing
Take a look at this PropertyParser
NSString *text = #"sample key = sample value";
PropertyParser *propertyParser = [[PropertyParser alloc] init];
NSMutableDictionary *keyValueMap = [propertyParser parse:text];
You can now use NSRegularExpression class to do this.

NSString - Convert to pure alphabet only (i.e. remove accents+punctuation)

I'm trying to compare names without any punctuation, spaces, accents etc.
At the moment I am doing the following:
-(NSString*) prepareString:(NSString*)a {
//remove any accents and punctuation;
a=[[[NSString alloc] initWithData:[a dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];
a=[a stringByReplacingOccurrencesOfString:#" " withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"'" withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"`" withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"-" withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"_" withString:#""];
a=[a lowercaseString];
return a;
}
However, I need to do this for hundreds of strings and I need to make this more efficient. Any ideas?
NSString* finish = [[start componentsSeparatedByCharactersInSet:[[NSCharacterSet letterCharacterSet] invertedSet]] componentsJoinedByString:#""];
Before using any of these solutions, don't forget to use decomposedStringWithCanonicalMapping to decompose any accented letters. This will turn, for example, é (U+00E9) into e ‌́ (U+0065 U+0301). Then, when you strip out the non-alphanumeric characters, the unaccented letters will remain.
The reason why this is important is that you probably don't want, say, “dän” and “dün”* to be treated as the same. If you stripped out all accented letters, as some of these solutions may do, you'll end up with “dn”, so those strings will compare as equal.
So, you should decompose them first, so that you can strip the accents and leave the letters.
*Example from German. Thanks to Joris Weimar for providing it.
On a similar question, Ole Begemann suggests using stringByFoldingWithOptions: and I believe this is the best solution here:
NSString *accentedString = #"ÁlgeBra";
NSString *unaccentedString = [accentedString stringByFoldingWithOptions:NSDiacriticInsensitiveSearch locale:[NSLocale currentLocale]];
Depending on the nature of the strings you want to convert, you might want to set a fixed locale (e.g. English) instead of using the user's current locale. That way, you can be sure to get the same results on every machine.
One important precision over the answer of BillyTheKid18756 (that was corrected by Luiz but it was not obvious in the explanation of the code):
DO NOT USE stringWithCString as a second step to remove accents, it can add unwanted characters at the end of your string as the NSData is not NULL-terminated (as stringWithCString expects it).
Or use it and add an additional NULL byte to your NSData, like Luiz did in his code.
I think a simpler answer is to replace:
NSString *sanitizedText = [NSString stringWithCString:[sanitizedData bytes] encoding:NSASCIIStringEncoding];
By:
NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
If I take back the code of BillyTheKid18756, here is the complete correct code:
// The input text
NSString *text = #"BûvérÈ!#$&%^&(*^(_()-*/48";
// Defining what characters to accept
NSMutableCharacterSet *acceptedCharacters = [[NSMutableCharacterSet alloc] init];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet letterCharacterSet]];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet decimalDigitCharacterSet]];
[acceptedCharacters addCharactersInString:#" _-.!"];
// Turn accented letters into normal letters (optional)
NSData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
// Corrected back-conversion from NSData to NSString
NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
// Removing unaccepted characters
NSString* output = [[sanitizedText componentsSeparatedByCharactersInSet:[acceptedCharacters invertedSet]] componentsJoinedByString:#""];
If you are trying to compare strings, use one of these methods. Don't try to change data.
- (NSComparisonResult)localizedCompare:(NSString *)aString
- (NSComparisonResult)localizedCaseInsensitiveCompare:(NSString *)aString
- (NSComparisonResult)compare:(NSString *)aString options:(NSStringCompareOptions)mask range:(NSRange)range locale:(id)locale
You NEED to consider user locale to do things write with strings, particularly things like names.
In most languages, characters like ä and å are not the same other than they look similar. They are inherently distinct characters with meaning distinct from others, but the actual rules and semantics are distinct to each locale.
The correct way to compare and sort strings is by considering the user's locale. Anything else is naive, wrong and very 1990's. Stop doing it.
If you are trying to pass data to a system that cannot support non-ASCII, well, this is just a wrong thing to do. Pass it as data blobs.
https://developer.apple.com/library/ios/documentation/cocoa/Conceptual/Strings/Articles/SearchingStrings.html
Plus normalizing your strings first (see Peter Hosey's post) precomposing or decomposing, basically pick a normalized form.
- (NSString *)decomposedStringWithCanonicalMapping
- (NSString *)decomposedStringWithCompatibilityMapping
- (NSString *)precomposedStringWithCanonicalMapping
- (NSString *)precomposedStringWithCompatibilityMapping
No, it's not nearly as simple and easy as we tend to think.
Yes, it requires informed and careful decision making. (and a bit of non-English language experience helps)
Consider using the RegexKit framework. You could do something like:
NSString *searchString = #"This is neat.";
NSString *regexString = #"[\W]";
NSString *replaceWithString = #"";
NSString *replacedString = [searchString stringByReplacingOccurrencesOfRegex:regexString withString:replaceWithString];
NSLog (#"%#", replacedString);
//... Thisisneat
Consider using NSScanner, and specifically the methods -setCharactersToBeSkipped: (which accepts an NSCharacterSet) and -scanString:intoString: (which accepts a string and returns the scanned string by reference).
You may also want to couple this with -[NSString localizedCompare:], or perhaps -[NSString compare:options:] with the NSDiacriticInsensitiveSearch option. That could simplify having to remove/replace accents, so you can focus on removing puncuation, whitespace, etc.
If you must use an approach like you presented in your question, at least use an NSMutableString and replaceOccurrencesOfString:withString:options:range: — that will be much more efficient than creating tons of nearly-identical autoreleased strings. It could be that just reducing the number of allocations will boost performance "enough" for the time being.
To give a complete example by combining the answers from Luiz and Peter, adding a few lines, you get the code below.
The code does the following:
Creates a set of accepted characters
Turn accented letters into normal letters
Remove characters not in the set
Objective-C
// The input text
NSString *text = #"BûvérÈ!#$&%^&(*^(_()-*/48";
// Create set of accepted characters
NSMutableCharacterSet *acceptedCharacters = [[NSMutableCharacterSet alloc] init];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet letterCharacterSet]];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet decimalDigitCharacterSet]];
[acceptedCharacters addCharactersInString:#" _-.!"];
// Turn accented letters into normal letters (optional)
NSData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *sanitizedText = [NSString stringWithCString:[sanitizedData bytes] encoding:NSASCIIStringEncoding];
// Remove characters not in the set
NSString* output = [[sanitizedText componentsSeparatedByCharactersInSet:[acceptedCharacters invertedSet]] componentsJoinedByString:#""];
Swift (2.2) example
let text = "BûvérÈ!#$&%^&(*^(_()-*/48"
// Create set of accepted characters
let acceptedCharacters = NSMutableCharacterSet()
acceptedCharacters.formUnionWithCharacterSet(NSCharacterSet.letterCharacterSet())
acceptedCharacters.formUnionWithCharacterSet(NSCharacterSet.decimalDigitCharacterSet())
acceptedCharacters.addCharactersInString(" _-.!")
// Turn accented letters into normal letters (optional)
let sanitizedData = text.dataUsingEncoding(NSASCIIStringEncoding, allowLossyConversion: true)
let sanitizedText = String(data: sanitizedData!, encoding: NSASCIIStringEncoding)
// Remove characters not in the set
let components = sanitizedText!.componentsSeparatedByCharactersInSet(acceptedCharacters.invertedSet)
let output = components.joinWithSeparator("")
Output
The output for both examples would be: BuverE!_-48
Just bumped into this, maybe its too late, but here is what worked for me:
// text is the input string, and this just removes accents from the letters
// lossy encoding turns accented letters into normal letters
NSMutableData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding
allowLossyConversion:YES];
// increase length by 1 adds a 0 byte (increaseLengthBy
// guarantees to fill the new space with 0s), effectively turning
// sanitizedData into a c-string
[sanitizedData increaseLengthBy:1];
// now we just create a string with the c-string in sanitizedData
NSString *final = [NSString stringWithCString:[sanitizedData bytes]];
#interface NSString (Filtering)
- (NSString*)stringByFilteringCharacters:(NSCharacterSet*)charSet;
#end
#implementation NSString (Filtering)
- (NSString*)stringByFilteringCharacters:(NSCharacterSet*)charSet {
NSMutableString * mutString = [NSMutableString stringWithCapacity:[self length]];
for (int i = 0; i < [self length]; i++){
char c = [self characterAtIndex:i];
if(![charSet characterIsMember:c]) [mutString appendFormat:#"%c", c];
}
return [NSString stringWithString:mutString];
}
#end
These answers didn't work as expected for me. Specifically, decomposedStringWithCanonicalMapping didn't strip accents/umlauts as I'd expected.
Here's a variation on what I used that answers the brief:
// replace accents, umlauts etc with equivalent letter i.e 'é' becomes 'e'.
// Always use en_GB (or a locale without the characters you wish to strip) as locale, no matter which language we're taking as input
NSString *processedString = [string stringByFoldingWithOptions: NSDiacriticInsensitiveSearch locale: [NSLocale localeWithLocaleIdentifier: #"en_GB"]];
// remove non-letters
processedString = [[processedString componentsSeparatedByCharactersInSet:[[NSCharacterSet letterCharacterSet] invertedSet]] componentsJoinedByString:#""];
// trim whitespace
processedString = [processedString stringByTrimmingCharactersInSet: [NSCharacterSet whitespaceCharacterSet]];
return processedString;
Peter's Solution in Swift:
let newString = oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet).joinWithSeparator("")
Example:
let oldString = "Jo_ - h !. nn y"
// "Jo_ - h !. nn y"
oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet)
// ["Jo", "h", "nn", "y"]
oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet).joinWithSeparator("")
// "Johnny"
I wanted to filter out everything except letters and numbers, so I adapted Lorean's implementation of a Category on NSString to work a little different. In this example, you specify a string with only the characters you want to keep, and everything else is filtered out:
#interface NSString (PraxCategories)
+ (NSString *)lettersAndNumbers;
- (NSString*)stringByKeepingOnlyLettersAndNumbers;
- (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string;
#end
#implementation NSString (PraxCategories)
+ (NSString *)lettersAndNumbers { return #"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; }
- (NSString*)stringByKeepingOnlyLettersAndNumbers {
return [self stringByKeepingOnlyCharactersInString:[NSString lettersAndNumbers]];
}
- (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string {
NSCharacterSet *characterSet = [NSCharacterSet characterSetWithCharactersInString:string];
NSMutableString * mutableString = #"".mutableCopy;
for (int i = 0; i < [self length]; i++){
char character = [self characterAtIndex:i];
if([characterSet characterIsMember:character]) [mutableString appendFormat:#"%c", character];
}
return mutableString.copy;
}
#end
Once you've made your Categories, using them is trivial, and you can use them on any NSString:
NSString *string = someStringValueThatYouWantToFilter;
string = [string stringByKeepingOnlyLettersAndNumbers];
Or, for example, if you wanted to get rid of everything except vowels:
string = [string stringByKeepingOnlyCharactersInString:#"aeiouAEIOU"];
If you're still learning Objective-C and aren't using Categories, I encourage you to try them out. They're the best place to put things like this because it gives more functionality to all objects of the class you Categorize.
Categories simplify and encapsulate the code you're adding, making it easy to reuse on all of your projects. It's a great feature of Objective-C!

NSString tokenize in Objective-C

What is the best way to tokenize/split a NSString in Objective-C?
Found answer here:
NSString *string = #"oop:ack:bork:greeble:ponies";
NSArray *chunks = [string componentsSeparatedByString: #":"];
Everyone has mentioned componentsSeparatedByString: but you can also use CFStringTokenizer (remember that an NSString and CFString are interchangeable) which will tokenize natural languages too (like Chinese/Japanese which don't split words on spaces).
If you just want to split a string, use -[NSString componentsSeparatedByString:]. For more complex tokenization, use the NSScanner class.
If your tokenization needs are more complex, check out my open source Cocoa String tokenizing/parsing toolkit: ParseKit:
http://parsekit.com
For simple splitting of strings using a delimiter char (like ':'), ParseKit would definitely be overkill. But again, for complex tokenization needs, ParseKit is extremely powerful/flexible.
Also see the ParseKit Tokenization documentation.
If you want to tokenize on multiple characters, you can use NSString's componentsSeparatedByCharactersInSet. NSCharacterSet has some handy pre-made sets like the whitespaceCharacterSet and the illegalCharacterSet. And it has initializers for Unicode ranges.
You can also combine character sets and use them to tokenize, like this:
// Tokenize sSourceEntityName on both whitespace and punctuation.
NSMutableCharacterSet *mcharsetWhitePunc = [[NSCharacterSet whitespaceAndNewlineCharacterSet] mutableCopy];
[mcharsetWhitePunc formUnionWithCharacterSet:[NSCharacterSet punctuationCharacterSet]];
NSArray *sarrTokenizedName = [self.sSourceEntityName componentsSeparatedByCharactersInSet:mcharsetWhitePunc];
[mcharsetWhitePunc release];
Be aware that componentsSeparatedByCharactersInSet will produce blank strings if it encounters more than one member of the charSet in a row, so you might want to test for lengths less than 1.
If you're looking to tokenise a string into search terms while preserving "quoted phrases", here's an NSString category that respects various types of quote pairs: "" '' ‘’ “”
Usage:
NSArray *terms = [#"This is my \"search phrase\" I want to split" searchTerms];
// results in: ["This", "is", "my", "search phrase", "I", "want", "to", "split"]
Code:
#interface NSString (Search)
- (NSArray *)searchTerms;
#end
#implementation NSString (Search)
- (NSArray *)searchTerms {
// Strip whitespace and setup scanner
NSCharacterSet *whitespace = [NSCharacterSet whitespaceAndNewlineCharacterSet];
NSString *searchString = [self stringByTrimmingCharactersInSet:whitespace];
NSScanner *scanner = [NSScanner scannerWithString:searchString];
[scanner setCharactersToBeSkipped:nil]; // we'll handle whitespace ourselves
// A few types of quote pairs to check
NSDictionary *quotePairs = #{#"\"": #"\"",
#"'": #"'",
#"\u2018": #"\u2019",
#"\u201C": #"\u201D"};
// Scan
NSMutableArray *results = [[NSMutableArray alloc] init];
NSString *substring = nil;
while (scanner.scanLocation < searchString.length) {
// Check for quote at beginning of string
unichar unicharacter = [self characterAtIndex:scanner.scanLocation];
NSString *startQuote = [NSString stringWithFormat:#"%C", unicharacter];
NSString *endQuote = [quotePairs objectForKey:startQuote];
if (endQuote != nil) { // if it's a valid start quote we'll have an end quote
// Scan quoted phrase into substring (skipping start & end quotes)
[scanner scanString:startQuote intoString:nil];
[scanner scanUpToString:endQuote intoString:&substring];
[scanner scanString:endQuote intoString:nil];
} else {
// Single word that is non-quoted
[scanner scanUpToCharactersFromSet:whitespace intoString:&substring];
}
// Process and add the substring to results
if (substring) {
substring = [substring stringByTrimmingCharactersInSet:whitespace];
if (substring.length) [results addObject:substring];
}
// Skip to next word
[scanner scanCharactersFromSet:whitespace intoString:nil];
}
// Return non-mutable array
return results.copy;
}
#end
If you are looking for splitting linguistic feature's of a string (Words, paragraphs, characters, sentences and lines), use string enumeration:
NSString * string = #" \n word1! word2,%$?'/word3.word4 ";
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationByWords
usingBlock:
^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(#"Substring: '%#'", substring);
}];
// Logs:
// Substring: 'word1'
// Substring: 'word2'
// Substring: 'word3'
// Substring: 'word4'
This api works with other languages where spaces are not always the delimiter (e.g. Japanese). Also using NSStringEnumerationByComposedCharacterSequences is the proper way to enumerate over characters, since many non-western characters are more than one byte long.
I had a case where I had to split the console output after an LDAP query with ldapsearch. First set up and execute the NSTask (I found a good code sample here: Execute a terminal command from a Cocoa app). But then I had to split and parse the output so as to extract only the print-server names out of the Ldap-query-output. Unfortunately it is rather tedious string-manipulation which would be no problem at all if we were to manipulate C-strings/arrays with simple C-array operations. So here is my code using cocoa objects. If you have better suggestions, let me know.
//as the ldap query has to be done when the user selects one of our Active Directory Domains
//(an according comboBox should be populated with print-server names we discover from AD)
//my code is placed in the onSelectDomain event code
//the following variables are declared in the interface .h file as globals
#protected NSArray* aDomains;//domain combo list array
#protected NSMutableArray* aPrinters;//printer combo list array
#protected NSMutableArray* aPrintServers;//print server combo list array
#protected NSString* sLdapQueryCommand;//for LDAP Queries
#protected NSArray* aLdapQueryArgs;
#protected NSTask* tskLdapTask;
#protected NSPipe* pipeLdapTask;
#protected NSFileHandle* fhLdapTask;
#protected NSMutableData* mdLdapTask;
IBOutlet NSComboBox* comboDomain;
IBOutlet NSComboBox* comboPrinter;
IBOutlet NSComboBox* comboPrintServer;
//end of interface globals
//after collecting the print-server names they are displayed in an according drop-down comboBox
//as soon as the user selects one of the print-servers, we should start a new query to find all the
//print-queues on that server and display them in the comboPrinter drop-down list
//to find the shares/print queues of a windows print-server you need samba and the net -S command like this:
// net -S yourPrintServerName.yourBaseDomain.com -U yourLdapUser%yourLdapUserPassWord -W adm rpc share -l
//which dispalays a long list of the shares
- (IBAction)onSelectDomain:(id)sender
{
static int indexOfLastItem = 0; //unfortunately we need to compare this because we are called also if the selection did not change!
if ([comboDomain indexOfSelectedItem] != indexOfLastItem && ([comboDomain indexOfSelectedItem] != 0))
{
indexOfLastItem = [comboDomain indexOfSelectedItem]; //retain this index for next call
//the print-servers-list has to be loaded on a per univeristy or domain basis from a file dynamically or from AN LDAP-QUERY
//initialize an LDAP-Query-Task or console-command like this one with console output
/*
ldapsearch -LLL -s sub -D "cn=yourLdapUser,ou=yourOuWithLdapUserAccount,dc=yourDomain,dc=com" -h "yourLdapServer.com" -p 3268 -w "yourLdapUserPassWord" -b "dc=yourBaseDomainToSearchIn,dc=com" "(&(objectcategory=computer)(cn=ps*))" "dn"
//our print-server names start with ps* and we want the dn as result, wich comes like this:
dn: CN=PSyourPrintServerName,CN=Computers,DC=yourBaseDomainToSearchIn,DC=com
*/
sLdapQueryCommand = [[NSString alloc] initWithString: #"/usr/bin/ldapsearch"];
if ([[comboDomain stringValue] compare: #"firstDomain"] == NSOrderedSame) {
aLdapQueryArgs = [NSArray arrayWithObjects: #"-LLL",#"-s", #"sub",#"-D", #"cn=yourLdapUser,ou=yourOuWithLdapUserAccount,dc=yourDomain,dc=com",#"-h", #"yourLdapServer.com",#"-p",#"3268",#"-w",#"yourLdapUserPassWord",#"-b",#"dc=yourFirstDomainToSearchIn,dc=com",#"(&(objectcategory=computer)(cn=ps*))",#"dn",nil];
}
else {
aLdapQueryArgs = [NSArray arrayWithObjects: #"-LLL",#"-s", #"sub",#"-D", #"cn=yourLdapUser,ou=yourOuWithLdapUserAccount,dc=yourDomain,dc=com",#"-h", #"yourLdapServer.com",#"-p",#"3268",#"-w",#"yourLdapUserPassWord",#"-b",#"dc=yourSecondDomainToSearchIn,dc=com",#"(&(objectcategory=computer)(cn=ps*))",#"dn",nil];
}
//prepare and execute ldap-query task
tskLdapTask = [[NSTask alloc] init];
pipeLdapTask = [[NSPipe alloc] init];//instead of [NSPipe pipe]
[tskLdapTask setStandardOutput: pipeLdapTask];//hope to get the tasks output in this file/pipe
//The magic line that keeps your log where it belongs, has to do with NSLog (see https://stackoverflow.com/questions/412562/execute-a-terminal-command-from-a-cocoa-app and here http://www.cocoadev.com/index.pl?NSTask )
[tskLdapTask setStandardInput:[NSPipe pipe]];
//fhLdapTask = [[NSFileHandle alloc] init];//would be redundand here, next line seems to do the trick also
fhLdapTask = [pipeLdapTask fileHandleForReading];
mdLdapTask = [NSMutableData dataWithCapacity:512];//prepare capturing the pipe buffer which is flushed on read and can overflow, start with 512 Bytes but it is mutable, so grows dynamically later
[tskLdapTask setLaunchPath: sLdapQueryCommand];
[tskLdapTask setArguments: aLdapQueryArgs];
#ifdef bDoDebug
NSLog (#"sLdapQueryCommand: %#\n", sLdapQueryCommand);
NSLog (#"aLdapQueryArgs: %#\n", aLdapQueryArgs );
NSLog (#"tskLdapTask: %#\n", [tskLdapTask arguments]);
#endif
[tskLdapTask launch];
while ([tskLdapTask isRunning]) {
[mdLdapTask appendData: [fhLdapTask readDataToEndOfFile]];
}
[tskLdapTask waitUntilExit];//might be redundant here.
[mdLdapTask appendData: [fhLdapTask readDataToEndOfFile]];//add another read for safety after process/command stops
NSString* sLdapOutput = [[NSString alloc] initWithData: mdLdapTask encoding: NSUTF8StringEncoding];//convert output to something readable, as NSData and NSMutableData are mere byte buffers
#ifdef bDoDebug
NSLog(#"LdapQueryOutput: %#\n", sLdapOutput);
#endif
//Ok now we have the printservers from Active Directory, lets parse the output and show the list to the user in its combo box
//output is formatted as this, one printserver per line
//dn: CN=PSyourPrintServer,OU=Computers,DC=yourBaseDomainToSearchIn,DC=com
//so we have to search for "dn: CN=" to retrieve each printserver's name
//unfortunately splitting this up will give us a first line containing only "" empty string, which we can replace with the word "choose"
//appearing as first entry in the comboBox
aPrintServers = (NSMutableArray*)[sLdapOutput componentsSeparatedByString:#"dn: CN="];//split output into single lines and store it in the NSMutableArray aPrintServers
#ifdef bDoDebug
NSLog(#"aPrintServers: %#\n", aPrintServers);
#endif
if ([[aPrintServers objectAtIndex: 0 ] compare: #"" options: NSLiteralSearch] == NSOrderedSame){
[aPrintServers replaceObjectAtIndex: 0 withObject: slChoose];//replace with localized string "choose"
#ifdef bDoDebug
NSLog(#"aPrintServers: %#\n", aPrintServers);
#endif
}
//Now comes the tedious part to extract only the print-server-names from the single lines
NSRange r;
NSString* sTemp;
for (int i = 1; i < [aPrintServers count]; i++) {//skip first line with "choose". To get rid of the rest of the line, we must isolate/preserve the print server's name to the delimiting comma and remove all the remaining characters
sTemp = [aPrintServers objectAtIndex: i];
sTemp = [sTemp stringByTrimmingCharactersInSet: [NSCharacterSet whitespaceAndNewlineCharacterSet]];//remove newlines and line feeds
#ifdef bDoDebug
NSLog(#"sTemp: %#\n", sTemp);
#endif
r = [sTemp rangeOfString: #","];//now find first comma to remove the whole rest of the line
//r.length = [sTemp lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
r.length = [sTemp length] - r.location;//calculate number of chars between first comma found and lenght of string
#ifdef bDoDebug
NSLog(#"range: %i, %i\n", r.location, r.length);
#endif
sTemp = [sTemp stringByReplacingCharactersInRange:r withString: #"" ];//remove rest of line
#ifdef bDoDebug
NSLog(#"sTemp after replace: %#\n", sTemp);
#endif
[aPrintServers replaceObjectAtIndex: i withObject: sTemp];//put back string into array for display in comboBox
#ifdef bDoDebug
NSLog(#"aPrintServer: %#\n", [aPrintServers objectAtIndex: i]);
#endif
}
[comboPrintServer removeAllItems];//reset combo box
[comboPrintServer addItemsWithObjectValues:aPrintServers];
[comboPrintServer setNumberOfVisibleItems:aPrintServers.count];
[comboPrintServer selectItemAtIndex:0];
#ifdef bDoDebug
NSLog(#"comboPrintServer reloaded with new values.");
#endif
//release memory we used for LdapTask
[sLdapQueryCommand release];
[aLdapQueryArgs release];
[sLdapOutput release];
[fhLdapTask release];
[pipeLdapTask release];
// [tskLdapTask release];//strangely can not be explicitely released, might be autorelease anyway
// [mdLdapTask release];//strangely can not be explicitely released, might be autorelease anyway
[sTemp release];
}
}
I have my self come across instance where it was not enough to just separate string by component many tasks such as 1) Categorizing token into types 2) Adding new tokens 3)Separating string between custom closures like all words between "{" and "}"For any such requirements i found Parse Kit a life saver.
I used it to parse .PGN (prtable gaming notation) files successfully its very fast and lite.