Using Objective C/Cocoa to unescape unicode characters, ie \u1234 - objective-c

Some sites that I am fetching data from are returning UTF-8 strings, with the UTF-8 characters escaped, ie: \u5404\u500b\u90fd
Is there a built in cocoa function that might assist with this or will I have to write my own decoding algorithm.

It's correct that Cocoa does not offer a solution, yet Core Foundation does: CFStringTransform.
CFStringTransform lives in a dusty, remote corner of Mac OS (and iOS) and so it's a little know gem. It is the front end to Apple's ICU compatible string transformation engine. It can perform real magic like transliterations between greek and latin (or about any known scripts), but it can also be used to do mundane tasks like unescaping strings from a crappy server:
NSString *input = #"\\u5404\\u500b\\u90fd";
NSString *convertedString = [input mutableCopy];
CFStringRef transform = CFSTR("Any-Hex/Java");
CFStringTransform((__bridge CFMutableStringRef)convertedString, NULL, transform, YES);
NSLog(#"convertedString: %#", convertedString);
// prints: 各個都, tada!
As I said, CFStringTransform is really powerful. It supports a number of predefined transforms, like case mappings, normalizations or unicode character name conversion. You can even design your own transformations.
I have no idea why Apple does not make it available from Cocoa.
Edit 2015:
OS X 10.11 and iOS 9 add the following method to Foundation:
- (nullable NSString *)stringByApplyingTransform:(NSString *)transform reverse:(BOOL)reverse;
So the example from above becomes...
NSString *input = #"\\u5404\\u500b\\u90fd";
NSString *convertedString = [input stringByApplyingTransform:#"Any-Hex/Java"
reverse:YES];
NSLog(#"convertedString: %#", convertedString);
Thanks #nschmidt for the heads up.

There is no built-in function to do C unescaping.
You can cheat a little with NSPropertyListSerialization since an "old text style" plist supports C escaping via \Uxxxx:
NSString* input = #"ab\"cA\"BC\\u2345\\u0123";
// will cause trouble if you have "abc\\\\uvw"
NSString* esc1 = [input stringByReplacingOccurrencesOfString:#"\\u" withString:#"\\U"];
NSString* esc2 = [esc1 stringByReplacingOccurrencesOfString:#"\"" withString:#"\\\""];
NSString* quoted = [[#"\"" stringByAppendingString:esc2] stringByAppendingString:#"\""];
NSData* data = [quoted dataUsingEncoding:NSUTF8StringEncoding];
NSString* unesc = [NSPropertyListSerialization propertyListFromData:data
mutabilityOption:NSPropertyListImmutable format:NULL
errorDescription:NULL];
assert([unesc isKindOfClass:[NSString class]]);
NSLog(#"Output = %#", unesc);
but mind that this isn't very efficient. It's far better if you write up your own parser. (BTW are you decoding JSON strings? If yes you could use the existing JSON parsers.)

Here's what I ended up writing. Hopefully this will help some people along.
+ (NSString*) unescapeUnicodeString:(NSString*)string
{
// unescape quotes and backwards slash
NSString* unescapedString = [string stringByReplacingOccurrencesOfString:#"\\\"" withString:#"\""];
unescapedString = [unescapedString stringByReplacingOccurrencesOfString:#"\\\\" withString:#"\\"];
// tokenize based on unicode escape char
NSMutableString* tokenizedString = [NSMutableString string];
NSScanner* scanner = [NSScanner scannerWithString:unescapedString];
while ([scanner isAtEnd] == NO)
{
// read up to the first unicode marker
// if a string has been scanned, it's a token
// and should be appended to the tokenized string
NSString* token = #"";
[scanner scanUpToString:#"\\u" intoString:&token];
if (token != nil && token.length > 0)
{
[tokenizedString appendString:token];
continue;
}
// skip two characters to get past the marker
// check if the range of unicode characters is
// beyond the end of the string (could be malformed)
// and if it is, move the scanner to the end
// and skip this token
NSUInteger location = [scanner scanLocation];
NSInteger extra = scanner.string.length - location - 4 - 2;
if (extra < 0)
{
NSRange range = {location, -extra};
[tokenizedString appendString:[scanner.string substringWithRange:range]];
[scanner setScanLocation:location - extra];
continue;
}
// move the location pas the unicode marker
// then read in the next 4 characters
location += 2;
NSRange range = {location, 4};
token = [scanner.string substringWithRange:range];
unichar codeValue = (unichar) strtol([token UTF8String], NULL, 16);
[tokenizedString appendString:[NSString stringWithFormat:#"%C", codeValue]];
// move the scanner past the 4 characters
// then keep scanning
location += 4;
[scanner setScanLocation:location];
}
// done
return tokenizedString;
}
+ (NSString*) escapeUnicodeString:(NSString*)string
{
// lastly escaped quotes and back slash
// note that the backslash has to be escaped before the quote
// otherwise it will end up with an extra backslash
NSString* escapedString = [string stringByReplacingOccurrencesOfString:#"\\" withString:#"\\\\"];
escapedString = [escapedString stringByReplacingOccurrencesOfString:#"\"" withString:#"\\\""];
// convert to encoded unicode
// do this by getting the data for the string
// in UTF16 little endian (for network byte order)
NSData* data = [escapedString dataUsingEncoding:NSUTF16LittleEndianStringEncoding allowLossyConversion:YES];
size_t bytesRead = 0;
const char* bytes = data.bytes;
NSMutableString* encodedString = [NSMutableString string];
// loop through the byte array
// read two bytes at a time, if the bytes
// are above a certain value they are unicode
// otherwise the bytes are ASCII characters
// the %C format will write the character value of bytes
while (bytesRead < data.length)
{
uint16_t code = *((uint16_t*) &bytes[bytesRead]);
if (code > 0x007E)
{
[encodedString appendFormat:#"\\u%04X", code];
}
else
{
[encodedString appendFormat:#"%C", code];
}
bytesRead += sizeof(uint16_t);
}
// done
return encodedString;
}

simple code:
const char *cString = [unicodeStr cStringUsingEncoding:NSUTF8StringEncoding];
NSString *resultStr = [NSString stringWithCString:cString encoding:NSNonLossyASCIIStringEncoding];
from: https://stackoverflow.com/a/7861345

Related

Check if NSString only contains one character repeated

I want to know a simple and fast way to determine if all characters in an NSString are the same.
For example:
NSString *string = "aaaaaaaaa"
=> return YES
NSString *string = "aaaaaaabb"
=> return NO
I know that I can achieve it by using a loop but my NSString is long so I prefer a shorter and simpler way.
you can use this, replace first character with null and check lenght:
-(BOOL)sameCharsInString:(NSString *)str{
if ([str length] == 0 ) return NO;
return [[str stringByReplacingOccurrencesOfString:[str substringToIndex:1] withString:#""] length] == 0 ? YES : NO;
}
Here are two possibilities that fail as quickly as possible and don't (explicitly) create copies of the original string, which should be advantageous since you said the string was large.
First, use NSScanner to repeatedly try to read the first character in the string. If the loop ends before the scanner has reached the end of the string, there are other characters present.
NSScanner * scanner = [NSScanner scannerWithString:s];
NSString * firstChar = [s substringWithRange:[s rangeOfComposedCharacterSequenceAtIndex:0]];
while( [scanner scanString:firstChar intoString:NULL] ) continue;
BOOL stringContainsOnlyOneCharacter = [scanner isAtEnd];
Regex is also a good tool for this problem, since "a character followed by any number of repetitions of that character" is in very simply expressed with a single back reference:
// Match one of any character at the start of the string,
// followed by any number of repetitions of that same character
// until the end of the string.
NSString * patt = #"^(.)\\1*$";
NSRegularExpression * regEx =
[NSRegularExpression regularExpressionWithPattern:patt
options:0
error:NULL];
NSArray * matches = [regEx matchesInString:s
options:0
range:(NSRange){0, [s length]}];
BOOL stringContainsOnlyOneCharacter = ([matches count] == 1);
Both these options correctly deal with multi-byte and composed characters; the regex version also does not require an explicit check for the empty string.
use this loop:
NSString *firstChar = [str substringWithRange:NSMakeRange(0, 1)];
for (int i = 1; i < [str length]; i++) {
NSString *ch = [str substringWithRange:NSMakeRange(i, 1)];
if(![ch isEqualToString:firstChar])
{
return NO;
}
}
return YES;

Get a substring from an NSString until arriving to any letter in an NSArray - objective C

I am trying to parse a set of words that contain -- first greek letters, then english letters. This would be easy if there was a delimiter between the sets.That is what I've built so far..
- (void)loadWordFileToArray:(NSBundle *)bundle {
NSLog(#"loadWordFileToArray");
if (bundle != nil) {
NSString *path = [bundle pathForResource:#"alfa" ofType:#"txt"];
//pull the content from the file into memory
NSData* data = [NSData dataWithContentsOfFile:path];
//convert the bytes from the file into a string
NSString* string = [[NSString alloc] initWithBytes:[data bytes]
length:[data length]
encoding:NSUTF8StringEncoding];
//split the string around newline characters to create an array
NSString* delimiter = #"\n";
incomingWords = [string componentsSeparatedByString:delimiter];
NSLog(#"incomingWords count: %lu", (unsigned long)incomingWords.count);
}
}
-(void)parseWordArray{
NSLog(#"parseWordArray");
NSString *seperator = #" = ";
int i = 0;
for (i=0; i < incomingWords.count; i++) {
NSString *incomingString = [incomingWords objectAtIndex:i];
NSScanner *scanner = [NSScanner localizedScannerWithString: incomingString];
NSString *firstString;
NSString *secondString;
NSInteger scanPosition;
[scanner scanUpToString:seperator intoString:&firstString];
scanPosition = [scanner scanLocation];
secondString = [[scanner string] substringFromIndex:scanPosition+[seperator length]];
// NSLog(#"greek: %#", firstString);
// NSLog(#"english: %#", secondString);
[outgoingWords insertObject:[NSMutableArray arrayWithObjects:#"greek", firstString, #"english",secondString,#"category", #"", nil] atIndex:0];
[englishWords insertObject:[NSMutableArray arrayWithObjects:secondString,nil] atIndex:0];
}
}
But I cannot count on there being delimiters.
I have looked at this question. I want something similar. This would be: grab the characters in the string until an english letter is found. Then take the first group to one new string, and all the characters after to a second new string.
I only have to run this a few times, so optimization is not my highest priority.. Any help would be appreciated..
EDIT:
I've changed my code as shown below to make use of NSLinguisticTagger. This works, but is this the best way? Note that the interpretation for english characters is -- for some reason "und"...
The incoming string is: άγαλμα, το statue, only the last 6 characters are in english.
int j = 0;
for (j=0; j<incomingString.length; j++) {
NSString *language = [tagger tagAtIndex:j scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];
if ([language isEqual: #"und"]) {
NSLog(#"j is: %i", j);
int k = 0;
for (k=0; k<j; k++) {
NSRange range = NSMakeRange (0, k);
NSString *tempString = [incomingString substringWithRange:range ];
NSLog (#"tempString: %#", tempString);
}
return;
}
NSLog (#"Language: %#", language);
}
Alright so what you could do is use NSLinguisticTagger to find out the language of the word (or letter) and if the language has changed then you know where to split the string. You can use NSLinguisticTagger like this:
NSArray *tagschemes = #[NSLinguisticTagSchemeLanguage];
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagschemes options: NSLinguisticTagPunctuation | NSLinguisticTaggerOmitWhitespace];
[tagger setString:#"This is my string in English."];
NSString *language = [tagger tagAtIndex:0 scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];
//Loop through each index of the string's characters and check the language as above.
//If it has changed then you can assume the language has changed.
Alternatively you can use NSSpellChecker's requestCheckingOfString to get teh dominant language in a range of characters:
NSSpellChecker *spellChecker = [NSSpellChecker sharedSpellChecker];
[spellChecker setAutomaticallyIdentifiesLanguages:YES];
NSString *spellCheckText = #"Guten Herr Mustermann. Dies ist ein deutscher Text. Bitte löschen Sie diesen nicht.";
[spellChecker requestCheckingOfString:spellCheckText
range:(NSRange){0, [spellCheckText length]}
types:NSTextCheckingTypeOrthography
options:nil
inSpellDocumentWithTag:0
completionHandler:^(NSInteger sequenceNumber, NSArray *results, NSOrthography *orthography, NSInteger wordCount) {
NSLog(#"dominant language = %#", orthography.dominantLanguage);
}];
This answer has information on how to detect the language of an NSString.
Allow me to introduce two good friends of mine.
NSCharacterSet and NSRegularExpression.
Along with them, normalization. (In Unicode terms)
First, you should normalize strings before analyzing them against a character set.
You will need to look at the choices, but normalizing to all composed forms is the way I would go.
This means an accented character is one instead of two or more.
It simplifies the number of things to compare.
Next, you can easily build your own NSCharacterSet objects from strings (loaded from files even) to use to test set membership.
Lastly, regular expressions can achieve the same thing with Unicode Property Names as classes or categories of characters. Regular expressions could be more terse but more expressive.

How to handle 32bit unicode characters in a NSString

I have a NSString containing a unicode character bigger than U+FFFF, like the MUSICAL SYMBOL G CLEF symbol '𝄞'. I can create the NSString and display it.
NSString *s = #"A\U0001d11eB"; // "A𝄞B"
NSLog(#"String = \"%#\"", s);
The log is correct and displays the 3 characters. This tells me the NSString is well done and there is no encoding problem.
String = "A𝄞B"
But when I try to loop through all characters using the method
- (unichar)characterAtIndex:(NSUInteger)index
everything goes wrong.
The type unichar is 16 bits so I expect to get the wrong character for the musical symbol. But the length of the string is also incorrect!
NSLog(#"Length = %d", [s length]);
for (int i=0; i<[s length]; i++)
{
NSLog(#" Character %d = %c", i, [s characterAtIndex:i]);
}
displays
Length = 4
Character 0 = A
Character 1 = 4
Character 2 = .
Character 3 = B
What methods should I use to correctly parse my NSString and get my 3 unicode characters?
Ideally the right method should return a type like wchar_t in place of unichar.
Thank you
NSString *s = #"A\U0001d11eB";
NSData *data = [s dataUsingEncoding:NSUTF32LittleEndianStringEncoding];
const wchar_t *wcs = [data bytes];
for (int i = 0; i < [data length]/4; i++) {
NSLog(#"%#010x", wcs[i]);
}
Output:
0x00000041
0x0001d11e
0x00000042
(The code assumes that wchar_t has a size of 4 bytes and little-endian encoding.)
length and charAtIndex: do not give the expected result because \U0001d11e
is internally stored as UTF-16 "surrogate pair".
Another useful method for general Unicode strings is
[s enumerateSubstringsInRange:NSMakeRange(0, [s length])
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(#"%#", substring);
}];
Output:
A
𝄞
B

Remove all but numbers from NSString

I have an NSString (phone number) with some parenthesis and hyphens as some phone numbers are formatted. How would I remove all characters except numbers from the string?
Old question, but how about:
NSString *newString = [[origString componentsSeparatedByCharactersInSet:
[[NSCharacterSet decimalDigitCharacterSet] invertedSet]]
componentsJoinedByString:#""];
It explodes the source string on the set of non-digits, then reassembles them using an empty string separator. Not as efficient as picking through characters, but much more compact in code.
There's no need to use a regular expressions library as the other answers suggest -- the class you're after is called NSScanner. It's used as follows:
NSString *originalString = #"(123) 123123 abc";
NSMutableString *strippedString = [NSMutableString
stringWithCapacity:originalString.length];
NSScanner *scanner = [NSScanner scannerWithString:originalString];
NSCharacterSet *numbers = [NSCharacterSet
characterSetWithCharactersInString:#"0123456789"];
while ([scanner isAtEnd] == NO) {
NSString *buffer;
if ([scanner scanCharactersFromSet:numbers intoString:&buffer]) {
[strippedString appendString:buffer];
} else {
[scanner setScanLocation:([scanner scanLocation] + 1)];
}
}
NSLog(#"%#", strippedString); // "123123123"
EDIT: I've updated the code because the original was written off the top of my head and I figured it would be enough to point the people in the right direction. It seems that people are after code they can just copy-paste straight into their application.
I also agree that Michael Pelz-Sherman's solution is more appropriate than using NSScanner, so you might want to take a look at that.
The accepted answer is overkill for what is being asked. This is much simpler:
NSString *pureNumbers = [[phoneNumberString componentsSeparatedByCharactersInSet:[[NSCharacterSet decimalDigitCharacterSet] invertedSet]] componentsJoinedByString:#""];
This is great, but the code does not work for me on the iPhone 3.0 SDK.
If I define strippedString as you show here, I get a BAD ACCESS error when trying to print it after the scanCharactersFromSet:intoString call.
If I do it like so:
NSMutableString *strippedString = [NSMutableString stringWithCapacity:10];
I end up with an empty string, but the code doesn't crash.
I had to resort to good old C instead:
for (int i=0; i<[phoneNumber length]; i++) {
if (isdigit([phoneNumber characterAtIndex:i])) {
[strippedString appendFormat:#"%c",[phoneNumber characterAtIndex:i]];
}
}
Though this is an old question with working answers, I missed international format support. Based on the solution of simonobo, the altered character set includes a plus sign "+". International phone numbers are supported by this amendment as well.
NSString *condensedPhoneNumber = [[phoneNumber componentsSeparatedByCharactersInSet:
[[NSCharacterSet characterSetWithCharactersInString:#"+0123456789"]
invertedSet]]
componentsJoinedByString:#""];
The Swift expressions are
var phoneNumber = " +1 (234) 567-1000 "
var allowedCharactersSet = NSMutableCharacterSet.decimalDigitCharacterSet()
allowedCharactersSet.addCharactersInString("+")
var condensedPhoneNumber = phoneNumber.componentsSeparatedByCharactersInSet(allowedCharactersSet.invertedSet).joinWithSeparator("")
Which yields +12345671000 as a common international phone number format.
Here is the Swift version of this.
import UIKit
import Foundation
var phoneNumber = " 1 (888) 555-5551 "
var strippedPhoneNumber = "".join(phoneNumber.componentsSeparatedByCharactersInSet(NSCharacterSet.decimalDigitCharacterSet().invertedSet))
Swift version of the most popular answer:
var newString = join("", oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.decimalDigitCharacterSet().invertedSet))
Edit: Syntax for Swift 2
let newString = oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.decimalDigitCharacterSet().invertedSet).joinWithSeparator("")
Edit: Syntax for Swift 3
let newString = oldString.components(separatedBy: CharacterSet.decimalDigits.inverted).joined(separator: "")
Thanks for the example. It has only one thing missing the increment of the scanLocation in case one of the characters in originalString is not found inside the numbers CharacterSet object. I have added an else {} statement to fix this.
NSString *originalString = #"(123) 123123 abc";
NSMutableString *strippedString = [NSMutableString
stringWithCapacity:originalString.length];
NSScanner *scanner = [NSScanner scannerWithString:originalString];
NSCharacterSet *numbers = [NSCharacterSet
characterSetWithCharactersInString:#"0123456789"];
while ([scanner isAtEnd] == NO) {
NSString *buffer;
if ([scanner scanCharactersFromSet:numbers intoString:&buffer]) {
[strippedString appendString:buffer];
}
// --------- Add the following to get out of endless loop
else {
[scanner setScanLocation:([scanner scanLocation] + 1)];
}
// --------- End of addition
}
NSLog(#"%#", strippedString); // "123123123"
It Accept only mobile number
NSString * strippedNumber = [mobileNumber stringByReplacingOccurrencesOfString:#"[^0-9]" withString:#"" options:NSRegularExpressionSearch range:NSMakeRange(0, [mobileNumber length])];
It might be worth noting that the accepted componentsSeparatedByCharactersInSet: and componentsJoinedByString:-based answer is not a memory-efficient solution. It allocates memory for the character set, for an array and for a new string. Even if these are only temporary allocations, processing lots of strings this way can quickly fill the memory.
A memory friendlier approach would be to operate on a mutable copy of the string in place. In a category over NSString:
-(NSString *)stringWithNonDigitsRemoved {
static NSCharacterSet *decimalDigits;
if (!decimalDigits) {
decimalDigits = [NSCharacterSet decimalDigitCharacterSet];
}
NSMutableString *stringWithNonDigitsRemoved = [self mutableCopy];
for (CFIndex index = 0; index < stringWithNonDigitsRemoved.length; ++index) {
unichar c = [stringWithNonDigitsRemoved characterAtIndex: index];
if (![decimalDigits characterIsMember: c]) {
[stringWithNonDigitsRemoved deleteCharactersInRange: NSMakeRange(index, 1)];
index -= 1;
}
}
return [stringWithNonDigitsRemoved copy];
}
Profiling the two approaches have shown this using about 2/3 less memory.
You can use regular expression on mutable string:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:
#"[^\\d]"
options:0
error:nil];
[regex replaceMatchesInString:str
options:0
range:NSMakeRange(0, str.length)
withTemplate:#""];
Built the top solution as a category to help with broader problems:
Interface:
#interface NSString (easyReplace)
- (NSString *)stringByReplacingCharactersNotInSet:(NSCharacterSet *)set
with:(NSString *)string;
#end
Implemenation:
#implementation NSString (easyReplace)
- (NSString *)stringByReplacingCharactersNotInSet:(NSCharacterSet *)set
with:(NSString *)string
{
NSMutableString *strippedString = [NSMutableString
stringWithCapacity:self.length];
NSScanner *scanner = [NSScanner scannerWithString:self];
while ([scanner isAtEnd] == NO) {
NSString *buffer;
if ([scanner scanCharactersFromSet:set intoString:&buffer]) {
[strippedString appendString:buffer];
} else {
[scanner setScanLocation:([scanner scanLocation] + 1)];
[strippedString appendString:string];
}
}
return [NSString stringWithString:strippedString];
}
#end
Usage:
NSString *strippedString =
[originalString stringByReplacingCharactersNotInSet:
[NSCharacterSet setWithCharactersInString:#"01234567890"
with:#""];
Swift 3
let notNumberCharacters = NSCharacterSet.decimalDigits.inverted
let intString = yourString.trimmingCharacters(in: notNumberCharacters)
swift 4.1
var str = "75003 Paris, France"
var stringWithoutDigit = (str.components(separatedBy:CharacterSet.decimalDigits)).joined(separator: "")
print(stringWithoutDigit)
Um. The first answer seems totally wrong to me. NSScanner is really meant for parsing. Unlike regex, it has you parsing the string one tiny chunk at a time. You initialize it with a string, and it maintains an index of how far along the string it's gotten; That index is always its reference point, and any commands you give it are relative to that point. You tell it, "ok, give me the next chunk of characters in this set" or "give me the integer you find in the string", and those start at the current index, and move forward until they find something that doesn't match. If the very first character already doesn't match, then the method returns NO, and the index doesn't increment.
The code in the first example is scanning "(123)456-7890" for decimal characters, which already fails from the very first character, so the call to scanCharactersFromSet:intoString: leaves the passed-in strippedString alone, and returns NO; The code totally ignores checking the return value, leaving the strippedString unassigned. Even if the first character were a digit, that code would fail, since it would only return the digits it finds up until the first dash or paren or whatever.
If you really wanted to use NSScanner, you could put something like that in a loop, and keep checking for a NO return value, and if you get that you can increment the scanLocation and scan again; and you also have to check isAtEnd, and yada yada yada. In short, wrong tool for the job. Michael's solution is better.
For those searching for phone extraction, you can extract the phone numbers from a text using NSDataDetector, for example:
NSString *userBody = #"This is a text with 30612312232 my phone";
if (userBody != nil) {
NSError *error = NULL;
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypePhoneNumber error:&error];
NSArray *matches = [detector matchesInString:userBody options:0 range:NSMakeRange(0, [userBody length])];
if (matches != nil) {
for (NSTextCheckingResult *match in matches) {
if ([match resultType] == NSTextCheckingTypePhoneNumber) {
DbgLog(#"Found phone number %#", [match phoneNumber]);
}
}
}
}
`
I created a category on NSString to simplify this common operation.
NSString+AllowCharactersInSet.h
#interface NSString (AllowCharactersInSet)
- (NSString *)stringByAllowingOnlyCharactersInSet:(NSCharacterSet *)characterSet;
#end
NSString+AllowCharactersInSet.m
#implementation NSString (AllowCharactersInSet)
- (NSString *)stringByAllowingOnlyCharactersInSet:(NSCharacterSet *)characterSet {
NSMutableString *strippedString = [NSMutableString
stringWithCapacity:self.length];
NSScanner *scanner = [NSScanner scannerWithString:self];
while (!scanner.isAtEnd) {
NSString *buffer = nil;
if ([scanner scanCharactersFromSet:characterSet intoString:&buffer]) {
[strippedString appendString:buffer];
} else {
scanner.scanLocation = scanner.scanLocation + 1;
}
}
return strippedString;
}
#end
I think currently best way is:
phoneNumber.replacingOccurrences(of: "\\D",
with: "",
options: String.CompareOptions.regularExpression)
If you're just looking to grab the numbers from the string, you could certainly use regular expressions to parse them out. For doing regex in Objective-C, check out RegexKit. Edit: As #Nathan points out, using NSScanner is a much simpler way to parse all numbers from a string. I totally wasn't aware of that option, so props to him for suggesting it. (I don't even like using regex myself, so I prefer approaches that don't require them.)
If you want to format phone numbers for display, it's worth taking a look at NSNumberFormatter. I suggest you read through this related SO question for tips on doing so. Remember that phone numbers are formatted differently depending on location and/or locale.
Swift 5
let newString = origString.components(separatedBy: CharacterSet.decimalDigits.inverted).joined(separator: "")
Based on Jon Vogel's answer here it is as a Swift String extension along with some basic tests.
import Foundation
extension String {
func stringByRemovingNonNumericCharacters() -> String {
return self.componentsSeparatedByCharactersInSet(NSCharacterSet.decimalDigitCharacterSet().invertedSet).joinWithSeparator("")
}
}
And some tests proving at least basic functionality:
import XCTest
class StringExtensionTests: XCTestCase {
func testStringByRemovingNonNumericCharacters() {
let baseString = "123"
var testString = baseString
var newString = testString.stringByRemovingNonNumericCharacters()
XCTAssertTrue(newString == testString)
testString = "a123b"
newString = testString.stringByRemovingNonNumericCharacters()
XCTAssertTrue(newString == baseString)
testString = "a=1-2_3#b"
newString = testString.stringByRemovingNonNumericCharacters()
XCTAssertTrue(newString == baseString)
testString = "(999) 999-9999"
newString = testString.stringByRemovingNonNumericCharacters()
XCTAssertTrue(newString.characters.count == 10)
XCTAssertTrue(newString == "9999999999")
testString = "abc"
newString = testString.stringByRemovingNonNumericCharacters()
XCTAssertTrue(newString == "")
}
}
This answers the OP's question but it could be easily modified to leave in phone number related characters like ",;*#+"
NSString *originalPhoneNumber = #"(123) 123-456 abc";
NSCharacterSet *numbers = [[NSCharacterSet characterSetWithCharactersInString:#"0123456789"] invertedSet];
NSString *trimmedPhoneNumber = [originalPhoneNumber stringByTrimmingCharactersInSet:numbers];
];
Keep it simple!

Reading ints from NSData?

I think I am getting a little confused here, what I have is a plain text file with the numbers "5 10 2350" in it. As you can see below I am trying to read the first value using readDataOfLength, I think maybe where I am getting muddled is that I should be reading as chars, but then 10 is 2 chars and 2350 is 4. Can anyone point m in the right direction to reading these.
NSString *dataFile_IN = #"/Users/FGX/Documents/Xcode/syntax_FileIO/inData.txt";
NSFileHandle *inFile;
NSData *readBuffer;
int intBuffer;
int bufferSize = sizeof(int);
inFile = [NSFileHandle fileHandleForReadingAtPath:dataFile_IN];
if(inFile != nil) {
readBuffer = [inFile readDataOfLength:bufferSize];
[readBuffer getBytes: &intBuffer length: bufferSize];
NSLog(#"BUFFER: %d", intBuffer);
[inFile closeFile];
}
EDIT_001
Both excellent answers from Jarret and Ole, here is what I have gone with. One final question "METHOD 02" picks up a carriage return to a blank line at the bottom of the text file, returns it as a subString, which in turn gets converted to "0" can I set the NSCharacterSet to stop that, currently I just added a length check on the string.
NSInteger intFromFile;
NSScanner *scanner;
NSArray *subStrings;
NSString *eachString;
// METHOD 01 Output: 57 58 59
strBuffer = [NSString stringWithContentsOfFile:dataFile_IN encoding:NSUTF8StringEncoding error:&fileError];
scanner = [NSScanner scannerWithString:strBuffer];
while ([scanner scanInteger:&intFromFile]) NSLog(#"%d", intFromFile);
// METHOD 02 Output: 57 58 59 0
strBuffer = [NSString stringWithContentsOfFile:dataFile_IN encoding:NSUTF8StringEncoding error:&fileError];
subStrings = [strBuffer componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
for(eachString in subStrings) {
if ([eachString length] != 0) {
NSLog(#"{%#} %d", eachString, [eachString intValue]);
}
}
gary
There are several conveniences in Cocoa that can make your life a bit easier here:
NSString *dataFile_IN = #"/Users/FGX/Documents/Xcode/syntax_FileIO/inData.txt";
// Read all the data at once into a string... an convenience around the
// need the open a file handle and convert NSData
NSString *s = [NSString stringWithContentsOfFile:dataFile_IN
encoding:NSUTF8StringEncoding
error:nil];
// Use a scanner to loop over the file. This assumes there is nothing in
// the file but integers separated by whitespace and newlines
NSInteger anInteger;
NSScanner *scanner = [NSScanner scannerWithString:s];
while (![scanner isAtEnd]) {
if ([scanner scanInteger:&anInteger]) {
NSLog(#"Found an integer: %d", anInteger);
}
}
Otherwise, using your original approach, you'd pretty much have to read character-by-character, adding each character to a "buffer" and then evaluating your integer when you encounter a space (or newline, or some other separator).
If you read the file's contents into a string as Jaret suggested, and assuming the string only contains numbers and whitespace, you can also call:
NSArray *substrings = [s componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
This will split the string at whitespace and newline characters and return an array of the substrings. You would then have to convert the substrings to integers by looping over the array and calling [substring integerValue].
One way to do it would be first to first turn your readBuffer into a string as follows:
NSString * dataString = [[NSString alloc] initWithData:readBuffer encoding:NSUTF8StringEncoding];
Then split the string into values:
NSString *dataString=#"5 10 2350"; // example string to split
NSArray * valueStrings = [dataString componentsSeparatedByString:#" "];
for(NSString *valueString in valueStrings)
{
int value=[valueString intValue];
NSLog(#"%d",value);
}
Output of this is
5
10
2350