Objective C - Parse NSData - objective-c

I have the following data inside an NSData object:
<00000000 6f2d840e 31504159 2e535953 2e444446 3031a51b 8801015f 2d02656e 9f110101 bf0c0cc5 0affff3f 00000003 ffff03>
I'm having issues parsing this data. This data contains information which is marked by tags
Tag 1 is from byte value 0x84 to 0xa5
Tag 2 is from byte value 0xa5 to 0x88
Tag 3 is from byte value 0x88 to 0x5f0x2d
Tag 4 is from byte value 0x5f0x2d to 0x9f0x11
How would I go about to get those values from the NSData object?
Regards,
EZFrag

Use -[NSData bytes] to get a pointer to the contents. Then use pointer arithmetic to iterate over the bytes until you find what you are looking for. Since you want to go byte by byte, you should probably cast the pointer returned by bytes to uint8_t*. Then, pointer[0] points to the first byte, pointer[1] to the second, and so on.

I managed a nice solution, deciding to actually use the graymatter
-(int)getIndexOfSubDataInData:(NSData*)haystack forData:(NSData*)needle{
int dataCounter = 0;
NSRange dataRange = NSMakeRange(dataCounter, [needle length]);
NSData* compareData = [haystack subdataWithRange:dataRange];
while (![compareData isEqualToData:needle]) {
dataCounter++;
dataRange = NSMakeRange(dataCounter, [needle length]);
compareData = [haystack subdataWithRange:dataRange];
}
return dataCounter;
}
-(NSData*)getSubDataInData:(NSData*)targetData fromTag:(NSData*)fromTag toTag:(NSData*)toTag{
int startIndex = [self getIndexOfSubDataInData:targetData forData:fromTag] + [fromTag length];
int endIndex = [self getIndexOfSubDataInData:targetData forData:toTag];
int dataLength = endIndex - startIndex;
NSRange dataRange = NSMakeRange(startIndex, dataLength);
return [targetData subdataWithRange:dataRange];
}
//here is how I use the code
NSData* langTagStart=[[NSData alloc] initWithBytes:"\x5F\x2D" length:2];
NSData* langTagEnd=[[NSData alloc] initWithBytes:"\x9F\x11" length:2];
NSData* languageData = [self getSubDataInData:[response bytes] fromTag:langTagStart toTag:langTagEnd];
Thanks for your suggestions.
Regards,
EZFrag

Related

Convert NSData byte array to string?

I have an NSData object. I need to convert its bytes to a string and send as JSON. description returns hex and is unreliable (according to various SO posters). So I'm looking at code like this:
NSUInteger len = [imageData length];
Byte *byteData = (Byte*)malloc(len);
[imageData getBytes:&byteData length:len];
How do I then send byteData as JSON? I want to send the raw bytes.
CODE:
NSString *jsonBase64 = [imageData base64EncodedString];
NSLog(#"BASE 64 FINGERPRINT: %#", jsonBase64);
NSData *b64 = [NSData dataFromBase64String:jsonBase64];
NSLog(#"Equal: %d", [imageData isEqualToData:b64]);
NSLog(#"b64: %#", b64);
NSLog(#"original: %#", imageData);
NSString *decoded = [[NSString alloc] initWithData:b64 encoding:NSUTF8StringEncoding];
NSLog(#"decoded: %#", decoded);
I get values for everything except for the last line - decoded.
Which would indicate to me that the raw bytes are not formatted in NSUTF8encoding?
The reason the String is being considered 'unreliable' in previous Stack posts is because they too were attempting to use NSData objects where the ending bytes aren't properly terminated with NULL :
NSString *jsonString = [NSString stringWithUTF8String:[nsDataObj bytes]];
// This is unreliable because it may result in NULL string values
Whereas the example below should give you your desired results because the NSData byte string will terminate correctly:
NSString *jsonString = [[NSString alloc] initWithBytes:[nsDataObj bytes] length:[nsDataObj length] encoding: NSUTF8StringEncoding];
You were on the right track and hopefully this is able to help you solve your current problem. Best of luck!
~ EDIT ~
Make sure you are declaring your NSData Object from an image like so:
NSData *imageData = [[NSData alloc] init];
imageData = UIImagePNGRepresentation(yourImage);
Have you tried using something like this:
#implementation NSData (Base64)
- (NSString *)base64EncodedString
{
return [self base64EncodedStringWithWrapWidth:0];
}
This will turn your NSData in a base64 string, and on the other side you just need to decode it.
EDIT: #Lucas said you can do something like this:
NSString *myString = [[NSString alloc] initWithData:myData encoding:NSUTF8StringEncoding];
but i had some problem with this method because of some special characters, and because of that i started using base64 strings for communication.
EDIT3: Trys this method base64EncodedString
#implementation NSData (Base64)
- (NSString *)base64EncodedString
{
return [self base64EncodedStringWithWrapWidth:0];
}
//Helper Method
- (NSString *)base64EncodedStringWithWrapWidth:(NSUInteger)wrapWidth
{
//ensure wrapWidth is a multiple of 4
wrapWidth = (wrapWidth / 4) * 4;
const char lookup[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
long long inputLength = [self length];
const unsigned char *inputBytes = [self bytes];
long long maxOutputLength = (inputLength / 3 + 1) * 4;
maxOutputLength += wrapWidth? (maxOutputLength / wrapWidth) * 2: 0;
unsigned char *outputBytes = (unsigned char *)malloc((NSUInteger)maxOutputLength);
long long i;
long long outputLength = 0;
for (i = 0; i < inputLength - 2; i += 3)
{
outputBytes[outputLength++] = lookup[(inputBytes[i] & 0xFC) >> 2];
outputBytes[outputLength++] = lookup[((inputBytes[i] & 0x03) << 4) | ((inputBytes[i + 1] & 0xF0) >> 4)];
outputBytes[outputLength++] = lookup[((inputBytes[i + 1] & 0x0F) << 2) | ((inputBytes[i + 2] & 0xC0) >> 6)];
outputBytes[outputLength++] = lookup[inputBytes[i + 2] & 0x3F];
//add line break
if (wrapWidth && (outputLength + 2) % (wrapWidth + 2) == 0)
{
outputBytes[outputLength++] = '\r';
outputBytes[outputLength++] = '\n';
}
}
//handle left-over data
if (i == inputLength - 2)
{
// = terminator
outputBytes[outputLength++] = lookup[(inputBytes[i] & 0xFC) >> 2];
outputBytes[outputLength++] = lookup[((inputBytes[i] & 0x03) << 4) | ((inputBytes[i + 1] & 0xF0) >> 4)];
outputBytes[outputLength++] = lookup[(inputBytes[i + 1] & 0x0F) << 2];
outputBytes[outputLength++] = '=';
}
else if (i == inputLength - 1)
{
// == terminator
outputBytes[outputLength++] = lookup[(inputBytes[i] & 0xFC) >> 2];
outputBytes[outputLength++] = lookup[(inputBytes[i] & 0x03) << 4];
outputBytes[outputLength++] = '=';
outputBytes[outputLength++] = '=';
}
if (outputLength >= 4)
{
//truncate data to match actual output length
outputBytes = realloc(outputBytes, (NSUInteger)outputLength);
return [[NSString alloc] initWithBytesNoCopy:outputBytes
length:(NSUInteger)outputLength
encoding:NSASCIIStringEncoding
freeWhenDone:YES];
}
else if (outputBytes)
{
free(outputBytes);
}
return nil;
}
Null termination is not the only problem when converting from NSData to NSString.
NSString is not designed to hold arbitrary binary data. It expects an encoding.
If your NSData contains an invalid UTF-8 sequence, initializing the NSString will fail.
The documentation isn't completely clear on this point, but for initWithData it says:
Returns nil if the initialization fails for some reason (for example
if data does not represent valid data for encoding).
Also: The JSON specification defines a string as a sequence of Unicode characters.
That means even if you're able to get your raw data into a JSON string, parsing could fail on the receiving end if the code performs UTF-8 validation.
If you don't want to use Base64, take a look at the answers here.
All code in this answer is pseudo-code fragments, you need to convert the algorithms into Objective-C or other language yourself.
Your question raises many questions... You start with:
I have an NSData object. I need to convert its bytes to a string and send as JSON. description returns hex and is unreliable (according to various SO posters).
This appears to suggest you wish to encode the bytes as a string, ready to decode them back to bytes the other end. If this is the case you have a number of choices, such as Base-64 encoding etc. If you want something simple you can just encode each byte as its two character hex value, pseudo code outline:
NSMutableString *encodedString = #"".mutableCopy;
foreach aByte in byteData
[encodedString appendFormat:#"%02x", aByte];
The format %02x means two hexadecimal digits with zero padding. This results in a string which can be sent as JSON and decoded easily the other end. The byte size over the wire will probably be twice the byte length as UTF-8 is the recommended encoding for JSON over the wire.
However in response to one of the answer you write:
But I need absolutely the raw bits.
What do you mean by this? Is your receiver going to interpret the JSON string it gets as a sequence of raw bytes? If so you have a number of problems to address. JSON strings are a subset of JavaScript strings and are stored as UCS-2 or UTF-16, that is they are sequences of 16-bit values not 8-bit values. If you encode each byte into a character in a string then it will be represented using 16-bits, if your receiver can access the byte stream it has to skip ever other byte. Of course if you receiver accesses the strings a character at a time each 16-bit character can be truncated back to an 8-bit byte. Now you might think if you take this approach then each 8-bit byte can just be output as a character as part of a string, but that won't work. While all values 1-255 are valid Unicode character code points, and JavaScript/JSON allow NULs (0 value) in strings, not all those values are printable, you cannot put a double quote " into a string without escaping it, and the escape character is \ - all these will need to be encoded into the string. You'd end up with something like:
NSMutableString *encodedString = #"".mutableCopy;
foreach aByte in byteData
if (isprint(aByte) && aByte != '"' && aByte != '\\')
[encodedString appendFormat:#"%c", aByte];
otherwise
[encodedString appendFormat:#"\\u00%02x", aByte]; // JSON unicode escape sequence
This will produce a string which when parsed by a JSON decoder will give you one character (16-bits) for each byte, the top 8-bits being zero. However if you pass this string to a JSON encoder it will encode the unicode escape sequences, which are already encoded... So you really need to send this string over the wire yourself to avoid this...
Confused? Getting complicated? Well why are you trying to send binary byte data as a string? You never say what your high-level goal is or what, if anything, is known about the byte data (e.g. does it represent character in some encoding)
If this is really just an array of bytes then why not send it as JSON array of numbers - a byte is just a number in the range 0-255. To do this you would use code along the lines of:
NSMutableArray *encodedBytes = [NSMutableArray new];
foreach aByte in byteData
[encodedBytes addObject:#(aByte)]; // add aByte as an NSNumber object
Now pass encodedBytes to NSJSONSerialisation and it will send a JSON array of numbers over the wire, the receiver will reverse the process packing each byte back into a byte buffer and you have you bytes back.
This method avoids all issues of valid strings, encodings and escapes.
HTH

NSString intValue deforming actual number

I was making a basic method that takes a Flickr image URL and returns the image's ID.
I'm passing the method the NSString #"http://farm6.staticflickr.com/5183/5629026092_c6762a118f".
The goal is to return the int: 5629026092, which is in the image's URL and is the image's ID.
Here is my method:
-(int)getImageIDFromFlickrURL:(NSString *)imageURL{
NSArray *objectsInURLArray = [imageURL componentsSeparatedByString:#"/"];
NSString *lastObjectInFlickrArray = [objectsInURLArray lastObject];
NSArray *dirtyFlickrIdArray = [lastObjectInFlickrArray componentsSeparatedByString:#"_"];
NSString *flickIDString = [dirtyFlickrIdArray objectAtIndex:0];
NSLog(#"flickr id string: %#",flickIDString);
int flickrID = [flickIDString intValue];
NSLog(#"id: %i",flickrID);
return flickrID;
}
The output in the console is:
2012-05-26 13:30:25.771 TestApp[1744:f803] flickr id string: 5629026092
2012-05-26 13:30:25.773 TestApp[1744:f803] id: 2147483647
Why is calling intValue deforming the actual number?
Use long long instead, your number is greater than int can handle (max being 2147483647 as you can see in your second log)
Your value is too big to represent in 32 bits. The biggest value you can store in a signed 32 bit integer (int) is 2147483647. For unsigned ints, it's 4294967295. You need to convert to a long long integer to represent a number as big as 5629026092.
You'll probably need to create a number formatter for that. I'm no expert on number formatters, and always have to dig out the documentation to figure out how to use them.
I just tried it, and this code works:
NSString *numberString = #"5629026092";
NSNumberFormatter *formatter = [[NSNumberFormatter alloc] init];
NSNumber *number = [formatter numberFromString: numberString];
long long value = [number longLongValue];
NSLog(#"%# = %qi", numberString, value);
[formatter release];
You could also convert the string to a C string and use scanf, come to think of it.
Easy ^^: INT_MAX Maximum value for a variable of type int. 2147483647
I found this to be a convenient way to do it:
NSString *flickIDString = [dirtyFlickrIdArray objectAtIndex:0]; // read some huge number into a string
// read into a NSNumber object or a long long variable. you choose
NSNumber *flickIDNumber = flickIDString.longLongValue;
long long flickIDLong = flickIDString.longLongValue;

Enumerate NSString characters via pointer

How can I enumerate NSString by pulling each unichar out of it? I can use characterAtIndex but that is slower than doing it by an incrementing unichar*. I didn't see anything in Apple's documentation that didn't require copying the string into a second buffer.
Something like this would be ideal:
for (unichar c in string) { ... }
or
unichar* ptr = (unichar*)string;
You can speed up -characterAtIndex: by converting it to it's IMP form first:
NSString *str = #"This is a test";
NSUInteger len = [str length]; // only calling [str length] once speeds up the process as well
SEL sel = #selector(characterAtIndex:);
// using typeof to save my fingers from typing more
unichar (*charAtIdx)(id, SEL, NSUInteger) = (typeof(charAtIdx)) [str methodForSelector:sel];
for (int i = 0; i < len; i++) {
unichar c = charAtIdx(str, sel, i);
// do something with C
NSLog(#"%C", c);
}
EDIT: It appears that the CFString Reference contains the following method:
const UniChar *CFStringGetCharactersPtr(CFStringRef theString);
This means you can do the following:
const unichar *chars = CFStringGetCharactersPtr((__bridge CFStringRef) theString);
while (*chars)
{
// do something with *chars
chars++;
}
If you don't want to allocate memory for coping the buffer, this is the way to go.
Your only option is to copy the characters into a new buffer. This is because the NSString class does not guarantee that there is an internal buffer you can use. The best way to do this is to use the getCharacters:range: method.
NSUInteger i, length = [string length];
unichar *buffer = malloc(sizeof(unichar) * length);
NSRange range = {0,length};
[string getCharacters:buffer range:range];
for(i = 0; i < length; ++i) {
unichar c = buffer[i];
}
If you are using potentially very long strings, it would be better to allocate a fixed size buffer and enumerate the string in chunks (this is actually how fast enumeration works).
I created a block-style enumeration method that uses getCharacters:range: with a fixed-size buffer, as per ughoavgfhw's suggestion in his answer. It avoids the situation where CFStringGetCharactersPtr returns null and it doesn't have to malloc a large buffer. You can drop it into an NSString category, or modify it to take a string as a parameter if you like.
-(void)enumerateCharactersWithBlock:(void (^)(unichar, NSUInteger, BOOL *))block
{
const NSInteger bufferSize = 16;
const NSInteger length = [self length];
unichar buffer[bufferSize];
NSInteger bufferLoops = (length - 1) / bufferSize + 1;
BOOL stop = NO;
for (int i = 0; i < bufferLoops; i++) {
NSInteger bufferOffset = i * bufferSize;
NSInteger charsInBuffer = MIN(length - bufferOffset, bufferSize);
[self getCharacters:buffer range:NSMakeRange(bufferOffset, charsInBuffer)];
for (int j = 0; j < charsInBuffer; j++) {
block(buffer[j], j + bufferOffset, &stop);
if (stop) {
return;
}
}
}
}
The fastest reliable way to enumerate characters in an NSString I know of is to use this relatively little-known Core Foundation gem hidden in plain sight (CFString.h).
NSString *string = <#initialize your string#>
NSUInteger stringLength = string.length;
CFStringInlineBuffer buf;
CFStringInitInlineBuffer((__bridge CFStringRef) string, &buf, (CFRange) { 0, stringLength });
for (NSUInteger charIndex = 0; charIndex < stringLength; charIndex++) {
unichar c = CFStringGetCharacterFromInlineBuffer(&buf, charIndex);
}
If you look at the source code of these inline functions, CFStringInitInlineBuffer() and CFStringGetCharacterFromInlineBuffer(), you'll see that they handle all the nasty details like CFStringGetCharactersPtr() returning NULL, CFStringGetCStringPtr() returning NULL, defaulting to slower CFStringGetCharacters() and caching the characters in a C array for fastest access possible. This API really deserves more publicity.
The caveat is that if you initialize the CFStringInlineBuffer at a non-zero offset, you should pass a relative character index to CFStringInlineBuffer(), as stated in the header comments:
The next two functions allow fast access to the contents of a string, assuming you are doing sequential or localized accesses. To use, call CFStringInitInlineBuffer() with a CFStringInlineBuffer (on the stack, say), and a range in the string to look at. Then call CFStringGetCharacterFromInlineBuffer() as many times as you want, with a index into that range (relative to the start of that range). These are INLINE functions and will end up calling CFString only once in a while, to fill a buffer. CFStringGetCharacterFromInlineBuffer() returns 0 if a location outside the original range is specified.
I don't think you can do this. NSString is an abstract interface to a multitude of classes that make no guarantees about the internal storage of the character data, so it's entirely possible there is no character array to get a pointer to.
If neither of the options mentioned in your question are suitable for your app, I'd recommend either creating your own string class for this purpose, or using raw malloc'ed unichar arrays instead of string objects.
This will work:
char *s = [string UTF8String];
for (char *t = s; *t; t++)
/* use as */ *t;
[Edit] And if you really need unicode characters then you have no option but to use length and characterAtIndex. From the documentation:
The NSString class has two primitive methods—length and characterAtIndex:—that provide the basis for all other methods in its interface. The length method returns the total number of Unicode characters in the string. characterAtIndex: gives access to each character in the string by index, with index values starting at 0.
So your code would be:
for (int index = 0; index < string.length; index++)
{
unichar c = [string characterAtIndex: index];
/* ... */
}
[edit 2]
Also, don't forget that NSString is 'toll-free bridged' to CFString and thus all the non-Objective-C, straight C-code interface functions are usable. The relevant one would be CFStringGetCharacterAtIndex

Converting NSData bytes to NSString

I am trying to create a 16 byte and later 32 byte initialization vector in objective-c (Mac OS). I took some code on how to create random bytes and modified it to 16 bytes, but I have some difficulty with this. The NSData dumps the hex, but an NSString dump gives nil, and a cstring NSLog gives the wrong number of characters (not reproduced the same in the dump here).
Here is my terminal output:
2012-01-07 14:29:07.705 Test3Test[4633:80f] iv hex <48ea262d efd8f5f5 f8021126 fd74c9fd>
2012-01-07 14:29:07.710 Test3Test[4633:80f] IV string: (null)
2012-01-07 14:29:07.711 Test3Test[4633:80f] IV char string t^Q¶�^��^A
Here is the main program:
int main (int argc, const char * argv[])
{
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
//NSString *iv_string = [NSString stringWithCString:iv encoding:NSUTF8StringEncoding];
testclass *obj = [testclass alloc];
NSData *iv_data = [obj createRandomNSData];
//[iv_string dataUsingEncoding:NSUTF8StringEncoding];
NSLog(#"iv hex %#",iv_data);
//NSString *iv_string = [[NSString alloc] initWithBytes:[iv_data bytes] length:16 encoding:NSUTF8StringE$
NSString *iv_string = [[NSString alloc] initWithData:iv_data encoding:NSUTF8StringEncoding];
NSLog(#"IV string: %#",iv_string);
NSLog(#"IV char string %.*s",[iv_data bytes]);
return 0;
]
(I left in the above some commented code that I tried and did not work also).
Below is my random number generater, taken from a stack overflow example:
#implementation testclass
-(NSData*)createRandomNSData
{
int twentyMb = 16;
NSMutableData* theData = [NSMutableData dataWithCapacity:twentyMb];
for( unsigned int i = 0 ; i < twentyMb/4 ; ++i )
{
u_int32_t randomBits = arc4random();
[theData appendBytes:(void*)&randomBits length:4];
}
NSData *data = [NSData dataWithData:theData];
[theData dealloc];
return data;
}
#end
I am really quite clueless as to what could be the problem here. If I have data as bytes, it should convert to a string or not necessarily? I have looked over the relevant examples here on stackoverflow, but none of them have worked in this situation.
Thanks,
Elijah
An arbitrary byte sequence may not be legal UTF8 encoding. As #Joachim Isaksson notes, there is seldom reason to convert to strings this way. If you need to store random data as a string, you should use an encoding scheme like Base64, serialize the NSData to a plist, or similar approach. You cannot simply use a cstring either, since NULL is legal inside of a random byte sequence, but is not legal inside of a cstring.
You do not need to build your own random byte creator on Mac or iOS. There's one built-in called SecRandomCopyBytes(). For example (from Properly encrypting with AES with CommonCrypto):
+ (NSData *)randomDataOfLength:(size_t)length {
NSMutableData *data = [NSMutableData dataWithLength:length];
int result = SecRandomCopyBytes(kSecRandomDefault,
length,
data.mutableBytes);
NSAssert(result == 0, #"Unable to generate random bytes: %d",
errno);
return data;
}
When converting NSData to NSString using an UTF8 encoding, you won't necessarily end up with the same number of bytes since not all binary values are valid encodings of characters. I'd say using a string for binary data is a recipe for problems.
What is the use of the string? NSData is exactly the datatype you want for storing binary data to begin with.

Most efficient way to iterate over all the chars in an NSString

What's the best way to iterate over all the chars in an NSString? Would you want to loop over the length of the string and use the method.
[aNSString characterAtIndex:index];
or would you want to user a char buffer based on the NSString?
I think it's important that people understand how to deal with unicode, so I ended up writing a monster answer, but in the spirit of tl;dr I will start with a snippet that should work fine. If you want to know details (which you should!), please continue reading after the snippet.
NSUInteger len = [str length];
unichar buffer[len+1];
[str getCharacters:buffer range:NSMakeRange(0, len)];
NSLog(#"getCharacters:range: with unichar buffer");
for(int i = 0; i < len; i++) {
NSLog(#"%C", buffer[i]);
}
Still with me? Good!
The current accepted answer seem to be confusing bytes with characters/letters. This is a common problem when encountering unicode, especially from a C background. Strings in Objective-C are represented as unicode characters (unichar) which are much bigger than bytes and shouldn't be used with standard C string manipulation functions.
(Edit: This is not the full story! To my great shame, I'd completely forgotten to account for composable characters, where a "letter" is made up of multiple unicode codepoints. This gives you a situation where you can have one "letter" resolving to multiple unichars, which in turn are multiple bytes each. Hoo boy. Please refer to this great answer for the details on that.)
The proper answer to the question depends on whether you want to iterate over the characters/letters (as distinct from the type char) or the bytes of the string (what the type char actually means). In the spirit of limiting confusion, I will use the terms byte and letter from now on, avoiding the possibly ambigious term character.
If you want to do the former and iterate over the letters in the string, you need to exclusively deal with unichars (sorry, but we're in the future now, you can't ignore it anymore). Finding the amount of letters is easy, it's the string's length property. An example snippet is as such (same as above):
NSUInteger len = [str length];
unichar buffer[len+1];
[str getCharacters:buffer range:NSMakeRange(0, len)];
NSLog(#"getCharacters:range: with unichar buffer");
for(int i = 0; i < len; i++) {
NSLog(#"%C", buffer[i]);
}
If, on the other hand, you want to iterate over the bytes in a string, it starts getting complicated and the result will depend entirely upon the encoding you choose to use. The decent default choice is UTF8, so that's what I will show.
Doing this you have to figure out how many bytes the resulting UTF8 string will be, a step where it's easy to go wrong and use the string's -length. One main reason this very easy to do wrong, especially for a US developer, is that a string with letters falling into the 7-bit ASCII spectrum will have equal byte and letter lengths. This is because UTF8 encodes 7-bit ASCII letters with a single byte, so a simple test string and basic english text might work perfectly fine.
The proper way to do this is to use the method -lengthOfBytesUsingEncoding:NSUTF8StringEncoding (or other encoding), allocate a buffer with that length, then convert the string to the same encoding with -cStringUsingEncoding: and copy it into that buffer. Example code here:
NSUInteger byteLength = [str lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
char proper_c_buffer[byteLength+1];
strncpy(proper_c_buffer, [str cStringUsingEncoding:NSUTF8StringEncoding], byteLength);
NSLog(#"strncpy with proper length");
for(int i = 0; i < byteLength; i++) {
NSLog(#"%c", proper_c_buffer[i]);
}
Just to drive the point home as to why it's important to keep things straight, I will show example code that handles this iteration in four different ways, two wrong and two correct. This is the code:
#import <Foundation/Foundation.h>
int main() {
NSString *str = #"буква";
NSUInteger len = [str length];
// Try to store unicode letters in a char array. This will fail horribly
// because getCharacters:range: takes a unichar array and will probably
// overflow or do other terrible things. (the compiler will warn you here,
// but warnings get ignored)
char c_buffer[len+1];
[str getCharacters:c_buffer range:NSMakeRange(0, len)];
NSLog(#"getCharacters:range: with char buffer");
for(int i = 0; i < len; i++) {
NSLog(#"Byte %d: %c", i, c_buffer[i]);
}
// Copy the UTF string into a char array, but use the amount of letters
// as the buffer size, which will truncate many non-ASCII strings.
strncpy(c_buffer, [str UTF8String], len);
NSLog(#"strncpy with UTF8String");
for(int i = 0; i < len; i++) {
NSLog(#"Byte %d: %c", i, c_buffer[i]);
}
// Do It Right (tm) for accessing letters by making a unichar buffer with
// the proper letter length
unichar buffer[len+1];
[str getCharacters:buffer range:NSMakeRange(0, len)];
NSLog(#"getCharacters:range: with unichar buffer");
for(int i = 0; i < len; i++) {
NSLog(#"Letter %d: %C", i, buffer[i]);
}
// Do It Right (tm) for accessing bytes, by using the proper
// encoding-handling methods
NSUInteger byteLength = [str lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
char proper_c_buffer[byteLength+1];
const char *utf8_buffer = [str cStringUsingEncoding:NSUTF8StringEncoding];
// We copy here because the documentation tells us the string can disappear
// under us and we should copy it. Just to be safe
strncpy(proper_c_buffer, utf8_buffer, byteLength);
NSLog(#"strncpy with proper length");
for(int i = 0; i < byteLength; i++) {
NSLog(#"Byte %d: %c", i, proper_c_buffer[i]);
}
return 0;
}
Running this code will output the following (with NSLog cruft trimmed out), showing exactly HOW different the byte and letter representations can be (the two last outputs):
getCharacters:range: with char buffer
Byte 0: 1
Byte 1:
Byte 2: C
Byte 3:
Byte 4: :
strncpy with UTF8String
Byte 0: Ð
Byte 1: ±
Byte 2: Ñ
Byte 3:
Byte 4: Ð
getCharacters:range: with unichar buffer
Letter 0: б
Letter 1: у
Letter 2: к
Letter 3: в
Letter 4: а
strncpy with proper length
Byte 0: Ð
Byte 1: ±
Byte 2: Ñ
Byte 3:
Byte 4: Ð
Byte 5: º
Byte 6: Ð
Byte 7: ²
Byte 8: Ð
Byte 9: °
While Daniel's solution will probably work most of the time, I think the solution is dependent on the context. For example, I have a spelling app and need to iterate over each character as it appears onscreen which may not correspond to the way it is represented in memory. This is especially true for text provided by the user.
Using something like this category on NSString:
- (void) dumpChars
{
NSMutableArray *chars = [NSMutableArray array];
NSUInteger len = [self length];
unichar buffer[len+1];
[self getCharacters: buffer range: NSMakeRange(0, len)];
for (int i=0; i<len; i++) {
[chars addObject: [NSString stringWithFormat: #"%C", buffer[i]]];
}
NSLog(#"%# = %#", self, [chars componentsJoinedByString: #", "]);
}
And feeding it a word like mañana might produce:
mañana = m, a, ñ, a, n, a
But it could just as easily produce:
mañana = m, a, n, ̃, a, n, a
The former will be produced if the string is in precomposed unicode form and the later if it's in decomposed form.
You might think this could be avoided by using the result of NSString's precomposedStringWithCanonicalMapping or precomposedStringWithCompatibilityMapping, but this is not necessarily the case as Apple warns in Technical Q&A 1225. For example a string like e̊gâds (which I totally made up) still produces the following even after converting to a precomposed form.
e̊gâds = e, ̊, g, â, d, s
The solution for me is to use NSString's enumerateSubstringsInRange passing NSStringEnumerationByComposedCharacterSequences as the enumeration option. Rewriting the earlier example to look like this:
- (void) dumpSequences
{
NSMutableArray *chars = [NSMutableArray array];
[self enumerateSubstringsInRange: NSMakeRange(0, [self length]) options: NSStringEnumerationByComposedCharacterSequences
usingBlock: ^(NSString *inSubstring, NSRange inSubstringRange, NSRange inEnclosingRange, BOOL *outStop) {
[chars addObject: inSubstring];
}];
NSLog(#"%# = %#", self, [chars componentsJoinedByString: #", "]);
}
If we feed this version e̊gâds then we get
e̊gâds = e̊, g, â, d, s
as expected, which is what I want.
The section of documentation on Characters and Grapheme Clusters may also be helpful in explaining some of this.
Note: Looks like some of the unicode strings I used are tripping up SO when formatted as code. The strings I used are mañana, and e̊gâds.
Neither. The "Optimize Your Text Manipulations" section of the "Cocoa Performance Guidelines" in the Xcode Documentation recommends:
If you want to iterate over the
characters of a string, one of the
things you should not do is use the
characterAtIndex: method to retrieve
each character separately. This method
is not designed for repeated access.
Instead, consider fetching the
characters all at once using the
getCharacters:range: method and
iterating over the bytes directly.
If you want to search a string for
specific characters or substrings, do
not iterate through the characters one
by one. Instead, use higher level
methods such as rangeOfString:,
rangeOfCharacterFromSet:, or
substringWithRange:, which are
optimized for searching the NSString
characters.
See this Stack Overflow answer on How to remove whitespace from right end of NSString for an example of how to let rangeOfCharacterFromSet: iterate over the characters of the string instead of doing it yourself.
I would definitely get a char buffer first, then iterate over that.
NSString *someString = ...
unsigned int len = [someString length];
char buffer[len];
//This way:
strncpy(buffer, [someString UTF8String]);
//Or this way (preferred):
[someString getCharacters:buffer range:NSMakeRange(0, len)];
for(int i = 0; i < len; ++i) {
char current = buffer[i];
//do something with current...
}
try enum string with blocks
Create Category of NSString
.h
#interface NSString (Category)
- (void)enumerateCharactersUsingBlock:(void (^)(NSString *character, NSInteger idx, bool *stop))block;
#end
.m
#implementation NSString (Category)
- (void)enumerateCharactersUsingBlock:(void (^)(NSString *character, NSInteger idx, bool *stop))block
{
bool _stop = NO;
for(NSInteger i = 0; i < [self length] && !_stop; i++)
{
NSString *character = [self substringWithRange:NSMakeRange(i, 1)];
block(character, i, &_stop);
}
}
#end
example
NSString *string = #"Hello World";
[string enumerateCharactersUsingBlock:^(NSString *character, NSInteger idx, bool *stop) {
NSLog(#"char %#, i: %li",character, (long)idx);
}];
This is little different solution for the question but I thought maybe this will be useful for someone. What I wanted was to actually iterate as actual unicode character in NSString. So, I found this solution:
NSString * str = #"hello 🤠💩";
NSRange range = NSMakeRange(0, str.length);
[str enumerateSubstringsInRange:range
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange,
NSRange enclosingRange, BOOL *stop)
{
NSLog(#"%#", substring);
}];
Although you would technically be getting individual NSString values, here is an alternative approach:
NSRange range = NSMakeRange(0, 1);
for (__unused int i = range.location; range.location < [starring length]; range.location++) {
NSLog(#"%#", [aNSString substringWithRange:range]);
}
(The __unused int i bit is necessary to silence the compiler warning.)
You should not use
NSUInteger len = [str length];
unichar buffer[len+1];
you should use memory allocation
NSUInteger len = [str length];
unichar* buffer = (unichar*) malloc (len+1)*sizeof(unichar);
and in the end use
free(buffer);
in order to avoid memory problems.