Convert NSData byte array to string? - objective-c

I have an NSData object. I need to convert its bytes to a string and send as JSON. description returns hex and is unreliable (according to various SO posters). So I'm looking at code like this:
NSUInteger len = [imageData length];
Byte *byteData = (Byte*)malloc(len);
[imageData getBytes:&byteData length:len];
How do I then send byteData as JSON? I want to send the raw bytes.
CODE:
NSString *jsonBase64 = [imageData base64EncodedString];
NSLog(#"BASE 64 FINGERPRINT: %#", jsonBase64);
NSData *b64 = [NSData dataFromBase64String:jsonBase64];
NSLog(#"Equal: %d", [imageData isEqualToData:b64]);
NSLog(#"b64: %#", b64);
NSLog(#"original: %#", imageData);
NSString *decoded = [[NSString alloc] initWithData:b64 encoding:NSUTF8StringEncoding];
NSLog(#"decoded: %#", decoded);
I get values for everything except for the last line - decoded.
Which would indicate to me that the raw bytes are not formatted in NSUTF8encoding?

The reason the String is being considered 'unreliable' in previous Stack posts is because they too were attempting to use NSData objects where the ending bytes aren't properly terminated with NULL :
NSString *jsonString = [NSString stringWithUTF8String:[nsDataObj bytes]];
// This is unreliable because it may result in NULL string values
Whereas the example below should give you your desired results because the NSData byte string will terminate correctly:
NSString *jsonString = [[NSString alloc] initWithBytes:[nsDataObj bytes] length:[nsDataObj length] encoding: NSUTF8StringEncoding];
You were on the right track and hopefully this is able to help you solve your current problem. Best of luck!
~ EDIT ~
Make sure you are declaring your NSData Object from an image like so:
NSData *imageData = [[NSData alloc] init];
imageData = UIImagePNGRepresentation(yourImage);

Have you tried using something like this:
#implementation NSData (Base64)
- (NSString *)base64EncodedString
{
return [self base64EncodedStringWithWrapWidth:0];
}
This will turn your NSData in a base64 string, and on the other side you just need to decode it.
EDIT: #Lucas said you can do something like this:
NSString *myString = [[NSString alloc] initWithData:myData encoding:NSUTF8StringEncoding];
but i had some problem with this method because of some special characters, and because of that i started using base64 strings for communication.
EDIT3: Trys this method base64EncodedString
#implementation NSData (Base64)
- (NSString *)base64EncodedString
{
return [self base64EncodedStringWithWrapWidth:0];
}
//Helper Method
- (NSString *)base64EncodedStringWithWrapWidth:(NSUInteger)wrapWidth
{
//ensure wrapWidth is a multiple of 4
wrapWidth = (wrapWidth / 4) * 4;
const char lookup[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
long long inputLength = [self length];
const unsigned char *inputBytes = [self bytes];
long long maxOutputLength = (inputLength / 3 + 1) * 4;
maxOutputLength += wrapWidth? (maxOutputLength / wrapWidth) * 2: 0;
unsigned char *outputBytes = (unsigned char *)malloc((NSUInteger)maxOutputLength);
long long i;
long long outputLength = 0;
for (i = 0; i < inputLength - 2; i += 3)
{
outputBytes[outputLength++] = lookup[(inputBytes[i] & 0xFC) >> 2];
outputBytes[outputLength++] = lookup[((inputBytes[i] & 0x03) << 4) | ((inputBytes[i + 1] & 0xF0) >> 4)];
outputBytes[outputLength++] = lookup[((inputBytes[i + 1] & 0x0F) << 2) | ((inputBytes[i + 2] & 0xC0) >> 6)];
outputBytes[outputLength++] = lookup[inputBytes[i + 2] & 0x3F];
//add line break
if (wrapWidth && (outputLength + 2) % (wrapWidth + 2) == 0)
{
outputBytes[outputLength++] = '\r';
outputBytes[outputLength++] = '\n';
}
}
//handle left-over data
if (i == inputLength - 2)
{
// = terminator
outputBytes[outputLength++] = lookup[(inputBytes[i] & 0xFC) >> 2];
outputBytes[outputLength++] = lookup[((inputBytes[i] & 0x03) << 4) | ((inputBytes[i + 1] & 0xF0) >> 4)];
outputBytes[outputLength++] = lookup[(inputBytes[i + 1] & 0x0F) << 2];
outputBytes[outputLength++] = '=';
}
else if (i == inputLength - 1)
{
// == terminator
outputBytes[outputLength++] = lookup[(inputBytes[i] & 0xFC) >> 2];
outputBytes[outputLength++] = lookup[(inputBytes[i] & 0x03) << 4];
outputBytes[outputLength++] = '=';
outputBytes[outputLength++] = '=';
}
if (outputLength >= 4)
{
//truncate data to match actual output length
outputBytes = realloc(outputBytes, (NSUInteger)outputLength);
return [[NSString alloc] initWithBytesNoCopy:outputBytes
length:(NSUInteger)outputLength
encoding:NSASCIIStringEncoding
freeWhenDone:YES];
}
else if (outputBytes)
{
free(outputBytes);
}
return nil;
}

Null termination is not the only problem when converting from NSData to NSString.
NSString is not designed to hold arbitrary binary data. It expects an encoding.
If your NSData contains an invalid UTF-8 sequence, initializing the NSString will fail.
The documentation isn't completely clear on this point, but for initWithData it says:
Returns nil if the initialization fails for some reason (for example
if data does not represent valid data for encoding).
Also: The JSON specification defines a string as a sequence of Unicode characters.
That means even if you're able to get your raw data into a JSON string, parsing could fail on the receiving end if the code performs UTF-8 validation.
If you don't want to use Base64, take a look at the answers here.

All code in this answer is pseudo-code fragments, you need to convert the algorithms into Objective-C or other language yourself.
Your question raises many questions... You start with:
I have an NSData object. I need to convert its bytes to a string and send as JSON. description returns hex and is unreliable (according to various SO posters).
This appears to suggest you wish to encode the bytes as a string, ready to decode them back to bytes the other end. If this is the case you have a number of choices, such as Base-64 encoding etc. If you want something simple you can just encode each byte as its two character hex value, pseudo code outline:
NSMutableString *encodedString = #"".mutableCopy;
foreach aByte in byteData
[encodedString appendFormat:#"%02x", aByte];
The format %02x means two hexadecimal digits with zero padding. This results in a string which can be sent as JSON and decoded easily the other end. The byte size over the wire will probably be twice the byte length as UTF-8 is the recommended encoding for JSON over the wire.
However in response to one of the answer you write:
But I need absolutely the raw bits.
What do you mean by this? Is your receiver going to interpret the JSON string it gets as a sequence of raw bytes? If so you have a number of problems to address. JSON strings are a subset of JavaScript strings and are stored as UCS-2 or UTF-16, that is they are sequences of 16-bit values not 8-bit values. If you encode each byte into a character in a string then it will be represented using 16-bits, if your receiver can access the byte stream it has to skip ever other byte. Of course if you receiver accesses the strings a character at a time each 16-bit character can be truncated back to an 8-bit byte. Now you might think if you take this approach then each 8-bit byte can just be output as a character as part of a string, but that won't work. While all values 1-255 are valid Unicode character code points, and JavaScript/JSON allow NULs (0 value) in strings, not all those values are printable, you cannot put a double quote " into a string without escaping it, and the escape character is \ - all these will need to be encoded into the string. You'd end up with something like:
NSMutableString *encodedString = #"".mutableCopy;
foreach aByte in byteData
if (isprint(aByte) && aByte != '"' && aByte != '\\')
[encodedString appendFormat:#"%c", aByte];
otherwise
[encodedString appendFormat:#"\\u00%02x", aByte]; // JSON unicode escape sequence
This will produce a string which when parsed by a JSON decoder will give you one character (16-bits) for each byte, the top 8-bits being zero. However if you pass this string to a JSON encoder it will encode the unicode escape sequences, which are already encoded... So you really need to send this string over the wire yourself to avoid this...
Confused? Getting complicated? Well why are you trying to send binary byte data as a string? You never say what your high-level goal is or what, if anything, is known about the byte data (e.g. does it represent character in some encoding)
If this is really just an array of bytes then why not send it as JSON array of numbers - a byte is just a number in the range 0-255. To do this you would use code along the lines of:
NSMutableArray *encodedBytes = [NSMutableArray new];
foreach aByte in byteData
[encodedBytes addObject:#(aByte)]; // add aByte as an NSNumber object
Now pass encodedBytes to NSJSONSerialisation and it will send a JSON array of numbers over the wire, the receiver will reverse the process packing each byte back into a byte buffer and you have you bytes back.
This method avoids all issues of valid strings, encodings and escapes.
HTH

Related

Decoding partial UTF-8 into NSString

While fetching a UTF-8-encoded file over the network using the NSURLConnection class, there's a good chance the delegate's connection:didReceiveData: message will be sent with an NSData which truncates the UTF-8 file - because UTF-8 is a multi-byte encoding scheme, and a single character can be sent in two separate NSData
In other words, if I join all the data I get from connection:didReceiveData: I will have a valid UTF-8 file, but each separate data is not valid UTF-8 ().
I do not want to store all the downloaded file in memory.
What I want is: given NSData, decode whatever you can into an NSString. In case the last
few byte of the NSData are an unclosed surrogate, tell me, so I can save them for the next NSData.
One obvious solution is repeatedly trying to decode using initWithData:encoding:, each time truncating the last byte, until success. This, unfortunately, can be very wasteful.
If you want to make sure that you don't stop in the middle of a UTF-8 multi-byte sequence, you're going to need to look at the end of the byte array and check the top 2 bits.
If the top bit is 0, then it's one of the ASCII-style unescaped UTF-8 codes, and you're done.
If the top bit is 1 and the second-from-top is 0, then it the continuation of an escape sequence and might represent the last byte of that sequence, so you will need to buffer the character for later and then look at the preceding character*
If the top bit is 1 and the second-from-top is also 1, then it is the beginning of the multi-byte sequence and you need to determine how many characters are in the sequence by looking for the first 0 bit.
Look at the multi-byte table in the Wikipedia entry: http://en.wikipedia.org/wiki/UTF-8
// assumes that receivedData contains both the leftovers and the new data
unsigned char *data= [receivedData bytes];
UInteger byteCount= [receivedData length];
if (byteCount<1)
return nil; // or #"";
unsigned char *lastByte = data[byteCount-1];
if ( lastByte & 0x80 == 0) {
NSString *newString = [NSString initWithBytes: data length: byteCount
encoding: NSUTF8Encoding];
// verify success
// remove bytes from mutable receivedData, or set overflow to empty
return newString;
}
// now eat all of the continuation bytes
UInteger backCount=0;
while ( (byteCount > 0) && (lastByte & 0xc0 == 0x80)) {
backCount++;
byteCount--;
lastByte = data[byteCount-1];
}
// at this point, either we have exhausted byteCount or we have the initial character
// if we exhaust the byte count we're probably in an illegal sequence, as we should
// always have the initial character in the receivedData
if (byteCount<1) {
// error!
return nil;
}
// at this point, you can either use just byteCount, or you can compute the
// length of the sequence from the lastByte in order
// to determine if you have exactly the right number of characters to decode UTF-8.
UInteger requiredBytes = 0;
if (lastByte & 0xe0 == 0xc0) { // 110xxxxx
// 2 byte sequence
requiredBytes= 1;
} else if (lastByte & 0xf0 == 0xe0) { // 1110xxxx
// 3 byte sequence
requiredBytes= 2;
} else if (lastByte & 0xf8 == 0xf0) { // 11110xxx
// 4 byte sequence
requiredBytes= 3;
} else if (lastByte & 0xfc == 0xf8) { // 111110xx
// 5 byte sequence
requiredBytes= 4;
} else if (lastByte & 0xfe == 0xfc) { // 1111110x
// 6 byte sequence
requiredBytes= 5;
} else {
// shouldn't happen, illegal UTF8 seq
}
// now we know how many characters we need and we know how many
// (backCount) we have, so either use them, or take the
// introductory character away.
if (requiredBytes==backCount) {
// we have the right number of bytes
byteCount += backCount;
} else {
// we don't have the right number of bytes, so remove the intro character
byteCount -= 1;
}
NSString *newString = [NSString initWithBytes: data length: byteCount
encoding: NSUTF8Encoding];
// verify success
// remove byteCount bytes from mutable receivedData, or set overflow to the
// bytes between byteCount and [receivedData count]
return newString;
UTF-8 is a pretty simple encoding to parse and was designed to make it easy to detect incomplete sequences and, if you start in the middle of an incomplete sequence, to find its beginning.
Search backward from the end for a byte that's either <= 0x7f or > 0xc0. If it's <= 0x7f, it's complete. If it's between 0xc0 and 0xdf, inclusive, it requires one following byte to be complete. If it's between 0xe0 and 0xef, it requires two following bytes to be complete. If it's >= 0xf0, it requires three following bytes to be complete.
I have a similar problem - partly decoding utf8
before
NSString * adsTopic = [components[2] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
adsInfo->adsTopic = malloc(sizeof(char) * adsTopic.length + 1);
strncpy(adsInfo->adsTopic, [adsTopic UTF8String], adsTopic.length + 1);
after [solved]
NSString *adsTopic = [components[2] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSUInteger byteCount = [adsTopic lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
NSLog(#"number of Unicode characters in the string topic == %lu",(unsigned long)byteCount);
adsInfo->adsTopic = malloc(byteCount+1);
strncpy(adsInfo->adsTopic, [adsTopic UTF8String], byteCount + 1);
NSString *text=[NSString stringWithCString:adsInfo.adsTopic encoding:NSUTF8StringEncoding];
NSLog(#"=== %#", text);

Appending null characters to regular characters

I want to concatenate null characters with regular characters like so:
NSString *message = #"ABC";
NSUInteger length = [message length];
char char4 = length;
char char3 = length >> 8;
char char2 = length >> 16;
char char1 = length >> 24;
message = [NSString stringWithFormat:#"%c%c%c%c%#",char4,char3,char2,char1,message];
The problem is that the string stays at length 4 (and looks like ¿ABC). How can I edit this code so that the null characters are also appended to the string?
In reply to Mark's comment
What I am trying to do here is append the length of the string to the beginning of the string, not in numerical format, but in the format of ascii characters (almost like a base 256 number). I would use the format, #"%04c%#",length,message, the problem is that the resulting string would be 000¿ABC and zeroes have ascii value (48 in decimal to be exact) and so that defeats the purpose. The ASCII character that has decimal value 0 is the null character (\0) so I have to use that instead of 0. It is necessary that I have those leading null characters.
For the purposes of what I'm trying to accomplish, the following code works
NSUInteger length = [message length];
char char4 = length;
char char3 = length >> 8;
char char2 = length >> 16;
char char1 = length >> 24;
if (char4 == '\0')
message = [NSString stringWithFormat:#"\0%#",message];
else
message = [NSString stringWithFormat:#"%c%#",char4,message];
if (char3 == '\0')
message = [NSString stringWithFormat:#"\0%#",message];
else
message = [NSString stringWithFormat:#"%c%#",char3,message];
if (char2 == '\0')
message = [NSString stringWithFormat:#"\0%#",message];
else
message = [NSString stringWithFormat:#"%c%#",char2,message];
if (char1 == '\0')
message = [NSString stringWithFormat:#"\0%#",message];
else
message = [NSString stringWithFormat:#"%c%#",char1,message];
But, if anybody can contribute something shorter or more intuitive, that'd be great.
Don't do this. It's a bad idea. For example, NSString is conceptually built on 16-bit Unicode characters, but you're trying to prepend bytes. Note that there's nothing guaranteeing that NSString's internal representation is UTF-16 or any other specific encoding. In any case, when the string is written out it has to be converted to whatever encoding and that is unlikely to preserve your length prefixing in a way you can predict.
Use an NSData with the length in the first 4 (or maybe 8 would be better for a 64-bit length) bytes followed by the string in a particular encoding. I recommend UTF-8.

Converting NSData bytes to NSString

I am trying to create a 16 byte and later 32 byte initialization vector in objective-c (Mac OS). I took some code on how to create random bytes and modified it to 16 bytes, but I have some difficulty with this. The NSData dumps the hex, but an NSString dump gives nil, and a cstring NSLog gives the wrong number of characters (not reproduced the same in the dump here).
Here is my terminal output:
2012-01-07 14:29:07.705 Test3Test[4633:80f] iv hex <48ea262d efd8f5f5 f8021126 fd74c9fd>
2012-01-07 14:29:07.710 Test3Test[4633:80f] IV string: (null)
2012-01-07 14:29:07.711 Test3Test[4633:80f] IV char string t^Q¶�^��^A
Here is the main program:
int main (int argc, const char * argv[])
{
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
//NSString *iv_string = [NSString stringWithCString:iv encoding:NSUTF8StringEncoding];
testclass *obj = [testclass alloc];
NSData *iv_data = [obj createRandomNSData];
//[iv_string dataUsingEncoding:NSUTF8StringEncoding];
NSLog(#"iv hex %#",iv_data);
//NSString *iv_string = [[NSString alloc] initWithBytes:[iv_data bytes] length:16 encoding:NSUTF8StringE$
NSString *iv_string = [[NSString alloc] initWithData:iv_data encoding:NSUTF8StringEncoding];
NSLog(#"IV string: %#",iv_string);
NSLog(#"IV char string %.*s",[iv_data bytes]);
return 0;
]
(I left in the above some commented code that I tried and did not work also).
Below is my random number generater, taken from a stack overflow example:
#implementation testclass
-(NSData*)createRandomNSData
{
int twentyMb = 16;
NSMutableData* theData = [NSMutableData dataWithCapacity:twentyMb];
for( unsigned int i = 0 ; i < twentyMb/4 ; ++i )
{
u_int32_t randomBits = arc4random();
[theData appendBytes:(void*)&randomBits length:4];
}
NSData *data = [NSData dataWithData:theData];
[theData dealloc];
return data;
}
#end
I am really quite clueless as to what could be the problem here. If I have data as bytes, it should convert to a string or not necessarily? I have looked over the relevant examples here on stackoverflow, but none of them have worked in this situation.
Thanks,
Elijah
An arbitrary byte sequence may not be legal UTF8 encoding. As #Joachim Isaksson notes, there is seldom reason to convert to strings this way. If you need to store random data as a string, you should use an encoding scheme like Base64, serialize the NSData to a plist, or similar approach. You cannot simply use a cstring either, since NULL is legal inside of a random byte sequence, but is not legal inside of a cstring.
You do not need to build your own random byte creator on Mac or iOS. There's one built-in called SecRandomCopyBytes(). For example (from Properly encrypting with AES with CommonCrypto):
+ (NSData *)randomDataOfLength:(size_t)length {
NSMutableData *data = [NSMutableData dataWithLength:length];
int result = SecRandomCopyBytes(kSecRandomDefault,
length,
data.mutableBytes);
NSAssert(result == 0, #"Unable to generate random bytes: %d",
errno);
return data;
}
When converting NSData to NSString using an UTF8 encoding, you won't necessarily end up with the same number of bytes since not all binary values are valid encodings of characters. I'd say using a string for binary data is a recipe for problems.
What is the use of the string? NSData is exactly the datatype you want for storing binary data to begin with.

Encoding is not giving the right result in objective c

I have a c# code which encodes a string. I am trying to write a corresponding routine in objective c.
The code is as follows:
// c# code
public static string Encode(Guid guid)
{
string encode = convert.ToBase64String(guid.ToByteArray());
encode = encoded.Replace("/","_").Replace("+","-");
return encoded.substring(0,22);
}
I have written this code in objective c.
- (NSString *)encode:(NSString *)inId
{
NSString *uniqueId = inId;
// convert user id in to data
NSData *userIdData = [uniqueId dataUsingEncoding:NSUTF16StringEncoding];
// convert encoded userId's data into base64EncodedString
NSString *base64String = [Base64 encode:userIdData];
//NSString *base64String = [userIdData encodeBase64ForData];
NSString *encodedId = [[NSString alloc] initWithString:base64String];
// replace "/" character in base64String into "_" character
encodedId = [encodedId stringByReplacingOccurrencesOfString:#"/" withString:#"_"];
// replace "+" character in base64String into "-" character
encodedId = [encodedId stringByReplacingOccurrencesOfString:#"+" withString:#"-"];
// get substring of range 22
encodedId = [encodedId substringToIndex:22];
NSLog(#"Base 64 encoded = %#",encodedId);
return encodedId;
}
I am calling this function from viewDidLoad
NSString *encodedStr = [self encode:#"a8f9f344-d14e-4541-a8e7-0f5936e42254"];// string to encode
NSLog(#"Encoded String %#",encodedStr);
this code is not giving me the correct result i want
for eg:for the string a8f9f344-d14e-4541-a8e7-0f5936e42254
it should give result as RPP5qE7RQUWo5w9ZNuQiVA.
Thanks.
Your problem is that guid.ToByteArray() and [uniqueId dataUsingEncoding:NSUTF16StringEncoding]; do not do the same thing. As far as I can tell from the documentation, the former removes the hyphens and treats the rest as the hex ASCII representation of 16 bytes. The latter just turns each character into UTF16 (actually, it is UTF-16 already) and puts it into an NSData.
You need to write some code in Objective-C to take an ASCII Hex string and convert it into bytes.

Converting long value to unichar* in objective-c

I'm storing large unicode characters (0x10000+) as long types which eventually need to be converted to NSStrings. Smaller unicode characters can be created as a unichar, and an NSString can be created using
[NSString stringWithCharacters:(const unichar *)characters length:(NSUInteger)length]
So, I imagine the best way to get an NSString from the unicode long value would be to first get a unichar* from the long value. Any idea on how I might go about doing this?
Is there any reason you are storing the values as longs? For Unicode storage you only need to store the values as UInt32, which would then make it easy to interpret the data as UTF-32 by doing something like this:
int numberOfChars = 3;
UInt32* yourStringBuffer = malloc(sizeof(UInt32) * numberOfChars);
yourStringBuffer[0] = 0x2F8DB; //杞
yourStringBuffer[1] = 0x2318; //⌘
yourStringBuffer[2] = 0x263A; //☺
NSData* stringData = [NSData dataWithBytes:yourStringBuffer length:sizeof(UInt32) * numberOfChars];
//set the encoding according to the current byte order
NSStringEncoding encoding;
if(CFByteOrderGetCurrent() == CFByteOrderBigEndian)
encoding = NSUTF32BigEndianStringEncoding;
else
encoding = NSUTF32LittleEndianStringEncoding;
NSString* string = [[NSString alloc] initWithData:stringData encoding:encoding];
free(yourStringBuffer);
NSLog(#"%#",string);
//output: 杞⌘☺