Converting NSData that contains UTF-8 and null bytes to string

Converting NSData that contains UTF-8 and null bytes to string - objective-c

I have an __NSCFData object. I know what's inside it.
61 70 70 6c 65 2c 74 79 70 68 6f 6f 6e 00 41 52 4d 2c 76 38 00
I tried converting it to a string with initWithData: and stringWithUTF8String: and it gives me "apple,typhoon". The conversion is terminated at 00
The data actually is
61 a
70 p
70 p
6c l
65 e
2c ,
74 t
79 y
70 p
68 h
6f o
6f o
6e n
00 (null)
41 A
52 R
4d M
2c ,
76 v
38 8
00 (null)
How can I properly convert this without loss of information?

The documentation for stringWithUTF8String describes its first parameter as:
A NULL-terminated C array of bytes in UTF8 encoding.
Which is why your conversion stops at the first null byte.
What you appear to have is a collection of C strings packed into a single NSData. You can convert each one individually. Use the NSData methods bytes and length to obtain a pointer to the bytes/first C string and the total number of bytes respectively. The standard C function strlen() will give you the length in bytes of an individual string. Combine these and some simple pointer arithmetic and you can write a loop which converts each string and, for example, stores them all into an array or concatenates them.
If you get stuck implementing the solution ask a new question, show your code, and explain the issue. Someone will undoubtedly help you with the next step.
HTH

In contrast to the intention of some answers, the stored strings in instances of NSString are not 0-terminated. Even there might be problems with writing them out (since underlying C functions for output expects a 0-terminated string), the instances itself can contain a \0:
NSString *zeroIncluded = #"A\0B";
NSLog(#"%ld", [zeroIncluded length]);
// prints 3
To create such an instance you can use methods that have a bytes and a length parameter, i. e. -initWithBytes:length:encoding:. Therefore something like this should work:
NSData *data = …
[[NSString alloc] initWithBytes:[data bytes] length:[data length] encoding:NSUTF8StringEncoding];
However, as intended by CRD, you might check, whether you want to have such a string.

0, or null, is the sentinel value which terminates strings, so you're going to have to deal with it somehow if you want to automatically dump the bytes into a string. If you don't, the string, or things that try to print it, for example, will assume the end of string is reached when reaching the NULL.
Just replace the bytes as they occur with something printable, like a space. Use whatever value works for you.
Example:
// original data you have from somewhere
char something[] = "apple,typhoon\0ARM,v8\0";
NSData *data = [NSData dataWithBytes:something length:sizeof(something)];
// Find each null terminated string in the data
NSMutableArray *strings = [NSMutableArray new];
NSMutableString *temp = [NSMutableString string];
const char *bytes = [data bytes];
for (int i = 0; i < [data length]; i++) {
unsigned char byte = (unsigned char)bytes[i];
if (byte == 0) {
if ([temp length] > 0) {
[strings addObject:temp];
temp = [NSMutableString string];
}
} else {
[temp appendFormat:#"%c", byte];
}
}
// Results
NSLog(#"strings count: %lu", [strings count]);
[strings enumerateObjectsUsingBlock:^(NSString *string, NSUInteger idx, BOOL * _Nonnull stop) {
NSLog(#"%ld: %#", idx, string);
}];
// strings count: 2
// 0: apple,typhoon
// 1: ARM,v8

Related

Why is NSMaximumStringLength not INT_MAX

While perusing through the NSString header file I saw the following.
#define NSMaximumStringLength (INT_MAX-1)
Why is the maximum string length one short of INT_MAX? Is this to accomodate for a null terminator (\0)? A related article can be found here.

Hypothesis:
It's to accomodate the NULL char: \0.
Documentation:
In Apple documentation found here for NSMaximumStringLength
NSMaximumStringLength
DECLARED IN foundation/NSString.h
SYNOPSIS NSMaximumStringLength
DESCRIPTION NSMaximumStringLength is the greatest possible length for an NSString.
And an NSString is but an "array of Unicode characters" - Source
NSString is concretized into either __NSCFStringduring runtime or __NSCFConstantString during compile time- Source
__NSCFString : Probably akin to __NSCFConstantString (See memory investigation below).
__NSCFConstantString: uses a char array allocation ( const char *cStr ) - Source.
Memory Investigation of NSString:
Code
NSString *s1 = #"test";
Breaking during runtime in LLDB:
Type:
expr [s1 fileSystemRepresentation]
Output:
$0 = 0x0b92bf70 "test" // Essential memory location and content.
To view memory type in LLDB:
memory read 0x0b92bf70
Output:
0x0b92bf70: 74 65 73 74 00 00 00 00 00 00 00 00 00 00 00 00 test............
0x0b92bf80: 7c 38 d4 02 72 a2 1b 03 f2 e6 1b 03 71 c5 4a 00 |8..r.......q.J.
*Notice empty termination after the last char t.
Testing Hypothesis of NULL termination:
Added a char* to previous code:
NSString *s1 = #"test";
char *p = (char*)[s1 cString];
Break into code with LLDB and type:
expr p[4] = '\1' // Removing NULL char.
Now if we print NSString with command:
expr s1
Output:
(NSString *) $0 = 0x002f1534 #"test
Avg Draw Time: %g"
Notice garbage after the 't', "Avg Draw Time: %g" (aka buffer over reading).
Conclusion
Through inference we can observe that there is 1 byte in the NSMaximumStringLength definition that is left for the NULL char to determine the end of a string in memory.

Objective-C Raw MD5-hash

In Objective-C, I generate a simple MD5-hash of 'HelloKey', which returns 0FD16658AEE3C52060A39F4EDFB11437. Unfortunately, I could not get a raw return, so I have to work with this string to get a raw MD5-hash (or do you know how I can get a raw result from the start?)
Anyway, in order to convert it to raw, I split it into chunks of 2 chars each, calculate the hex value, and append a char with that value to a string.
Here's the function:
- (NSString *)hex2bin:(NSString *)input{
NSString *output = #"";
for (int i = 0; i < input.length; i+=2){
NSString *component = [input substringWithRange:NSMakeRange(i, 2)];
unsigned int outVal;
NSScanner* scanner = [NSScanner scannerWithString:component];
[scanner scanHexInt:&outVal];
/* if(outVal > 127){
outVal -= 256;
} */
// unsigned char appendage = (char)outVal;
output = [NSString stringWithFormat:#"%#%c", output, outVal];
NSLog(#"component: %# = %d", component, outVal);
}
return output;
}
When I print each outval, I get:
0F = 15
D1 = 209
66 = 102
58 = 88
AE = 174
E3 = 227
C5 = 197
20 = 32
60 = 96
A3 = 163
9F = 159
4E = 78
DF = 223
B1 = 177
14 = 20
37 = 55
However, when I print the string that I get with a special function that tells me the integer values of each character (a function which is shown here):
- (NSString *)str2bin:(NSString *)input{
NSString *output = #"";
for (NSInteger charIdx=0; charIdx < input.length; charIdx++){
char currentChar = [input characterAtIndex:charIdx];
int charNum = [NSNumber numberWithChar:currentChar].intValue;
output = [NSString stringWithFormat:#"%# %d", output, charNum];
}
return output;
}
I get: 15 20 102 88 -58 30 72 32 96 -93 -4 78 2 -79 20 55. You will notice that there are significant differences, like 209 -> 20, 174 -> -58, 227 -> 30. In some cases, the difference is 256, so no harm done. But in other cases, it's not, and I would really like to know what's going wrong. Any tips?

You are doing it wrong, since you are trying to store binary data in NSString, which is UTF8 string.
You should use NSData (or C string) to store binary hash representation.

How to print unsigned char* in NSLog()

Title pretty much says everything.
would like to print (NOT in decimal), but in unsigned char value (HEX).
example
unsigned char data[6] = {70,AF,80,1A, 01,7E};
NSLog(#"?",data); //need this output : 70 AF 80 1A 01 7E
Any idea? Thanks in advance.

There is no format specifier for an char array. One option would be to create an NSData object from the array and then log the NSData object.
NSData *dataData = [NSData dataWithBytes:data length:sizeof(data)];
NSLog(#"data = %#", dataData);

Nothing in the standard libraries will do it, so you could write a small hex dump function, or you could use something else that prints non-ambigious full data. Something like:
char buf[1 + 3*dataLength];
strvisx(buf, data, dataLength, VIS_WHITE|VIS_HTTPSTYLE);
NSLog(#"data=%s", buf);
For smallish chunks of data you could try to make a NSData and use the debugDescription method. That is currently a hex dump, but nothing promises it will always be one.

To print char* in NSLog try the following:
char data[6] = {'H','E','L','L','0','\n'};
NSString *string = [[NSString alloc] initWithUTF8String:data];
NSLog(#"%#", string);
You will need to null terminate your string.
From Apple Documentation:
- (instancetype)initWithUTF8String:(const char *)nullTerminatedCString;
Returns an NSString object initialized by copying the characters from a given C array of UTF8-encoded bytes.

appending data using NSMutableData

Right now I'm appending data using NSMutableData's -appendBytes:length: like this:
int length = [self.trackData length]+3;
[contents appendBytes:&length length:4];
Suppose length is 20. In hex, the bytes appended are 16 00 00 00, extended to 4 bytes.
How can I add the additional zeros to the left like in 00 00 00 16?

You probably want to swap the bytes to big-endian:
int length = NSSwapHostIntToBig([self.trackData length]+3);
[contents appendBytes:&length length:4];

Set line-terminator string in NSDocument?

(This question has been rewritten from an issue with NSTextView following some further research)
UPDATE: You can download a very basic project that displays the issue here:
http://w3style.co.uk/~d11wtq/DocumentApp.tar.gz
(Do a grep -c "\r" file.txt on the file you save to get a line count where \r occurs... repeat for \n).
I've realised all files created by NSDocument have \r is line endings, not the standard \n, even though the NSData my document subclass returns does not contain \r, it only contains \n. Is there a way to configure this?
I thought Macs used UNIX line endings these days, so it seems weird that AppKit is still using the antiquated Mac endings. Weirder is that NSDocument asks for NSData, then rather unkindly corrupts that NSData by transforming the line endings.
The switch to \r is happening after producing NSData, so NSDocument itself is doing some replacements on the bytes:
const char *bytes = [data bytes];
int i, len;
for (i = 0, len = [data length]; i < len; ++i) {
NSLog(#"byte %d = %02x", i, bytes[i]);
}
Outputs (note 0a is the hex value of \n):
> 2010-12-17 12:45:59.076
> MojiBaker[74929:a0f] byte 0 = 66
> 2010-12-17 12:45:59.076
> MojiBaker[74929:a0f] byte 1 = 6f
> 2010-12-17 12:45:59.076
> MojiBaker[74929:a0f] byte 2 = 6f
> 2010-12-17 12:45:59.077
> MojiBaker[74929:a0f] byte 3 = 0a
> 2010-12-17 12:45:59.077
> MojiBaker[74929:a0f] byte 4 = 62
> 2010-12-17 12:45:59.077
> MojiBaker[74929:a0f] byte 5 = 61
> 2010-12-17 12:45:59.077
> MojiBaker[74929:a0f] byte 6 = 72
> 2010-12-17 12:45:59.077
> MojiBaker[74929:a0f] byte 7 = 0a
If NSDocument is going to ask for NSData then it should respect that and not modify it.
Here's the full code from the method: -dataOfType:error: method in my document:
-(NSData *)dataOfType:(NSString *)typeName error:(NSError **)outError {
NSString *string = [textView string];
// DEBUG CODE...
NSArray *unixLines = [string componentsSeparatedByString:#"\n"];
NSArray *windowsLines = [string componentsSeparatedByString:#"\r\n"];
NSArray *macLines = [string componentsSeparatedByString:#"\r"];
NSLog(#"TextView has %d LF, %d CRLF, %d CR", [unixLines count] - 1, [windowsLines count] - 1, [macLines count] - 1);
NSData *data = [NSData dataWithBytes:[string cStringUsingEncoding:NSUTF8StringEncoding]
length:[string lengthOfBytesUsingEncoding:NSUTF8StringEncoding]];
const char *bytes = [data bytes];
int i, len;
for (i = 0, len = [data length]; i < len; ++i) {
NSLog(#"byte %d = %02x", i, bytes[i]);
}
if (data != nil) {
[textView breakUndoCoalescing];
}
return data;
}

NSDocument doesn’t care about line termination; it’s a semi-abstract class, designed to be subclassed. By itself it imposes nothing on a file format.
It’s the particular implementation of an NSDocument subclass - one that happens to read and write plain text - that will care about line termination characters.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Converting NSData that contains UTF-8 and null bytes to string - objective-c

Related

Why is NSMaximumStringLength not INT_MAX

Objective-C Raw MD5-hash

How to print unsigned char* in NSLog()

appending data using NSMutableData

Set line-terminator string in NSDocument?

Categories

Resources