NSString substituting NSData possible, what are consequences? - objective-c

Suppose I'm storing a stream of ASCII, say 0x0a0b0c00. What would happen to the data if I store it in an NSData instance vs. an NSString? Would the data get converted into something else? I'm a little confused because they are both buffers holding the exact same thing.

NSData is a container to store, as its name suggests, raw binary data. NSData makes no assumptions of the format of the binary data. It can be text, images, audio, etc.
NSString interprets the data as text with a given encoding: which could be ASCII, Unicode, etc. In most cases, NSString will copy the bytes to its internal data structure to store the raw binary.
If it's not text, use NSData. It's clearer in code to know what's being managed and avoids having to fight string encodings.

Related

NSString value as NSData output

I have a NSString which is #"15".
I want my NSData to be 15 also. I know how to convert it to get the value 31 35 but I would like my NSData to be 15 if I use NSLog on it. I'm not asking for a conversion but more for a translation. I don't wanna change the NSLog print but the NSData value. Is there anyway to do it ?
Parse the string to an integer (lets assume a signed 32-bit integer):
NSString *str = #"15";
int32_t i = (int32_t)[str intValue];
To encode it in native endian:
NSData *data = [NSData dataWithBytes:&i length:sizeof(i)];
Note: if you intend to transmit that data to another computer then you need to decide on a common endianness of primitive types. Big endian is traditionally used and facilitated with functions like htonl(), ntohl(), etc. If the computers are all the same platform then you can use the native endianness, for a slight performance boost and code simplification.
You need to convert the string to a byte first (by parsing it). Then you can build the NSData from the byte.

Why NSJSONSerialization uses NSData instead of NSString?

Is there any reason for NSJSONSerialization to use NSData instead of NSString for representing JSON data?
NSString seems like a more obvious choice to me...
I imagine it would be more efficient to encourage parsing NSData instead of NSString. If you are parsing a response from a server, for example, you'll get an NSData object representing a buffer of raw bytes returned from the server (note that NSJSONSerialization also includes a method for parsing an NSInputStream directly). Parsing the whole thing into an NSString would be a waste since that would just be an intermediate object that would get thrown out. Instead, NSJSONSerialization is probably parsing the bytes in the NSData object directly and only construct NSStrings for the appropriate keys and values in the resulting data structure.

Building a custom NSArchiver serialize to string

How does NSArchiver serialize to file? I assume it's serialized in binary format, is that correct? What if I want to store it in string so I can store into SQLite database? Do I need to write my own custom NSArchiver? If so, how do I go about doing that? Are there any tutorials out there?
p.s. I do realize Core Data can do this but let me cross that option out for now.
You can archive to an NSData object instead of to a file, if you want, with +archivedDataWithRootObject:. It won't be a "string," but that's fine, because an NSString in Cocoa represents a sequence of Unicode characters, while an NSData represents a sequence of bytes (which you could easily store wherever you want, including in a database).
Note that you really should be using NSKeyedArchiver instead:
+ (NSData *)archivedDataWithRootObject:(id)rootObject
+ (id)unarchiveObjectWithData:(NSData *)data

Unihan: combining UTF-8 chars

I am using data that involves Chinese Unihan characters in an Objective-C app. I am using a voice recognition program (cmusphinx) that returns a phrase from my data. It returns UTF-8 characters and when returning a Chinese character (which is three bytes) it separates it into three separate characters.
Example: When I want 人 to, I see: ‰∫∫. This is the proper in coding (E4 BA BA), but my code sees the returned value as three seperate characters rather than one.
Actually, my function is receiving the phrase as an NSString, (due to a wrap around) which uses UTF-16. I tried using Objective-C's built in conversion methods (to UTF-8 and from UTF-16), but these keep my string as three characters.
How can I decode these three separate characters into the one utf-8 codepoint for the Chinese character?
Or how can I properly encode it?
This is code fragment dealing with the cstring returned from sphinx and its encoding to a NSString:
const char * hypothesis = ps_get_hyp(pocketSphinxDecoder, &recognitionScore, &utteranceID);
NSString *hypothesisString = [[NSString alloc] initWithCString:hypothesis encoding:NSMacOSRomanEncoding];
Edit: From looking at the addition to your post, you actually do have control over the string encoding. In that case, why are you creating the string with NSMacOSRomanEncoding when you're expecting utf-8? Just change that to NSUTF8StringEncoding.
It sounds like what you're saying is you're being given an NSString that contains UTF-8 data that's being interpreted as a single-byte encoding (e.g. ISO-Latin-1, MacRoman, etc). I'm assuming here that you have no control over the code that creates the NSString, because if you did then the solution is just to change the encoding it's initializing with.
In any case, what you're asking for is a way to take the data in the string and convert it back to UTF-8. You can do this by creating an NSData from the NSString using whatever encoding its was originally created with (you need to know this much, at least, or it won't work), and then you can create a new NSString from the same data using UTF-8.
From the example character you gave (人) it looks like it's being interpreted as MacRoman, so lets go with that. The following code should convert it back:
- (NSString *)fixEncodingOfString:(NSString *)input {
CFStringEncoding cfEncoding = kCFStringEncodingMacRoman;
NSStringEncoding encoding = CFStringCovnertEncodingToNSStringEncoding(cfEncoding);
NSData *data = [input dataUsingEncoding:encoding];
if (!data) {
// the string wasn't actually in MacRoman
return nil;
}
NSString *output = [[[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding] autorelease];
}

how i can display a NSDATA description not in hex but in human redeable text?

I'm writing an iphone apps that receive data from an udp socket i can dispay what i receive using NSDATA description but is in hex format and for example it look like this:
<4c61742c 34343039 3131302e 35302c4c 6f6e2c39 38343237 352e3934 2c482c32 37392e30 302c4b6e 6f74732c 302e3032 2c4e616d 652c504c 41594552 2c496e64 65782c30 2c4d756c 7469706c 61796572 4e756d62 65722c30 00>
i know that thi is a prase of compite sens how i can convert it ?
NSString * s = [[[NSString alloc] initWithData:yourData encoding:NSUTF8StringEncoding] autorelease];
I'm going to assume you want to just debug the binary data. If that's not the case, what I describe below won't be terribly useful.
One technique I like is to use 0xED. Copy that big binary dump (everything inside the angle brackets) and paste it into a newly created 0xED document. You get a nice hex editor view.
0xED also supports user-defined plugins which can help visualize the binary data (for instance, converting an 8 byte timestamp to an NSDate)