How to use NSString getBytes:maxLength:usedLength:encoding:options:range:remainingRange: - objective-c

I have a string that I want as a byte array. So far I have used NSData to do this:
NSString *message = #"testing";
NSData *messageData = [message dataUsingEncoding:NSUnicodeStringEncoding allowLossyConversion:YES];
NSUInteger dataLength = [messageData length];
Byte *byteData = (Byte*)malloc( dataLength );
memcpy( byteData, [messageData bytes], dataLength );
But, I know that NSString has the getBytes:maxLength:usedLength:encoding:options:range:remainingRange: method that would allow me to skip using NSData all together. My issue is, I don't know how to properly set all the parameters.
I assume the pointer array passed in has to be malloc'ed - but I'm not sure how to find how much memory to malloc. I know there is [NSString lengthOfBytesUsingEncoding:] and [NSString maximumLengthOfBytesUsingEncoding:] but I don't know if those are the methods I need to use and don't fully understand the difference between them. I assume this would be the same value given to maxLength. The rest of the parameters make sense from the documentation. Any help would be great. Thanks.

The difference between lengthOfBytesUsingEncoding: and maximumLengthOfBytesUsingEncoding: is that the former is exact but slow (O(n)) while the latter is fast (O(1)) but may return a considerably larger number of bytes than is actually needed. The only guarantee that maximumLengthOfBytesUsingEncoding: gives is that the return value will be large enough to contain the string's bytes.
Generally, your assumptions are correct. So the method should be used like this:
NSUInteger numberOfBytes = [message lengthOfBytesUsingEncoding:NSUnicodeStringEncoding];
void *buffer = malloc(numberOfBytes);
NSUInteger usedLength = 0;
NSRange range = NSMakeRange(0, [message length]);
BOOL result = [message getBytes:buffer maxLength:numberOfBytes usedLength:&usedLength encoding:NSUnicodeStringEncoding options:0 range:range remainingRange:NULL];
...
free(buffer);

Related

Iterate through NSData bytes

How can I iterate through [NSData bytes] one by one and append them to an NSMutableString or print them using NSLog()?
Rather than appending bytes to a mutable string, create a string using the data:
// Be sure to use the right encoding:
NSString *result = [[NSString alloc] initWithData:myData encoding:NSUTF8StringEncoding];
If you really want to loop through the bytes:
NSMutableString *result = [NSMutableString string];
const char *bytes = [myData bytes];
for (int i = 0; i < [myData length]; i++)
{
[result appendFormat:#"%02hhx", (unsigned char)bytes[i]];
}
Update! Since iOS 7, there's a new, preferred way to iterate through all of the bytes in an NSData object.
Because an NSData can now be composed of multiple disjoint byte array chunks under the hood, calling [NSData bytes] can sometimes be memory-inefficient, because it needs to flatten all of the underlying chunks into a single byte array for the caller.
To avoid this behavior, it's better to enumerate bytes using the enumerateByteRangesUsingBlock: method of NSData, which will return ranges of the existing underlying chunks, which you can access directly without needing to generate any new array structures. Of course, you'll need to be careful not to go poking around inappropriately in the provided C-style array.
NSMutableString* resultAsHexBytes = [NSMutableString string];
[data enumerateByteRangesUsingBlock:^(const void *bytes,
NSRange byteRange,
BOOL *stop) {
//To print raw byte values as hex
for (NSUInteger i = 0; i < byteRange.length; ++i) {
[resultAsHexBytes appendFormat:#"%02x", ((uint8_t*)bytes)[i]];
}
}];

Getting weird characters when going from NSString to bytes and then back to NSString

NSString *message = #"testing";
NSUInteger dataLength = [message lengthOfBytesUsingEncoding:NSUnicodeStringEncoding];
void *byteData = malloc( dataLength );
NSRange range = NSMakeRange(0, [message length]);
NSUInteger actualLength = 0;
NSRange remain;
BOOL result = [message getBytes:byteData maxLength:dataLength usedLength:&actualLength encoding:NSUnicodeStringEncoding options:0 range:range remainingRange:&remain];
NSString *decodedString = [[NSString alloc] initWithBytes:byteData length:actualLength encoding:NSUnicodeStringEncoding];
My issue is that I expect decodedString to be testing, but instead it looks like chinese characters. I thought it could be an issue with null-terminated data, but it seems that that should not be an issue.
You want something like this?
NSString *message = #"testing";
NSData *bytes = [message dataUsingEncoding:NSUTF8StringEncoding];
NSString* messageDecoded = [[NSString alloc] initWithData:bytes encoding:NSUTF8StringEncoding];
NSLog(#"decoded: %#", messageDecoded);
The UTF-16 byte order is getting reversed between the encode and decode.
You can do any one of the following:
Use an encoding that specifies an explicit byte order (e.g., NSUTF16BigEndianStringEncoding, NSUTF16LittleEndianStringEncoding, NSUTF8StringEncoding).
Pass NSStringEncodingConversionExternalRepresentation to the options: parameter in getBytes:maxLength:usedLength:encoding:options:range:. This prepends a byte-order mark to the start of the data.
Use NSData, as Elvis suggested.
These days, UTF-8 is the preferred Unicode encoding in most cases.

NSData to NSString after CC_SHA1

Based on this question I wrote a category on NSString to hash NSString instances using SHA1. However, there is something wrong with my implementation. The funny thing is that logging the NSData instance does give the expected hash, but when I want to create an NSString from that NSData instance, I simply get null.
- (NSString *)sha1 {
NSData *dataFromString = [self dataUsingEncoding:NSUTF8StringEncoding];
unsigned char hashed[CC_SHA1_DIGEST_LENGTH];
if ( CC_SHA1([dataFromString bytes], [dataFromString length], hashed) ) {
NSData *dataFromDigest = [NSData dataWithBytes:hashed length:CC_SHA1_DIGEST_LENGTH];
NSString *result = [[NSString alloc] initWithBytes:[dataFromDigest bytes] length:[dataFromDigest length] encoding:NSUTF8StringEncoding];
return result;
} else {
return nil;
}
}
Thanks for the help!
The output of a hash function is just a bare bunch of bytes. You're taking those bytes, and essentially telling NSString that they represent a UTF8-encoded string, which they don't. The resulting NSString is just garbage.
It sounds like what you really want is something like a string of hexadecimal digits that represent the hash value? If so, I believe you'll need to roll this yourself by looping through the dataFromDigest one byte at a time and outputting into a new NSMutableString the right hex digits depending on the byte's value. You can do it yourself or use some code from the web. The comment on this post looks promising.

Is there an Objective-c regex replace with callback/C# MatchEvaluator equivalent?

I have a C# project I'm intending to port to Objective-C. From what I understand about Obj-C, it looks like there's a confusing variety of Regex options but I can't see anything about a way of doing a replace with callback.
I'm looking for something that is the equivalent of the C# MatchEvaluator delegate or PHP's preg_replace_callback. An example of what I want to do in C# is -
// change input so each word is followed a number showing how many letters it has
string inputString = "Hello, how are you today ?";
Regex theRegex = new Regex(#"\w+");
string outputString = theRegex.Replace(inputString, delegate (Match thisMatch){
return thisMatch.Value + thisMatch.Value.Length;
});
// outputString is now 'Hello5, how3 are3 you3 today5 ?'
How could I do this in Objective-C ? In my actual situation the Regex has both lookahead and lookbehind assertions in it though, so any alternative involving finding the strings in advance and then doing a series of straight string replaces won't work unfortunately.
Foundation has a NSRegularExpression class (iOS4 and later), which may be useful to you. From the docs:
The fundamental matching method for
NSRegularExpression is a Block
iterator method that allows clients to
supply a Block object which will be
invoked each time the regular
expression matches a portion of the
target string. There are additional
convenience methods for returning all
the matches as an array, the total
number of matches, the first match,
and the range of the first match.
For example:
NSString *input = #"Hello, how are you today?";
// make a copy of the input string. we are going to edit this one as we iterate
NSMutableString *output = [NSMutableString stringWithString:input];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"\\w+"
options:NSRegularExpressionCaseInsensitive
error:&error];
// keep track of how many additional characters we've added (1 per iteration)
__block NSUInteger count = 0;
[regex enumerateMatchesInString:input
options:0
range:NSMakeRange(0, [input length])
usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop){
// Note that Blocks in Objective C are basically closures
// so they will keep a constant copy of variables that were in scope
// when the block was declared
// unless you prefix the variable with the __block qualifier
// match.range is a C struct
// match.range.location is the character offset of the match
// match.range.length is the length of the match
NSString *matchedword = [input substringWithRange:match.range];
// the matched word with the length appended
NSString *new = [matchedword stringByAppendingFormat:#"%d", [matchedword length]];
// every iteration, the output string is getting longer
// so we need to adjust the range that we are editing
NSRange newrange = NSMakeRange(match.range.location+count, match.range.length);
[output replaceCharactersInRange:newrange withString:new];
count++;
}];
NSLog(#"%#", output); //output: Hello5, how3 are3 you3 today5?
I modified atshum's code to make it a bit more flexible:
__block int prevEndPosition = 0;
[regex enumerateMatchesInString:text
options:0
range:NSMakeRange(0, [text length])
usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop)
{
NSRange r = {.location = prevEndPosition, .length = match.range.location - prevEndPosition};
// Copy everything without modification between previous replacement and new one
[output appendString:[text substringWithRange:r]];
// Append string to be replaced
[output appendString:#"REPLACED"];
prevEndPosition = match.range.location + match.range.length;
}];
// Finalize string end
NSRange r = {.location = prevEndPosition, .length = [text length] - prevEndPosition};
[output appendString:[text substringWithRange:r]];
Seems to work for now (probably needs a bit more testing)

Convert NSData bytes to NSString?

I'm trying to use the BEncoding ObjC class to decode a .torrent file.
NSData *rawdata = [NSData dataWithContentsOfFile:#"/path/to/the.torrent"];
NSData *torrent = [BEncoding objectFromEncodedData:rawdata];
When I NSLog torrent I get the following:
{
announce = <68747470 3a2f2f74 6f727265 6e742e75 62756e74 752e636f 6d3a3639 36392f61 6e6e6f75 6e6365>;
comment = <5562756e 74752043 44207265 6c656173 65732e75 62756e74 752e636f 6d>;
"creation date" = 1225365524;
info = {
length = 732766208;
name = <7562756e 74752d38 2e31302d 6465736b 746f702d 69333836 2e69736f>;
"piece length" = 524288;
....
How do I convert the name into a NSString? I have tried..
NSData *info = [torrent valueForKey:#"info"];
NSData *name = [info valueForKey:#"name"];
unsigned char aBuffer[[name length]];
[name getBytes:aBuffer length:[name length]];
NSLog(#"File name: %s", aBuffer);
..which retrives the data, but seems to have additional unicode rubbish after it:
File name: ubuntu-8.10-desktop-i386.iso)
I have also tried (from here)..
NSString *secondtry = [NSString stringWithCharacters:[name bytes] length:[name length] / sizeof(unichar)];
..but this seems to return a bunch of random characters:
扵湵畴㠭ㄮⴰ敤歳潴⵰㍩㘸椮潳
The fact the first way (as mentioned in the Apple documentation) returns most of the data correctly, with some additional bytes makes me think it might be an error in the BEncoding library.. but my lack of knowledge about ObjC is more likely to be at fault..
That's an important point that should be re-emphasized I think. It turns out that,
NSString *content = [NSString stringWithUTF8String:[responseData bytes]];
is not the same as,
NSString *content = [[NSString alloc] initWithBytes:[responseData bytes]
length:[responseData length] encoding: NSUTF8StringEncoding];
the first expects a NULL terminated byte string, the second doesn't. In the above two cases content will be NULL in the first example if the byte string isn't correctly terminated.
How about
NSString *content = [[[NSString alloc] initWithData:myData
encoding:NSUTF8StringEncoding] autorelease];
NSData *torrent = [BEncoding objectFromEncodedData:rawdata];
When I NSLog torrent I get the following:
{
⋮
}
That would be an NSDictionary, then, not an NSData.
unsigned char aBuffer[[name length]];
[name getBytes:aBuffer length:[name length]];
NSLog(#"File name: %s", aBuffer);
..which retrives the data, but seems to have additional unicode rubbish after it:
File name: ubuntu-8.10-desktop-i386.iso)
No, it retrieved the filename just fine; you simply printed it incorrectly. %s takes a C string, which is null-terminated; the bytes of a data object are not null-terminated (they are just bytes, not necessarily characters in any encoding, and 0—which is null as a character—is a perfectly valid byte). You would have to allocate one more character, and set the last one in the array to 0:
size_t length = [name length] + 1;
unsigned char aBuffer[length];
[name getBytes:aBuffer length:length];
aBuffer[length - 1] = 0;
NSLog(#"File name: %s", aBuffer);
But null-terminating the data in an NSData object is wrong (except when you really do need a C string). I'll get to the right way in a moment.
I have also tried […]..
NSString *secondtry = [NSString stringWithCharacters:[name bytes] length:[name length] / sizeof(unichar)];
..but this seems to return random Chinese characters:
扵湵畴㠭ㄮⴰ敤歳潴⵰㍩㘸椮潳
That's because your bytes are UTF-8, which encodes one character in (usually) one byte.
unichar is, and stringWithCharacters:length: accepts, UTF-16. In that encoding, one character is (usually) two bytes. (Hence the division by sizeof(unichar): it divides the number of bytes by 2 to get the number of characters.)
So you said “here's some UTF-16 data”, and it went and made characters from every two bytes; each pair of bytes was supposed to be two characters, not one, so you got garbage (which turned out to be mostly CJK ideographs).
You answered your own question pretty well, except that stringWithUTF8String: is simpler than stringWithCString:encoding: for UTF-8-encoded strings.
However, when you have the length (as you do when you have an NSData), it is even easier—and more proper—to use initWithBytes:length:encoding:. It's easier because it does not require null-terminated data; it simply uses the length you already have. (Don't forget to release or autorelease it.)
A nice quick and dirty approach is to use NSString's stringWithFormat initializer to help you out. One of the less-often used features of string formatting is the ability to specify a mximum string length when outputting a string. Using this handy feature allows you to convert NSData into a string pretty easily:
NSData *myData = [self getDataFromSomewhere];
NSString *string = [NSString stringWithFormat:#"%.*s", [myData length], [myData bytes]];
If you want to output it to the log, it can be even easier:
NSLog(#"my Data: %.*s", [myData length], [myData bytes]);
Aha, the NSString method stringWithCString works correctly:
With the bencoding.h/.m files added to your project, the complete .m file:
#import <Foundation/Foundation.h>
#import "BEncoding.h"
int main (int argc, const char * argv[]) {
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
// Read raw file, and de-bencode
NSData *rawdata = [NSData dataWithContentsOfFile:#"/path/to/a.torrent"];
NSData *torrent = [BEncoding objectFromEncodedData:rawdata];
// Get the file name
NSData *infoData = [torrent valueForKey:#"info"];
NSData *nameData = [infoData valueForKey:#"name"];
NSString *filename = [NSString stringWithCString:[nameData bytes] encoding:NSUTF8StringEncoding];
NSLog(#"%#", filename);
[pool drain];
return 0;
}
..and the output:
ubuntu-8.10-desktop-i386.iso
In cases where I don't have control over the data being transformed into a string, such as reading from the network, I prefer to use NSString -initWithBytes:length:encoding: so that I'm not dependent upon having a NULL terminated string in order to get defined results. Note that Apple's documentation says if cString is not a NULL terminated string, that the results are undefined.
Use a category on NSData:
NSData+NSString.h
#interface NSData (NSString)
- (NSString *)toString;
#end
NSData+NSString.m
#import "NSData+NSString.h"
#implementation NSData (NSString)
- (NSString *)toString
{
Byte *dataPointer = (Byte *)[self bytes];
NSMutableString *result = [NSMutableString stringWithCapacity:0];
NSUInteger index;
for (index = 0; index < [self length]; index++)
{
[result appendFormat:#"0x%02x,", dataPointer[index]];
}
return result;
}
#end
Then just NSLog(#"Data is %#", [nsData toString])"
You can try this. Fine with me.
DLog(#"responeData: %#", [[[NSString alloc] initWithBytes:[data bytes] length:[data length] encoding:NSASCIIStringEncoding] autorelease]);
Sometimes you need to create Base64 encoded string from NSData. For instance, when you create a e-mail MIME. In this case use the following:
#import "NSData+Base64.h"
NSString *string = [data base64EncodedString];
This will work.
NSString *str = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];