Difference between NSMutableData's mutableBytes and bytes methods - objective-c

Both return the same pointer. I know - bytes belongs to NSData, why does NSMutableData introduce - mutableBytes? Is it just for code clarity so it is more obvious you are accessing mutable data? Does it really matter which one is used?
NSMutableData* mydata = [[NSMutableData alloc] init];
[mydata appendData: [#"hello" dataUsingEncoding:NSUTF8StringEncoding]];
NSLog(#"%p", [mydata mutableBytes]);
NSLog(#"%p", [mydata bytes]);
Thanks.

There are a couple of reasons why NSMutableData might provide a separate mutableBytes method:
As you suggested in your question, using mutableBytes makes it clear to the reader that you want to change the data.
The bytes method returns a const void *. The mutableBytes method returns a void *. If you want to change the bytes, you need a void * with no const qualifier. The mutableBytes method eliminates the need to cast away the const qualifier.
In theory there could be a third reason: the -[NSData mutableCopy] method could return an NSMutableData that points to the same buffer as the original NSData, and only create a new, mutable copy of the buffer when you call mutableBytes. However, I don't think it's implemented this way based on my very limited testing.

One addition to the rob's answer and his comment:
#Dabbu NSData and NSMutableData store their contents as one contiguous
array of bytes.
The thing to keep in mind here is that this behavior was changed in iOS7: now NSData/NSMutableData are not guaranteed to keep contents as one contiguous array. It could be stored as multiple chunks.
So when you call bytes/mutableBytes, they will copy and flatten contents into one contiguous array of bytes, if needed, and then return a pointer to this contiguous chunk.
Depending of what you're trying to do, it may cause an unexpected performance penalty or excessive memory consumption for large buffers.

Related

How to discover if a c-string can be encoded to NSString with a given encoding

I am trying to implement code that converts const char * to NSString. I would like to try multiple encodings in a specified order until I find one that works. Unfortunately, all the initWith... methods on NSString say that the results are undefined if the encoding doesn't work.
In particular, (sometimes) I would like to try first to encode as NSMacOSRomanStringEncoding which never seems to fail. Instead it just encodes gobbledygook. Is there some kind of check I can perform ahead of time? (Like canBeConvertedToEncoding but in the other direction?)
Instead of trying encodings one by one until you find a match, consider asking NSString to help you out here by using +[NSString stringEncodingForData:encodingOptions:convertedString:usedLossyConversion:], which, given string data and some options, may be able to detect the encoding for you, and return it (along with the actual decoded string).
Specifically for your use-case, since you have a list of encodings you'd like to try, the encodingOptions parameter will allow you to pass those encodings in using the NSStringEncodingDetectionSuggestedEncodingsKey.
So, given a C string and some possible encoding options, you might be able to do something like:
NSString *decodeCString(const char *source, NSArray<NSNumber *> *encodings) {
NSData * const cStringData = [NSData dataWithBytesNoCopy:(void *)source length:strlen(source) freeWhenDone:NO];
NSString *result = nil;
BOOL usedLossyConversion = NO;
NSStringEncoding determinedEncoding = [NSString stringEncodingForData:cStringData
encodingOptions:#{NSStringEncodingDetectionSuggestedEncodingsKey: encodings,
NSStringEncodingDetectionUseOnlySuggestedEncodingsKey: #YES}
convertedString:&result
usedLossyConversion:&usedLossyConversion];
/* Decide whether to do anything with `usedLossyConversion` and `determinedEncoding. */
return result;
}
Example usage:
NSString *result = decodeCString("Hello, world!", #[#(NSShiftJISStringEncoding), #(NSMacOSRomanStringEncoding), #(NSASCIIStringEncoding)]);
NSLog(#"%#", result); // => "Hello, world!"
If you don't 100% care about using only the list of encodings you want to try, you can drop the NSStringEncodingDetectionUseOnlySuggestedEncodingsKey option.
One thing to note about the encoding array you pass in: although the documentation doesn't promise that the suggested encodings are attempted in order, spelunking through the disassembly of the (current) method implementation shows that the array is enumerated using fast enumeration (i.e., in order). I can imagine that this could change in the future (or have been different in the past) so if this is somehow a hard requirement for you, you could theoretically work around it by repeatedly calling +stringEncodingForData:encodingOptions:convertedString:usedLossyConversion: one encoding at a time in order, but this would likely be incredibly expensive given the complexity of this method.

Convert NSValue to NSData and back again, with the correct type

I would like to be able to convert an objective-c object, such as an NSArray or UIImage into an NSData object, which I can then use to write to disk. I first converted them to an NSValue, which I then planned on converting to NSData. This question provided part of the answer, but as they were working with NSNumber, they didn't really have a need to convert it back to NSValue.
I have seen other questions such as this one that relies on NSKeyedArchiver, but I would like to steer away from this due to the vast size inflation that occurs.
Therefore my code at the moment for encoding an NSData object from an NSValue, from the first question, is as follows:
+(NSData*) dataWithValue:(NSValue*)value {
NSUInteger size;
const char* encoding = [value objCType];
NSGetSizeAndAlignment(encoding, &size, NULL);
void* ptr = malloc(size);
[value getValue:ptr];
NSData* data = [NSData dataWithBytes:ptr length:size];
free(ptr);
return data;
}
My question is how would I go about decoding an NSData object that has been encoded in this manner and get back the original objCType from the NSValue?
I would assume I would be using something along these lines
[NSValue valueWithBytes:[data bytes] objCType:typeHere];
How would I get back the type information?
Use NSKeyedArchiver to save the items and NSKeyedUnarchiver to restore the object. The object must conform to NSCoding and in the case of a collection all contained items must also conform to NSCoding.
See the Apple documentation of NSKeyedArchiver and NSCoder
Your approach will only work for primitive types (int, float, structs without pointers, ...) inside your NSValue. Otherwise you will only get the meaningless pointer value but not the actual data in your NSData object.
To also pass the actual type string along you would have to figure out a way to get this inside your NSData object as well. Not impossible, but it will not solve your actual problem.
Using a keyed archiver as zaph suggests is much better.

What's the fastest way to init a NSString from a CString?

I need to allocate lot's of NSString objects from cStrings (which come that way from a database), as fast as possible. cStringUsingEncoding and the likes are just too slow - about 10-15 times slower compared to allocating a cString.
However, creating a NSString with a NSString is getting pretty close to cString allocation (about 1.2s for 1M allocations). EDIT: Fixed alloc to use a copy of the string.
const char *n;
const char *s = "Office für iPad: Steve Ballmer macht Hoffnung";
NSString *str = [NSString stringWithUTF8String:s];
int len = strlen(s);
for (int i = 0; i<10000000; i++) {
NSString *s = [[NSString alloc] initWithString:[str copy]];
s = s;
}
cString allocation test (also about 1s for 1M allocations):
for (int i = 0; i<10000000; i++) {
n = malloc(len);
memccpy((void*)n, s, 0, len) ;
n = n;
free(n);
}
But as I said, using stringWithCString and the likes is an order of magnitude slower. The fastest I could get was using initWithBytesNoCopy (about 8s, therefore 8 times slower compared to stringWithString):
NSString *so = [[NSString alloc] initWithBytesNoCopy:(void*)n length:len encoding:NSUTF8StringEncoding freeWhenDone:YES];
So, is there another magic way to make allocations from cStrings faster? I'd even not rule out to subclass NSString (and yes, I know it's a cluster class).
EDIT: In instruments I see that NSString's call to CFStringUsingByteStream3 is the root issue.
EDIT 2: The root issue is according to instuments __CFFromUTF8. Just looking at the sources [1], this seems indeed to be quite inefficient and handling some legacy cases.
https://www.opensource.apple.com/source/CF/CF-476.17/CFBuiltinConverters.c?txt
This seems to me to not be a fair test.
cString allocation test looks to be allocating a byte array and copying data. I can't tell for sure because the variable definitions are not included.
NSString *s = [[NSString alloc] initWithString:str]; is taking an existing NSString (data already in the correct format) and maybe just increments the retain count. Even if a copy is forced the data is still already in the correct encoding and just needs to be copied.
[NSString stringWithUTF8String:s]; has to handle the UTF8 encoding and convert from one encoding (UTF8) to the internal NSString/CFString encoding. The method being used (CFStreamUsingByteStream) has support for multiple encodings (UTF8/UTF16/UTF32/others). A specialized UTF8 only method could be faster but that leads to the question of is this really a performance problem or just an exercise.
You can see the source code for CFStringUsingByteStream3 in this file.
As per my comment, and Brian's answer, I think the problem here is that to create NSStrings you're having to parse the UTF-8 strings. So the question arises: do you really need to parse them, then?
If parsing-on-demand is an option then I'd suggest you write a proxy that can impersonate NSString with an interface along the lines of:
#interface BJLazyUTF8String: NSProxy
- (id)initWithBytes:(const char *)bytes length:(size_t)length;
#end
So it's not a subclass of NSString and it doesn't try to provide any real functionality. Inside the init just keep the bytes, e.g. as _bytes, doing whatever is correct for your C memory ownership. Then:
- (NSString *)bjRealString
{
// we'd better create the NSString if we haven't already
if(!_string)
_string = [NSString stringWithUTF8String:_bytes];
return _string;
}
- (void)forwardInvocation:(NSInvocation *)anInvocation
{
// if this is invoked then someone is trying to
// make a call to what they think is a string;
// let's forward that call to a string so that
// it does what they expect
[anInvocation setTarget:[self bjRealString]];
[anInvocation invoke];
}
- (NSMethodSignature *)methodSignatureForSelector:(SEL)aSelector
{
return [[self bjRealString] methodSignatureForSelector:aSelector];
}
You can then do:
NSString *myString = [[BJLazyUTF8String alloc] initWithBytes:... length:...];
And subsequently treat myString exactly as though it were an NSString.
Microbenchmarks are a great distraction, but rarely useful. In this case, though, there is validity.
Assuming, for the moment, that you've actually measured string creation as being a real source of performance issues, then the real problem can be better expressed as how do I reduce memory bandwidth? because that is really where your problems lie; you causing tons and tons of data to be copied into freshly allocated buffers.
As you've discovered, the fastest you can go is to not copy at all. initWithBytesNoCopy:... exists exactly to solve this case. Thus, you'll want to create a data construct that holds the original string buffer and manages all the NSString instances that point to it as one cohesive unit.
Without thinking it through in detail, you could likely encapsulate the raw buffer in an NSData instance, then use associated objects to create a strong reference from your string instances to that NSData instance. That way, the NSData (and associated memory) will be deallocated when the last string is deallocated.
With the additional detail that this is for a CoreData-esque ORM layer (and, no, I'm not going to suggest yer doin' it wrong because your description really does sound like you need that level of control), then it would seem that your ORM layer would be the ideal place to manage these strings as described above.
I'd also encourage you to investigate something like FMDB to see if it can provide both the encapsulation you need and the flexibility to add your additional features (and the hooks to make it fast).

Using malloc to allocate an array of NSStrings?

Since NSSstring is not defined in length like integer or double, do I run the risk of problems allocating an array of NSStrings for it using malloc?
thanks
ie:
NSString ***nssName;
nssName = (NSString***) malloc(iN * sizeof(NSString*));
the end result with for_loops for the rows is a 2D array, so it is a little easier to work then NSArray(less code).
No problems should arise, allocating an array of NSStrings is like making an array of the pointers to string objects. Pointers are a constant length. I would recommend just using NSArray but it is still fine to use a C array of NSStrings. Note that this may have changed with ARC.
Here is completely acceptable code demonstarting this:
NSString** array = malloc(sizeof(NSString*) * 10); // Array of 10 strings
array[0] = #"Hello World"; // Put on at index 0
NSLog(#"%#", array[0]); // Log string at index 0
Since NSString is an object (and to be more precise: an object cluster) you cannot know its final size in memory, only Objective-C does. So you need to use the Objective-C allocation methods (like [[NSString alloc] init]), you cannot use malloc.
The problem is further that NSString is an object cluster which means you do not get an instance of NSString but a subclass (that you might not even know and should not care about). For example, very often the real class is NSCFString but once you call some of the methods that treat the string like a path you get an instance of NSPathStore2 or whatever). Think of the NSString init methods as being factories (as in Factory Pattern).
After question edit:
What you really want is:
NSString **nssName;
nssName = (NSString**) malloc(iN * sizeof(NSString*));
And then something like:
nssName[0] = #"My string";
nssName[1] = [[NSString alloc] init];
...
This is perfectly fine since you have an array of pointers and the size of pointer is of course known.
But beware of memory management: first, you should make sure the array is filled with NULLs, e.g. with bzero or using calloc:
bzero(nssName, iN * sizeof(NSString*));
Then, before you free the array you need to release each string in the array (and make sure you do not store autoreleased strings; you will need to retain them first).
All in all, you have a lot more pitfalls here. You can go this route but using an NSArray will be easier to handle.
NSStrings can only be dealt with through pointers, so you'd just be making an array of pointers to NSString. Pointers have a defined length, so it's quite possible. However, an NSArray is usually the better option.
You should alloc/init... the NSString*s or use the class's factory methods. If you need an array of them, try NSArray*.
You should not use malloc to allocate data for Objective-C types. Doing this will allocate memory space but not much else. Most importantly the object will not be initialized, and almost as importantly the retain count for the object will not be set. This is just asking for problems. Is there any reason you do not want to use alloc and init?

Objective-C memory management problem

I'm getting an EXC_BAD_ACCESS error, and It's because of this part of code. Basically, I take an input and do some work on it. After multiple inputs, it throws the error. Am I doing something wrong with my memory here? I'd post the rest of the code, but it's rather long -- and I think this may be where my problem lies (It's where Xcode points me, at least).
-(IBAction) findShows: (id) clicked
{
char urlChars[1000];
[self getEventURL: urlChars];
NSString * theUrl = [[NSString alloc] initWithFormat:#"%s", urlChars];
NSData *data = [NSData dataWithContentsOfURL:[NSURL URLWithString:theUrl]];
int theLength = [data length];
NSString *content = [NSString stringWithUTF8String:[data bytes]];
char eventData[[data length]];
strcpy(eventData, [content UTF8String]);
[self parseEventData: eventData dataLength: theLength];
[whatIsShowing setStringValue:#"Showing events by this artist"];
}
When a crash occurs, there will be a backtrace.
Post it.
Either your program will break in the debugger, and the call stack will be in the debugger UI (or you can type 'bt
With that, the cause of the crash is often quite obvious. Without that, we are left to critique the code.
So, here goes....
char urlChars[1000];
[self getEventURL: urlChars];
This is, at best, a security hole and, at worst, the source of your crash. Any time you are going to copy bytes into a buffer, there should be some kind of way to (a) limit the # of bytes copied in (pass the length of the buffer) and (b) the # of bytes copied is returned (0 for failure or no bytes copied).
Given the above, what happens if there are 1042 bytes copied into urlChars by getEventURL:? boom
NSString * theUrl = [[NSString alloc] initWithFormat:#"%s", urlChars];
This is making some assumptions about urlChars that will lead to failure. First, it assumes that urlChars is of a proper %s compatible encoding. Secondly, it assumes that urlChars is NULL terminated (and didn't overflow the buffer).
Best to use one of the various NSString methods that create strings directly from the buffer of bytes using a particular encoding. More precise and more efficient.
NSData *data = [NSData dataWithContentsOfURL:[NSURL URLWithString:theUrl]];
I hope this isn't on the main thread... 'cause it'll block if it is and that'll make your app unresponsive on slow/flaky networks.
int theLength = [data length];
NSString *content = [NSString stringWithUTF8String:[data bytes]];
char eventData[[data length]];
strcpy(eventData, [content UTF8String]);
This is about the least efficient possible way of doing this. There is no need to create an NSString instance just to then turn it into a (char *). Just grab the bytes from the data directly.
Also -- are you sure that the data returned is NULL terminated? If not, that strcpy() is gonna blow right past the end of your eventData buffer, corrupting the stack.
[self parseEventData: eventData dataLength: theLength];
[whatIsShowing setStringValue:#"Showing events by this artist"];
What kind of data are you parsing that you really want to parse the raw bytes? In almost all cases, such data should be of some kind of structured type; XML or, even, HTML. If so, there is no need to drop down to parsing the raw bytes. (Not that raw data is unheard of -- just odd).
The bytes you get from [content UTF8String] could conceivably be different in number from the value of [data length]. Try using strncpy() instead and see if that still crashes. (It's also possible that getEventURL: sometimes fails to return a string in the format expected, but that's impossible to tell without the source to that method.)
Is it possible that the string contained in urlChars sometimes comes back non-NULL-terminated? You might want to try zeroing out the array, for example using bzero.
Additionally, there are a bunch of techniques for debugging EXC_BAD_ACCESS. Since you're doing a lot of pure C string manipulation, the usual method of turning on NSZombieEnabled may or may not help you (though I recommend turning it on regardless). Another technique you can try is recovering a previous stack frame using GDB. See my previous answer to a similar question if you're interested.
In my opinion the code is too complex. Do not resort to plain C arrays and strings unless you absolutely have to, they are harder to get right. (It’s no rocket science, but if you play with guns all the time, you will shoot yourself in the foot sooner or later.) Even if you insist on parsing plain C strings, isolate the code using the function interface:
// Callers have to mess with char*.
- (void) parseEventData: (char*) data {…}
// Callers can stay in the Objective-C land.
- (void) parseEventData: (NSString* or NSData*) data {
char *unwrappedData = …;
…
}
I’d certainly think twice before I used strcpy in my code. And I think you are leaking theUrl (although that should not cause EXC_BAD_ACCESS in this case). As for the bug itself, you might be hanging on parts of urlChars or eventData and when those stack-based variables disappear, you cause the segfault?