Removing non-ascii characters from NSData? - objective-c

First off, I'm not exactly sure what is happening or if I fully understand it enough to describe the issue so I'll try my best.
I'm encoding a NSData object that contains json and one of the objects contains a degree symbol. We believe this what is causing the issue and would like to remove it before encoding since the problem occurs during encoding.
I have plenty of options out there for removing certain characters from strings but none from doing it from the NSData object itself. Wondering if this is even possible or if its an issue with how I'm already encoding it.
This is how the NSData object is being encoded and turned back into a NSData object to serialize it to json. Right now I'm not trying to remove the degree symbol, using Latin 1 because another character I want to use but do not need it, this probably isn't the best way to do but it works for majority of other data objects that pass through it just not this one so this needs to change.
NSString* stringISOLatin1 = [NSString stringWithCString:data.bytes encoding:NSISOLatin1StringEncoding];
NSData* dataUTF8 = [stringISOLatin1 dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:NO];
The results are a little weird, most of the time it works fine, even including the degree symbol in the text when displayed on screen. Other times after encoding the string comes back messed up at the end which makes it unable to be serialized.
Any help would be appreciated even if it just leads to a better explanation of what is happen. Thanks

The problem is likely that you are using NSString:stringWithCString:encoding: to convert your data object. This function requires the data to be null terminated. NSData objects do not have to be NULL terminated because they have an explicit length. If the NULL character is missing it will continue to read whatever there happens to be after the string, giving you either garbage at the end or possibly crash because of memory violation.
Instead try using this:
NSString *stringISOLatin1 = [[NSString alloc] initWithData:data encoding:NSISOLatin1StringEncoding];

Related

How to discover if a c-string can be encoded to NSString with a given encoding

I am trying to implement code that converts const char * to NSString. I would like to try multiple encodings in a specified order until I find one that works. Unfortunately, all the initWith... methods on NSString say that the results are undefined if the encoding doesn't work.
In particular, (sometimes) I would like to try first to encode as NSMacOSRomanStringEncoding which never seems to fail. Instead it just encodes gobbledygook. Is there some kind of check I can perform ahead of time? (Like canBeConvertedToEncoding but in the other direction?)
Instead of trying encodings one by one until you find a match, consider asking NSString to help you out here by using +[NSString stringEncodingForData:encodingOptions:convertedString:usedLossyConversion:], which, given string data and some options, may be able to detect the encoding for you, and return it (along with the actual decoded string).
Specifically for your use-case, since you have a list of encodings you'd like to try, the encodingOptions parameter will allow you to pass those encodings in using the NSStringEncodingDetectionSuggestedEncodingsKey.
So, given a C string and some possible encoding options, you might be able to do something like:
NSString *decodeCString(const char *source, NSArray<NSNumber *> *encodings) {
NSData * const cStringData = [NSData dataWithBytesNoCopy:(void *)source length:strlen(source) freeWhenDone:NO];
NSString *result = nil;
BOOL usedLossyConversion = NO;
NSStringEncoding determinedEncoding = [NSString stringEncodingForData:cStringData
encodingOptions:#{NSStringEncodingDetectionSuggestedEncodingsKey: encodings,
NSStringEncodingDetectionUseOnlySuggestedEncodingsKey: #YES}
convertedString:&result
usedLossyConversion:&usedLossyConversion];
/* Decide whether to do anything with `usedLossyConversion` and `determinedEncoding. */
return result;
}
Example usage:
NSString *result = decodeCString("Hello, world!", #[#(NSShiftJISStringEncoding), #(NSMacOSRomanStringEncoding), #(NSASCIIStringEncoding)]);
NSLog(#"%#", result); // => "Hello, world!"
If you don't 100% care about using only the list of encodings you want to try, you can drop the NSStringEncodingDetectionUseOnlySuggestedEncodingsKey option.
One thing to note about the encoding array you pass in: although the documentation doesn't promise that the suggested encodings are attempted in order, spelunking through the disassembly of the (current) method implementation shows that the array is enumerated using fast enumeration (i.e., in order). I can imagine that this could change in the future (or have been different in the past) so if this is somehow a hard requirement for you, you could theoretically work around it by repeatedly calling +stringEncodingForData:encodingOptions:convertedString:usedLossyConversion: one encoding at a time in order, but this would likely be incredibly expensive given the complexity of this method.

NSString isEqual strange bug

I have a web server which I used to fetch some data in my iOS application. The data include a field as the itemId let say '48501' (with no quotation). I read item JSON data into my itemObject in which itemId is defined as a NSString and not a NSInteger.
Everything works until this point but I have problems where I want to compare itemObject.itemId using isEqual: function with another NSString filled with 48501.
In other words both string are exactly the same and include 48501 when I print them. No space and hidden things is there. All isEqual: and isEqualToString: and == report false on comparison.
On the hand when I convert NSStrings to NSIntegers and compare them it works but not always! sometime TRUE sometime CRASH with no error to catch and just pointing to the line! I see them printed exactly the same but the if statement does not go through.
I showed the code to someone with far more experience than me and he was like this could be a bug! Anyone has ever exposed to this?
If your itemId is 48501 without any quotation in the JSON, then it's deserialized as NSNumber. Probably that's the problem in the first place. Try logging the type of your itemId and use appropriately -isEqualToString: for NSString and -isEqualToNumber: for NSNumber.

Putting NSData into an NSArray

I have NSData objects storing data (non character / non-ascii). I'm trying to put it into an array without it being interpreted as characters or ascii. I know this question has been asked a few times before, but none of the solutions posted have worked for me in this situation. I'm trying to avoid using property lists, which is what most answers suggested. I already tried converting the NSData to an NSString, then storing the string in the array, but of course it is interpreted as characters after putting it in the string, regardless of the encoding I've used. For example, one of the NSData's contains the value 2c, and when I put it into a string it is interpreted as ,. Does anyone know how I can store the raw data, in its original state, in an NSArray? Maybe by storing the data in user defaults, then somehow storing the defaults in an array? I'm at a loss.
Here is some possibly relevant code:
NSData *receivedData = [bleDevice readData];
NSString *receivedDataString = [[NSString alloc] initWithData:receivedData encoding:NSUTF8StringEncoding];
[dataArray insertObject:receivedDataString atIndex:0];
When I call:
[dataArray insertObject:receivedDataString atIndex:0];
It will store something like 2c ad a ,.
But, when I try and insert the raw data, like:
[dataArray insertObject:receivedData atIndex:0];
It will simply not store anything. There are no warnings, no errors. I'll NSLog the array and it is null.
[dataArray insertObject:receivedData atIndex:0]; most certainly will insert "receivedData" into "dataArray" (so long as both exist). "receivedData" can be any sort of NSObject -- need not be a string. If the array is "null" when you log it then the array itself never got created.
(It's important to remember that if an object pointer is nil then method calls on that pointer do not fail but rather silently return zero/nil, so "returns nil" strongly suggests the object never was created.)

Can't get MD5 checksum from security framework on Mac OS X Mountain Lion

NSString *curFourChanFilePath = [currentSubFile stringByAppendingPathComponent:curFourChanFile];
NSData *imageData = [NSData dataWithContentsOfFile:curFourChanFilePath];
CFErrorRef theError;
SecTransformRef testTransform = SecDigestTransformCreate(kSecDigestMD5,0,&theError);
CFDataRef theDataRef = (__bridge CFDataRef)imageData;
SecTransformSetAttribute(testTransform, kSecTransformInputAttributeName, theDataRef, &theError);
NSData *resultingData = (__bridge NSData *)(SecTransformExecute(testTransform, &theError));
NSString *resultingString = [[NSString alloc] initWithData:resultingData encoding:NSUTF8StringEncoding];
NSLog(#"%#",resultingString);
[checksumMapTable setObject:resultingData forKey:curFourChanFile];
Here's the code I'm having issues with. This code is in a nested nested loop, and all the code works fine till it get to turning the data into an NSString. It seems to have trouble with UTF8. All the strings turn into (null), but the strange thing is, it isn't (null). When I change the encoding to UTF16 or UTF32, I get text. Not readable text, it's all garbled as you'd expect from using the wrong encoding, but it's clearly there, I just can't seem to get at it in what I thought was the proper encoding, UTF8. Any help would be appreciated. Just to reiterate, again, all the code seems to be working fine until this point. The Security framework is still a bit new to me.
Actually, I just answered my own question. using SecEncodeTransformCreate(NULL, NULL) and sending the data through that, I got the checksum. Problem solved.
Change your encoding to one that can decode any byte value. For example NSISOLatin1StringEncoding. This should give you an output similar to openssl md5 -binary.
Not that the result will make much sense though…

Objective-C memory management problem

I'm getting an EXC_BAD_ACCESS error, and It's because of this part of code. Basically, I take an input and do some work on it. After multiple inputs, it throws the error. Am I doing something wrong with my memory here? I'd post the rest of the code, but it's rather long -- and I think this may be where my problem lies (It's where Xcode points me, at least).
-(IBAction) findShows: (id) clicked
{
char urlChars[1000];
[self getEventURL: urlChars];
NSString * theUrl = [[NSString alloc] initWithFormat:#"%s", urlChars];
NSData *data = [NSData dataWithContentsOfURL:[NSURL URLWithString:theUrl]];
int theLength = [data length];
NSString *content = [NSString stringWithUTF8String:[data bytes]];
char eventData[[data length]];
strcpy(eventData, [content UTF8String]);
[self parseEventData: eventData dataLength: theLength];
[whatIsShowing setStringValue:#"Showing events by this artist"];
}
When a crash occurs, there will be a backtrace.
Post it.
Either your program will break in the debugger, and the call stack will be in the debugger UI (or you can type 'bt
With that, the cause of the crash is often quite obvious. Without that, we are left to critique the code.
So, here goes....
char urlChars[1000];
[self getEventURL: urlChars];
This is, at best, a security hole and, at worst, the source of your crash. Any time you are going to copy bytes into a buffer, there should be some kind of way to (a) limit the # of bytes copied in (pass the length of the buffer) and (b) the # of bytes copied is returned (0 for failure or no bytes copied).
Given the above, what happens if there are 1042 bytes copied into urlChars by getEventURL:? boom
NSString * theUrl = [[NSString alloc] initWithFormat:#"%s", urlChars];
This is making some assumptions about urlChars that will lead to failure. First, it assumes that urlChars is of a proper %s compatible encoding. Secondly, it assumes that urlChars is NULL terminated (and didn't overflow the buffer).
Best to use one of the various NSString methods that create strings directly from the buffer of bytes using a particular encoding. More precise and more efficient.
NSData *data = [NSData dataWithContentsOfURL:[NSURL URLWithString:theUrl]];
I hope this isn't on the main thread... 'cause it'll block if it is and that'll make your app unresponsive on slow/flaky networks.
int theLength = [data length];
NSString *content = [NSString stringWithUTF8String:[data bytes]];
char eventData[[data length]];
strcpy(eventData, [content UTF8String]);
This is about the least efficient possible way of doing this. There is no need to create an NSString instance just to then turn it into a (char *). Just grab the bytes from the data directly.
Also -- are you sure that the data returned is NULL terminated? If not, that strcpy() is gonna blow right past the end of your eventData buffer, corrupting the stack.
[self parseEventData: eventData dataLength: theLength];
[whatIsShowing setStringValue:#"Showing events by this artist"];
What kind of data are you parsing that you really want to parse the raw bytes? In almost all cases, such data should be of some kind of structured type; XML or, even, HTML. If so, there is no need to drop down to parsing the raw bytes. (Not that raw data is unheard of -- just odd).
The bytes you get from [content UTF8String] could conceivably be different in number from the value of [data length]. Try using strncpy() instead and see if that still crashes. (It's also possible that getEventURL: sometimes fails to return a string in the format expected, but that's impossible to tell without the source to that method.)
Is it possible that the string contained in urlChars sometimes comes back non-NULL-terminated? You might want to try zeroing out the array, for example using bzero.
Additionally, there are a bunch of techniques for debugging EXC_BAD_ACCESS. Since you're doing a lot of pure C string manipulation, the usual method of turning on NSZombieEnabled may or may not help you (though I recommend turning it on regardless). Another technique you can try is recovering a previous stack frame using GDB. See my previous answer to a similar question if you're interested.
In my opinion the code is too complex. Do not resort to plain C arrays and strings unless you absolutely have to, they are harder to get right. (It’s no rocket science, but if you play with guns all the time, you will shoot yourself in the foot sooner or later.) Even if you insist on parsing plain C strings, isolate the code using the function interface:
// Callers have to mess with char*.
- (void) parseEventData: (char*) data {…}
// Callers can stay in the Objective-C land.
- (void) parseEventData: (NSString* or NSData*) data {
char *unwrappedData = …;
…
}
I’d certainly think twice before I used strcpy in my code. And I think you are leaking theUrl (although that should not cause EXC_BAD_ACCESS in this case). As for the bug itself, you might be hanging on parts of urlChars or eventData and when those stack-based variables disappear, you cause the segfault?