NSString contents get truncated because of the null character in the middle of the string - objective-c

I am getting null character in the middle of the string obtained as a part of response to HTTP Post. As a result the content gets printed only upto the null character even though the string has more data in it. Below is the sample code that illustrates the problem
NSString *testString= #"v1db1���������¿¿sssss"; ->this line shows the null character warning
NSInteger stringlength = [testString length];
NSLog(#"String Length:%ld",stringlength);
NSLog(#"String Value:%#",testString);
Note: the test string contains a null character with the question mark. For some reason I am not able to save the post if I copy the exact string.
The first line shows a warning "Null character(s) preserved in string literal" in Xcode.
The output of this program is
String Length:21
String Value:v1db1
What is the correct approach to solve this problem?. I am thinking of scanning the NSString for any null character and removing the same. What could be the reason I am getting null character ?.

The problem characters are UTF-8 encoded something (EF BF BD and C2 BF) that are handled badly directly in quoted strings in XCode. You will need to convert it. Something like:
[ NSString stringWithUTF8String: [ #"v1db1���������¿¿sssss" cStringUsingEncoding: [ NSString defaultCStringEncoding ] ] ];

Related

Objective-C / C Convert UTF8 Literally to Real string

Im wondering how to convert
NSString = "\xC4"; ....
to real NSString represented in normal format
Fundamentally related to xcode UTF-8 literals. Of course, it is ambiguous what you actually mean by "\xC4" - without an encoding specified, it means nothing.
If you mean the character whose Unicode code point is 0x00C4 then I would think (though I haven't tested) that this will do what you want.
NSString *s = #"\u00C4";
First are you sure you have \xC4 in your string? Consider:
NSString *one = #"\xC4\x80";
NSString *two = #"\\xC4\\x80";
NSLog(#"%# | %#", one, two);
This will output:
Ā | \xC4\x80
If you are certain your string contains the four characters \xC4 are you sure it is UTF-8 encoded as ASCII? Above you will see I added \x80, this is because \xC4 is not valid UTF-8, it is the first byte of a two-byte sequence. Maybe you have only shown a sample of your input and the second byte is present, if not you do not have UTF-8 encoded as ASCII.
If you are certain it is UTF-8 encoded as ASCII you will have to convert it yourself. It might seem the Cocoa string encoding methods would handle it, especially as what you appear to have is a string as it might be written in Objective-C source code. Unfortunately the obvious encoding, NSNonLossyAsciiStringEncoding only handles octal and unicode escapes, not the hexadecimal escapes in your string.
You can use any algorithm you like to convert it. One choice would be a simple finite state machine which scans the input a byte at a time and recognises the four byte sequence: \, x, hex-digit, hex-digit; and combines the two hex-digits into a single byte. NSString is not the best choice for byte-at-time string processing, you may be better off converting to C strings, e.g.:
// sample input, all characters should be ASCII
NSString *input = #"\\xC4\\x80";
// obtain a C string containing the ASCII characters
const char *cInput = [input cStringUsingEncoding:NSASCIIStringEncoding];
// allocate a buffer of the correct length for the result
char cOutput[strlen(c2a)+1];
// call your function to decode the hexadecimal escapes
convertAsciiEncodedUTF8(cInput, cOutput);
// create a NSString from the result
NSString *output = [NSString stringWithCString:cOutput encoding:NSUTF8StringEncoding];
You just need to write the finite state machine, or other algorithm, for convertAsciiEncodedUTF8.
(If you write an algorithm and it fails ask another question showing your code, somebody will probably help you. But don't expect someone to write it for you.)
HTH

How to put string into string after specific string

I am new in programming. I have string NSString *string = #"\U0420\U043e\U0437\U044b"; and after each slash('\') i need put another slash to get string like this #"\\U0420\\U043e\\U0437\\U044b"
I am new to programming and objective-c. please help.
My original answer was:
Use [NSString stringByReplacingOccurrencesOfString:withString:] (reference).
NSString *string = #"\U0420\U043e\U0437\U044b";
NSString *converted = [string stringByReplacingOccurrencesOfString:#"\\"
withString:#"\\\\\\"];
However I now don't think that's right given the \ characters won't actually exist in string; instead the compiler will convert each of those sequences into a unicode character. You will need to encode string as this:
NSString *string = #"\\U0420\\U043e\\U0437\\U044b";
In order to use the above code. I cannot see any alternative to this.
Further Update: Often when I've come across questions like this there is a confusion between string literals and string data. In your question those \ characters won't appear as the compiler will have converted them into unicode characters (\Uxxx is a unicode escape sequence for a single character). However if you provided a string like that at runtime (say read from a text file) then those \ characters will exist and you can use the code above.

What is the right way to replace a given unicode char in an NSString instance?

I have an NSString instance (let's called it myString) containing the following UTF-8 unicode character: \xc2\x96 ( that is the long dash seen in, e.g., MS Word ).
When printing the NSString to the console using NSLog and the %# format specifier, the character is replaced by an upside-down question mark indicating that something is wrong - and when using it as text in a table cell, the unicode character simply appears as blank space ( not the empty string - a blank space ).
To solve this, I would like to replace the \xc2\x96 unicode character with a "normal" dash - at first I thought this should be a 10 sec. task but after some research I have not yet found the "right way" to do this and this is where I would like your help.
What I have tried:
When I print myString in hex like this NSLog(#"%x", myString) I get the hex value: 96 for the unicode character representing the unicode character \xc2\x96.
Using this information I have made the following implementation to replace it with its "normal" dash equivalent:
for(int index = 0; index < [myString length]; index++)
{
NSLog(#"Hex:'%x' Char:'%c'", [myString characterAtIndex:index],[myString characterAtIndex:index]);
if([[NSString stringWithFormat:#"%x", [myString characterAtIndex:index]] isEqualToString:#"96"])
myString = [myString stringByReplacingCharactersInRange:NSMakeRange(index, 1) withString:#"-"];
}
... it works, but my eyes don't like it, and I would like to know if this can be done in much more cleaner and "right" way? E.g. like C#'s String.Replace(char,char) which supports unicode characters .
So to wrap up:
I'm looking for the "right way" to replace unicode chars in a string - I have done some research, but apparently, there is only methods available that replaces occurrences of a given NSString with another NSString.
I have read the following:
https://stackoverflow.com/a/5223737/700926
https://stackoverflow.com/a/5217703/700926
https://stackoverflow.com/a/714009/700926
https://stackoverflow.com/a/668254/700926
https://stackoverflow.com/a/2039396/700926
... but all of them explains how to replace a given NSString with another NSString and do not cover how specific unicode characters ( in particular double byte ) can be replaced.
You can make your string mutable (i. e. use an NSMutableString instead of an NSString). Also, the call to [[NSString stringWithFormat:#"%x", character] isEqualToString:#"96"] is as inefficient as possible - why not simply if (character == 0x96)? All in all, try
NSString *longDash = #"\xc2\x96";
[string replaceOccurrencesOfString:longDash withString:#"-"];

Composing unicode char format for NSString

I have a list of unicode char "codes" that I'd like to print using \u escape sequence (e.g. \ue415), as soon as I try to compose it with something like this:
// charCode comes as NSString object from PList
NSString *str = [NSString stringWithFormat:#"\u%#", charCode];
the compiler warns me about incomplete character code. Can anyone help me with this trivial task?
I think you can't do that the way you're trying - \uxxx escape sequence is used to indicate that a constant is a unicode character - and that conversion is processed at compile-time.
What you need is to convert your charCode to an integer number and use that value as format parameter:
unichar codeValue = (unichar) strtol([charCode UTF8String], NULL, 16);
NSString *str = [NSString stringWithFormat:#"%C", charCode];
NSLog(#"Character with code \\u%# is %C", charCode, codeValue);
Sorry, that nust not be the best way to get int value from HEX representation, but that's the 1st that came to mind
Edit: It appears that NSScanner class can scan NSString for number in hex representation:
unichar codeValue;
[[NSScanner scannerWithString:charCode] scanHexInt:&codeValue];
...
Beware that not all characters can be encoded in UTF-8. I had a bug yesterday where some Korean characters were failing to be encoded in UTF-8 properly.
My solution was to change the format string from %s to %# and avoid the re-encoding issue, although this may not work for you.
Based on codes from #Vladimir, this works for me:
NSUInteger codeValue;
[[NSScanner scannerWithString:#"0xf8ff"] scanHexInt:&codeValue];
NSLog(#"%C", (unichar)codeValue);
not leading by "\u" or "\\u", from API doc:
The hexadecimal integer representation may optionally be preceded
by 0x or 0X. Skips past excess digits in the case of overflow,
so the receiver’s position is past the entire hexadecimal representation.

NSString Decoding Problem

This String is base64 encoded string:
NSString *string=#"ë§ë ë¼ì´";
This is not show the orginal string:
NSLog(#"String is %#",[string cStringUsingEncoding:NSMacOSRomanStringEncoding]);
That's not a Base64-encoded string. There are a couple other things going on with your code, too:
You can't include literal non-ASCII characters inside a string constant; rather, you have to use the bytes that make up the character, prefixed with \x; or in the case of Unicode, you can use the Unicode code point, prefixed with \u. So your string should look something like NSString *string = #"\x91\xa4\x91 \x91\x93";. But...
The characters ¼ and ´ aren't part of the MacRoman encoding, so you'll have trouble using them. Are you sure you want a MacRoman string, rather than a Unicode string? Not many applications use MacRoman anymore, anyway.
cStringUsingEncoding: returns a C string, which should be printed with %s, not %#, since it's not an Objective-C object.
That said, your code will sort of work with:
// Using MacRoman encoding in string constant
NSString *s = #"\x91\xa4\x91 \x91\x93";
NSLog(#"%s", [s cStringUsingEncoding:NSMacOSRomanStringEncoding]);
I say "sort of work" because, again, you can't represent that code in MacRoman.
That would be because Mac OS Roman is nothing like base-64 encoding. Base-64 encoding is a further encoding applied the bytes that represent the original string. If you want to see the original string, you will first need to base-64 decode the bytestring and then figure out the original string encoding in order to interpret it.