Spanish characters replaced with weird string when 'stringWithFormat' is used? - objective-c

NSString *myString = [NSString stringWithFormat:#"%#",BernabÈu];
NSLog(#"%#", myString);
Above statement prints:
Bernab\u00c8u
Here 'BernabÈu' is Spanish character string.
Why is the "\u00c8u" appended? How to get rid of it?

The because the '\u00c8' is the unicode representation of the E. I don't have the code handy, but you will have to look into using Locale's I think to get it to print with the correct character. But don't worry. Java still understands that this is an E.
(don't have the correct 'E' handy either :-)

Related

Parsing file with percent signs (%) in Objective-C

I'm writing a parser for fortune files. Fortune is a small app on *nix platforms that just prints out a random "fortune". The fortune files are straight text, with each fortune being separated by a percent sign on its own line. For example:
A little suffering is good for the soul.
-- Kirk, "The Corbomite Maneuver", stardate 1514.0
%
A man either lives life as it happens to him, meets it head-on and
licks it, or he turns his back on it and starts to wither away.
-- Dr. Boyce, "The Menagerie" ("The Cage"), star date unknown
%
What I've found is that when parsing the file, stringWithContentsOfFile returns a string with the % signs in place. For example:
#"A little suffering is good for the soul.\n\t\t-- Kirk, \"The Corbomite Maneuver\", stardate 1514.0\n%\nA man either lives life as it happens to him, meets it head-on and\nlicks it, or he turns his back on it and starts to wither away.\n\t\t-- Dr. Boyce, \"The Menagerie\" (\"The Cage\"), stardate unknown\n%"
However, when I call componentsSeparatedByCharactersInSet on the file contents, everything is parsed as a string, with the exception of the percent signs, which are NSTaggedPointerString. When I print out the lines, the percent signs are gone.
Is this because the percent sign is a format specifier for strings? I would think in that case that the initial content pull would escape those.
Here's the code:
NSFileManager *fileManager;
fileManager = [NSFileManager defaultManager];
NSStringEncoding stringEncoding;
// NSString *fileContents = [NSString stringWithContentsOfFile:fileName encoding:NSASCIIStringEncoding error:nil];
NSString *fileContents = [NSString stringWithContentsOfFile:fileName usedEncoding:&stringEncoding error:nil];
NSArray *fileLines = [fileContents componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]];
The used encoding ends up being UTF-8. You can see I have also tried specifying plain ASCII, but it yields the same results.
So the question is, how do I retain the percent signs? Or, may I should use it as the separator character and then parse each of the subsequent results individually.
You are calling NSLog() but passing the line strings as the format string. Something like:
NSLog(lineString);
Therefore, any percent characters in the line strings are interpreted as format specifiers. You should (almost) never pass strings that come from outside sources — i.e. strings which are not hard-coded in your code — as format strings to any function (NSLog(), printf(), +[NSString stringWithFormat:], etc.). It's not safe and you'll sometimes get unexpected results like you've seen.
You should always log a single string like this:
NSLog(#"%#", lineString);
That is, you need to pass a hard-coded format string and use the foreign string as data for that to format.
NSTaggedPointerString is just subclass of NSString. You can use anywhere as NSString.
But in your string
#"A little suffering is good for the soul.\n\t\t-- Kirk, \"The Corbomite Maneuver\", stardate 1514.0\n%\nA man either lives life as it happens to him, meets it head-on and\nlicks it, or he turns his back on it and starts to wither away.\n\t\t-- Dr. Boyce, \"The Menagerie\" (\"The Cage\"), stardate unknown\n%"
sign % is not percent sign. in Objective-C percent sign is declared as double of % mark
#"%%"

OS X Using literal asterisk in regular expression

I'm writing a program to make text that begins with /* and ends with */ a different color (syntax highlighting for a C comment). When I try this
#"/\*.*\*/";
I get unknown escape sequence. So I figured that to get a literal asterisk I had to use this
#"/[*].*[*]/";
and I get no errors, but when I use this code
commentPattern = #"/[*].*[*]/";
reg = [NSRegularExpression regularExpressionWithPattern:commentPattern options:kNilOptions error:nil];
results = [reg matchesInString:self.string options:kNilOptions range:NSMakeRange(0, [self.string length])];
for (NSTextCheckingResult *result in results)
{
[self setTextColor:[NSColor colorWithCalibratedRed:0.0 green:0.7 blue:0.0 alpha:1.0] range:result.range];
}
the text color of the comments doesn't change, but I don't see anything wrong with my regular expression. Can someone tell me why this wont work? I don't think it's a problem with the way I get the results or change their color, because I use the same method for other regular expressions.
You want to use this: "\\*".
\* is the escape sequence for * in regular expressions, but in C strings, \ also begins an escaped character token, so you have to escape that as well.
#"/\*.*\*/";
I get unknown escape sequence.
A string first converts escape sequences in the string, then the result is handed over to the regex engine. For instance, an escape sequence might be \t, which represents a tab, or \n which represents a newline. The string first converts an escape sequence to a special code. Your error is saying that \* is not a legal escape sequence for an NSString.
The regex engine needs to see a literal back slash followed by a *. To get a literal back slash in a string you need to write \\. However, for readability I prefer using a character class like you did with your second attempt.
You should NSLog what the results array contains to see what matches you are getting. If the matches are what you expect, then the problem is not with the regex.

objective c UTF8String not working with japanese

I would like to show the NSString below on my UILabel:
NSString *strValue=#"你好";
but i can not show it on my UILabel i get strange characters!
I use this code to show the text:
[NSString stringWithCString:[strValue UTF8String] encoding:NSUTF8StringEncoding];
I tried [NSString stringWithCString:[strValue cStringUsingEncoding:NSISOLatin1StringEncoding] encoding:NSUTF8StringEncoding] and it worked
but i can not show emoticons with cStringUsingEncoding:NSISOLatin1StringEncoding so i have to use UTF8String.
Any help appreciated.
Your source file is in UTF-8, but the compiler you are using thinks it's ISO-Latin 1. What you think is the string #"你好" is actually the string #"你好". But when you ask NSString* to give you this back as ISO-Latin 1, and treat it as UTF-8, you've reversed the process the compiler took and you end up with the original string.
One solution that you can use here is to tell your compiler what encoding your source file is in. There is a compiler flag (for GCC it's -finput-charset=UTF-8, not sure about clang) that will tell the compiler what encoding to use. Curiously, UTF-8 should be the default already, but perhaps you're overriding this with a locale.
A more portable solution is to use only ASCII in your source file. You can accomplish this by replacing the non-ASCII chars with a string escape using \u1234 or \U12345678. In your case, you'd use
NSString *strValue=#"\u4F60\u597D";
Of course, once you get your string constant to be correct, you can ditch the whole encoding stuff and just use strValue directly.

What is the right way to replace a given unicode char in an NSString instance?

I have an NSString instance (let's called it myString) containing the following UTF-8 unicode character: \xc2\x96 ( that is the long dash seen in, e.g., MS Word ).
When printing the NSString to the console using NSLog and the %# format specifier, the character is replaced by an upside-down question mark indicating that something is wrong - and when using it as text in a table cell, the unicode character simply appears as blank space ( not the empty string - a blank space ).
To solve this, I would like to replace the \xc2\x96 unicode character with a "normal" dash - at first I thought this should be a 10 sec. task but after some research I have not yet found the "right way" to do this and this is where I would like your help.
What I have tried:
When I print myString in hex like this NSLog(#"%x", myString) I get the hex value: 96 for the unicode character representing the unicode character \xc2\x96.
Using this information I have made the following implementation to replace it with its "normal" dash equivalent:
for(int index = 0; index < [myString length]; index++)
{
NSLog(#"Hex:'%x' Char:'%c'", [myString characterAtIndex:index],[myString characterAtIndex:index]);
if([[NSString stringWithFormat:#"%x", [myString characterAtIndex:index]] isEqualToString:#"96"])
myString = [myString stringByReplacingCharactersInRange:NSMakeRange(index, 1) withString:#"-"];
}
... it works, but my eyes don't like it, and I would like to know if this can be done in much more cleaner and "right" way? E.g. like C#'s String.Replace(char,char) which supports unicode characters .
So to wrap up:
I'm looking for the "right way" to replace unicode chars in a string - I have done some research, but apparently, there is only methods available that replaces occurrences of a given NSString with another NSString.
I have read the following:
https://stackoverflow.com/a/5223737/700926
https://stackoverflow.com/a/5217703/700926
https://stackoverflow.com/a/714009/700926
https://stackoverflow.com/a/668254/700926
https://stackoverflow.com/a/2039396/700926
... but all of them explains how to replace a given NSString with another NSString and do not cover how specific unicode characters ( in particular double byte ) can be replaced.
You can make your string mutable (i. e. use an NSMutableString instead of an NSString). Also, the call to [[NSString stringWithFormat:#"%x", character] isEqualToString:#"96"] is as inefficient as possible - why not simply if (character == 0x96)? All in all, try
NSString *longDash = #"\xc2\x96";
[string replaceOccurrencesOfString:longDash withString:#"-"];

Composing unicode char format for NSString

I have a list of unicode char "codes" that I'd like to print using \u escape sequence (e.g. \ue415), as soon as I try to compose it with something like this:
// charCode comes as NSString object from PList
NSString *str = [NSString stringWithFormat:#"\u%#", charCode];
the compiler warns me about incomplete character code. Can anyone help me with this trivial task?
I think you can't do that the way you're trying - \uxxx escape sequence is used to indicate that a constant is a unicode character - and that conversion is processed at compile-time.
What you need is to convert your charCode to an integer number and use that value as format parameter:
unichar codeValue = (unichar) strtol([charCode UTF8String], NULL, 16);
NSString *str = [NSString stringWithFormat:#"%C", charCode];
NSLog(#"Character with code \\u%# is %C", charCode, codeValue);
Sorry, that nust not be the best way to get int value from HEX representation, but that's the 1st that came to mind
Edit: It appears that NSScanner class can scan NSString for number in hex representation:
unichar codeValue;
[[NSScanner scannerWithString:charCode] scanHexInt:&codeValue];
...
Beware that not all characters can be encoded in UTF-8. I had a bug yesterday where some Korean characters were failing to be encoded in UTF-8 properly.
My solution was to change the format string from %s to %# and avoid the re-encoding issue, although this may not work for you.
Based on codes from #Vladimir, this works for me:
NSUInteger codeValue;
[[NSScanner scannerWithString:#"0xf8ff"] scanHexInt:&codeValue];
NSLog(#"%C", (unichar)codeValue);
not leading by "\u" or "\\u", from API doc:
The hexadecimal integer representation may optionally be preceded
by 0x or 0X. Skips past excess digits in the case of overflow,
so the receiver’s position is past the entire hexadecimal representation.