How to display UTF8 string in OS X Terminal - objective-c

I can't believe I couldn't find a solution to this very simple issue. I have a command line tool in Objective C, and need to display UTF8 strings (with non-English characters) in the console. I can't use NSLog as it also display process information, PID, timestamp etc. too. printf doesn't handle non-English characters well.
How can I print non-English characters in the Terminal, without any timestamps? Am I missing something really obvious here, or is such an extremely simple task really non-trivial in OS X?
I've tried:
printf: Doesn't display non-English characters.
NSLog: Displays PID/timestamp, which I don't want.
DLog (from https://stackoverflow.com/a/17311835/811405): Doesn't display non-English characters.

This works just fine:
printf("%s\n", [#"Can Poyrazoğlu" UTF8String]);

The macro you've tried to use depends on CFShow which doesn't print Unicode characters but only their escape codes. More information regarding this behaviour here.
So you could either use something else for your macro instead of CFShow to print to console without any timestamps or you could use an NSLog replacement library I wrote, Xcode Logger and use its XLog_NH logger which prints only the output without any other information.

Using stdio:
puts([#"Can Poyrazoğlu" UTF8String]);
Using write:
const char* example = [#"Can Poyrazoğlu" UTF8String];
write(STDOUT_FILENO, example, strlen(example));

Related

ANSI escape codes in GNU Smalltalk

I'm trying to make a console-based program that makes use of ANSI escape codes with GNU Smalltalk. I can't seem to figure out how to go about printing a string object formatted with ANSI escape codes. I've tried the following.
'\x1b[31mHi' displayNl
This prints the entire string, including the escape code, without any formatting. I would have expected this to print "Hi" in red (and then everything else in the console after that, as I didn't reset the color.)
After googling a bit, I was able to find a couple issues on mailing lists where people were trying to produce things like newlines using "\n". Most of the answers were using the Transcript object's cr method, but I didn't find anything about colors in the textCollector class.
It looks like it shouldn't be all that hard to create my own module in C to achieve this functionality, but I'd like to know if there's a better way first.
I'm aware of the ncurses bindings, but I'm not sure that'd be practical for just making certain pieces of text in the program colored. So, is there a standard way of outputting colored text to the terminal in GNU Smalltalk using ANSI escape sequences?
Ended up getting an answer on the GNU Smalltalk mailing list. Looks like you can use an interpolation operator to achieve this.
For example ('%1[31mHi' % #($<16r1B>)) displayNl. would change the color to red, and ('%1[34mHi' % #($<16r1B>)) displayNl. would change the color to blue.
Basically, the % operator looks for a sequences that look like "%(number)" and replaces them with the objects in the array to the right of the operator. In our case, the array has one item, which is the ascii escape character in hexadecimal. So the "%1" in "%1[31mHi' is being replaced with the escape character, and then printed.
(This answer was stolen almost verbatim from Paolo on the GNU Smalltalk mailing list.)

Encoding issue in I/O with Jena

I'm generating some RDF files with Jena. The whole application works with utf-8 text. The source code as well is stored in utf-8.
When I print a string contaning non-English characters on the console, I get the right format, e.g. Est un lieu généralement officielle assis....
Then, I use the RDF writer to output the file:
Model m = loadMyModelWithMultipleLanguages()
log.info( getSomeStringFromModel(m) ) // log4j, correct output
RDFWriter w = m.getWriter( "RDF/XML" ) // default enc: utf-8
w.setProperty("showXmlDeclaration","true") // optional
OutputStream out = new FileOutputStream(pathToFile)
w.write( m, out, "http://someurl.org/base/" )
// file contains garbled text
The RDF file starts with: <?xml version="1.0"?>. If I add utf-8 nothing changes.
By default the text should be encoded to utf-8.
The resulting RDF file validates ok, but when I open it with any editor/visualiser (vim, Firefox, etc.), non-English text is all messed up: Est un lieu généralement officielle assis ... or Est un lieu g\u221A\u00A9n\u221A\u00A9ralement officielle assis....
(Either way, this is obviously not acceptable from the user's viewpoint).
The same issue happens with any output format supported by Jena (RDF, NT, etc.).
I can't really find a logical explanation to this.
The official documentation doesn't seem to address this issue.
Any hint or tests I can run to figure it out?
My guess would be that your strings are messed up, and your printStringFromModel() method just happens to output them in a way that accidentally makes them display correctly, but it's rather hard to say without more information.
You're instructing Jena to include an XML declaration in the RDF/XML file, but don't say what encoding (if any) Jena declares in the XML declaration. This would be helpful to know.
You're also not showing how you're printing the strings in the printStringFromModel() method.
Also, in Firefox, go to the View menu and then to Character Encoding. What encoding is selected? If it's not UTF-8, then what happens when you select UTF-8? Do you get it to show things correctly when selecting some other encoding?
Edit: The snippet you show in your post looks fine and should work. My best guess is that the code that reads your source strings into a Jena model is broken, and reads the UTF-8 source as ISO-8859-1 or something similar. You should be able to confirm or disconfirm that by checking the length() of one of the offending strings: If each of the troublesome characters like é are counted as two, then the error is on reading; if it's correctly counted as one, then it's on writing.
My hint/answer would be to inspect the byte sequence in 3 places:
The data source. Using a hex editor, confirm that the é character in your source data is represented by the expected utf-8 hex sequence 0xc3a8.
In memory. Right after your call to printStringFromModel, put a breakpoint and inspect the bytes in the string (or convert to hex and print them out.
The output file. Again, use a hex editor to inspect the byte sequence is 0xc3a8.
This will tell exactly what is happening to the bytes as they travel along the path of your program, and also where they deviate from the expected 0xc3a8.
The best way to address this would be to package up the smallest unit of your code that you can that demonstrates the issue, and submit a complete, runnable test case as a ticket on the Jena Jira.

Outputting Formatted NSString

UIAlertView *message = [[UIAlertView alloc] initWithTitle:[[LanguageManager sharedLanguageManager] get:#"Notice"]
message:[NSString stringWithFormat:[[LanguageManager sharedLanguageManager] get:#"Notice_Text"]]
delegate:nil
cancelButtonTitle:[[LanguageManager sharedLanguageManager] get:#"Close"]
otherButtonTitles:nil];
Hi, let me explain my codes above. Basically it calls up an UIAlertView with data read from a .plist via my LanguageManager singleton class. The LanguageManager get function basically returns a NSString*. I know I should use the NSLocalizedString class but I had been using this class for a while now, so I had decided to stick to it.
My problem lies with the "message:" parameter. The string I am trying to read contains formatting characters like \n but it does not output correctly and appears as \n instead of a line break when printed. I also get the "Format string is not a string literal" warning. Other parts of the app using similar method to return a string which contains %d or %f works correctly though, just the '\n' character not working.
Does anyone have any idea how I may overcome this?
"\n" is not a "formatting character": the compiler translates it to the appropiate code; the string NEVER contains the "\" and "n" characters.
Thus, if you string comes from a source that is NOT compiled by a (Objective-)C(++) compiler, "\n" will be just the two characters. Nothing will turn them into a newline, unless you do it yourself with something like
NewString=[MyString stringByReplacingOccurrencesOfString:#"\\n" withString:#"\n"];
Note the two different strings: in the first case, "\" prevents the compiler from doing the \n -> newline conversion, while the second string will be an actual newline.
The warning about a non-literal format string is somewhat pointless; I've yet to find a good way to get rid of that one (for now, I just disable it entirely, using -Wno-format-nonliteral on clang++ >= 4.0).

SQL Server to CSV character encoding

I have a SQL Server database extract I'm doing.
At the beginning of my program, I have:
ini_set('mssql.charset','cp1250');
My database calls do not do anything special.
I'm only call the following methods:
mssql_connect, mssql_select_db, mssql_query, mssql_fetch_object,
mssql_next_result and mssql_close.
When I print the output of my export on screen, all the characters look fine. When I export fputcsv() into a csv file, I get a ton of <92> and <93> characters (this is the way that they look when I use a terminal to read them). When I open the file using Excel, they look like ì, í and î
This is causing major problems. Do you have any ideas?
try to convert encoding into utf8
iconv('cp1250', 'utf-8', $text);
also print this:
var_dump(iconv_get_encoding('all'));
Thanks but it turns out that the problem isn't with the encoding so much as it is with the fact that my fputcsv() call actually was not specifying a delimiter. I chose "\t" for the delim and everything worked perfectly.

How do I format an NSString over multiple lines

Does anyone know how to format an NSString over multiple lines?
e.g. this doesn't build:
return #"asdfasdf" +
"asdfasdf";
return #"asdfasdf"
#"asdfasdf";
I suggest using this syntax instead of
return #"asdfasdf"
"asdfasdf";
just to distinguish C-strings from ObjectiveC ones.
I was having this problem all the time (especially with HTML strings), so I made a tiny tool to convert text to an escaped multi-line Objective-C string:
http://multilineobjc.herokuapp.com/
Hope this saves you some time.
If you remove the +, the compiler will join the two strings together. See C syntax: string literal concatenation.
return #"asdfasdf"
"asdfasdf";
Note that neither GCC nor LLVM seem to care if you omit the # prefix from the later strings.