Xcode Sqlite Encoding Turkish characters - objective-c

We have records like 'GÜLHAN', 'Yılan', 'çekiç' in our Sqlite database.
These words include Turkish characters and the problem is that we can not read these words correctly, for example; we read 'GEDf∞k' instead of 'GEDİK'.
How can we solve this sqlite reading problem in xcode?

What encoding did you use to store data in DB? Should not be any problems if it's UTF8.
char *data = (char *) sqlite3_column_text (stmt, 1);
NSString *string = [NSString stringWithUTF8String:data];
If this gives you unexpected results, then it's not UTF8 and it's probably a good idea to re-encode everything in DB to UTF8 first.

Related

Prevent NSString to escape contents

Sorry for maybe a newbie question.
For various reasons I am stuck with a peculiar string that looks like this:
NSString *myString = #"A\\314\\212A\\314\\210O\\314\\210.jpg";
Can I in some ninja-way remove the double \\ and force NSString understand that the string is Uniencoded and should be read like this
NSString *myString = #"A\314\212A\314\210O\314\210.jpg"; // Displays ÅÄÖ as expected
I have tried different strategies tried to replace all slashes ("\"), but as soon as I add a ("\") NSString adds another one to escape the first one. And I get stuck here...
Is it possible to prevent NSString to escape my string?
UPDATE
I am aware this is a special case. Reading the output from a terminal program which reads files on the users drive. Via a NSTask I am capturing the output to into a NSString for parsing and splitting it into an array. It works great as long as there are no non-ascii characters. HFS+ is encoding non-ascii characters with slightly different Unicode called NFD.
When I am capturing the reponse, the ÅÄÖ are already encoded inside qoutes like this:
file.jpg
file2.jpg
"A\314\212A\314\210O\314\210.jpg"
When I create a NSString and with the captured reponse, it gets escaped by NSString a second time.
A\\314\\212A\\314\\210O\\314\\210.jpg
I am aware that this is not the optimal, but right now I have no control over what the terminal program is outputting. Usually when a NSString is created with this NFD encoding, Objectiv-C takes care of the encoding/decoding for you. But since I have a string with mixed and double escaped content, I have a hard way of creating it and make NSString to understand that the content is encoded with this encoding.
Basically I would like to to this:
decodedString = [output stringByReplacingOccurrencesOfString:#"\\\\"
withString:#"\\"];
But behind the scenes NSString is always escaping \ with another \ for you so I would like a way to create "raw" strings with out NSString interfering.
Have tried various ways to try enforing Unicode encoding on NSString but it all boils down to NSString is always capturing and escaping \.
Any tips och points appreciated!
I did not find any way around this other than go the other way around and change the output from the terminal program not to encode it this way.

Objective-C / C Convert UTF8 Literally to Real string

Im wondering how to convert
NSString = "\xC4"; ....
to real NSString represented in normal format
Fundamentally related to xcode UTF-8 literals. Of course, it is ambiguous what you actually mean by "\xC4" - without an encoding specified, it means nothing.
If you mean the character whose Unicode code point is 0x00C4 then I would think (though I haven't tested) that this will do what you want.
NSString *s = #"\u00C4";
First are you sure you have \xC4 in your string? Consider:
NSString *one = #"\xC4\x80";
NSString *two = #"\\xC4\\x80";
NSLog(#"%# | %#", one, two);
This will output:
Ā | \xC4\x80
If you are certain your string contains the four characters \xC4 are you sure it is UTF-8 encoded as ASCII? Above you will see I added \x80, this is because \xC4 is not valid UTF-8, it is the first byte of a two-byte sequence. Maybe you have only shown a sample of your input and the second byte is present, if not you do not have UTF-8 encoded as ASCII.
If you are certain it is UTF-8 encoded as ASCII you will have to convert it yourself. It might seem the Cocoa string encoding methods would handle it, especially as what you appear to have is a string as it might be written in Objective-C source code. Unfortunately the obvious encoding, NSNonLossyAsciiStringEncoding only handles octal and unicode escapes, not the hexadecimal escapes in your string.
You can use any algorithm you like to convert it. One choice would be a simple finite state machine which scans the input a byte at a time and recognises the four byte sequence: \, x, hex-digit, hex-digit; and combines the two hex-digits into a single byte. NSString is not the best choice for byte-at-time string processing, you may be better off converting to C strings, e.g.:
// sample input, all characters should be ASCII
NSString *input = #"\\xC4\\x80";
// obtain a C string containing the ASCII characters
const char *cInput = [input cStringUsingEncoding:NSASCIIStringEncoding];
// allocate a buffer of the correct length for the result
char cOutput[strlen(c2a)+1];
// call your function to decode the hexadecimal escapes
convertAsciiEncodedUTF8(cInput, cOutput);
// create a NSString from the result
NSString *output = [NSString stringWithCString:cOutput encoding:NSUTF8StringEncoding];
You just need to write the finite state machine, or other algorithm, for convertAsciiEncodedUTF8.
(If you write an algorithm and it fails ask another question showing your code, somebody will probably help you. But don't expect someone to write it for you.)
HTH

Incorrect decoding of known UTF-8 string from server

In my application, I am getting some string values from a server, but I'm not ending up with the right string.
بسيط this is the string from server side, but what I am getting is بسÙØ·
I tried to test the response string in an online decoder:
http://www.cafewebmaster.com/online_tools/utf8_encode
It is UTF-8 encoded, but I couldn't decode the string on the iPhone side.
I took a look at these Stack Overflow links as reference
Converting escaped UTF8 characters back to their original form
unicode escapes in objective-c
utf8_decode for objective-c
but none of them helped.
I don't understand from your question the following points:
Do you have access on the server side (I mean the programming of it)?
How do you send and receive data to the server?
For the first question I will assume that the server is programmed to send you text in UTF-8 encoding.
Now on the iPhone if you are sending to the server using sockets use the following:
NSString *messageToSend = #"The text in the language you like";
const uint8_t *str = (uint8_t *) [messageToSend cStringUsingEncoding:NSUTF8StringEncoding];
[self writeToServer:str];
Where the function writeToServer is your function that will send the data to the server.
If you are willing to put the data in a SQLite3 database use:
sqlite3_bind_text(statement, 2, [#"The text in the language you like" UTF8String], -1, NULL);
If you are receiving the data from the server (again using sockets) do the following:
[rowData appendBytes:(const void *)buf length:len];
NSString *strRowData = [[NSString alloc] initWithData:rowData encoding:NSUTF8StringEncoding];
I hope this covers all the cases you need.
Without any source it is hard to say anything conclusive, but at some point you are interpreting a UTF-8 encoded string as ISO-8859-1, and (wrongfully) converting it to UTF-8:
Analysis for string 'بسيط':
raw length: 8
logical length: 4
raw bytes: 0xD8 0xA8 0xD8 0xB3 0xD9 0x8A 0xD8 0xB7
interpreted as ISO-8859-1 (بسÙØ·): 0xC3 0x98 0xC2 0xA8 0xC3 0x98 0xC2 0xB3 0xC3 0x99 0xC2 0x8A 0xC3 0x98 0xC2 0xB7
So at some point you should probably find some reference to ISO-8859-1 in your code. Find it and remove it.
SOLVED the issue from this link
Different kind of UTF8 decoding in NSString
NSString *string = #"بسÙØ·";
I tried
[NSString stringWithUTF8String:(char*)[string cStringUsingEncoding:NSISOLatin1StringEncoding]]
this method
Thank You.

utf8_decode for objective-c [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
unicode escapes in objective-c
I have a LATIN1 string.
Artîsté
When I json_encode it, it escapes some chars and converts it to single byte UTF8.
Art\u00eest\u00e9
If I just json_decode it, I believe it is decoding in UTF8
Artîsté
In order to get my original string back, I have to call utf8_decode
Artîsté
Is there a way to handle this conversion in objective-c?
You might be looking for this:
NSString *string = (some string with non-ASCII characters in it);
char const *string_as_latin1 = [string cStringUsingEncoding:NSISOLatin1StringEncoding];
or possibly this:
NSData *data_latin1 = [string dataUsingEncoding:NSISOLatin1StringEncoding allowLossyConversion:YES];
I have a LATIN1 string.
I don't think you do. Assuming you are talking about PHP, json_encode() only accepts UTF-8 strings, and bails out if it hits a non-UTF-8 high-byte sequence:
json_encode("Art\xeest\xe9")
"Art"
json_encode("Art\xc3\xaest\xc3\xa9")
"Art\u00eest\u00e9"
I think you had a proper UTF-8 string to start with, then you encoded and decoded it to get the exact same UTF-8 string back. But then you're displaying it or processing it in another step you haven't shown us, that treats your string as if it were Latin-1.

Composing unicode char format for NSString

I have a list of unicode char "codes" that I'd like to print using \u escape sequence (e.g. \ue415), as soon as I try to compose it with something like this:
// charCode comes as NSString object from PList
NSString *str = [NSString stringWithFormat:#"\u%#", charCode];
the compiler warns me about incomplete character code. Can anyone help me with this trivial task?
I think you can't do that the way you're trying - \uxxx escape sequence is used to indicate that a constant is a unicode character - and that conversion is processed at compile-time.
What you need is to convert your charCode to an integer number and use that value as format parameter:
unichar codeValue = (unichar) strtol([charCode UTF8String], NULL, 16);
NSString *str = [NSString stringWithFormat:#"%C", charCode];
NSLog(#"Character with code \\u%# is %C", charCode, codeValue);
Sorry, that nust not be the best way to get int value from HEX representation, but that's the 1st that came to mind
Edit: It appears that NSScanner class can scan NSString for number in hex representation:
unichar codeValue;
[[NSScanner scannerWithString:charCode] scanHexInt:&codeValue];
...
Beware that not all characters can be encoded in UTF-8. I had a bug yesterday where some Korean characters were failing to be encoded in UTF-8 properly.
My solution was to change the format string from %s to %# and avoid the re-encoding issue, although this may not work for you.
Based on codes from #Vladimir, this works for me:
NSUInteger codeValue;
[[NSScanner scannerWithString:#"0xf8ff"] scanHexInt:&codeValue];
NSLog(#"%C", (unichar)codeValue);
not leading by "\u" or "\\u", from API doc:
The hexadecimal integer representation may optionally be preceded
by 0x or 0X. Skips past excess digits in the case of overflow,
so the receiver’s position is past the entire hexadecimal representation.