Unicode Character in Core Graphics - objective-c

I have an iPad app where I am storing some unicode (non-ASCII) data in a SQLite DB. Later, I retrieve that data and need to write it to a pdf. All ASCII data does fine, but the unicode data is presenting as "encoded characters"
The process goes like this:
Retrieve the data and write it to the console. This goes fine. Here is the output to the console: LOOKUP:↑ to enable all activities:FOR:COORDINATIONGOAL:
(The little up arrow is my unicode character)
At this point, the data is stored in an NSString. So next, I convert it to a char so I can use it in core graphics:
NSString *t = [data objectForKey:pi.persistencekey];
char *text = " ";
if ([t length] > 0) {
text = [t UTF8String];
}
CGContextShowTextAtPoint (pdfContext, pi.x, pageRect.size.height - pi.y, text, strlen(text));
The PDF generates fine, but the text generated shows strange characters where the up arrow is supposed to be.
I have tried other decoding methods and none work.
Thanks in advance for any help.

CGContextShowTextAtPoint has some known weaknesses with drawing unicode. Basically it doesn't.
You have two options:
NSString's drawAtPoint and it's friends. They can be found by searching doc for UIStringDrawing.
or if you need to go low-level, Core Text.
Most of the time (99%) 1. is good enough.

Related

Using libqrencode library

I loaded libqrencode library in my cocoa project but I'm not sure how to use it exactly. I have a text field in which you type a text, and once done you click a button and I log that text with NSLog. Now I want to encode that text to be able to use it later and generate a QRcode out of it, so in the manual it's saying to use this format
QRcode* QRcode_encodeString (const char * string,
int version,
QRecLevel level,
QRencodeMode hint,
int casesensitive
)
I am not sure how to use that in my method to log the results as well
- (IBAction)GenerateCode:(id)sender {
NSString *urlText = [[NSString alloc] initWithFormat:#"%#", [_urlField stringValue]];
NSLog(#"The url is %#", urlText);
}
You need to get from an NSString instance to a const char *. This has been answered several times on SO, but here's one. Once you do that, you can call QRCode_encodeString() directly and pass whatever you desire for the arguments.
If you need more specifics, you'll have to try something, post your code, and describe how it's not working for you so we can help you more directly without just writing it for you.

Unihan: combining UTF-8 chars

I am using data that involves Chinese Unihan characters in an Objective-C app. I am using a voice recognition program (cmusphinx) that returns a phrase from my data. It returns UTF-8 characters and when returning a Chinese character (which is three bytes) it separates it into three separate characters.
Example: When I want 人 to, I see: ‰∫∫. This is the proper in coding (E4 BA BA), but my code sees the returned value as three seperate characters rather than one.
Actually, my function is receiving the phrase as an NSString, (due to a wrap around) which uses UTF-16. I tried using Objective-C's built in conversion methods (to UTF-8 and from UTF-16), but these keep my string as three characters.
How can I decode these three separate characters into the one utf-8 codepoint for the Chinese character?
Or how can I properly encode it?
This is code fragment dealing with the cstring returned from sphinx and its encoding to a NSString:
const char * hypothesis = ps_get_hyp(pocketSphinxDecoder, &recognitionScore, &utteranceID);
NSString *hypothesisString = [[NSString alloc] initWithCString:hypothesis encoding:NSMacOSRomanEncoding];
Edit: From looking at the addition to your post, you actually do have control over the string encoding. In that case, why are you creating the string with NSMacOSRomanEncoding when you're expecting utf-8? Just change that to NSUTF8StringEncoding.
It sounds like what you're saying is you're being given an NSString that contains UTF-8 data that's being interpreted as a single-byte encoding (e.g. ISO-Latin-1, MacRoman, etc). I'm assuming here that you have no control over the code that creates the NSString, because if you did then the solution is just to change the encoding it's initializing with.
In any case, what you're asking for is a way to take the data in the string and convert it back to UTF-8. You can do this by creating an NSData from the NSString using whatever encoding its was originally created with (you need to know this much, at least, or it won't work), and then you can create a new NSString from the same data using UTF-8.
From the example character you gave (人) it looks like it's being interpreted as MacRoman, so lets go with that. The following code should convert it back:
- (NSString *)fixEncodingOfString:(NSString *)input {
CFStringEncoding cfEncoding = kCFStringEncodingMacRoman;
NSStringEncoding encoding = CFStringCovnertEncodingToNSStringEncoding(cfEncoding);
NSData *data = [input dataUsingEncoding:encoding];
if (!data) {
// the string wasn't actually in MacRoman
return nil;
}
NSString *output = [[[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding] autorelease];
}

Convert special characters like ë,à,é,ä all to e,a,e,a? Objective C

Is there a simple way in objective c to convert all special characters like ë,à,é,ä to the normal characters like e en a?
Yep, and it's pretty simple:
NSString *src = #"Convert special characters like ë,à,é,ä all to e,a,e,a? Objective C";
NSData *temp = [src dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *dst = [[[NSString alloc] initWithData:temp encoding:NSASCIIStringEncoding] autorelease];
NSLog(#"converted: %#", dst);
Running that on my machine produces:
EmptyFoundation[69299:a0f] converted: Convert special characters like e,a,e,a all to e,a,e,a? Objective C
Basically, we're asking the string to transform itself it an NSData (ie, a byte array) that represents the characters in the string in the ASCII character set. Since not all of the characters in the original string are in ASCII, we tell the string that it's OK to do a "lossy" conversion. In other words, it's OK to turn "é" into "e", and so on.
Once we've got our byte array, we simply turn it back into a string, and we're done! :)
CFStringTransform
CFStringTransform is the solution when you are dealing with a specific language. It transliterates strings in ways that simplify normalization, indexing, and searching. For example, it can remove accent marks using the option kCFStringTransformStripCombiningMarks:
CFMutableStringRef string = CFStringCreateMutableCopy(NULL, 0, CFSTR("Schläger"));
CFStringTransform(string, NULL, kCFStringTransformStripCombiningMarks,
false);
... => string is now “Schlager” CFRelease(string);
CFStringTransform is even more powerful when you are dealing with non-Latin writing systems such as Arabic or Chinese. It can convert many writing systems to Latin script, making normalization much simpler.
For example, you can convert Chinese script to Latin script like this:
CFMutableStringRef string = CFStringCreateMutableCopy(NULL, 0, CFSTR("你好"));
CFStringTransform(string, NULL, kCFStringTransformToLatin, false);
... => string is now “nˇı hˇao”
CFStringTransform(string, NULL, kCFStringTransformStripCombiningMarks,
false);
... => string is now “ni hao” CFRelease(string);
Notice that the option is simply kCFStringTransformToLatin.
The source language is not required. You can hand almost any string to
this transform without having to know first what language it is in.
CFStringTransform can also transliterate from Latin script to other
writing systems such as Arabic, Hangul, Hebrew, and Thai.
References: iOS 7 Programming: Pushing to the limits

how i can display a NSDATA description not in hex but in human redeable text?

I'm writing an iphone apps that receive data from an udp socket i can dispay what i receive using NSDATA description but is in hex format and for example it look like this:
<4c61742c 34343039 3131302e 35302c4c 6f6e2c39 38343237 352e3934 2c482c32 37392e30 302c4b6e 6f74732c 302e3032 2c4e616d 652c504c 41594552 2c496e64 65782c30 2c4d756c 7469706c 61796572 4e756d62 65722c30 00>
i know that thi is a prase of compite sens how i can convert it ?
NSString * s = [[[NSString alloc] initWithData:yourData encoding:NSUTF8StringEncoding] autorelease];
I'm going to assume you want to just debug the binary data. If that's not the case, what I describe below won't be terribly useful.
One technique I like is to use 0xED. Copy that big binary dump (everything inside the angle brackets) and paste it into a newly created 0xED document. You get a nice hex editor view.
0xED also supports user-defined plugins which can help visualize the binary data (for instance, converting an 8 byte timestamp to an NSDate)

Objective c doesn't like my unichars?

Xcode complaints about "multi-character character contant"'s when I try to do the following:
static unichar accent characters[] = { 'ā', 'á', 'ă', 'à' };
How do you make an array of characters, when not all of them are ascii? The following works just fine
static unichar accent[] = { 'a', 'b', 'c' };
Workaround
The closest work around I have found is to convert the special characters into hex, ie this works:
static unichar accent characters[] = { 0x0100, 0x0101, 0x0102 };
It's not that Objective-C doesn't like it, it's that C doesn't. The constant 'c' is for char which has 1 byte, not unichar which has 2 bytes. (see the note below for a bit more detail.)
There's no perfectly supported way to represent a unichar constant. You can use
char* s="ü";
in a UTF-8-encoded source file to get the unicode C-string, or
NSString* s=#"ü";
in a UTF-8 encoded source file to get an NSString. (This was not possible before 10.5. It's OK for iPhone.)
NSString itself is conceptually encoding-neutral; but if you want, you can get the unicode character by using -characterAtIndex:.
Finally two comments:
If you just want to remove accents from the string, you can just use the method like this, without writing the table yourself:
-(NSString*)stringWithoutAccentsFromString:(NSString*)s
{
if (!s) return nil;
NSMutableString *result = [NSMutableString stringWithString:s];
CFStringFold((CFMutableStringRef)result, kCFCompareDiacriticInsensitive, NULL);
return result;
}
See the document of CFStringFold.
If you want unicode characters for localization/internationalization, you shouldn't embed the strings in the source code. Instead you should use Localizable.strings and NSLocalizedString. See here.
Note:
For arcane historical reasons, 'a' is an int in C, see the discussions here. In C++, it's a char. But it doesn't change the fact that writing more than one byte inside '...' is implementation-defined and not recommended. For example, see ISO C Standard 6.4.4.10. However, it was common in classic Mac OS to write the four-letter code enclosed in single quotes, like 'APPL'. But that's another story...
Another complication is that accented letters are not always represented by 1 byte; it depends on the encoding. In UTF-8, it's not. In ISO-8859-1, it is. And unichar should be in UTF-16. Did you save your source code in UTF-16? I think the default of XCode is UTF-8. GCC might do some encoding conversion depending on the setup, too...
Or you can just do it like this:
static unichar accent characters[] = { L'ā', L'á', L'ă', L'à' };
L is a standard C keyword which says "I'm about to write a UNICODE character or character set".
Works fine for Objective-C too.
Note: The compiler may give you a strange warning about too many characters put inside a unichar, but you can safely ignore that warning. Xcode just doesn't deal with the unicode characters the right way, but the compiler parses them properly and the result is OK.
Depending on your circumstances, this may be a tidy way to do it:
NSCharacterSet* accents =
[NSCharacterSet characterSetWithCharactersInString:#"āáăà"];
And then, if you want to check if a given unichar is one of those accent characters:
if ([accents characterIsMember:someOtherUnichar])
{
}
NSString also has many methods of its own for handling NSCharacterSet objects.