ObjC / iOS: How to retrieve unicode hex code for character? - objective-c

So, I know how to convert a unicode hex code into an NSString consisting of the unicode character referenced by that code:
NSString *ucStr = #"\\u004A"; // hex code for capital J
NSString *theLetter = [ucStr mutableCopy];
CFStringRef transform = CFSTR("Any-Hex/Java");
CFStringTransform((__bridge CFMutableStringRef)theLetter, NULL, transform, YES);
// theLetter is now #"J"
...However, I don't seem to understand how to go in the other direction, i.e. starting with an NSString #"J", output the NSString #"004A".

Simply extract each character and format it using the format string #"%04x", as below:
NSString *input = #"How now brown cow";
for (NSUInteger i = 0; i < [input length]; i++) {
unichar c = [input characterAtIndex:i];
NSLog(#"%04x", (unsigned)c);
// or NSString *s = [NSString stringWithFormat:#"%04x", (unsigned)c];
}
BTW I don't understand the code you have posted, but as that wasn't the question, it doesn't matter.

Related

Get a substring from an NSString until arriving to any letter in an NSArray - objective C

I am trying to parse a set of words that contain -- first greek letters, then english letters. This would be easy if there was a delimiter between the sets.That is what I've built so far..
- (void)loadWordFileToArray:(NSBundle *)bundle {
NSLog(#"loadWordFileToArray");
if (bundle != nil) {
NSString *path = [bundle pathForResource:#"alfa" ofType:#"txt"];
//pull the content from the file into memory
NSData* data = [NSData dataWithContentsOfFile:path];
//convert the bytes from the file into a string
NSString* string = [[NSString alloc] initWithBytes:[data bytes]
length:[data length]
encoding:NSUTF8StringEncoding];
//split the string around newline characters to create an array
NSString* delimiter = #"\n";
incomingWords = [string componentsSeparatedByString:delimiter];
NSLog(#"incomingWords count: %lu", (unsigned long)incomingWords.count);
}
}
-(void)parseWordArray{
NSLog(#"parseWordArray");
NSString *seperator = #" = ";
int i = 0;
for (i=0; i < incomingWords.count; i++) {
NSString *incomingString = [incomingWords objectAtIndex:i];
NSScanner *scanner = [NSScanner localizedScannerWithString: incomingString];
NSString *firstString;
NSString *secondString;
NSInteger scanPosition;
[scanner scanUpToString:seperator intoString:&firstString];
scanPosition = [scanner scanLocation];
secondString = [[scanner string] substringFromIndex:scanPosition+[seperator length]];
// NSLog(#"greek: %#", firstString);
// NSLog(#"english: %#", secondString);
[outgoingWords insertObject:[NSMutableArray arrayWithObjects:#"greek", firstString, #"english",secondString,#"category", #"", nil] atIndex:0];
[englishWords insertObject:[NSMutableArray arrayWithObjects:secondString,nil] atIndex:0];
}
}
But I cannot count on there being delimiters.
I have looked at this question. I want something similar. This would be: grab the characters in the string until an english letter is found. Then take the first group to one new string, and all the characters after to a second new string.
I only have to run this a few times, so optimization is not my highest priority.. Any help would be appreciated..
EDIT:
I've changed my code as shown below to make use of NSLinguisticTagger. This works, but is this the best way? Note that the interpretation for english characters is -- for some reason "und"...
The incoming string is: άγαλμα, το statue, only the last 6 characters are in english.
int j = 0;
for (j=0; j<incomingString.length; j++) {
NSString *language = [tagger tagAtIndex:j scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];
if ([language isEqual: #"und"]) {
NSLog(#"j is: %i", j);
int k = 0;
for (k=0; k<j; k++) {
NSRange range = NSMakeRange (0, k);
NSString *tempString = [incomingString substringWithRange:range ];
NSLog (#"tempString: %#", tempString);
}
return;
}
NSLog (#"Language: %#", language);
}
Alright so what you could do is use NSLinguisticTagger to find out the language of the word (or letter) and if the language has changed then you know where to split the string. You can use NSLinguisticTagger like this:
NSArray *tagschemes = #[NSLinguisticTagSchemeLanguage];
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagschemes options: NSLinguisticTagPunctuation | NSLinguisticTaggerOmitWhitespace];
[tagger setString:#"This is my string in English."];
NSString *language = [tagger tagAtIndex:0 scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];
//Loop through each index of the string's characters and check the language as above.
//If it has changed then you can assume the language has changed.
Alternatively you can use NSSpellChecker's requestCheckingOfString to get teh dominant language in a range of characters:
NSSpellChecker *spellChecker = [NSSpellChecker sharedSpellChecker];
[spellChecker setAutomaticallyIdentifiesLanguages:YES];
NSString *spellCheckText = #"Guten Herr Mustermann. Dies ist ein deutscher Text. Bitte löschen Sie diesen nicht.";
[spellChecker requestCheckingOfString:spellCheckText
range:(NSRange){0, [spellCheckText length]}
types:NSTextCheckingTypeOrthography
options:nil
inSpellDocumentWithTag:0
completionHandler:^(NSInteger sequenceNumber, NSArray *results, NSOrthography *orthography, NSInteger wordCount) {
NSLog(#"dominant language = %#", orthography.dominantLanguage);
}];
This answer has information on how to detect the language of an NSString.
Allow me to introduce two good friends of mine.
NSCharacterSet and NSRegularExpression.
Along with them, normalization. (In Unicode terms)
First, you should normalize strings before analyzing them against a character set.
You will need to look at the choices, but normalizing to all composed forms is the way I would go.
This means an accented character is one instead of two or more.
It simplifies the number of things to compare.
Next, you can easily build your own NSCharacterSet objects from strings (loaded from files even) to use to test set membership.
Lastly, regular expressions can achieve the same thing with Unicode Property Names as classes or categories of characters. Regular expressions could be more terse but more expressive.

How to handle 32bit unicode characters in a NSString

I have a NSString containing a unicode character bigger than U+FFFF, like the MUSICAL SYMBOL G CLEF symbol '𝄞'. I can create the NSString and display it.
NSString *s = #"A\U0001d11eB"; // "A𝄞B"
NSLog(#"String = \"%#\"", s);
The log is correct and displays the 3 characters. This tells me the NSString is well done and there is no encoding problem.
String = "A𝄞B"
But when I try to loop through all characters using the method
- (unichar)characterAtIndex:(NSUInteger)index
everything goes wrong.
The type unichar is 16 bits so I expect to get the wrong character for the musical symbol. But the length of the string is also incorrect!
NSLog(#"Length = %d", [s length]);
for (int i=0; i<[s length]; i++)
{
NSLog(#" Character %d = %c", i, [s characterAtIndex:i]);
}
displays
Length = 4
Character 0 = A
Character 1 = 4
Character 2 = .
Character 3 = B
What methods should I use to correctly parse my NSString and get my 3 unicode characters?
Ideally the right method should return a type like wchar_t in place of unichar.
Thank you
NSString *s = #"A\U0001d11eB";
NSData *data = [s dataUsingEncoding:NSUTF32LittleEndianStringEncoding];
const wchar_t *wcs = [data bytes];
for (int i = 0; i < [data length]/4; i++) {
NSLog(#"%#010x", wcs[i]);
}
Output:
0x00000041
0x0001d11e
0x00000042
(The code assumes that wchar_t has a size of 4 bytes and little-endian encoding.)
length and charAtIndex: do not give the expected result because \U0001d11e
is internally stored as UTF-16 "surrogate pair".
Another useful method for general Unicode strings is
[s enumerateSubstringsInRange:NSMakeRange(0, [s length])
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(#"%#", substring);
}];
Output:
A
𝄞
B

Convert text in an NSString to it's 8-byte ASCII Hex equivalent, and store back in an NSString

I want to convert the NSString #"2525" to the NSString #"0032003500320035". The 8-byte ASCII value for "2" in hex is "0032" and for "5" it's "0035". Just to get the c-string equivalent, I tried...
const char *pinUTF8 = [pin cStringUsingEncoding:NSASCIIStringEncoding];
...but as you can see I'm struggling with this and I knew it wasn't going to be that easy. Any tips?
Thanks so much in advance for your wisdom!
Try this:
NSString *str = #"2525";
const char *s = [str cStringUsingEncoding:NSASCIIStringEncoding];
size_t len = strlen(s);
NSMutableString *asciiCodes = [NSMutableString string];
for (int i = 0; i < len; i++) {
[asciiCodes appendFormat:#"%04x", (int)s[i]];
}
NSLog(#"%#", asciiCodes);

How do I split a string with special characters into a NSMutableArray

I'am trying to seperate a string with danish characters into a NSMutableArray. But something is not working. :(
My code:
NSString *danishString = #"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]];
for (int i=0; i < [danishString length]; i++)
{
NSString *ichar = [NSString stringWithFormat:#"%c", [danishString characterAtIndex:i ]];
[characters addObject:ichar];
}
If I do at NSLog on the danishString it works (returns æøå);
But if I do a NSLog on the characters (the array) I get some very stange characters - What is wrong?
/Morten
First of all, your code is incorrect. characterAtIndex returns unichar, so you should use #"%C"(uppercase) as the format specifier.
Even with the correct format specifier, your code is unsafe, and strictly speaking, still incorrect, because not all unicode characters can be represented by a single unichar. You should always handle unicode strings per substring:
It's common to think of a string as a sequence of characters, but when
working with NSString objects, or with Unicode strings in general, in
most cases it is better to deal with substrings rather than with
individual characters. The reason for this is that what the user
perceives as a character in text may in many cases be represented by
multiple characters in the string.
You should definitely read String Programming Guide.
Finally, the correct code for you:
NSString *danishString = #"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]];
[danishString enumerateSubstringsInRange:NSMakeRange(0, danishString.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
[characters addObject:substring];
}];
If with NSLog(#"%#", characters); you see "strange character" of the form "\Uxxxx", that's correct. It's the default stringification behavior of NSArray by description method. You can print these unicode characters one by one if you want to see the "normal characters":
for (NSString *c in characters) {
NSLog(#"%#", c);
}
In your example, ichar isn't type of NSString, but unichar. If you want NSStrings try getting a substring instead :
NSString *danishString = #"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]];
for (int i=0; i < [danishString length]; i++)
{
NSRange r = NSMakeRange(i, 1);
NSString *ichar = [danishString substringWithRange:r];
[characters addObject:ichar];
}
You could do something like the following, which should be fine with Danish characters, but would break down if you have decomposed characters. I suggest reading the String Programming Guide for more information.
NSString *danishString = #"æøå";
NSMutableArray* characters = [NSMutableArray array];
for( int i = 0; i < [danishString length]; i++ ) {
NSString* subchar = [danishString substringWithRange:NSMakeRange(i, 1)];
if( subchar ) [characters addObject:subchar];
}
That would split the string into an array of individual characters, assuming that all the code points were composed characters.
It is printing the unicode of the characters. Anyhow, you can use the unicode (with \u) anywhere.

NSString get -characters

I want to get the characters of an NSString. Like this:
NSString *data;
const char * typein = [[data characters] UTF8String];
But obviously NSString won't respond to -characters. How do I get the characters of NSString?
thanks,
Elijah
You can use this function:
for(int i =0 ;i<[myString length]; i++) {
char character = [myString characterAtIndex:i];
}
or
NSString *str = #"astring";
const char *cString = [str UTF8String];
If you just want to get a cString from the NSString just call UTF8String as you are already doing and then iterate the array.