NSString seems to kill unicode characters

NSString seems to kill unicode characters - objective-c

I'm using ABPeoplePickerNavigationController to let the user select an address. Everything works just fine in the Simulator. But if an address contains non-ASCII characters like "ü" the result is strange on my device. I have this code:
- (BOOL)peoplePickerNavigationController:(ABPeoplePickerNavigationController *)peoplePicker shouldContinueAfterSelectingPerson:(ABRecordRef)person property:(ABPropertyID)property identifier:(ABMultiValueIdentifier)identifier {
ABMultiValueRef addressesMultiValue = ABRecordCopyValue(person, property);
NSArray *addresses = (__bridge_transfer NSArray*)ABMultiValueCopyArrayOfAllValues(addressesMultiValue);
CFRelease(addressesMultiValue);
NSDictionary *addressData = [addresses objectAtIndex:0];
NSLog(#"%#", addressData);
NSArray *addressKeys = [[NSArray alloc] initWithObjects:(NSString*)kABPersonAddressStreetKey,
(NSString*)kABPersonAddressZIPKey,
(NSString*)kABPersonAddressCityKey,
(NSString*)kABPersonAddressStateKey,
(NSString*)kABPersonAddressCountryKey,
(NSString*)kABPersonAddressCountryCodeKey, nil];
NSMutableString *address = [[NSMutableString alloc] init];
for (NSString *key in addressKeys) {
NSString *object = [addressData objectForKey:key];
if (object) {
[address appendFormat:#"%#, ", object];
}
}
NSLog(#"%#", address);
The output for addressData is like this:
{
City = "M\U00fcnchen";
Country = Deutschland;
CountryCode = de;
Street = "Some street";
ZIP = 81000;
}
and the output for address is:
Some street, 81000, M√ºnchen, Deutschland, de,
The correct output for address would be "Some street, 81000, München, Deutschland, de, ". What puzzles me the most is, that \U00fc is the correct Unicode code point for "ü". I have tried many things including printing out every single unichar on its own, but the result doesn't change. Whatever I do when accessing the value in the NSDictionary seems to kill the Unicode character. What can I do to simply get the address correctly?
Thank you very much in advance!

There is nothing wrong with your code or the output. You are displaying it wrong. The letter 'ü' encoded in UTF-8 is 0xC3 0xBC. In the MacRoman character set the byte 0xC3 represents the character '√' and the byte 0xBC represents 'º'. Look at your output as UTF-8 (which it is) and not as MacRoman (which it is not) and you're set.

Use NSUTF8StringEncoding to encode your string using method: stringWithUTF8String:.
For example,
NSString *str = [NSString stringWithUTF8String:"your string for encoding"];
As in your case
NSString *object = [NSString stringWithUTF8String:[[addressData objectForKey:key] cStringUsingEncoding:NSUTF8StringEncoding]];
Or
NSString *object = [NSString stringWithUTF8String:[[addressData objectForKey:key] UTF8String]];
Let me know if you need more help.
Hope this helps.

Related

Get a substring from an NSString until arriving to any letter in an NSArray - objective C

I am trying to parse a set of words that contain -- first greek letters, then english letters. This would be easy if there was a delimiter between the sets.That is what I've built so far..
- (void)loadWordFileToArray:(NSBundle *)bundle {
NSLog(#"loadWordFileToArray");
if (bundle != nil) {
NSString *path = [bundle pathForResource:#"alfa" ofType:#"txt"];
//pull the content from the file into memory
NSData* data = [NSData dataWithContentsOfFile:path];
//convert the bytes from the file into a string
NSString* string = [[NSString alloc] initWithBytes:[data bytes]
length:[data length]
encoding:NSUTF8StringEncoding];
//split the string around newline characters to create an array
NSString* delimiter = #"\n";
incomingWords = [string componentsSeparatedByString:delimiter];
NSLog(#"incomingWords count: %lu", (unsigned long)incomingWords.count);
}
}
-(void)parseWordArray{
NSLog(#"parseWordArray");
NSString *seperator = #" = ";
int i = 0;
for (i=0; i < incomingWords.count; i++) {
NSString *incomingString = [incomingWords objectAtIndex:i];
NSScanner *scanner = [NSScanner localizedScannerWithString: incomingString];
NSString *firstString;
NSString *secondString;
NSInteger scanPosition;
[scanner scanUpToString:seperator intoString:&firstString];
scanPosition = [scanner scanLocation];
secondString = [[scanner string] substringFromIndex:scanPosition+[seperator length]];
// NSLog(#"greek: %#", firstString);
// NSLog(#"english: %#", secondString);
[outgoingWords insertObject:[NSMutableArray arrayWithObjects:#"greek", firstString, #"english",secondString,#"category", #"", nil] atIndex:0];
[englishWords insertObject:[NSMutableArray arrayWithObjects:secondString,nil] atIndex:0];
}
}
But I cannot count on there being delimiters.
I have looked at this question. I want something similar. This would be: grab the characters in the string until an english letter is found. Then take the first group to one new string, and all the characters after to a second new string.
I only have to run this a few times, so optimization is not my highest priority.. Any help would be appreciated..
EDIT:
I've changed my code as shown below to make use of NSLinguisticTagger. This works, but is this the best way? Note that the interpretation for english characters is -- for some reason "und"...
The incoming string is: άγαλμα, το statue, only the last 6 characters are in english.
int j = 0;
for (j=0; j<incomingString.length; j++) {
NSString *language = [tagger tagAtIndex:j scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];
if ([language isEqual: #"und"]) {
NSLog(#"j is: %i", j);
int k = 0;
for (k=0; k<j; k++) {
NSRange range = NSMakeRange (0, k);
NSString *tempString = [incomingString substringWithRange:range ];
NSLog (#"tempString: %#", tempString);
}
return;
}
NSLog (#"Language: %#", language);
}

Alright so what you could do is use NSLinguisticTagger to find out the language of the word (or letter) and if the language has changed then you know where to split the string. You can use NSLinguisticTagger like this:
NSArray *tagschemes = #[NSLinguisticTagSchemeLanguage];
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagschemes options: NSLinguisticTagPunctuation | NSLinguisticTaggerOmitWhitespace];
[tagger setString:#"This is my string in English."];
NSString *language = [tagger tagAtIndex:0 scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];
//Loop through each index of the string's characters and check the language as above.
//If it has changed then you can assume the language has changed.
Alternatively you can use NSSpellChecker's requestCheckingOfString to get teh dominant language in a range of characters:
NSSpellChecker *spellChecker = [NSSpellChecker sharedSpellChecker];
[spellChecker setAutomaticallyIdentifiesLanguages:YES];
NSString *spellCheckText = #"Guten Herr Mustermann. Dies ist ein deutscher Text. Bitte löschen Sie diesen nicht.";
[spellChecker requestCheckingOfString:spellCheckText
range:(NSRange){0, [spellCheckText length]}
types:NSTextCheckingTypeOrthography
options:nil
inSpellDocumentWithTag:0
completionHandler:^(NSInteger sequenceNumber, NSArray *results, NSOrthography *orthography, NSInteger wordCount) {
NSLog(#"dominant language = %#", orthography.dominantLanguage);
}];
This answer has information on how to detect the language of an NSString.

Allow me to introduce two good friends of mine.
NSCharacterSet and NSRegularExpression.
Along with them, normalization. (In Unicode terms)
First, you should normalize strings before analyzing them against a character set.
You will need to look at the choices, but normalizing to all composed forms is the way I would go.
This means an accented character is one instead of two or more.
It simplifies the number of things to compare.
Next, you can easily build your own NSCharacterSet objects from strings (loaded from files even) to use to test set membership.
Lastly, regular expressions can achieve the same thing with Unicode Property Names as classes or categories of characters. Regular expressions could be more terse but more expressive.

Check if it is possible to break a string into chunks?

I have this code who chunks a string existing inside a NSString into a NSMutableArray:
NSString *string = #"one/two/tree";
NSMutableArray *parts = [[string componentsSeparatedByString:#"/"] mutableCopy];
NSLog(#"%#-%#-%#",parts[0],parts[1],parts[2]);
This command works perfectly but if the NSString is not obeying this pattern (not have the symbol '/' within the string), the app will crash.
How can I check if it is possible to break the NSString, preventing the app does not crash?

Just check parts.count if you don't have / in your string (or only one), you won't get three elements.
NSString *string = #"one/two/tree";
NSMutableArray *parts = [[string componentsSeparatedByString:#"/"] mutableCopy];
if(parts.count >= 3) {
NSLog(#"%#-%#-%#",parts[0],parts[1],parts[2]);
}
else {
NSLog(#"Not found");
}
From the docs:
If list has no separators—for example, "Karin"—the array contains the string itself, in this case { #"Karin" }.
https://developer.apple.com/library/mac/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html#//apple_ref/occ/instm/NSString/componentsSeparatedByString:

You might be better off using the "opposite" function to put it back together...
NSString *string = #"one/two/three";
NSArray *parts = [string componentsSeparatedByString:#"/"];
NSString *newString = [parts componentsJoinedByString:#"-"];
// newString = #"one-two-three"
This will take the original string. Split it apart and then put it back together no matter how many parts there are.

Chinese string in the text of a label

I have a label where I have to put a string in Chinese extracted from a database, but nothing comes out. I noticed that the string is not pulled from database, while all other work correctly. What can I do?
char *subTitle= (char*)sqlite3_column_text(statement,13);
NSLog(#" The sutitle is %s", subTitle);
//The sutitle is
rowTable.sottotitolo = [[NSString alloc]initWithUTF8String: subTitle];
NSLog(#"The subtitle is %#", rowTable.sottotitolo);
//The subtitle is
Using methods other than Western alphabet?
NSLog(#"The string in chinese is %#", self.chinaTable.subtitle);
//The string in chinese is
//is not printed to the screen,but the database is written correctly
self.labelTitle.text = self.chinaTable.subtitle;
//empty out
Thanks in advance

While you retrieving your data from sqlite, instead of specifying the encoding schema, use this:
NSString *myChineseText = [NSString stringWithFormat:#"%s",(const char*)sqlite3_column_text(statement, index)];
NSLog(#"%#",myChineseText);
Hope, it'll solved your problem. :)

Try CFStringConvertEncodingToNSStringEncoding and kCFStringEncodingBig5_E.
Also see apple doc and for international
or for creating own encoding see
and this
unichar ellipsis = 0x2026;
NSString *theString = [NSString stringWithFormat:#"To be continued%C", ellipsis];
// custom encoding
NSStringEncoding encoding = CFStringConvertEncodingToNSStringEncoding(kCFStringEncodingDOSChineseTrad);
NSData *asciiData = [theString dataUsingEncoding:encoding
allowLossyConversion:YES];
NSString *asciiString = [[NSString alloc] initWithData:asciiData
encoding:encoding];

How do I get the comma separated values from an NSMutableString in Objective-C?

I making an HTTP request in Objective-C and I get the reply from that is
200,8,"7 Infinite Loop, Cupertino, CA 95014, USA"
I want to extract the part "Cupertino, CA" from it.
I wrote the following code:
NSArray *myArray = [result5 componentsSeparatedByString:#","];
NSLog(#"Response: %#", myArray);
NSString * state = [[myArray objectAtIndex:4]
stringByReplacingOccurrencesOfRegex:#"[^0-9]" withString:#""];
NSLog(#"Response9: %#", state);
NSString *city = [NSString stringWithFormat:#"%# %#",
[myArray objectAtIndex:3], state];
NSLog(#"Response1: %#", city);
But I got a warning for the line:
NSString * state = [[myArray objectAtIndex:4]
stringByReplacingOccurrencesOfRegex:#"[^0-9]" withString:#""];
which says "no -stringByReplacingOccurrenceoOfRegexwithString method found" and "Message without a matching method signature will be assumed to return 'id' and accept '.......' as arguments".
How do I get the state and city name from the result?

Have a look at [componentsSeparatedByCharactersInSet:][1]. If you supply numbers as the set you will get an array of strings which you can recombine into a numberless string.

Convert NSData bytes to NSString?

I'm trying to use the BEncoding ObjC class to decode a .torrent file.
NSData *rawdata = [NSData dataWithContentsOfFile:#"/path/to/the.torrent"];
NSData *torrent = [BEncoding objectFromEncodedData:rawdata];
When I NSLog torrent I get the following:
{
announce = <68747470 3a2f2f74 6f727265 6e742e75 62756e74 752e636f 6d3a3639 36392f61 6e6e6f75 6e6365>;
comment = <5562756e 74752043 44207265 6c656173 65732e75 62756e74 752e636f 6d>;
"creation date" = 1225365524;
info = {
length = 732766208;
name = <7562756e 74752d38 2e31302d 6465736b 746f702d 69333836 2e69736f>;
"piece length" = 524288;
....
How do I convert the name into a NSString? I have tried..
NSData *info = [torrent valueForKey:#"info"];
NSData *name = [info valueForKey:#"name"];
unsigned char aBuffer[[name length]];
[name getBytes:aBuffer length:[name length]];
NSLog(#"File name: %s", aBuffer);
..which retrives the data, but seems to have additional unicode rubbish after it:
File name: ubuntu-8.10-desktop-i386.iso)
I have also tried (from here)..
NSString *secondtry = [NSString stringWithCharacters:[name bytes] length:[name length] / sizeof(unichar)];
..but this seems to return a bunch of random characters:
扵湵畴㠭ㄮⴰ敤歳潴⵰㍩㘸椮潳
The fact the first way (as mentioned in the Apple documentation) returns most of the data correctly, with some additional bytes makes me think it might be an error in the BEncoding library.. but my lack of knowledge about ObjC is more likely to be at fault..

That's an important point that should be re-emphasized I think. It turns out that,
NSString *content = [NSString stringWithUTF8String:[responseData bytes]];
is not the same as,
NSString *content = [[NSString alloc] initWithBytes:[responseData bytes]
length:[responseData length] encoding: NSUTF8StringEncoding];
the first expects a NULL terminated byte string, the second doesn't. In the above two cases content will be NULL in the first example if the byte string isn't correctly terminated.

How about
NSString *content = [[[NSString alloc] initWithData:myData
encoding:NSUTF8StringEncoding] autorelease];

NSData *torrent = [BEncoding objectFromEncodedData:rawdata];
When I NSLog torrent I get the following:
{
⋮
}
That would be an NSDictionary, then, not an NSData.
unsigned char aBuffer[[name length]];
[name getBytes:aBuffer length:[name length]];
NSLog(#"File name: %s", aBuffer);
..which retrives the data, but seems to have additional unicode rubbish after it:
File name: ubuntu-8.10-desktop-i386.iso)
No, it retrieved the filename just fine; you simply printed it incorrectly. %s takes a C string, which is null-terminated; the bytes of a data object are not null-terminated (they are just bytes, not necessarily characters in any encoding, and 0—which is null as a character—is a perfectly valid byte). You would have to allocate one more character, and set the last one in the array to 0:
size_t length = [name length] + 1;
unsigned char aBuffer[length];
[name getBytes:aBuffer length:length];
aBuffer[length - 1] = 0;
NSLog(#"File name: %s", aBuffer);
But null-terminating the data in an NSData object is wrong (except when you really do need a C string). I'll get to the right way in a moment.
I have also tried […]..
NSString *secondtry = [NSString stringWithCharacters:[name bytes] length:[name length] / sizeof(unichar)];
..but this seems to return random Chinese characters:
扵湵畴㠭ㄮⴰ敤歳潴⵰㍩㘸椮潳
That's because your bytes are UTF-8, which encodes one character in (usually) one byte.
unichar is, and stringWithCharacters:length: accepts, UTF-16. In that encoding, one character is (usually) two bytes. (Hence the division by sizeof(unichar): it divides the number of bytes by 2 to get the number of characters.)
So you said “here's some UTF-16 data”, and it went and made characters from every two bytes; each pair of bytes was supposed to be two characters, not one, so you got garbage (which turned out to be mostly CJK ideographs).
You answered your own question pretty well, except that stringWithUTF8String: is simpler than stringWithCString:encoding: for UTF-8-encoded strings.
However, when you have the length (as you do when you have an NSData), it is even easier—and more proper—to use initWithBytes:length:encoding:. It's easier because it does not require null-terminated data; it simply uses the length you already have. (Don't forget to release or autorelease it.)

A nice quick and dirty approach is to use NSString's stringWithFormat initializer to help you out. One of the less-often used features of string formatting is the ability to specify a mximum string length when outputting a string. Using this handy feature allows you to convert NSData into a string pretty easily:
NSData *myData = [self getDataFromSomewhere];
NSString *string = [NSString stringWithFormat:#"%.*s", [myData length], [myData bytes]];
If you want to output it to the log, it can be even easier:
NSLog(#"my Data: %.*s", [myData length], [myData bytes]);

Aha, the NSString method stringWithCString works correctly:
With the bencoding.h/.m files added to your project, the complete .m file:
#import <Foundation/Foundation.h>
#import "BEncoding.h"
int main (int argc, const char * argv[]) {
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
// Read raw file, and de-bencode
NSData *rawdata = [NSData dataWithContentsOfFile:#"/path/to/a.torrent"];
NSData *torrent = [BEncoding objectFromEncodedData:rawdata];
// Get the file name
NSData *infoData = [torrent valueForKey:#"info"];
NSData *nameData = [infoData valueForKey:#"name"];
NSString *filename = [NSString stringWithCString:[nameData bytes] encoding:NSUTF8StringEncoding];
NSLog(#"%#", filename);
[pool drain];
return 0;
}
..and the output:
ubuntu-8.10-desktop-i386.iso

In cases where I don't have control over the data being transformed into a string, such as reading from the network, I prefer to use NSString -initWithBytes:length:encoding: so that I'm not dependent upon having a NULL terminated string in order to get defined results. Note that Apple's documentation says if cString is not a NULL terminated string, that the results are undefined.

Use a category on NSData:
NSData+NSString.h
#interface NSData (NSString)
- (NSString *)toString;
#end
NSData+NSString.m
#import "NSData+NSString.h"
#implementation NSData (NSString)
- (NSString *)toString
{
Byte *dataPointer = (Byte *)[self bytes];
NSMutableString *result = [NSMutableString stringWithCapacity:0];
NSUInteger index;
for (index = 0; index < [self length]; index++)
{
[result appendFormat:#"0x%02x,", dataPointer[index]];
}
return result;
}
#end
Then just NSLog(#"Data is %#", [nsData toString])"

You can try this. Fine with me.
DLog(#"responeData: %#", [[[NSString alloc] initWithBytes:[data bytes] length:[data length] encoding:NSASCIIStringEncoding] autorelease]);

Sometimes you need to create Base64 encoded string from NSData. For instance, when you create a e-mail MIME. In this case use the following:
#import "NSData+Base64.h"
NSString *string = [data base64EncodedString];

This will work.
NSString *str = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

NSString seems to kill unicode characters - objective-c

Related

Get a substring from an NSString until arriving to any letter in an NSArray - objective C

Check if it is possible to break a string into chunks?

Chinese string in the text of a label

How do I get the comma separated values from an NSMutableString in Objective-C?

Convert NSData bytes to NSString?

Categories

Resources