How to convert a unichar value to an NSString in Objective-C? - objective-c

I've got an international character stored in a unichar variable. This character does not come from a file or url. The variable itself only stores an unsigned short(0xce91) which is in UTF-8 format and translates to the greek capital letter 'A'. I'm trying to put that character into an NSString variable but i fail miserably.
I've tried 2 different ways both of which unsuccessful:
unichar greekAlpha = 0xce91; //could have written greekAlpha = 'Α' instead.
NSString *theString = [NSString stringWithFormat:#"Greek Alpha: %C", greekAlpha];
No good. I get some weird chinese characters. As a sidenote this works perfectly with english characters.
Then I also tried this:
NSString *byteString = [[NSString alloc] initWithBytes:&greekAlpha
length:sizeof(unichar)
encoding:NSUTF8StringEncoding];
But this doesn't work either.
I'm obviously doing something terribly wrong, but I don't know what.
Can someone help me please ?
Thanks!

unichar greekAlpha = 0x0391;
NSString* s = [NSString stringWithCharacters:&greekAlpha length:1];
And now you can incorporate that NSString into another in any way you like. Do note, however, that it is now legal to type a Greek alpha directly into an NSString literal.

Since 0xce91 is in the UTF-8 format and %C expects it to be in UTF-16 a simple solution like the one above won't work. For stringWithFormat:#"%C" to work you need to input 0x391 which is the UTF-16 unicode.
In order to create a string from the UTF-8 encoded unichar you need to first split the unicode into it's octets and then use initWithBytes:length:encoding.
unichar utf8char = 0xce91;
char chars[2];
int len = 1;
if (utf8char > 127) {
chars[0] = (utf8char >> 8) & (1 << 8) - 1;
chars[1] = utf8char & (1 << 8) - 1;
len = 2;
} else {
chars[0] = utf8char;
}
NSString *string = [[NSString alloc] initWithBytes:chars
length:len
encoding:NSUTF8StringEncoding];

The above answer is great but doesn't account for UTF-8 characters longer than 16 bits, e.g. the ellipsis symbol - 0xE2,0x80,0xA6. Here's a tweak to the code:
if (utf8char > 65535) {
chars[0] = (utf8char >> 16) & 255;
chars[1] = (utf8char >> 8) & 255;
chars[2] = utf8char & 255;
chars[3] = 0x00;
} else if (utf8char > 127) {
chars[0] = (utf8char >> 8) & 255;
chars[1] = utf8char & 255;
chars[2] = 0x00;
} else {
chars[0] = utf8char;
chars[1] = 0x00;
}
NSString *string = [[[NSString alloc] initWithUTF8String:chars] autorelease];
Note the different string initialisation method which doesn't require a length parameter.

Here is an algorithm for UTF-8 encoding on a single character:
if (utf8char<0x80){
chars[0] = (utf8char>>0) & (0x7F | 0x00);
chars[1] = 0x00;
chars[2] = 0x00;
chars[3] = 0x00;
}
else if (utf8char<0x0800){
chars[0] = (utf8char>>6) & (0x1F | 0xC0);
chars[1] = (utf8char>>0) & (0x3F | 0x80);
chars[2] = 0x00;
chars[3] = 0x00;
}
else if (utf8char<0x010000) {
chars[0] = (utf8char>>12) & (0x0F | 0xE0);
chars[1] = (utf8char>>6) & (0x3F | 0x80);
chars[2] = (utf8char>>0) & (0x3F | 0x80);
chars[3] = 0x00;
}
else if (utf8char<0x110000) {
chars[0] = (utf8char>>18) & (0x07 | 0xF0);
chars[1] = (utf8char>>12) & (0x3F | 0x80);
chars[2] = (utf8char>>6) & (0x3F | 0x80);
chars[3] = (utf8char>>0) & (0x3F | 0x80);
}

The code above is the moral equivalent of unichar foo = 'abc';.
The problem is that 'Α' doesn't map to a single byte in the "execution character set" (I'm assuming UTF-8) which is "implementation-defined" in C99 §6.4.4.4 10:
The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.
One way is to make 'ab' equal to 'a'<<8|b. Some Mac/iOS system headers rely on this for things like OSType/FourCharCode/FourCC; the only one in iOS that comes to mind is CoreVideo pixel formats. This is, however, unportable.
If you really want a unichar literal, you can try L'A' (technically it's a wchar_t literal, but on OS X and iOS, wchar_t is typically UTF-16 so it'll work for things inside the BMP). However, it's far simpler to just use #"Α" (which works as long as you set the source character encoding correctly) or #"\u0391" (which has worked since at least the iOS 3 SDK).

Related

Convert NSData byte array to string?

I have an NSData object. I need to convert its bytes to a string and send as JSON. description returns hex and is unreliable (according to various SO posters). So I'm looking at code like this:
NSUInteger len = [imageData length];
Byte *byteData = (Byte*)malloc(len);
[imageData getBytes:&byteData length:len];
How do I then send byteData as JSON? I want to send the raw bytes.
CODE:
NSString *jsonBase64 = [imageData base64EncodedString];
NSLog(#"BASE 64 FINGERPRINT: %#", jsonBase64);
NSData *b64 = [NSData dataFromBase64String:jsonBase64];
NSLog(#"Equal: %d", [imageData isEqualToData:b64]);
NSLog(#"b64: %#", b64);
NSLog(#"original: %#", imageData);
NSString *decoded = [[NSString alloc] initWithData:b64 encoding:NSUTF8StringEncoding];
NSLog(#"decoded: %#", decoded);
I get values for everything except for the last line - decoded.
Which would indicate to me that the raw bytes are not formatted in NSUTF8encoding?
The reason the String is being considered 'unreliable' in previous Stack posts is because they too were attempting to use NSData objects where the ending bytes aren't properly terminated with NULL :
NSString *jsonString = [NSString stringWithUTF8String:[nsDataObj bytes]];
// This is unreliable because it may result in NULL string values
Whereas the example below should give you your desired results because the NSData byte string will terminate correctly:
NSString *jsonString = [[NSString alloc] initWithBytes:[nsDataObj bytes] length:[nsDataObj length] encoding: NSUTF8StringEncoding];
You were on the right track and hopefully this is able to help you solve your current problem. Best of luck!
~ EDIT ~
Make sure you are declaring your NSData Object from an image like so:
NSData *imageData = [[NSData alloc] init];
imageData = UIImagePNGRepresentation(yourImage);
Have you tried using something like this:
#implementation NSData (Base64)
- (NSString *)base64EncodedString
{
return [self base64EncodedStringWithWrapWidth:0];
}
This will turn your NSData in a base64 string, and on the other side you just need to decode it.
EDIT: #Lucas said you can do something like this:
NSString *myString = [[NSString alloc] initWithData:myData encoding:NSUTF8StringEncoding];
but i had some problem with this method because of some special characters, and because of that i started using base64 strings for communication.
EDIT3: Trys this method base64EncodedString
#implementation NSData (Base64)
- (NSString *)base64EncodedString
{
return [self base64EncodedStringWithWrapWidth:0];
}
//Helper Method
- (NSString *)base64EncodedStringWithWrapWidth:(NSUInteger)wrapWidth
{
//ensure wrapWidth is a multiple of 4
wrapWidth = (wrapWidth / 4) * 4;
const char lookup[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
long long inputLength = [self length];
const unsigned char *inputBytes = [self bytes];
long long maxOutputLength = (inputLength / 3 + 1) * 4;
maxOutputLength += wrapWidth? (maxOutputLength / wrapWidth) * 2: 0;
unsigned char *outputBytes = (unsigned char *)malloc((NSUInteger)maxOutputLength);
long long i;
long long outputLength = 0;
for (i = 0; i < inputLength - 2; i += 3)
{
outputBytes[outputLength++] = lookup[(inputBytes[i] & 0xFC) >> 2];
outputBytes[outputLength++] = lookup[((inputBytes[i] & 0x03) << 4) | ((inputBytes[i + 1] & 0xF0) >> 4)];
outputBytes[outputLength++] = lookup[((inputBytes[i + 1] & 0x0F) << 2) | ((inputBytes[i + 2] & 0xC0) >> 6)];
outputBytes[outputLength++] = lookup[inputBytes[i + 2] & 0x3F];
//add line break
if (wrapWidth && (outputLength + 2) % (wrapWidth + 2) == 0)
{
outputBytes[outputLength++] = '\r';
outputBytes[outputLength++] = '\n';
}
}
//handle left-over data
if (i == inputLength - 2)
{
// = terminator
outputBytes[outputLength++] = lookup[(inputBytes[i] & 0xFC) >> 2];
outputBytes[outputLength++] = lookup[((inputBytes[i] & 0x03) << 4) | ((inputBytes[i + 1] & 0xF0) >> 4)];
outputBytes[outputLength++] = lookup[(inputBytes[i + 1] & 0x0F) << 2];
outputBytes[outputLength++] = '=';
}
else if (i == inputLength - 1)
{
// == terminator
outputBytes[outputLength++] = lookup[(inputBytes[i] & 0xFC) >> 2];
outputBytes[outputLength++] = lookup[(inputBytes[i] & 0x03) << 4];
outputBytes[outputLength++] = '=';
outputBytes[outputLength++] = '=';
}
if (outputLength >= 4)
{
//truncate data to match actual output length
outputBytes = realloc(outputBytes, (NSUInteger)outputLength);
return [[NSString alloc] initWithBytesNoCopy:outputBytes
length:(NSUInteger)outputLength
encoding:NSASCIIStringEncoding
freeWhenDone:YES];
}
else if (outputBytes)
{
free(outputBytes);
}
return nil;
}
Null termination is not the only problem when converting from NSData to NSString.
NSString is not designed to hold arbitrary binary data. It expects an encoding.
If your NSData contains an invalid UTF-8 sequence, initializing the NSString will fail.
The documentation isn't completely clear on this point, but for initWithData it says:
Returns nil if the initialization fails for some reason (for example
if data does not represent valid data for encoding).
Also: The JSON specification defines a string as a sequence of Unicode characters.
That means even if you're able to get your raw data into a JSON string, parsing could fail on the receiving end if the code performs UTF-8 validation.
If you don't want to use Base64, take a look at the answers here.
All code in this answer is pseudo-code fragments, you need to convert the algorithms into Objective-C or other language yourself.
Your question raises many questions... You start with:
I have an NSData object. I need to convert its bytes to a string and send as JSON. description returns hex and is unreliable (according to various SO posters).
This appears to suggest you wish to encode the bytes as a string, ready to decode them back to bytes the other end. If this is the case you have a number of choices, such as Base-64 encoding etc. If you want something simple you can just encode each byte as its two character hex value, pseudo code outline:
NSMutableString *encodedString = #"".mutableCopy;
foreach aByte in byteData
[encodedString appendFormat:#"%02x", aByte];
The format %02x means two hexadecimal digits with zero padding. This results in a string which can be sent as JSON and decoded easily the other end. The byte size over the wire will probably be twice the byte length as UTF-8 is the recommended encoding for JSON over the wire.
However in response to one of the answer you write:
But I need absolutely the raw bits.
What do you mean by this? Is your receiver going to interpret the JSON string it gets as a sequence of raw bytes? If so you have a number of problems to address. JSON strings are a subset of JavaScript strings and are stored as UCS-2 or UTF-16, that is they are sequences of 16-bit values not 8-bit values. If you encode each byte into a character in a string then it will be represented using 16-bits, if your receiver can access the byte stream it has to skip ever other byte. Of course if you receiver accesses the strings a character at a time each 16-bit character can be truncated back to an 8-bit byte. Now you might think if you take this approach then each 8-bit byte can just be output as a character as part of a string, but that won't work. While all values 1-255 are valid Unicode character code points, and JavaScript/JSON allow NULs (0 value) in strings, not all those values are printable, you cannot put a double quote " into a string without escaping it, and the escape character is \ - all these will need to be encoded into the string. You'd end up with something like:
NSMutableString *encodedString = #"".mutableCopy;
foreach aByte in byteData
if (isprint(aByte) && aByte != '"' && aByte != '\\')
[encodedString appendFormat:#"%c", aByte];
otherwise
[encodedString appendFormat:#"\\u00%02x", aByte]; // JSON unicode escape sequence
This will produce a string which when parsed by a JSON decoder will give you one character (16-bits) for each byte, the top 8-bits being zero. However if you pass this string to a JSON encoder it will encode the unicode escape sequences, which are already encoded... So you really need to send this string over the wire yourself to avoid this...
Confused? Getting complicated? Well why are you trying to send binary byte data as a string? You never say what your high-level goal is or what, if anything, is known about the byte data (e.g. does it represent character in some encoding)
If this is really just an array of bytes then why not send it as JSON array of numbers - a byte is just a number in the range 0-255. To do this you would use code along the lines of:
NSMutableArray *encodedBytes = [NSMutableArray new];
foreach aByte in byteData
[encodedBytes addObject:#(aByte)]; // add aByte as an NSNumber object
Now pass encodedBytes to NSJSONSerialisation and it will send a JSON array of numbers over the wire, the receiver will reverse the process packing each byte back into a byte buffer and you have you bytes back.
This method avoids all issues of valid strings, encodings and escapes.
HTH

Concatenate integers and strings in Objective C

Please forgive the simplicity of the question. I'm completely new to Objective C.
I'd like to know how to concatenate integer and string values and print them to the console.
This is what I'd like for my output:
10 + 20 = 30
In Java I'd write this code to produce the needed results:
System.Out.Println(intVarWith10 + " + " + intVarWith20 + " = " + result);
Objective-C is quite different. How can we concatenate the 3 integers along with the strings in between?
You can use following code
int iFirst,iSecond;
iFirst=10;
iSecond=20;
NSLog(#"%#",[NSString stringWithFormat:#"%d + %d =%d",iFirst,iSecond,(iFirst+iSecond)]);
Take a look at NSString - it has a method stringWithFormat that does what you require. For example:
NSString* yString = [NSString stringWithFormat:#"%d + %d = %d",
intVarWith10, intVarWith20 , result];
You can use C style syntax, with NSLog (If you just need to print)
NSLog(#"%d+%d=%d",intvarWith10,intvarWith20,result);
If you want a string variable holding the value
NSString *str = [NSString stringWithFormat:#"%d+%d=%d",intvarWith10,intvarWith20,result];
You have to create an NSString with format and specify the data type.
Something like this :
NSInteger firstOperand=10;
NSInteger secondOperand=20;
NSInteger result=firstOperand+secondOperand;
NSString *operationString=[NSString stringWithFormat:#"%d + %d = %d",firstOperand,secondOperand,result];
NSLog(#"%#",operationString);
NSString with format follows the C printf syntax
Check below code :
int i = 8;
NSString * tempStr = [NSString stringWithFormat#"Hello %d",i];
NSLog(#"%#",tempStr);
I strongly recommend you this link Objective-C Reference.
The Objective-C int data type can store a positive or negative whole number. The actual size or range of integer that can be handled by the int data type is machine and compiler implementation dependent.
So you can store like this.
int a,b;
a= 10;
b= 10;
then performing operation you need to first understand NSString.
C style character strings are composed of single byte characters and therefore limited in the range of characters that can be stored.
int C = a + b;
NSString *strAnswer = [NSString stringWithFormat:#"Answer %d + %d = %d", a , b, c];
NSLog(#"%#",strAnswer)
Hope this will help you.

In Objective-C, how to print out N spaces? (using stringWithCharacters)

The following is tried to print out N number of spaces (or 12 in the example):
NSLog(#"hello%#world", [NSString stringWithCharacters:" " length:12]);
const unichar arrayChars[] = {' '};
NSLog(#"hello%#world", [NSString stringWithCharacters:arrayChars length:12]);
const unichar oneChar = ' ';
NSLog(#"hello%#world", [NSString stringWithCharacters:&oneChar length:12]);
But they all print out weird things such as hello ÔÅÓñüÔÅ®Óñü®ÓüÅ®ÓñüÔ®ÓüÔÅ®world... I thought a "char array" is the same as a "string" and the same as a "pointer to a character"? The API spec says it is to be a "C array of Unicode characters" (by Unicode, is it UTF8? if it is, then it should be compatible with ASCII)... How to make it work and why those 3 ways won't work?
You can use %*s to specify the width.
NSLog(#"Hello%*sWorld", 12, "");
Reference:
A field width, or precision, or both, may be indicated by an asterisk
( '*' ). In this case an argument of type int supplies the field width
or precision. Applications shall ensure that arguments specifying
field width, or precision, or both appear in that order before the
argument, if any, to be converted.
This will get you what you want:
NSLog(#"hello%#world", [#"" stringByPaddingToLength:12 withString:#" " startingAtIndex:0]);
I think the issue you have is you are misinterpreting what +(NSString *)stringWithCharacters:length: is supposed to do. It's not supposed to repeat the characters, but instead copy them from the array into a string.
So in your case you only have a single ' ' in the array, meaning the other 11 characters will be taken from whatever follows arrayChars in memory.
If you want to print out a pattern of n spaces, the easiest way to do that would be to use -(NSString *)stringByPaddingToLength:withString:startingAtIndex:, i.e creating something like this.
NSString *formatString = #"Hello%#World";
NSString *paddingString = [[NSString string] stringByPaddingToLength: n withString: #" " startingAtIndex: 0];
NSLog(formatString, paddingString);
This is probably the fastest method:
NSString *spacesWithLength(int nSpaces)
{
char UTF8Arr[nSpaces + 1];
memset(UTF8Arr, ' ', nSpaces * sizeof(*UTF8Arr));
UTF8Arr[nSpaces] = '\0';
return [NSString stringWithUTF8String:UTF8Arr];
}
The reason your current code isn't working is because +stringWithCharacters: expects an array with a length of characters of 12, while your array is only 1 character in length {' '}. So, to fix, you must create a buffer for your array (in this case, we use a char array, not a unichar, because we can easily memset a char array, but not a unichar array).
The method I provided above is probably the fastest that is possible with a dynamic length. If you are willing to use GCC extensions, and you have a fixed size array of spaces you need, you can do this:
NSString *spacesWithLength7()
{
unichar characters[] = { [0 ... 7] = ' ' };
return [NSString stringWithCharacters:characters length:7];
}
Unfortunately, that extension doesn't work with variables, so it must be a constant.
Through the magic of GCC extensions and preprocessor macros, I give you.... THE REPEATENATOR! Simply pass in a string (or a char), and it will do the rest! Buy now, costs you only $19.95, operators are standing by! (Based on the idea suggested by #JeremyL)
// step 1: determine if char is a char or string, or NSString.
// step 2: repeat that char or string
// step 3: return that as a NSString
#define repeat(inp, cnt) __rep_func__(#encode(typeof(inp)), inp, cnt)
// arg list: (int siz, int / char *input, int n)
static inline NSString *__rep_func__(char *typ, ...)
{
const char *str = NULL;
int n;
{
va_list args;
va_start(args, typ);
if (typ[0] == 'i')
str = (const char []) { va_arg(args, int), '\0' };
else if (typ[0] == '#')
str = [va_arg(args, id) UTF8String];
else
str = va_arg(args, const char *);
n = va_arg(args, int);
va_end(args);
}
int len = strlen(str);
char outbuf[(len * n) + 1];
// now copy the content
for (int i = 0; i < n; i++) {
for (int j = 0; j < len; j++) {
outbuf[(i * len) + j] = str[j];
}
}
outbuf[(len * n)] = '\0';
return [NSString stringWithUTF8String:outbuf];
}
The stringWithCharaters:length: method makes an NSString (or an instance of a subclass of NSString) using the first length characters in the C array. It does not iterate over the given array of characters until it reaches the length.
The output you are seeing is the area of memory 12 Unicode characters long starting at the location of your passed 1 Unicode character array.
This should work.
NSLog(#"hello%#world", [NSString stringWithCharacters:" " length:12]);

Appending null characters to regular characters

I want to concatenate null characters with regular characters like so:
NSString *message = #"ABC";
NSUInteger length = [message length];
char char4 = length;
char char3 = length >> 8;
char char2 = length >> 16;
char char1 = length >> 24;
message = [NSString stringWithFormat:#"%c%c%c%c%#",char4,char3,char2,char1,message];
The problem is that the string stays at length 4 (and looks like ¿ABC). How can I edit this code so that the null characters are also appended to the string?
In reply to Mark's comment
What I am trying to do here is append the length of the string to the beginning of the string, not in numerical format, but in the format of ascii characters (almost like a base 256 number). I would use the format, #"%04c%#",length,message, the problem is that the resulting string would be 000¿ABC and zeroes have ascii value (48 in decimal to be exact) and so that defeats the purpose. The ASCII character that has decimal value 0 is the null character (\0) so I have to use that instead of 0. It is necessary that I have those leading null characters.
For the purposes of what I'm trying to accomplish, the following code works
NSUInteger length = [message length];
char char4 = length;
char char3 = length >> 8;
char char2 = length >> 16;
char char1 = length >> 24;
if (char4 == '\0')
message = [NSString stringWithFormat:#"\0%#",message];
else
message = [NSString stringWithFormat:#"%c%#",char4,message];
if (char3 == '\0')
message = [NSString stringWithFormat:#"\0%#",message];
else
message = [NSString stringWithFormat:#"%c%#",char3,message];
if (char2 == '\0')
message = [NSString stringWithFormat:#"\0%#",message];
else
message = [NSString stringWithFormat:#"%c%#",char2,message];
if (char1 == '\0')
message = [NSString stringWithFormat:#"\0%#",message];
else
message = [NSString stringWithFormat:#"%c%#",char1,message];
But, if anybody can contribute something shorter or more intuitive, that'd be great.
Don't do this. It's a bad idea. For example, NSString is conceptually built on 16-bit Unicode characters, but you're trying to prepend bytes. Note that there's nothing guaranteeing that NSString's internal representation is UTF-16 or any other specific encoding. In any case, when the string is written out it has to be converted to whatever encoding and that is unlikely to preserve your length prefixing in a way you can predict.
Use an NSData with the length in the first 4 (or maybe 8 would be better for a 64-bit length) bytes followed by the string in a particular encoding. I recommend UTF-8.

How do I break a text string down to two letter chunks, and convert those chunks to four digit numbers in Objective-C?

I want to make a simple program for my number theory class. We're learning encryption.
The main encryption I want to demonstrate is demonstrated in this example:
Take the phrase "TAKE CARE"
as
TA
KE
-C
AR
E-
where TA is converted to 2001, because T is the 20th letter in the alphabet and A is the first.
Well, since you seem to be limiting yourself to ASCII, then you should be fine using the -UTF8String of the string:
NSString *source = #"TAKE CARE";
source = [source lowercaseString]; //normalize the capitalization
const char *characters = [source UTF8String];
for (NSUInteger i = 0; i < [source length]; ++i) {
const char character = characters[i];
if (character >= 'a' && character <= 'z') {
int positionInAlphabet = character - 'a' + 1; // this means "a" is "1"
NSLog(#"%c = %d", character, positionInAlphabet);
} else {
NSLog(#"non-letter: %c", character);
}
}