How to convert c++ std::u32string to NSString? - objective-c

I am developing a bridge between C++ and swift. And I need to convert C++ std::u32string to NSString. Here the code I tried:
u32string str = U"some string";
NSLog(#"%#", [[NSString alloc] initWithBytes:str.data() length:str.size() * sizeof(char32_t) encoding:NSUTF32StringEncoding]);
However initWithBytes returns nil.

Don't ask me why, but it seems to work if:
I insert a byte order mark (BOM) at the start of the string:u32string str = U"\uFEFFsome string";
or I use NSUTF32LittleEndianStringEncoding instead of NSUTF32StringEncoding.
Inserting byte order marks all over the place isn't terribly practical, so I guess you need to define your own constant which evaluates to either the little or big endian versions of the encoding constant depending on the platform being compiled.
This appears to be a quirk in Foundation, nothing specific to Objective-C++ or your use of std::u32string.

Related

Difference between NSString and # along with a string in objective c [duplicate]

I've been using Objective-C for a while now, but have never really understood what the purpose of the # symbol before all strings is. For instance, why do you have to declare a string like this:
NSString *string = #"This is a string";
and not like this:
NSString *anotherString = "This is another string";
as you do in Java or so many other programming languages. Is there a good reason?
It denotes a NSString (rather than a standard C string)
an NSString is an Object that stores a unicode string and provides a bunch of method to assist with manipulating.
a C string is just a \0 terminated bunch of characters (bytes).
EDIT: and the good reason is that Objective-C builds on top of C, the C language constructs need to be still available. #"" is an objective-c only extension.

What does stringWithUTF8String do?

So I have done some searching around so that I could see what it was I was doing with my code, and I couldn't find any answers as to what this very one specific line of code does.
NSString* name = [NSString stringWithUTF8String:countryName];
I know what the rest does (I only had to google how to do this part), it is supposed to take my char* (countryName) and turn it into an NSString so later on I can compare it with the
isEqualToString:
thing. I would just like to know what the following is actually doing to the char, and what does the UTF8String even mean?
I have barely any Objective C programming experience so any feedback is helpful :D
you are not totally right.
this method
Returns a string created by copying the data from a given C array of UTF8-encoded bytes.
so, UTF-8 string here is just a C array of bytes.
Check the documentation here.
It doesn't do anything to the char * string. It's just the input to the method. stringWithUTF8String takes a C-style string (in UTF-8 encoding), and creates an NSString using it as a template.

Concatenating & storing music symbols - Objective-C

I was using these unicode definitions for sharp and flat symbols and they work fine in string concats:
#define kSharpSymbol [NSString stringWithFormat:#"\U0000266F"]
#define kFlatSymbol [NSString stringWithFormat:#"\U0000266D"]
[...]
// Set F#
[f setNoteLetterName:[NSString stringWithFormat:#"F%#",kSharpSymbol]];
Then, I just read on a SO question that relying on the unicode formatting is not recommended by Apple so I went to this, which also works but results in compiler warnings when I do the implicit string concat:
Format specifies type 'unsigned short' but the argument has type 'int'
#define kSharpSymbol [NSString stringWithFormat:#"%C", 0x266F]
#define kFlatSymbol [NSString stringWithFormat:#"%C", 0x266D]
[...]
// Set F#
[f setNoteLetterName:[NSString stringWithFormat:#"F%#",kSharpSymbol]];
I guess I need some clarity on this. What's best and how do I get the compiler to be happy?
I would suggest another way to approach this problem: there is absolutely nothing wrong with using string constants that contain Unicode symbols directly, for example
#define kSharpSymbol #"♯"
#define kFlatSymbol #"♭"
The advantage is that the human readers of your program are going to see the symbol without looking it up in a table. The disadvantage is that the program is not going to look correctly when viewed in some older text editors that do not support modern file encoding. Fortunately, Xcode's editor is not one of them, so it shouldn't be a concern.

Unfamiliar C syntax in Objective-C context

I am coming to Objective-C from C# without any intermediate knowledge of C. (Yes, yes, I will need to learn C at some point and I fully intend to.) In Apple's Certificate, Key, and Trust Services Programming Guide, there is the following code:
static const UInt8 publicKeyIdentifier[] = "com.apple.sample.publickey\0";
static const UInt8 privateKeyIdentifier[] = "com.apple.sample.privatekey\0";
I have an NSString that I would like to use as an identifier here and for the life of me I can't figure out how to get that into this data structure. Searching through Google has been fruitless also. I looked at the NSString Class Reference and looked at the UTF8String and getCharacters methods but I couldn't get the product into the structure.
What's the simple, easy trick I'm missing?
Those are C strings: Arrays (not NSArrays, but C arrays) of characters. The last character is a NUL, with the numeric value 0.
“UInt8” is the CoreServices name for an unsigned octet, which (on Mac OS X) is the same as an unsigned char.
static means that the array is specific to this file (if it's in file scope) or persists across function calls (if it's inside a method or function body).
const means just what you'd guess: You cannot change the characters in these arrays.
\0 is a NUL, but including it explicitly in a "" literal as shown in those examples is redundant. A "" literal (without the #) is NUL-terminated anyway.
C doesn't specify an encoding. On Mac OS X, it's generally something ASCII-compatible, usually UTF-8.
To convert an NSString to a C-string, use UTF8String or cStringUsingEncoding:. To have the NSString extract the C string into a buffer, use getCString:maxLength:encoding:.
I think some people are missing the point here. Everyone has explained the two constant arrays that are being set up for the tags, but if you want to use an NSString, you can simply add it to the attribute dictionary as-is. You don't have to convert it to anything. For example:
NSString *publicTag = #"com.apple.sample.publickey";
NSString *privateTag = #"com.apple.sample.privatekey";
The rest of the example stays exactly the same. In this case, there is no need for the C string literals at all.
Obtaining a char* (C string) from an NSString isn't the tricky part. (BTW, I'd also suggest UTF8String, it's much simpler.) The Apple-supplied code works because it's assigning a C string literal to the static const array variables. Assigning the result of a function or method call to a const will probably not work.
I recently answered an SO question about defining a constant in Objective-C, which should help your situation. You may have to compromise by getting rid of the const modifier. If it's declared static, you at least know that nobody outside the compilation unit where it's declared can reference it, so just make sure you don't let a reference to it "escape" such that other code could modify it via a pointer, etc.
However, as #Jason points out, you may not even need to convert it to a char* at all. The sample code creates an NSData object for each of these strings. You could just do something like this within the code (replacing steps 1 and 3):
NSData* publicTag = [#"com.apple.sample.publickey" dataUsingEncoding:NSUnicodeStringEncoding];
NSData* privateTag = [#"com.apple.sample.privatekey" dataUsingEncoding:NSUnicodeStringEncoding];
That sure seems easier to me than dealing with the C arrays if you already have an NSString.
try this
NSString *newString = #"This is a test string.";
char *theString;
theString = [newString cStringWithEncoding:[NSString defaultCStringEncoding]];

Raw strings like Python's in Objective-C

Does Objective-C have raw strings like Python's?
Clarification: a raw string doesn't interpret escape sequences like \n: both the slash and the "n" are separate characters in the string. From the linked Python tutorial:
>>> print 'C:\some\name' # here \n means newline!
C:\some
ame
>>> print r'C:\some\name' # note the r before the quote
C:\some\name
Objective-C is a superset of C. So, the answer is yes. You can write
char* string="hello world";
anywhere. You can then turn it into an NSString later by
NSString* nsstring=[NSString stringWithUTF8String:string];
From your link explaining what you mean by "raw string", the answer is: there is no built in method for what you are asking.
However, you can replace occurrences of one string with another string, so you can replace #"\n" with #"\\n", for example. That should get you close to what you're seeking.
You can use stringize macro.
#define MAKE_STRING(x) ##x
NSString *expendedString = MAKE_STRING(
hello world
"even quotes will be escaped"
);
The preprocess result is
NSString *expendedString = #"hello world \"even quotes will be escaped\"";
As you can see, double quotes are escaped, however new lines are ignored.
This feature is very suitable to paste some JS code in Objective-C files. Using this feature is safe if you are using C99.
source:
https://gcc.gnu.org/onlinedocs/cpp/Stringizing.html
How, exactly, does the double-stringize trick work?
Like everyone said, raw ANSI strings are very easy. Just use simple C strings, or C++ std::string if you feel like compiling Objective C++.
However, the native string format of Cocoa is UCS-2 - fixed-width 2-byte characters. NSStrings are stored, internally, as UCS-2, i. e. as arrays of unsigned short. (Just like in Win32 and in Java, by the way.) The systemwide aliases for that datatype are unichar and UniChar. Here's where things become tricky.
GCC includes a wchar_t datatype, and lets you define a raw wide-char string constant like this:
wchar_t *ws = L"This a wide-char string.";
However, by default, this datatype is defined as 4-byte int and therefore is not the same as Cocoa's unichar! You can override that by specifying the following compiler option:
-fshort-wchar
but then you lose the wide-char C RTL functions (wcslen(), wcscpy(), etc.) - the RTL was compiled without that option and assumes 4-byte wchar_t. It's not particularly hard to reimplement these functions by hand. Your call.
Once you have a truly 2-byte wchar_t raw strings, you can trivially convert them to NSStrings and back:
wchar_t *ws = L"Hello";
NSString *s = [NSString stringWithCharacters:(const unichar*)ws length:5];
Unlike all other [stringWithXXX] methods, this one does not involve any codepage conversions.
Objective-C is a strict superset of C so you are free to use char * and char[] wherever you want (if that's what you call raw strings).
If you mean C-style strings, then yes.