Raw strings like Python's in Objective-C - objective-c

Does Objective-C have raw strings like Python's?
Clarification: a raw string doesn't interpret escape sequences like \n: both the slash and the "n" are separate characters in the string. From the linked Python tutorial:
>>> print 'C:\some\name' # here \n means newline!
C:\some
ame
>>> print r'C:\some\name' # note the r before the quote
C:\some\name

Objective-C is a superset of C. So, the answer is yes. You can write
char* string="hello world";
anywhere. You can then turn it into an NSString later by
NSString* nsstring=[NSString stringWithUTF8String:string];

From your link explaining what you mean by "raw string", the answer is: there is no built in method for what you are asking.
However, you can replace occurrences of one string with another string, so you can replace #"\n" with #"\\n", for example. That should get you close to what you're seeking.

You can use stringize macro.
#define MAKE_STRING(x) ##x
NSString *expendedString = MAKE_STRING(
hello world
"even quotes will be escaped"
);
The preprocess result is
NSString *expendedString = #"hello world \"even quotes will be escaped\"";
As you can see, double quotes are escaped, however new lines are ignored.
This feature is very suitable to paste some JS code in Objective-C files. Using this feature is safe if you are using C99.
source:
https://gcc.gnu.org/onlinedocs/cpp/Stringizing.html
How, exactly, does the double-stringize trick work?

Like everyone said, raw ANSI strings are very easy. Just use simple C strings, or C++ std::string if you feel like compiling Objective C++.
However, the native string format of Cocoa is UCS-2 - fixed-width 2-byte characters. NSStrings are stored, internally, as UCS-2, i. e. as arrays of unsigned short. (Just like in Win32 and in Java, by the way.) The systemwide aliases for that datatype are unichar and UniChar. Here's where things become tricky.
GCC includes a wchar_t datatype, and lets you define a raw wide-char string constant like this:
wchar_t *ws = L"This a wide-char string.";
However, by default, this datatype is defined as 4-byte int and therefore is not the same as Cocoa's unichar! You can override that by specifying the following compiler option:
-fshort-wchar
but then you lose the wide-char C RTL functions (wcslen(), wcscpy(), etc.) - the RTL was compiled without that option and assumes 4-byte wchar_t. It's not particularly hard to reimplement these functions by hand. Your call.
Once you have a truly 2-byte wchar_t raw strings, you can trivially convert them to NSStrings and back:
wchar_t *ws = L"Hello";
NSString *s = [NSString stringWithCharacters:(const unichar*)ws length:5];
Unlike all other [stringWithXXX] methods, this one does not involve any codepage conversions.

Objective-C is a strict superset of C so you are free to use char * and char[] wherever you want (if that's what you call raw strings).

If you mean C-style strings, then yes.

Related

How to convert c++ std::u32string to NSString?

I am developing a bridge between C++ and swift. And I need to convert C++ std::u32string to NSString. Here the code I tried:
u32string str = U"some string";
NSLog(#"%#", [[NSString alloc] initWithBytes:str.data() length:str.size() * sizeof(char32_t) encoding:NSUTF32StringEncoding]);
However initWithBytes returns nil.
Don't ask me why, but it seems to work if:
I insert a byte order mark (BOM) at the start of the string:u32string str = U"\uFEFFsome string";
or I use NSUTF32LittleEndianStringEncoding instead of NSUTF32StringEncoding.
Inserting byte order marks all over the place isn't terribly practical, so I guess you need to define your own constant which evaluates to either the little or big endian versions of the encoding constant depending on the platform being compiled.
This appears to be a quirk in Foundation, nothing specific to Objective-C++ or your use of std::u32string.

How to access a character in NSMutableString Objective-C

I have an instance of NSMutableString called MyMutableStr and I want access its character at index 7.
For example:
unsigned char cMy = [(NSString*) MyMutableStr characterAtIndex:7];
I think this is an ugly way; it's too much code.
My question is: Are there more simple ways in Objective-C to access the character in NSMutableString?
Like, in C language we can access a character of a string using [ ] operator:
unsigned char cMy = MyMutableStr[7];
The way of doing it is to use characterAtIndex:, but you don't need to cast it to a NSString pointer, since NSMutableString is a subclass of NSString. So it isn't that long, but if you still don't find it comfortable, I suggest to use UTF8String to obtain a C string over which you can iterate using the brackets operator:
const char* cString= [MyMutableStr UTF8String];
char first= cString[0];
But remember this (taken from NSString class reference):
The returned C string is automatically freed just as a returned object would be released; you should copy the C string if it needs to store it outside of the autorelease context in which the C string is created.
As others said characterAtIndex: but a few things you might want to consider carefully.
First you're dealing with an mutable string. You want to be careful to avoid it changing out from under you. One way is to an immutable copy and use that for the op.
Second, you're dealing with Unicode so you may want to consider normalizing your string to get a precomposed form as some visual representations may be more than one actual unichar. That's often a stumbling block for folks.

Concatenating & storing music symbols - Objective-C

I was using these unicode definitions for sharp and flat symbols and they work fine in string concats:
#define kSharpSymbol [NSString stringWithFormat:#"\U0000266F"]
#define kFlatSymbol [NSString stringWithFormat:#"\U0000266D"]
[...]
// Set F#
[f setNoteLetterName:[NSString stringWithFormat:#"F%#",kSharpSymbol]];
Then, I just read on a SO question that relying on the unicode formatting is not recommended by Apple so I went to this, which also works but results in compiler warnings when I do the implicit string concat:
Format specifies type 'unsigned short' but the argument has type 'int'
#define kSharpSymbol [NSString stringWithFormat:#"%C", 0x266F]
#define kFlatSymbol [NSString stringWithFormat:#"%C", 0x266D]
[...]
// Set F#
[f setNoteLetterName:[NSString stringWithFormat:#"F%#",kSharpSymbol]];
I guess I need some clarity on this. What's best and how do I get the compiler to be happy?
I would suggest another way to approach this problem: there is absolutely nothing wrong with using string constants that contain Unicode symbols directly, for example
#define kSharpSymbol #"♯"
#define kFlatSymbol #"♭"
The advantage is that the human readers of your program are going to see the symbol without looking it up in a table. The disadvantage is that the program is not going to look correctly when viewed in some older text editors that do not support modern file encoding. Fortunately, Xcode's editor is not one of them, so it shouldn't be a concern.

Objective c doesn't like my unichars?

Xcode complaints about "multi-character character contant"'s when I try to do the following:
static unichar accent characters[] = { 'ā', 'á', 'ă', 'à' };
How do you make an array of characters, when not all of them are ascii? The following works just fine
static unichar accent[] = { 'a', 'b', 'c' };
Workaround
The closest work around I have found is to convert the special characters into hex, ie this works:
static unichar accent characters[] = { 0x0100, 0x0101, 0x0102 };
It's not that Objective-C doesn't like it, it's that C doesn't. The constant 'c' is for char which has 1 byte, not unichar which has 2 bytes. (see the note below for a bit more detail.)
There's no perfectly supported way to represent a unichar constant. You can use
char* s="ü";
in a UTF-8-encoded source file to get the unicode C-string, or
NSString* s=#"ü";
in a UTF-8 encoded source file to get an NSString. (This was not possible before 10.5. It's OK for iPhone.)
NSString itself is conceptually encoding-neutral; but if you want, you can get the unicode character by using -characterAtIndex:.
Finally two comments:
If you just want to remove accents from the string, you can just use the method like this, without writing the table yourself:
-(NSString*)stringWithoutAccentsFromString:(NSString*)s
{
if (!s) return nil;
NSMutableString *result = [NSMutableString stringWithString:s];
CFStringFold((CFMutableStringRef)result, kCFCompareDiacriticInsensitive, NULL);
return result;
}
See the document of CFStringFold.
If you want unicode characters for localization/internationalization, you shouldn't embed the strings in the source code. Instead you should use Localizable.strings and NSLocalizedString. See here.
Note:
For arcane historical reasons, 'a' is an int in C, see the discussions here. In C++, it's a char. But it doesn't change the fact that writing more than one byte inside '...' is implementation-defined and not recommended. For example, see ISO C Standard 6.4.4.10. However, it was common in classic Mac OS to write the four-letter code enclosed in single quotes, like 'APPL'. But that's another story...
Another complication is that accented letters are not always represented by 1 byte; it depends on the encoding. In UTF-8, it's not. In ISO-8859-1, it is. And unichar should be in UTF-16. Did you save your source code in UTF-16? I think the default of XCode is UTF-8. GCC might do some encoding conversion depending on the setup, too...
Or you can just do it like this:
static unichar accent characters[] = { L'ā', L'á', L'ă', L'à' };
L is a standard C keyword which says "I'm about to write a UNICODE character or character set".
Works fine for Objective-C too.
Note: The compiler may give you a strange warning about too many characters put inside a unichar, but you can safely ignore that warning. Xcode just doesn't deal with the unicode characters the right way, but the compiler parses them properly and the result is OK.
Depending on your circumstances, this may be a tidy way to do it:
NSCharacterSet* accents =
[NSCharacterSet characterSetWithCharactersInString:#"āáăà"];
And then, if you want to check if a given unichar is one of those accent characters:
if ([accents characterIsMember:someOtherUnichar])
{
}
NSString also has many methods of its own for handling NSCharacterSet objects.

Unfamiliar C syntax in Objective-C context

I am coming to Objective-C from C# without any intermediate knowledge of C. (Yes, yes, I will need to learn C at some point and I fully intend to.) In Apple's Certificate, Key, and Trust Services Programming Guide, there is the following code:
static const UInt8 publicKeyIdentifier[] = "com.apple.sample.publickey\0";
static const UInt8 privateKeyIdentifier[] = "com.apple.sample.privatekey\0";
I have an NSString that I would like to use as an identifier here and for the life of me I can't figure out how to get that into this data structure. Searching through Google has been fruitless also. I looked at the NSString Class Reference and looked at the UTF8String and getCharacters methods but I couldn't get the product into the structure.
What's the simple, easy trick I'm missing?
Those are C strings: Arrays (not NSArrays, but C arrays) of characters. The last character is a NUL, with the numeric value 0.
“UInt8” is the CoreServices name for an unsigned octet, which (on Mac OS X) is the same as an unsigned char.
static means that the array is specific to this file (if it's in file scope) or persists across function calls (if it's inside a method or function body).
const means just what you'd guess: You cannot change the characters in these arrays.
\0 is a NUL, but including it explicitly in a "" literal as shown in those examples is redundant. A "" literal (without the #) is NUL-terminated anyway.
C doesn't specify an encoding. On Mac OS X, it's generally something ASCII-compatible, usually UTF-8.
To convert an NSString to a C-string, use UTF8String or cStringUsingEncoding:. To have the NSString extract the C string into a buffer, use getCString:maxLength:encoding:.
I think some people are missing the point here. Everyone has explained the two constant arrays that are being set up for the tags, but if you want to use an NSString, you can simply add it to the attribute dictionary as-is. You don't have to convert it to anything. For example:
NSString *publicTag = #"com.apple.sample.publickey";
NSString *privateTag = #"com.apple.sample.privatekey";
The rest of the example stays exactly the same. In this case, there is no need for the C string literals at all.
Obtaining a char* (C string) from an NSString isn't the tricky part. (BTW, I'd also suggest UTF8String, it's much simpler.) The Apple-supplied code works because it's assigning a C string literal to the static const array variables. Assigning the result of a function or method call to a const will probably not work.
I recently answered an SO question about defining a constant in Objective-C, which should help your situation. You may have to compromise by getting rid of the const modifier. If it's declared static, you at least know that nobody outside the compilation unit where it's declared can reference it, so just make sure you don't let a reference to it "escape" such that other code could modify it via a pointer, etc.
However, as #Jason points out, you may not even need to convert it to a char* at all. The sample code creates an NSData object for each of these strings. You could just do something like this within the code (replacing steps 1 and 3):
NSData* publicTag = [#"com.apple.sample.publickey" dataUsingEncoding:NSUnicodeStringEncoding];
NSData* privateTag = [#"com.apple.sample.privatekey" dataUsingEncoding:NSUnicodeStringEncoding];
That sure seems easier to me than dealing with the C arrays if you already have an NSString.
try this
NSString *newString = #"This is a test string.";
char *theString;
theString = [newString cStringWithEncoding:[NSString defaultCStringEncoding]];