Can this statement ever fail?
if (#"Hello") foo(); ?
In other words can there be a situation where in the compiler fails to allocate enough storage space for literals. I know this sounds ridiculous for short length literals but what about really long ones.
No.
NSString literals are "allocated" at compile time and form part of the text segment of your program.
Edit
To answer the other part of the question, if the compiler fails to allocate enough memory for the literal, the if statement won't fail, but the compilation will.
I don't know what the effective upper limit to the length of a string literal is, but it's bound to be less than NSIntegerMax unichars because NSNotFound is defined as NSIntegerMax. According to the clang docs, the length in bytes of a string literal is an unsigned int and NSString string literals are sequences of unichars.
I'm pretty sure if you try to compile a file with the literal
#" ... 1TB worth of characters ... "
the compiler will fail. The C standard available here says that any compatible compiler needs to support at least 4095 characters per string literal, etc. See Section 5.2.4.1. I'm sure GCC and clang allows much bigger literals.
Related
I comment some code in my project and don't want these to be built into my app's binary.
Does Xcode build comments code into its binary?
//Obj-C
//- (void)functionName {
//
//}
//Swift
//func functionName() {
//
//}
For Swift: From The Basics in the “The Swift Programming Language” (emphasis mine):
Use comments to include nonexecutable text in your code, as a note or reminder to yourself. Comments are ignored by the Swift compiler when your code is compiled.
For Objective-C: Objective-C is an extension of C, and the C 99 standard specifies in “5.1.1.2 Translation phases” (emphasis added):
3 The source file is decomposed into preprocessing tokens6) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.
and in “6.4.9 Comments”:
1 Except within a character constant, a string literal, or a comment, the characters /* introduce a comment. The contents of such a comment are examined only to identify multibyte characters and to find the characters */ that terminate it.
2 Except within a character constant, a string literal, or a comment, the characters // introduce a comment that includes all multibyte characters up to, but not including, the next new-line character. The contents of such a comment are examined only to identify multibyte characters and to find the terminating new-line character.
Short answer: No.
Long answer:
Every single SDK has a compiler that compiles code into machine language (aka, hexadecimal codes for each of the commands). So, all compilers will ignore comments 100%, so that it can compile codes faster.
In terms of Apple's app, it is bundled such way that in it is packed with all the assets (images, sounds, plist, that are able to be viewed by anybody with the .app file. This is the case where hackers were able to create exactly same app but with slightly different graphics/sounds and resubmit as their own.
Together with those assets, is the BINARY UNIX EXECUTABLE file, which if you open in a notepad, you will see gibberish (machine code cant be read by notepad). Example below is one of my app:
I know how to return a Unicode character from a code point. That's not what I'm after. What I want to know is how to return the name associated with a particular code point. For example, The code point for 🍀 is 1F340. And its name is FOUR LEAF CLOVER. Is it possible for us to return this name with its code point? I've read about 100 topics involving Unicode. But I haven't see one discussing my question. I hope that's possible.
Thank you for your help.
Have you considered the ICU library? It offers the following C API: http://icu-project.org/apiref/icu4c/uchar_8h.html#aa488f2a373998c7decb0ecd3e3552079
int32_t u_charName(
UChar32 code,
UCharNameChoice nameChoice,
char* buffer,
int32_t bufferLength,
UErrorCode* pErrorCode)
Retrieve the name of a Unicode character.
Depending on nameChoice, the character name written into the buffer is the "modern" name or the name that was defined in Unicode version 1.0. The name contains only "invariant" characters like A-Z, 0-9, space, and '-'. Unicode 1.0 names are only retrieved if they are different from the modern names and if the data file contains the data for them. gennames may or may not be called with a command line option to include 1.0 names in unames.dat.
Parameters
code The character (code point) for which to get the name. It must be 0<=code<=0x10ffff.
nameChoice Selector for which name to get.
buffer Destination address for copying the name. The name will always be zero-terminated. If there is no name, then the buffer will be set to the empty string.
bufferLength ==sizeof(buffer)
pErrorCode Pointer to a UErrorCode variable; check for U_SUCCESS() after u_charName() returns.
Returns
The length of the name, or 0 if there is no name for this character. If the bufferLength is less than or equal to the length, then the buffer contains the truncated name and the returned length indicates the full length of the name. The length does not include the zero-termination.
ICU is the right approach, but it's even simpler than Chris said. Foundation includes ICU already, for various text processing functions, including CFStringTransform(). Its transform parameter accepts "any valid ICU transform ID defined in the ICU User Guide for Transforms".
One of ICU's transforms is Any-Name:
Converts between characters and their Unicode names in curly braces. For example:
., ⇆ {FULL STOP}{COMMA}
(The syntax isn't exactly as documented, but it's close enough you can figure it out.)
There's also an Any-Hex transform which can be used for translating to/from the codepoint hex value.
I want to verify that a given file in a path is of type text file, i.e. not binary, i.e. readable by a human. I guess reading first characters and check each character with :
isAlphaNumeric
isSpecial
isSeparator
isOctetCharacter ???
but joining all those testing methods with and: [ ... and: [ ... and: [ ] ] ] seems not to be very smalltalkish. Any suggestion for a more elegant way?
(There is a Python version here How to identify binary and text files using Python? which could be useful but syntax and implementation looks like C.)
only heuristics; you can never be really certain...
For ascii, the following may do:
|isPlausibleAscii numChecked|
isPlausibleAscii :=
[:char |
((char codePoint between:32 and:127)
or:[ char isSeparator ])
].
numChecked := text size min: 1024.
isPossiblyText := text from:1 to:numChecked conform: isPlausibleAscii.
For unicode (UTF8 ?) things become more difficult; you could then try to convert. If there is a conversion error, assume binary.
PS: if you don't have from:to:conform:, replace by (copyFrom:to:) conform:
PPS: if you don't have conform: , try allSatisfy:
All text contains more space than you'd expect to see in a binary file, and some encodings (UTF16/32) will contain lots of 0's for common languages.
A smalltalky solution would be to hide the gory details in method on Standard/MultiByte-FileStream, #isProbablyText would probably be a good choice.
It would essentially do the following:
- store current state if you intend to use it later, reset to start (Set Latin1 converter if you use a MultiByteStream)
Iterate over N next characters (where N is an appropriate number)
Encounter a non-printable ascii char? It's probably binary, so return false. (not a special selector, use a map, implement a new method on Character or something)
Increase 2 counters if appropriate, one for space characters, and another for zero characters.
If loop finishes, return whether either of the counters have been read a statistically significant amount
TLDR; Use a method to hide the gory details, otherwise it's pretty much the same.
Does anyone know how to format an NSString over multiple lines?
e.g. this doesn't build:
return #"asdfasdf" +
"asdfasdf";
return #"asdfasdf"
#"asdfasdf";
I suggest using this syntax instead of
return #"asdfasdf"
"asdfasdf";
just to distinguish C-strings from ObjectiveC ones.
I was having this problem all the time (especially with HTML strings), so I made a tiny tool to convert text to an escaped multi-line Objective-C string:
http://multilineobjc.herokuapp.com/
Hope this saves you some time.
If you remove the +, the compiler will join the two strings together. See C syntax: string literal concatenation.
return #"asdfasdf"
"asdfasdf";
Note that neither GCC nor LLVM seem to care if you omit the # prefix from the later strings.
I need to use something like NSLog but without the timestamp and newline character, so I'm using printf. How can I use this with NSString?
You can convert an NSString into a UTF8 string by calling the UTF8String method:
printf("%s", [string UTF8String]);
//public method that accepts a string argument
- (void) sayThis : ( NSString* ) this
{
printf("%s",[this cString]);
}
According to the NSString.h ( html version ) the UTF8String method is only available on Mac OSX.
(see below )
All the other methods I looked at are marked as 'availability:Openstep'
There are further methods that will return regular char* strings but they might throw character conversion exceptions.
NOTE The string pointers point to memory that might go away so you have to copy the strings if you want to keep a copy of the string contents, but immediate printing should be fine ?
There are also methods that will return an encoded string, and a method to test if the encoding you want will work ( I think ) so you can check if your required encoding will work and then request a string that has been encoded as required.
From reading through the .h file itself there are many encodings and translations between encodings.
These are managed using enumerations so you can pass the type of encoding you want as an argument.
On linux etc. do :
locate NSString.h
** Note this found the html doc file also
otherwise do a :
find /usr -name NSString.h
NOTE Your mileage may vary :)
Thanks.
From the NSString.h html doc file :
cString
- (const char*) cString;
Availability: OpenStep
Returns a pointer to a null terminated string of 8-bit characters in the default encoding. The memory pointed to is not owned by the caller, so the caller must copy its contents to keep it. Raises an NSCharacterConversionException if loss of information would occur during conversion. (See -canBeConvertedToEncoding: .)
cStringLength
- (NSUInteger) cStringLength;
Availability: OpenStep
Returns length of a version of this unicode string converted to bytes using the default C string encoding. If the conversion would result in information loss, the results are unpredictable. Check -canBeConvertedToEncoding: first.
cStringUsingEncoding:
- (const char*) cStringUsingEncoding: (NSStringEncoding)encoding;
Availability: MacOS-X 10.4.0, Base 1.2.0
Returns a pointer to a null terminated string of characters in the specified encoding.
NB. under GNUstep you can used this to obtain a nul terminated utf-16 string (sixteen bit characters) as well as eight bit strings.
The memory pointed to is not owned by the caller, so the caller must copy its contents to keep it.
Raises an NSCharacterConversionException if loss of information would occur during conversion.
canBeConvertedToEncoding:
- (BOOL) canBeConvertedToEncoding: (NSStringEncoding)encoding;
Availability: OpenStep
Returns whether this string can be converted to the given string encoding without information loss.