What is maximum length of MIDL helpstring attrubute? - com

Looks like MIDL helpstring attribute has a limitation on string length although not documented.
As I see, it's around 260 characters (similar to MAX_PATH constant).
Does anybody know something about it?


protocol-buffers: string or byte sequence of the exact length

Looking at https://developers.google.com/protocol-buffers/docs/proto3#scalar it appears that string and bytes types don't limit the length? Does it mean that we're expected to specify the length of transmitted string in a separate field, e.g. :
message Person {
string name = 1;
int32 name_len = 2;
int32 user_id = 3;
The wire type used for string/byte is Length-delimited. This means that the message includes the strings length. How this is made available to you will depend upon the language you are using - for example the table says that in C++ a string type is used so you can call name.length() to retrieve the length.
So there is no need to specify the length in a separate field.
One of the things that I wished GPB did was allow the schema to be used to set constraints on such things as list/array length, or numerical value ranges. The best you can do is to have a comment in the .proto file and hope that programmers pay attention to it!
Other serialisation technologies do do this, like XSD (though often the tools are poor), ASN.1 and JSON schema. It's very useful. If GPB added these (it doesn't change wire formats), GPB would be pretty well "complete".

Finding the length of an XSTRING

I have an XSTRING variable, but i can't find a way to find it's length (in bytes, preferrably).
It seems that when an XSTRING is returned/exported from a method or function module, its length is usually also exported. But in my case i only have the string itself.
Is there a way to figure out the length?
I'd say you're looking for xstrlen( l_my_xstring )

Character encoding that won't change the higher bits after I set them

I'm looking for a character encoding that allows me to set a byte higher than 127. NSASCIICharacterEncoding and NSUTF8CharacterEncoding replace those higher values.
The character encoding only matters when you're trying to interpret the bytes as characters. If that's what you need to do, and if you're using data that comes from some outside source, then use whatever encoding the outside source used.
On the other hand, if you're just trying to manage a collection of bytes (i.e. not characters), then look into using NSData instead. NSData doesn't care about character encodings, doesn't change the order of your bytes, and will happily keep track of as much data as you give it. (There's a mutable version if you need to modify the data it contains.)

Trailing Ampersand in VB.NET hexadecimal?

This should be an easy one for folks. Google's got nothing except content farms linking to one blurb, and that's written in broken English. So let's get this cleared up here where it'll be entombed for all time.
What's the trailing ampersand on VB hexadecimal numbers for? I've read it forces conversion to an Int32 on the chance VB wants to try and store as an Int16. That makes sense to me. But the part I didn't get from the blurb was to always use the trailing ampersand for bitmasks, flags, enums, etc. Apparantly, it has something to do with overriding VB's fetish for using signed numbers for things internally, which can lead to weird results in comparisons.
So to get easy points, what are the rules for VB.Net hexadecimal numbers, with and without the trailing ampersand? Please include the specific usage in the case of bitmasks/flags and such, and how one would also use it to force signed vs. unsigned.
No C# please :)
Vb.net will regard "&h"-notation hex constants in the range from 0x80000000-0xFFFFFFFF as negative numbers unless the type is explicitly specified as UInt32, Int64, or UInt64. Such behavior might be understandable if the numbers were written with precisely eight digits following the "&", but for some reason I cannot fathom, vb.net will behave that way even if the numbers are written with leading zeroes. In present versions of VB, one may force the number to be evaluated correctly by using a suffix of "&" suffix (Int64), "L" (Int64), "UL" (UInt64), or "UI" (UInt32). In earlier versions of VB, the "problem range" was 0x8000-0xFFFF, and the only way to force numbers in that range to be evaluated correctly (as a 32-bit integer, which was then called a "Long") was a trailing ampersand.
Visual Basic has the concept of Type Characters. These can be used to modify variable declarations and literals, although I'd not recommend using them in variable declarations - most developers are more familiar these days with As. E.g. the following declarations are equivalent:
Dim X&
Dim X As Long
But personally, I find the second more readable. If I saw the first, I'd actually have to go visit the link above, or use Intellisense, to work out what the variable is (not good if looking at the code on paper).

Objective-C How to get unicode character

I want to get unicode code point for a given unicode character in Objective-C. NSString said it internal use UTF-16 encoding and said,
The NSString class has two primitive methods—length and characterAtIndex:—that provide the basis for all other methods in its interface. The length method returns the total number of Unicode characters in the string. characterAtIndex: gives access to each character in the string by index, with index values starting at 0.
That seems assume characterAtIndex method is unicode aware. However it return unichar is a 16 bits unsigned int type.
- (unichar)characterAtIndex:(NSUInteger)index
The questions are:
Q1: How it present unicode code point above UFFFF?
Q2: If Q1 make sense, is there method to get unicode code point for a given unicode character in Objective-C.
The short answer to "Q1: How it present unicode code point above UFFFF?" is: You need to be UTF16 aware and correctly handle Surrogate Code Points. The info and links below should give you pointers and example code that allow you to do this.
The NSString documentation is correct. However, while you said "NSString said it internal use UTF-16 encoding", it's more accurate to say that the public / abstract interface for NSString is UTF16 based. The difference is that this leaves the internal representation of a string a private implementation detail, but the public methods such as characterAtIndex: and length are always in UTF16.
The reason for this is it tends to strike the best balance between older ASCII-centric and Unicode aware strings, largely due to the fact that Unicode is a strict superset of ASCII (ASCII uses 7 bits, for 128 characters, which are mapped to the first 128 Unicode Code Points).
To represent Unicode Code Points that are > U+FFFF, which obviously exceeds what can be represented in a single UTF16 Code Unit, UTF16 uses special Surrogate Code Points to form a Surrogate Pair, which when combined together form a Unicode Code Point > U+FFFF. You can find details about this at:
Unicode UTF FAQ - What are surrogates?
Unicode UTF FAQ - What’s the algorithm to convert from UTF-16 to character codes?
Although the official Unicode UTF FAQ - How do I write a UTF converter? now recommends the use of International Components for Unicode, it used to recommend some code officially sanctioned and maintained by Unicode. Although no longer directly available from Unicode.org, you can still find copies of the "no longer official" example code in various open-source projects: ConvertUTF.c and ConvertUTF.h. If you need to roll your own, I'd strongly recommend examining this code first, as it is well tested.
From the documentation of length:
The number returned includes the
individual characters of composed
character sequences, so you cannot use
this method to determine if a string
will be visible when printed or how
long it will appear.
From this, I would infer that any characters above U+FFFF would be counted as two characters and would be encoded as a Surrogate Pair (see the relevant entry at http://unicode.org/glossary/).
If you have a UTF-32 encoded string with the character you wish to convert, you could create a new NSString with initWithBytesNoCopy:length:encoding:freeWhenDone: and use the result of that to determine how the character is encoded in UTF-16, but if you're going to be doing much heavy Unicode processing, your best bet is probably to get familiar with ICU (http://site.icu-project.org/).