What does this notation mean in TLS documentation? - ssl

For example, looking at RFC 7301, which defines ALPN:
enum {
application_layer_protocol_negotiation(16), (65535)
} ExtensionType;
The (16) is the enum value to be used, but how should I read the (65535) part?
From the same document:
opaque ProtocolName<1..2^8-1>;
struct {
ProtocolName protocol_name_list<2..2^16-1>
} ProtocolNameList;
...how should I read the <1..2^8-1> and <2..2^16-1> parts?

The notation is described in https://www.rfc-editor.org/rfc/rfc8446.
For "enumerateds" (enums), see https://www.rfc-editor.org/rfc/rfc8446#section-3.5, which says that the value in brackets is the value of that enum member, and that the enum occupies as many octets as required by the highest documented value.
Thus, if you want to leave some room, you need an un-named enum member with a sufficiently high value.
One may optionally specify a value without its associated tag to force the width definition without defining a superfluous element.
In the following example, Taste will consume two bytes in the data stream but can only assume the values 1, 2, or 4.
enum { sweet(1), sour(2), bitter(4), (32000) } Taste;
For vectors, see https://www.rfc-editor.org/rfc/rfc8446#section-3.4. This says:
Variable-length vectors are defined by specifying a subrange of legal lengths, inclusively, using the notation <floor..ceiling>. When these are encoded, the actual length precedes the vector's contents in the byte stream. The length will be in the form of a number consuming as many bytes as required to hold the vector's specified maximum (ceiling) length.
So the notation <1..2^8-1> means that ProtocolName must be at least one octet, and up to 255 octets in length.
Similarly <2..2^16-1> means that protocol_name_list must have at least 2 octets (not entries), and can have up to 65535 octets (not entries).
In this particular case, the minimum of 2 octets is because it must contain at least one entry, which is itself at least 2 octets long (u8 length prefix, at least one octet in the value).
To make the octets/entries distinction clear, later in that section, it says:
uint16 longer<0..800>;
/* zero to 400 16-bit unsigned integers */

Related

How are strings written down in FlatBuffers

I am researching FlatBuffers file structure and I want to know how are strings written down. From what I could gather, the string orc (for example) is written down as letters count in little endian (0x3 0x0 0x0 0x0) followed by the actual letters and followed by something else. I am trying to understand what the something else is. What bytes follow the letters? I am only asking about the presentation of this specific string in buffer/file.
According to the FlatBuffer documentation:
"Strings are simply a vector of bytes, and are always null-terminated. Vectors are stored as contiguous aligned scalar elements prefixed by a 32bit element count (not including any null termination). Neither is stored inline in their parent, but are referred to by offset. A vector may consist of more than one offset pointing to the same value if the user explicitly serializes the same offset twice."
Thus, the 4 bytes at the front are the 32 bit element count, and 0x3 0x0 0x0 0x0 would mean that there are 3 bytes in the string excluding zero termination. (FlatBuffer defaults to little-endian; see link above.)

In PDF format syntax should number 1.e10 be written as 10000000000.?

Looking at the PDF Referene ver 1.7 about how objects of type number
are writen according to valid syntax it informs.
Note: PDF does not support the PostScript syntax for numbers with
nondecimal radices (such as 16#FFFE ) or in exponential format (such
as 6.02E23 ).
However it also does not mandate a maximum range the numbers should be in. This seems to suggest it would be correct to write
1.00E10 as 10000000000
or
1.00E-50 as 0.00000000000000000000000000000000000000000000000001
This question has hence 2 aspects:
a) is the notation correct (as provided in the examples?
b) does pdf format expect implementations to use (or at least fall back
to some bigint/bigfloat handling) of numbers, as it seems to not provide
any range for the numbers?
First of all, for normative information on PDF you should refer to the appropriate ISO standards, in particular ISO 32000. Yes, Part 1 (ISO 32000-1) in particular is derived from the PDF reference 1.7 without that many changes, but not without changes either. (Ok, in some situations one has to consult the old PDF reference, too, to understand some of these changes.)
Adobe has published a copy thereof (with "ISO" in the page headers removed) on its web site: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
Now to your question:
According to ISO 32000, both part 1 and 2:
An integer shall be written as one or more decimal digits optionally preceded by a sign. [...]
A real value shall be written as one or more decimal digits with an optional sign and a leading, trailing, or embedded PERIOD (2Eh) (decimal point).
(section 7.3.3 "Numeric Objects")
Thus, concerning your question a)
is the notation correct (as provided in the examples?
Yes, 10000000000 is an integer valued numeric object, 0.00000000000000000000000000000000000000000000000001 is a real valued numeric object.
Concerning your question b)
does pdf format expect implementations to use (or at least fall back to some bigint/bigfloat handling) of numbers, as it seems to not provide any range for the numbers?
No, in the same section as quoted above you also find
The range and precision of numbers may be limited by the internal representations used in the computer on which the conforming reader is running; Annex C gives these limits for typical implementations.
and Annex C recommends at least the following limits:
integer
2,147,483,647
Largest integer value; equal to 231 − 1.
integer
-2,147,483,648
Smallest integer value; equal to −231
real
±3.403 × 1038
Largest and smallest real values (approximate).
real
±1.175 × 10-38
Nonzero real values closest to 0 (approximate). Values closer than these are automatically converted to 0.
real
5
Number of significant decimal digits of precision in fractional part (approximate).
(ISO 32000-1)
Integers
Integer values (such as object numbers) can often be expressed within 32 bits.
Real numbers
Modern computers often represent and process real numbers using IEEE Standard for Floating-Point Arithmetic (IEEE 754) single or double precision.
(ISO 32000-2)

Does the "C" code algorithm in RFC1071 work well on big-endian machine?

As described in RFC1071, an extra 0-byte should be added to the last byte when calculating checksum in the situation of odd count of bytes:
But in the "C" code algorithm, only the last byte is added:
The above code does work on little-endian machine where [Z,0] equals Z, but I think there's some problem on big-endian one where [Z,0] equals Z*256.
So I wonder whether the example "C" code in RFC1071 only works on little-endian machine?
-------------New Added---------------
There's one more example of "breaking the sum into two groups" described in RFC1071:
We can just take the data here (addr[]={0x00, 0x01, 0xf2}) for example:
Here, "standard" represents the situation described in the formula [2], while "C-code" representing the C code algorithm situation.
As we can see, in "standard" situation, the final sum is f201 regardless of endian-difference since there's no endian-issue with the abstract form of [Z,0] after "Swap". But it matters in "C-code" situation because f2 is always the low-byte whether in big-endian or in little-endian.
Thus, the checksum is variable with the same data(addr&count) on different endian.
I think you're right. The code in the RFC adds the last byte in as low-order, regardless of whether it is on a litte-endian or big-endian machine.
In these examples of code on the web we see they have taken special care with the last byte:
https://github.com/sjaeckel/wireshark/blob/master/epan/in_cksum.c
and in
http://www.opensource.apple.com/source/tcpdump/tcpdump-23/tcpdump/print-ip.c
it does this:
if (nleft == 1)
sum += htons(*(u_char *)w<<8);
Which means that this text in the RFC is incorrect:
Therefore, the sum may be calculated in exactly the same way
regardless of the byte order ("big-endian" or "little-endian")
of the underlaying hardware. For example, assume a "little-
endian" machine summing data that is stored in memory in network
("big-endian") order. Fetching each 16-bit word will swap
bytes, resulting in the sum; however, storing the result
back into memory will swap the sum back into network byte order.
The following code in place of the original odd byte handling is portable (i.e. will work on both big- and little-endian machines), and doesn't depend on an external function:
if (count > 0)
{
char buf2[2] = {*addr, 0};
sum += *(unsigned short *)buf2;
}
(Assumes addr is char * or const char *).

Storing integers in a redis ordered set?

I have a system which deals with keys that have been turned into unsigned long integers (by packing short sequences into byte strings). I want to try storing these in Redis, and I want to do it in the best way possible. My concern is mainly memory efficiency.
From playing with the online REPL I notice that the two following are identical
zadd myset 1.0 "123"
zadd myset 1.0 123
This means that even if I know I want to store an integer, it has to be set as a string. I notice from the documentation that keys are just stored as char*s and that commands like SETBIT indicate that Redis is not averse to treating strings as bytestrings in the client. This hints at a slightly more efficient way of storing unsigned longs than as their string representation.
What is the best way to store unsigned longs in sorted sets?
Thanks to Andre for his answer. Here are my findings.
Storing ints directly
Redis keys must be strings. If you want to pass an integer, it has to be some kind of string. For small, well-defined sets of values, Redis will parse the string into an integer, if it is one. My guess is that it will use this int to tailor its hash function (or even statically dimension a hash table based on the value). This works for small values (examples being the default values of 64 entries of a value of up to 512). I will test for larger values during my investigation.
http://redis.io/topics/memory-optimization
Storing as strings
The alternative is squashing the integer so it looks like a string.
It looks like it is possible to use any byte string as a key.
For my application's case it actually didn't make that much difference storing the strings or the integers. I imagine that the structure in Redis undergoes some kind of alignment anyway, so there may be some pre-wasted bytes anyway. The value is hashed in any case.
Using Python for my testing, so I was able to create the values using the struct.pack. long longs weigh in at 8 bytes, which is quite large. Given the distribution of integer values, I discovered that it could actually be advantageous to store the strings, especially when coded in hex.
As redis strings are "Pascal-style":
struct sdshdr {
long len;
long free;
char buf[];
};
and given that we can store anything in there, I did a bit of extra Python to code the type into the shortest possible type:
def do_pack(prefix, number):
"""
Pack the number into the best possible string. With a prefix char.
"""
# char
if number < (1 << 8*1):
return pack("!cB", prefix, number)
# ushort
elif number < (1 << 8*2):
return pack("!cH", prefix, number)
# uint
elif number < (1 << 8*4):
return pack("!cI", prefix, number)
# ulonglong
elif number < (1 << 8*8):
return pack("!cQ", prefix, number)
This appears to make an insignificant saving (or none at all). Probably due to struct padding in Redis. This also drives Python CPU through the roof, making it somewhat unattractive.
The data I was working with was 200000 zsets of consecutive integer => (weight, random integer) × 100, plus some inverted index (based on random data). dbsize yields 1,200,001 keys.
Final memory use of server: 1.28 GB RAM, 1.32 Virtual. Various tweaks made a difference of no more than 10 megabytes either way.
So my conclusion:
Don't bother encoding into fixed-size data types. Just store the integer as a string, in hex if you want. It won't make all that much difference.
References:
http://docs.python.org/library/struct.html
http://redis.io/topics/internals-sds
I'm not sure of this answer, it's more of a suggestion than anything else. I'd have to give it a try and see if it works.
As far as I can tell, Redis only supports UTF-8 strings.
I would suggest grabbing a bit representation of your long integer and pad it accordingly to fill up the nearest byte. Encode each set of 8 bytes to a UTF-8 string (ending up with 8x*utf8_char* string) and store that in Redis. The fact that they're unsigned means that you don't care about that first bit but if you did, you could add a flag to the string.
Upon retrieving the data, you have to remember to pad each character to 8 bytes again as UTF-8 will use less bytes for the representation if the character can be stored with less bytes.
End result is that you store a maximum of 8 x 8 byte characters instead of (possibly) a maximum of 64 x 8 byte characters.

Should there be a difference between an empty BSTR and a NULL BSTR?

When maintaining a COM interface should an empty BSTR be treated the same way as NULL?
In other words should these two function calls produce the same result?
// Empty BSTR
CComBSTR empty(L""); // Or SysAllocString(L"")
someObj->Foo(empty);
// NULL BSTR
someObj->Foo(NULL);
Yes - a NULL BSTR is the same as an empty one. I remember we had all sorts of bugs that were uncovered when we switched from VS6 to 2003 - the CComBSTR class had a change to the default constructor that allocated it using NULL rather than an empty string. This happens when you for example treat a BSTR as a regular C style string and pass it to some function like strlen, or try to initialise a std::string with it.
Eric Lippert discusses BSTR's in great detail in Eric's Complete Guide To BSTR Semantics:
Let me list the differences first and
then discuss each point in
excruciating detail.
A BSTR must have identical
semantics for NULL and for "". A PWSZ
frequently has different semantics for
those.
A BSTR must be allocated and freed
with the SysAlloc* family of
functions. A PWSZ can be an
automatic-storage buffer from the
stack or allocated with malloc, new,
LocalAlloc or any other memory
allocator.
A BSTR is of fixed length. A PWSZ
may be of any length, limited only by
the amount of valid memory in its
buffer.
A BSTR always points to the first
valid character in the buffer. A PWSZ
may be a pointer to the middle or end
of a string buffer.
When allocating an n-byte BSTR you
have room for n/2 wide characters.
When you allocate n bytes for a PWSZ
you can store n / 2 - 1 characters --
you have to leave room for the null.
A BSTR may contain any Unicode data
including the zero character. A PWSZ
never contains the zero character
except as an end-of-string marker.
Both a BSTR and a PWSZ always have a
zero character after their last valid
character, but in a BSTR a valid
character may be a zero character.
A BSTR may actually contain an odd
number of bytes -- it may be used for
moving binary data around. A PWSZ is
almost always an even number of bytes
and used only for storing Unicode
strings.
The easiest way to handle this dilemma is to use CComBSTR and check for .Length() to be zero. That works for both empty and NULL values.
However, keep in mind, empty BSTR must be released or there will be a memory leak. I saw some of those recently in other's code. Quite hard to find, if you are not looking carefully.