What is the space taken by bool in Google Protocol Buffers? - serialization

message Person{
optional bool foo = 1;
optional bool bar = 2;
}
In the serialized form what is the space taken by a bool type in google protobuf?

A bool is encoded as a varint with value 0 or 1, so the payload will take 1 byte. The field header size is dependent on the field number; for fields 1 and 2, this will be 1 byte. So overall: 2 bytes. If you are storing lots of books, consider packing them bitwise into a single integer field - perhaps using fixed width (fixed32 etc) if the high bits are likely (large magnitude numbers are relatively expensive to encode as varint)

Related

Twincat 3 - SizeOf returning wrong structure size

I have a structure, and am trying to get the size of this structure. SizeOf returns 16, but I am expecting 14 as answer.
2+2+4+2+2+2=14
By using pointers I noticed that there are 2 empty bytes at the end of the structure.
If I replace the UDINT with UINT then the size is correct. If I put the UDINT at the end of the structure, then the two empty bytes are placed after iCrateCnt.
This leads me to believe that the sizeOf is working properly, but for some unknown reason there are two additional bytes placed somewhere in my structure that I am not using.
Why is this happening and how can it be solved?
The unexpected size returned by SIZEOF() are due to so called 'padding bytes'.
Where these padding bytes occur depends on:
The system that is used (Tc2 x86, Tc2 ARM, Tc3)
The data types that are used
The order in which these datatypes (c.q. variables) are defined
For more information about padding bytes see Alignment and Structures
As Kolyur has rightfully mentioned the attribute Pack_Mode can be used to control these padding bytes.
For example in Tc3:
TYPE HMI_POPUPSTRUCT : // The total size of this struct is 8 bytes
STRUCT
bVar1: BOOL; // At byte 0.
// At byte 1 there will be a padding byte
bVar2: INT; // At byte 2 and 3
bVar3: BOOL; // At byte 4
bVar4: BOOL; // At byte 5
bVar5: BOOL; // At byte 6.
// At byte 7 there will be a padding byte (8th byte)
END_STRUCT
When inserting either
{attribute 'pack_mode' := '0'}
or
{attribute 'pack_mode' := '1'}
just above the struct then there won't be any padding bytes resulting in a struct-size of 6 bytes instead of 8.
The pack_mode attribute can be used to eliminate unused bytes in a structure.
https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_plc_intro/2529746059.html&id=3686945105176987925

How to read 8-byte integers in GMS 2.x?

I need to read 8-byte integers from a stream. I could not find any documentation how to read 8-byte integers in DM. It would be something similar to a long long integer.
Is there a trick how to stream 8-byte integers from file in GMS 2.x ?
We can use the "Stream" object to read/import data of various kinds. Please refer to the DM Help > Scripting > File Input and Output:
Other examples can also be found at DM-Script-Database :
Read-Ser (http://donation.tugraz.at/dm/source_codes/127)
JEMS_.ems file reader (http://donation.tugraz.at/dm/source_codes/108)
Hope this helps.
I used the following (stupid) method to do so:
number readint32(object s){
number stream_byte_order=2
number result=0
TagGroup tg = NewTagGroup();
tg.TagGroupSetTagAsLong( "SInt32_0", 0 )
TagGroupReadTagDataFromStream( tg, "SInt32_0", s, stream_byte_order );
tg.TagGroupGetTagAsLong( "SInt32_0", result)
return result
}
number readint64(object s){
//new for reading 8-byte integer in TIA ver >3.7
//DM automatic convert result to float when the second 4-byte >1
number result = readint32(s)+ (readint32(s)*4294967296)
// 4294967296 equals to 0xFFFFFFFF in hex form
return result
}
It works with reading ser <2GB, but does not for larger file. I still did not figure it out...
#09-04-2016
Now i got a solution to the data offset problem in ser:
Here is the solution:
Void b_readint64(object s, number &lo, number &hi){
//new for reading 8-byte (64bit) integer in TIA ver >3.7
//read the low and high section individually and later work
//together with StreamSetPos32singed, StreamSetPos64 funcsions
lo = b_readint32(s)
hi = b_readint32(s)
}
Void StreamSetPos32Signed(object s, number base, number lo){
if (lo>0) StreamSetPos(s, base, lo)
else StreamSetPos(s, base, 4294967296+lo)
}
Void StreamSetPos64(object s, number base, number lo, number hi){
if (hi!=0){
StreamSetPos(s, base, 0)
for (number i=0; i<hi; i++) StreamSetPos(s, 1, 4294967296)
StreamSetPos32Signed(s, 1, lo)
} else StreamSetPos32signed(s, base, lo)
}
BTW, I just uploaded this upgraded script to
http://portal.tugraz.at/portal/page/portal/felmi/DM-Script/DM-Script-Database
There is nothing like an 8-byte integer in DigitalMicrograph. You can use the streaming to read in two successive 4-byte sections as integers (See answer above) and then display them as binary using binary() or hexadecimal using hex(), but you will have to do the maths yourself for the "meaning" of the 8-byte integer (storing it as real-number). You can use the binary operators & | ^ for bitwise numeric, when needed.

Most common sequence of characters in a given string

Suppose I am given a string of characters. How to find the most common sequence of characters with a minimum length of l?
Programming language doesn't matter but it should work with a string of 1000+ at an usual Computer.
You have to find all possible sequences and count them. That is,
for (each position in string) {
length = 0;
do {
sequence = (string from position to position + length);
count sequence locations in string;
if (count is higher than max count) {
remember sequence;
update max count;
}
length++;
if (position + length > string.length or length > sequence limit) break;
}
}
It is possible that same sequences will be met in different string places so they will be counted abundantly. This is harmless but takes some extra cycles. A way to avoid that is to store found sequences and don't check those already checked. But memory requirements for long strings and long sequences may become huge.

Dealing with Int64 value with Booksleeve

I have a question about Marc Gravell's Booksleeve library.
I tried to understand how booksleeve deal the Int64 value (i have billion long value in Redis actually)
I used reflection to undestand the Set long value overrides.
// BookSleeve.RedisMessage
protected static void WriteUnified(Stream stream, long value)
{
if (value >= 0L && value <= 99L)
{
int i = (int)value;
if (i <= 9)
{
stream.Write(RedisMessage.oneByteIntegerPrefix, 0, RedisMessage.oneByteIntegerPrefix.Length);
stream.WriteByte((byte)(48 + i));
}
else
{
stream.Write(RedisMessage.twoByteIntegerPrefix, 0, RedisMessage.twoByteIntegerPrefix.Length);
stream.WriteByte((byte)(48 + i / 10));
stream.WriteByte((byte)(48 + i % 10));
}
}
else
{
byte[] bytes = Encoding.ASCII.GetBytes(value.ToString());
stream.WriteByte(36);
RedisMessage.WriteRaw(stream, (long)bytes.Length);
stream.Write(bytes, 0, bytes.Length);
}
stream.Write(RedisMessage.Crlf, 0, 2);
}
I don't understand why, with more than two digits int64, the long is encoding in ascii?
Why don't use byte[] ? I know than i can use byte[] overrides to do this, but i just want to understand this implementation to optimize mine. There may be a relationship with the Redis storage.
By advance thank you Marc :)
P.S : i'm still very enthusiastic about your next major version, than i can use long value key instead of string.
It writes it in ASCII because that is what the redis protocol demands.
If you look carefully, it is always encoded as ASCII - but for the most common cases (0-9, 10-99) I've special-cased it, as these are very simple results:
x => $1\r\nX\r\n
xy => $2\r\nXY\r\n
where x and y are the first two digits of a number in the range 0-99, and X and Y are those digits (as numbers) offset by 48 ('0') - so decimal 17 becomes the byte sequence (in hex):
24-32-0D-0A-31-37-0D-0A
Of course, that can also be achieved simply via the writing each digit sequentially and offsetting the digit value by 48 ('0'), and handling the negative sign - I guess the answer there is simply "because I coded it the simple but obviously correct way". Consider the value -123 - which is encoded as $4\r\n-123\r\n (hey, don't look at me - I didn't design the protocol). It is slightly awkward because it needs to calculate the buffer length first, then write that buffer length, then write the value - remembering to write in the order 100s, 10s, 1s (which is much harder than writing the other way around).
Perfectly willing to revisit it - simply: it works.
Of course, it becomes trivial if you have a scratch buffer available - you just write it in the simple order, then reverse the portion of the scratch buffer. I'll check to see if one is available (and if not, it wouldn't be unreasonable to add one).
I should also clarify: there is also the integer type, which would encode -123 as :-123\r\n - however, from memory there are a lot of places this simply does not work.

Storing integers in a redis ordered set?

I have a system which deals with keys that have been turned into unsigned long integers (by packing short sequences into byte strings). I want to try storing these in Redis, and I want to do it in the best way possible. My concern is mainly memory efficiency.
From playing with the online REPL I notice that the two following are identical
zadd myset 1.0 "123"
zadd myset 1.0 123
This means that even if I know I want to store an integer, it has to be set as a string. I notice from the documentation that keys are just stored as char*s and that commands like SETBIT indicate that Redis is not averse to treating strings as bytestrings in the client. This hints at a slightly more efficient way of storing unsigned longs than as their string representation.
What is the best way to store unsigned longs in sorted sets?
Thanks to Andre for his answer. Here are my findings.
Storing ints directly
Redis keys must be strings. If you want to pass an integer, it has to be some kind of string. For small, well-defined sets of values, Redis will parse the string into an integer, if it is one. My guess is that it will use this int to tailor its hash function (or even statically dimension a hash table based on the value). This works for small values (examples being the default values of 64 entries of a value of up to 512). I will test for larger values during my investigation.
http://redis.io/topics/memory-optimization
Storing as strings
The alternative is squashing the integer so it looks like a string.
It looks like it is possible to use any byte string as a key.
For my application's case it actually didn't make that much difference storing the strings or the integers. I imagine that the structure in Redis undergoes some kind of alignment anyway, so there may be some pre-wasted bytes anyway. The value is hashed in any case.
Using Python for my testing, so I was able to create the values using the struct.pack. long longs weigh in at 8 bytes, which is quite large. Given the distribution of integer values, I discovered that it could actually be advantageous to store the strings, especially when coded in hex.
As redis strings are "Pascal-style":
struct sdshdr {
long len;
long free;
char buf[];
};
and given that we can store anything in there, I did a bit of extra Python to code the type into the shortest possible type:
def do_pack(prefix, number):
"""
Pack the number into the best possible string. With a prefix char.
"""
# char
if number < (1 << 8*1):
return pack("!cB", prefix, number)
# ushort
elif number < (1 << 8*2):
return pack("!cH", prefix, number)
# uint
elif number < (1 << 8*4):
return pack("!cI", prefix, number)
# ulonglong
elif number < (1 << 8*8):
return pack("!cQ", prefix, number)
This appears to make an insignificant saving (or none at all). Probably due to struct padding in Redis. This also drives Python CPU through the roof, making it somewhat unattractive.
The data I was working with was 200000 zsets of consecutive integer => (weight, random integer) × 100, plus some inverted index (based on random data). dbsize yields 1,200,001 keys.
Final memory use of server: 1.28 GB RAM, 1.32 Virtual. Various tweaks made a difference of no more than 10 megabytes either way.
So my conclusion:
Don't bother encoding into fixed-size data types. Just store the integer as a string, in hex if you want. It won't make all that much difference.
References:
http://docs.python.org/library/struct.html
http://redis.io/topics/internals-sds
I'm not sure of this answer, it's more of a suggestion than anything else. I'd have to give it a try and see if it works.
As far as I can tell, Redis only supports UTF-8 strings.
I would suggest grabbing a bit representation of your long integer and pad it accordingly to fill up the nearest byte. Encode each set of 8 bytes to a UTF-8 string (ending up with 8x*utf8_char* string) and store that in Redis. The fact that they're unsigned means that you don't care about that first bit but if you did, you could add a flag to the string.
Upon retrieving the data, you have to remember to pad each character to 8 bytes again as UTF-8 will use less bytes for the representation if the character can be stored with less bytes.
End result is that you store a maximum of 8 x 8 byte characters instead of (possibly) a maximum of 64 x 8 byte characters.