Store a single byte in a protobuf message - serialization

What data type do I use to store a single byte in a protocol buffer message? Seeing the list at https://developers.google.com/protocol-buffers/docs/proto#scalar it seems like one of the *int32 types are the best fit. Is there a more efficient way to store a single byte?

Well you need to understand that it will take at least two bytes anyway - one for the tag and one for the data. (The tag will take more space if the field number is high.) If you use uint32, it will take 1 byte for the data for values up to 127, and 2 bytes for anything larger.
I don't believe there's anything that will be more efficient than that.

Related

Elm: Search Number in Bytes

I'm trying to find some exif data in an image.
So first I need to find the number 0x45786966 ('Exif' as unsignedInt32) and store the offset.
The next two bytes should be zeros and after that the endianness as unsignedInt16 (either 0x4d4d or 0x4949) which should be stored too.
I can get the image as Bytes with the elm/file module.
But how do I search the 'Exif' start and parse the endianness in those Bytes?
I looked at the loop-example from elm/bytes but do not fully understand it.
First it reads the length of a list (unsignedInt32) and then it reads byte by byte?
How would this work if I want to read unsignedInt32s instead of bytes?
How do I set an offset to indicate where functions like unsignedInt32 should read next?
The example is talking about structured data with a known size field at the start. In your case, what you want to do is a search, so it is a rather different problem.
The problem is elm/bytes isn't really designed to handle searching. If you can guarantee the part you are looking for will be byte aligned, it may well be possible to do this, but given just what you have said, there isn't an easy way, as you can't iterate bit-by-bit.
You would have to read in values without alignment and then manually search for the part of the number you want within that. Given the difficulty and inefficiency of that approach, I would recommend using ports instead for that use case.
If you can guarantee that what you are searching for will be byte-aligned (or better yet, aligned to the length of your number), you can decode a byte at a time until you find what you are looking for. There is no way to read from a given offset, if you want to read to a certain point, you'd need to read and throw away values.
To do this, you would want to set up a loop where your state contains how much of the value you are looking for you have found. Each step, you check if you have the whole thing (success), you have the next part (continue), or you have something different (reset the state to search from the start again). If you reach the end without finding it, you have failed.

Writing bitarray to file vb.net

doing this in byte is easy with filestream, but I cant get it to work with a bitarray.
I want to develop file compression algorithms, just as a hobby.
The method in question checks if the combined occurence of three combination of bytes occur enough times for the filesize to benefit from adding two bits at the beginning of every byte, as to whether the next byte is one of the three pre-stored bytes, if both bits are zero it assumes no, and just continues reading the file. now writing another byte for each byte would make all of this redundant.
If someone could tell me how to do this, I would much appreciate it.
As I want to do it, it is not possible. Extra bits are inevitable as the smallest unit which by the hard drive can be written to is a byte. the extra bits have to be dealt with in the logic of decompression. a bulletproof solution is to add some extra bits at the beginning of the file (or wherever the metadata is stored for the filecompression) which represent the cutoff in the last byte. 3 bits is enough as you can represent the number 7 with it. (obviously there isn't a whole extra unnecessery byte, therefor if the compressed file is still divisible by eight these three bits should be 000 or numerically a zero, 8 does not need to be written) this way once the last byte is being read the program should ignore as many bits as the three bits equal to.

Is varchar(128) better than varchar(100)

Quick question. Does it matter from the point of storing data if I will use decimal field limits or hexadecimal (say 16,32,64 instead of 10,20,50)?
I ask because I wonder if this will have anything to do with clusters on HDD?
Thanks!
VARCHAR(128) is better than VARCHAR(100) if you need to store strings longer than 100 bytes.
Otherwise, there is very little to choose between them; you should choose the one that better fits the maximum length of the data you might need to store. You won't be able to measure the performance difference between them. All else apart, the DBMS probably only stores the data you send, so if your average string is, say, 16 bytes, it will only use 16 (or, more likely, 17 - allowing 1 byte for storing the length) bytes on disk. The bigger size might affect the calculation of how many rows can fit on a page - detrimentally. So choosing the smallest size that is adequate makes sense - waste not, want not.
So, in summary, there is precious little difference between the two in terms of performance or disk usage, and aligning to convenient binary boundaries doesn't really make a difference.
If it would be a C-Program I'd spend some time to think about that, too. But with a database I'd leave it to the DB engine.
DB programmers spent a lot of time in thinking about the best memory layout, so just tell the database what you need and it will store the data in a way that suits the DB engine best (usually).
If you want to align your data, you'll need exact knowledge of the internal data organization: How is the string stored? One, two or 4 bytes to store the length? Is it stored as plain byte sequence or encoded in UTF-8 UTF-16 UTF-32? Does the DB need extra bytes to identify NULL or > MAXINT values? Maybe the string is stored as a NUL-terminated byte sequence - then one byte more is needed internally.
Also with VARCHAR it is not neccessary true, that the DB will always allocate 100 (128) bytes for your string. Maybe it stores just a pointer to where space for the actual data is.
So I'd strongly suggest to use VARCHAR(100) if that is your requirement. If the DB decides to align it somehow there's room for extra internal data, too.
Other way around: Let's assume you use VARCHAR(128) and all things come together: The DB allocates 128 bytes for your data. Additionally it needs 2 bytes more to store the actual string length - makes 130 bytes - and then it could be that the DB aligns the data to the next (let's say 32 byte) boundary: The actual data needed on the disk is now 160 bytes 8-}
Yes but it's not that simple. Sometimes 128 can be better than 100 and sometimes, it's the other way around.
So what is going on? varchar only allocates space as necessary so if you store hello world in a varchar(100) it will take exactly the same amount of space as in a varchar(128).
The question is: If you fill up the rows, will you hit a "block" limit/boundary or not?
Databases store their data in blocks. These have a fixed size, for example 512 (this value can be configured for some databases). So the question is: How many blocks does the DB have to read to fetch each row? Rows that span several block will need more I/O, so this will slow you down.
But again: This doesn't depend on the theoretical maximum size of the columns but on a) how many columns you have (each column needs a little bit of space even when it's empty or null), b) how many fixed width columns you have (number/decimal, char), and finally c) how much data you have in variable columns.

[My]SQL VARCHAR Size and Null-Termination

Disclaimer: I'm very new to SQL and databases in general.
I need to create a field that will store a maximum of 32 characters of text data. Does "VARCHAR(32)" mean that I have exactly 32 characters for my data? Do I need to reserve an extra character for null-termination?
I conducted a simple test and it seems that this is a WYSIWYG buffer. However, I wanted to get a concrete answer from people who actually know what they're doing.
I have a C[++] background, so this question is raising alarm bells in my head.
Yes, you have 32 characters at your disposal. SQL does not concern itself with nul terminated strings like some programming languages do.
Your VARCHAR specification size is the max size of your data, so in this case, 32 characters. However, VARCHARS are a dynamic field, so the actual physical storage used is only the size of your data, plus one or two bytes.
If you put a 10-character string into a VARCHAR(32), the physical storage will be 11 or 12 bytes (the manual will tell you the exact formula).
However, when MySQL is dealing with result sets (ie. after a SELECT), 32 bytes will be allocated in memory for that field for every record.

in sql,How does fixed-length data type take place in memory?

I want to know in sql,how fixed-length data type take places length in memory?I know is that for varchar,if we specify length is (20),and if user input length is 15,it takes 20 by setting space.for varchar2,if we specify length is (20),and if user input is 15,it only take 15 length in memory.So how about fixed-length data type take place?I searched in Google,but I did not find explanation with example.Please explain me with example.Thanks in advance.
A fixed length data field always consumes its full size.
In the old days (FORTRAN), it was padded at the end with space characters. Modern databases might do that too, but either implicitly trim trailing blanks off or the query might have to do it explicitly.
Variable length fields are a relative newcomer to databases, probably in the 1970s or 1980s they made widespread appearances.
It is considerably easier to manage fixed length record offsets and sizes rather than compute the offset of each data item in a record which has variable length fields. Furthermore, a fixed length data record is easily addressed in a data file by computing the byte offset of its beginning by multiplying the record size times the record number (and adding the length of whatever fixed header data is at the beginning of file).