Structure Packing - structure

I'm currently learning C# and my first project (as a learning experiment) is to create a DBF reader. I'm having some difficulty understanding "packing" according to this: http://www.developerfusion.com/pix/articleimages/dec05/structs1.jpg
If I specified a packing of 2, wouldn't all structure elements begin on a 2-byte boundary, and if I specified a packing of 4, wouldn't all structure elements begin on a 4-byte boundary, and also consume a minimum of 4 bytes each?
For instance, a byte element would be placed on a 4 byte boundary, and the element following it (in a sequential layout) would be located on the next 4-byte boundary (losing 3 bytes to padding)?
In the image shown, in the "pack=4" it shows a byte that is on a 2 byte boundary, following a short.

If I understand the picture correctly, pack equal to n means that one variable cannot be stored "between" two packs of lengths n. In other words, bytes which compose a variable cannot cross one pack's boundary. This is only true if the size of a variable is less or equal to the size of a pack.
Let's take Pack = 4 as an example. Here, we can safely store a byte and a short in one pack, because they require 3 bytes of memory together. But since there is only one byte in the pack left, it requires one byte of padding to be able to store an int into the data structure, because what's left in the pack is too little to store the whole int.
I hope the explanation makes sense.
Looking at the picture again, I think it would be better if all data were aligned to the same side of a pack, either to bottom or top. This would make it clearer what's going on.

Related

To deploy a Tiny ML model that I created via Colab Google

When I compile the code (on arduino) I get the following error:
8 bytes lost due to alignment. To avoid this loss, please make sure the tensor_arena is 16 bytes aligned.
constexpr int tensorArenaSize = 8 * 1024;
byte tensorArena[tensorArenaSize];
Someone can help me to fix this problem?
For reasons unbeknownst to me, the compiler wants to make sure your large byte array is 16-byte-aligned. Because of variables already declared above the two lines you included, it needs to "move forward" the Large Array by 8 bytes, to make it start at an address that is on a 16-byte boundary. To fix the error (to me this should just be a warning) either add a dummy 8-byte variable before your Large Array, or move 8-byte worth of variables from before your Large Array to after it. In the first case you just lose 8 bytes of variable space.

How to manipulate bits in Smalltalk?

I am currently working on a file compressor based on Huffman decoding. So I have a decoding tree like so:
and I have to encode this tree on an output file by following a certain criteria:
"for each leaf, write out a 0 bit, followed by the 8 bits of
the corresponding character. Write out the bits in the order bit 7, bit 6, . . ., bit 0, that is high bit first. As a special case, if the byte is 0, write out bit 8, which will be a 0 for a byte value of 0, and 1 for a byte value of 256 (the EOF marker)." For an internal node, just write a bit 1.
So what I plan to do is to create a bit array and add to it the corresponding bits in the specified format. The problem is that I don't know how to convert a number to binary in smalltalk.
For example, if I want to encode the first leaf, I would want to do something like 01101011 i.e 0 followed by the bit representation of k and then add every bit one by one into the array.
I don't know which dialect you are using exactly, but generally, you can access the bits of Integer. They are modelled as if the representation was in two-complement, with an infinite sequence of bits.
2 is ....0000000000010
1 is ....0000000000001
0 is ....0000000000000 with infinitely many 0 on the left
-1 is ....1111111111111 with infinitely many 1 on the left
-2 is ....1111111111110
This is also true for LargeIntegers, even though they are generally implemented as sign magnitude (the class encodes the sign), two-complement will be emulated.
Then you can operate with bitAnd: bitOr: bitXor: bitInvert bitShift:, and in some flavours bitAt:put:
You can access the bits with (2 bitAt: index) where the index starts at 1 for least significant bit, or grows higher. If it's missing, implement it with bitAnd: and bitShift:...
For positive, you can ask for the rank of high bit (2 highBit).
All these operations should create a new integer (there's no in place modification possible).
Conceptually, a ByteArray is a collection of unsigned integers on 8 bits (between 0 and 255), so you can implement a bit Array with them (if it does not already exist in the dialect). Or you can use an Integer (but won't be able to control size which will be infinite, nor in place mofifications, operations will cost a copy).

What does PACK8/16/32 mean in VkFormat names?

I'm trying to understand the names of the items in the VkFormat enum, and so far I think I get all the structure of the names of all of the (non-block) formats, but I can't figure out what it means when they have a suffix of PACK8, PACK16, PACK32. If I add up the channel sizes, they always add up to 8, 16, or 32, nothing irregular, so I don't understand what it would mean to bit-pack these values, since they seem to be 100% efficient, using all their bits.
As usual, the documentation is not very helpful, just saying the format is packed without saying what that means.
The PACK fields mean exactly what the specification says they mean:
whole texels or attributes are stored in a single data element, rather than individual components occupying a single data element
Though if you find that too confusing, you could just look at the actual format descriptions. Vulkan goes into excruciating detail about them, to the point of needless repetition.
The difference between VK_FORMAT_B8G8R8A8_RGB and VK_FORMAT_B8G8R8A8_RGB_PACK32 is the same difference between a uint8_t[4] and a uint32_t. One is an array ("individual components"), while the other is a single value ("single data element") made up of smaller values.
If you have a uint8_t color[4] array, which stores B8G8R8A8, then color[0] stores the blue component. The order of the components in the array is defined by the order of the components in the format's name.
If you have a uint32_t color value, which stores B8G8R8A8, then (color & 0xFF000000) >> 24 will retrieve the blue component. The highest byte is the first, followed by the next highest and so forth.
The reason the packed-vs-not-packed distinction matters is because of endian issues. Arrays of bytes don't have endian issues. But values packed into 16 or 32-bits do have endian issues. The endian of the packed formats is always assumed to be the native endian of the host.

Does the "C" code algorithm in RFC1071 work well on big-endian machine?

As described in RFC1071, an extra 0-byte should be added to the last byte when calculating checksum in the situation of odd count of bytes:
But in the "C" code algorithm, only the last byte is added:
The above code does work on little-endian machine where [Z,0] equals Z, but I think there's some problem on big-endian one where [Z,0] equals Z*256.
So I wonder whether the example "C" code in RFC1071 only works on little-endian machine?
-------------New Added---------------
There's one more example of "breaking the sum into two groups" described in RFC1071:
We can just take the data here (addr[]={0x00, 0x01, 0xf2}) for example:
Here, "standard" represents the situation described in the formula [2], while "C-code" representing the C code algorithm situation.
As we can see, in "standard" situation, the final sum is f201 regardless of endian-difference since there's no endian-issue with the abstract form of [Z,0] after "Swap". But it matters in "C-code" situation because f2 is always the low-byte whether in big-endian or in little-endian.
Thus, the checksum is variable with the same data(addr&count) on different endian.
I think you're right. The code in the RFC adds the last byte in as low-order, regardless of whether it is on a litte-endian or big-endian machine.
In these examples of code on the web we see they have taken special care with the last byte:
https://github.com/sjaeckel/wireshark/blob/master/epan/in_cksum.c
and in
http://www.opensource.apple.com/source/tcpdump/tcpdump-23/tcpdump/print-ip.c
it does this:
if (nleft == 1)
sum += htons(*(u_char *)w<<8);
Which means that this text in the RFC is incorrect:
Therefore, the sum may be calculated in exactly the same way
regardless of the byte order ("big-endian" or "little-endian")
of the underlaying hardware. For example, assume a "little-
endian" machine summing data that is stored in memory in network
("big-endian") order. Fetching each 16-bit word will swap
bytes, resulting in the sum; however, storing the result
back into memory will swap the sum back into network byte order.
The following code in place of the original odd byte handling is portable (i.e. will work on both big- and little-endian machines), and doesn't depend on an external function:
if (count > 0)
{
char buf2[2] = {*addr, 0};
sum += *(unsigned short *)buf2;
}
(Assumes addr is char * or const char *).

Converting meshes to metaballs

I'm doing a project where I need to convert an existing polygonal mesh into a static shape made from metaballs (blobs). I have voxelized the mesh with binvox to "a .raw file" (according to the description at binvox), but I have no clue of how it stores the data, and therefore don't know how to load it.
Question1: Is there any non PHD way to do so? Create a metaball model from a polygonal mesh.
Question2: Has anyone ever used the said .raw file format from binvox and if you did, how?
RLE Run length Encoding
The binary voxel data
The binary data consists of pairs of bytes. The first byte of each pair is the value byte and is either 0 or 1 (1 signifies the presence of a voxel). The second byte is the count byte and specifies how many times the preceding voxel value should be repeated (so obviously the minimum count is 1, and the maximum is 255).
http://www.cs.princeton.edu/~min/binvox/binvox.html