Struct of Arrays in flatbuffer? - flatbuffers

Let's say I have the following flatbuffer IDL file:
table Monster {
mana:short = 150;
inventory:[ubyte]; // Vector of scalars.
}
And that I want to serialize an array of 2 Monster objects in a buffer.
Apparently it is possible to create the following memory layout for the overall buffer while serializing the data:
ArrayOfUBytesForInventoryOfMonster1|ArrayOfUBytesForInventoryOfMonster2|Monster1Data|Monster2Data
Which means that now all the inventory fields lay in a contiguous memory location.
However is it possible to also do this on the mana field?
ie I want to serialize my objects with this memory representation:
ArrayOfUBytesForInventoryOfMonster1|ArrayOfUBytesForInventoryOfMonster2|Monster1ManaValue|Monster2ManaValue|Monster1Data|Monster2Data.
Which has the effect of transforming all the "mana" values into a raw array in memory.
Is it possible to do this with Flatbuffers? It seems that fields can be only be serialized after the start of the object itself

Neither will work in the way you indicated. Scalar fields like mana are always inline in the table, so will never be contiguous with similar fields. Even vectors like inventory are prefixed by a size field, so their elements are not contiguous, even though they can be adjacent since they are not inline.
If you want contiguous data, you'll explicitly have to write out a single vector of such values.

Related

Why flatbuffer struct fields can not be vector/table/string?

Structs may only contain scalars or other structs. But as I konw , only offset is writed to buffer when writing vector/table/string field into table data (the data of vector/table/string is wirited before table data). So struct contains vector/table/string field still can has fix size. Why flatbuffer do the limit that struct can only contain scalars or other structs?
The idea behind a struct is that it is a self-contained piece of memory that always has the same layout and size and as such can easily be copied around by itself, especially in languages that support such types natively, like C/C++/Rust etc.
If it could contain strings, then it would be at least two pieces of memory, whose distance and size would be variable, and thus not efficient to copy and easy to manage. We have table for such cases.
If you must have a vector or string inside a struct, some languages already support an "array" type, which is of fixed length. You could put that, plus a length field in a struct to emulate vectors and strings, of course with the downside that the space allocated for them is always the same.

"Extension" of the first Data Unit

I'm starting to study the FITS format and I'm in the proccess of reading the Definition of FITS document.
I know that a FITS file can have one or more HDUs, the primary being the first one and the extensions being the following ones (if there is more than one HDU), I also know that for the extensions there is a mandatory keyword in the header (XTENSION) that let us know if the Data Unit is an Image, Binary Table or ASCII Table, but how can I know what is the Data Type (Image, Binary Table or ASCII Table) of the first HDU?
I don't understand why XTENSION isn't a mandatory keyword in the primary header.
The "type" of the PRIMARY HDU is essentially IMAGE in most cases. From v3.0 of the standard:
3.3.2. Primary data array
The primary data array, if present, shall consist of a single data
array with from 1 to 999 dimensions (as specified by the NAXIS
keyword defined in Sect. 4.4.1). The random groups convention
in the primary data array is a more complicated structure and
is discussed separately in Sect. 6. The entire array of data values
are represented by a continuous stream of bits starting with
the first bit of the first data block. Each data value shall consist
of a fixed number of bits that is determined by the value of
the BITPIX keyword (Sect. 4.4.1). Arrays of more than one dimension
shall consist of a sequence such that the index along
axis 1 varies most rapidly, that along axis 2 next most rapidly,
and those along subsequent axes progressively less rapidly, with that along axis m, where m is the value of NAXIS, varying least
rapidly. There is no space or any other special character between
the last value on a row or plane and the first value on the next
row or plane of a multi-dimensional array. Except for the location
of the first element, the array structure is independent of the
FITS block structure. This storage order is shown schematically
in Fig. 1 and is the same order as in multi-dimensional arrays in
the Fortran programming language (ISO 2004). The index count
along each axis shall begin with 1 and increment by 1 up to the
value of the NAXISn keyword (Sect. 4.4.1).
If the data array does not fill the final data block, the remainder
of the data block shall be filled by setting all bits to zero.
The individual data values shall be stored in big-endian byte order
such that the byte containing the most significant bits of the
value appears first in the FITS file, followed by the remaining
bytes, if any, in decreasing order of significance.
Though it isn't until later on (in section 7.1) that it makes this connection:
7.1. Image extension
The FITS image extension is nearly identical in structure to the
the primary HDU and is used to store an array of data. Multiple
image extensions can be used to store any number of arrays in a
single FITS file. The first keyword in an image extension shall
be XTENSION= ’IMAGE ’.
It isn't immediately apparent what it means by "nearly identical" here. I guess the only difference is that the PRIMARY HDU may also have the aformentioned "random groups" structure, whereas with IMAGE extension HDU's PCOUNT is always 0 and GCOUNT is always 1.
You'll only rarely see the "random groups" convention. This is sort of a precursor to the BINTABLE format. It was used traditionally in radio interferometry data, but hardly at all outside that.
The reason for all this is for backwards compatibility with older versions of FITS that predate even the existence of extension HDUs. Many FITS-based formats don't put any data in the PRIMARY HDU and use the primary header only for metadata keywords that pertain to the entire file (e.g. most HST data).

Getting the size of each dimension of multidimensional array

Let's say I want to write a function to "unpack" (store or log, perhaps) a multidimensional array using nested loops. The concept is simple enough, provided I'm able to determine, in the case of a 3D array, the length, width and height of the array.
In Objective-C, is there some way to, after being passed a multidimensional array of unknown size as a method argument, determine what those dimension sizes are? Then it'd be a simple matter of using, as stated, nested for loops.
NSArray is inherently one dimensional, a vector if you like, and as noted by John it has a count property. To simulate a multidimensional array you could of course have a NSArray *rows that contains a number of NSAarray *columnElement and so forth. This is not really a multi diensional array and more like a tree type structure but you could sort of call it an array. But I do not think that is what you are asking.
I think you are thinking of a C style buffer of memory traversed by pointer references. This is also inherently a one dimensional structure as memeory adressing is one dimensional. In many cases such a buffer can be viewed as 2 or more dimensional in such way that you say every 50 bytes is a row, so position 0 is row 1 col 1, position 49 is row 1 col 50 and position 50 is row 2 col 1 etc.
In this case as it is you as the designer who have defined this interpretation of the buffer as a two-dimensional array there cannot be a way to derive the structure from the buffer alone. Either you have to store the arrangement of the buffer as metadata in other variables or impose some form of delimeter characters in the buffer, e.g. newline for new row. and comma for new column element or similar (but then it is no longer an array in my opinion it is a file format, here a csv file).

Keeping an array sorted - at setting, getting or later?

As an aid to learning objective c/oop, I'm designing an iOS app to store and display periodic bodyweight measurements. I've got a singleton which returns a mutablearray of the shared store of measurement object. Each measurement will have at least a date and a body weight, and I want to be able to add historic measurements.
I'd like to display the measurements in date order. What's the best way to do this? As far as I can see the options are as follows: 1) when adding a measurement - I override addobject to sort the shared store every time after a measurement is added, 2) when retrieving the mutablearray I sort it, or 3) I retrieve the mutablearray in whatever order it happens to be in the shared store, then sort it when displaying the table/chart.
It's likely that the data will be retrieved more frequently than a new datum is added, so option 1 will reduce redundant sorting of the shared store - so this is the best way, yes?
You can use a modified version of (1). Instead of sorting the complete array each time a new object is inserted, you use the method described here: https://stackoverflow.com/a/8180369/1187415 to insert the new object into the array at the correct place.
Then for each insert you have only a binary search to find the correct index for the new object, and the array is always in correct order.
Since you said that the data is more frequently retrieved than new data is added, this seems to be more efficient.
If I forget your special case, this question is not so easy to answer. There are two basic solutions:
Keep array unsorted and when you try to access the element and array is not sorted, then sort it. Let's call it "lazy sorting".
Keep array sorted when inserting elements. Note this is not about appending new element at the end and then sort the whole array. This is about finding where the element should be (binary search) and place it there. Let's call it "sorted insert".
Both techniques are correct and useful and deciding which one is better depends on your use cases.
Example:
You want to insert hundreds of elements into the array, then access the elements, then again insert hundreds of elements, then access. In summary, you will be inserting values in big chunks. In this case, lazy sorting will be better.
You will often insert individual elements and you will access the elements often. Then sorted insert will have better performance.
Something in the middle (between inserting 1 and inserting tens of elements). You probably don't care which one of the methods will be used.
(Note that you can use also specialized structures to keep an array sorted, not based on NSArray, e.g. structures based on a balanced tree, while keeping number of elements in the subtree).

Better to use size or count on collection?

When counting a collection. Is it better to do it via size or count?
Size = Ruby (#foobars.size)
Count = SQL (#foobars.count)
I also notice, count makes another trip to the db.
I tend to suggest using size for everything, just because it's safer. People make fewer silly mistakes using size.
Here's how they work:
length: length will return the number of elements from an array, or otherwise loaded collection - the key point is that the collection will be loaded here regardless. So if you're working with an activerecord association, it will pull the elements from the DB to memory, and then return the number.
count: count issues a database query, so if you have an array already it's a pointless call to your database.
size: best of both worlds - size checks which type you're using and then uses whichever seems more appropriate (so if you have an array, it will use length; if you have an unretrieved ActiveRecord::Association it will use count, and so on).
Source:
http://blog.hasmanythrough.com/2008/2/27/count-length-size/
It depends on the situation. In the example you show I would go with size since you already have the collection loaded and a call to size will just check the length of the array. As you noticed, count will do an extra db query and you really want to avoid that.
However, in the scenario that you only want to display the number of Foobars and not show those objects, then I would go with count because it will not load the instances into memory, just return the number of records.