I'm parsing a FlatBuffers binary file and create POJOs from the data (Plain Old Javascript Objects) for some reason.
In my .fbs file I have for example a geo_id defined as follows:
table Properties {
geo_id:long;
tags:[TagIndex];
}
In the javascript in my HTML I create a POJO feature object lie this:
function createFeature(rawFeature, cell) {
var feature = {
id: rawFeature.properties().geoId(),
geometry: null,
properties: {}
}
return feature;
}
My expectation was that I get a plain number (long), but I'm getting an object with "low" and "high" where "low" seems to be the id. Though I'm a bit confused and would like to know the proper way to convert this data to plain old variables.
A long is a 64-bit number which can't be represented natively in JS. To preserve the information, it is represented as 2 32-bit numbers.
If you are really using all the bits in a long, then there's no way to convert it to a single number safely. If the high bits are all 0, then you could use just the low bits to represent it as a single number in JS.
You may want to use int in your schema for convenience if you don't need the high bits.
Related
I am considering using flatbuffers' serialized struct as a key in a key-value store. Here is an example of the structs that I want to use as a key in rocksdb.
struct Foo {
foo_id: int64;
foo_type: int32;
}
I read the documentation and figured that the layout of a struct is deterministic. Does that mean it is suitable to be used as a key? If yes, how do I serialize a struct and deserialize it back. It seems like Table has API for serialization/deserialization but struct does not (?).
I tried serializing struct doing it as follows:
constexpr int key_size = sizeof(Foo);
using FooKey = std::array<char, key_size>;
FooKey get_foo_key(const Foo& foo_object) {
FooKey key;
std::memcpy(&key, &foo_object, key_size);
return key;
}
const Foo* get_foo(const FooKey& key) {
return reinterpret_cast<const Foo*>(&key);
}
I did some sanity checks and the above seems to work in my Ubuntu 18 docker image and is blazing fast. So my questions are as follows:
Is this a safe thing to do on a machine if it passes FLATBUFFERS_LITTLEENDIAN and uint8/char equivalence checks? Or are there any other checks needed?
Are there any other caveats that I should be aware of when doing it as demonstrated above?
Thanks in advance !
You don't actually need to go via std::array, the Foo struct is already a block of memory that is safe to copy or cast as you wish. It needs no serialization functions.
Like you said, that memory contains little endian data, so FLATBUFFERS_LITTLEENDIAN must pass. Actually even on a big endian machine you may copy these structures all you want, as long as you use the accessors to read the fields (which do a byteswap on access on big endian). The only thing that won't work on big endian is casting the struct to, say, an int64_t * to read the first field without using the accessor methods.
The other caveat to certain casting operations is strict aliasing, if you have that turned on certain casts may be undefined behavior.
Also note that in this example Foo will be 16 bytes in size on all platforms, because of alignment.
I am currently sending data between my PC and an ARM M4 Microcontroller via UART. I've defined my own protocol where each message looks like this:
[START_CHAR LEN TYPE SEQ DATA CRC]
The START_CHAR and LEN fields help me determine when the data ends, after which I look up the TYPE (constant offset of 3) to figure out what data came in order to unpack it into a message class.
Now I'm looking into flatbuffers and it seems perfect except that I cannot encode the TYPE into the message without including it inside the actual message. Here is what I am trying to do:
namespace FlatMessage;
uint8 const TYPE = 50; // does not compile
table String {
value:string;
}
root_type String;
I could create an Enum but that is messy. Thank you!
[EDIT] I should add that I could just change the protocol to have an END_CHAR but I need to support the TYPE field for legacy reasons.
Well actually, I suppose I would still need the type to figure out how to deserialize it as a flatbuffer.
e.g.
uint8_t *buf = builder.GetBufferPointer(); // I can do this with END_CHAR because I could get the buffer.
auto receive_string = GetString(buf); // But I wouldn't know what the type is. e.g. this could be GetCoolString(buf).
You have a couple of options to store a type with a FlatBuffer:
Prefix a buffer yourself with a type.
Use the file_identifier feature of FlatBuffers, to make it possible to identify the type of FlatBuffer.
Store the type in FlatBuffers itself, by using a union type. Make the root table have a single union field.
One of the more notable aspects of Go when coming from C is that the compiler will not build your program if there is an unused variable declared inside of it. So why, then, is this program building if there is an unused parameter declared in a function?
func main() {
print(computron(3, -3));
}
func computron(param_a int, param_b int) int {
return 3 * param_a;
}
There's no official reason, but the reason given on golang-nuts is:
Unused variables are always a programming error, whereas it is common
to write a function that doesn't use all of its arguments.
One could leave those arguments unnamed (using _), but then that might
confuse with functions like
func foo(_ string, _ int) // what's this supposed to do?
The names, even if they're unused, provide important documentation.
Andrew
https://groups.google.com/forum/#!topic/golang-nuts/q09H61oxwWw
Sometimes having unused parameters is important for satisfying interfaces, one example might be a function that operates on a weighted graph. If you want to implement a graph with a uniform cost across all edges, it's useless to consider the nodes:
func (graph *MyGraph) Distance(node1,node2 Node) int {
return 1
}
As that thread notes, there is a valid argument to only allow parameters named as _ if they're unused (e.g. Distance(_,_ Node)), but at this point it's too late due to the Go 1 future-compatibility guarantee. As also mentioned, a possible objection to that anyway is that parameters, even if unused, can implicitly provide documentation.
In short: there's no concrete, specific answer, other than that they simply made an ultimately arbitrary (but still educated) determination that unused parameters are more important and useful than unused local variables and imports. If there was once a strong design reason, it's not documented anywhere.
The main reason is to be able to implement interfaces that dictate specific methods with specific parameters, even if you don't use all of them in your implementation. This is detailed in #Jsor's answer.
Another good reason is that unused (local) variables are often the result of a bug or the use of a language feature (e.g. use of short variable declaration := in a block, unintentionally shadowing an "outer" variable) while unused function parameters never (or very rarely) are the result of a bug.
Another reason can be to provide forward compatibility. If you release a library, you can't change or extend the parameter list without breaking backward compatibility (and in Go there is no function overloading: if you want 2 variants with different parameters, their names must be different too).
You may provide an exported function or method and add extra - not yet used - or optional parameters (e.g. hints) to it in the spirit that you may use them in a future version / release of your library.
Doing so early will give you the benefit that others using your library won't have to change anything in their code.
Let's see an example:
You want to create a formatting function:
// FormatSize formats the specified size (bytes) to a string.
func FormatSize(size int) string {
return fmt.Sprintf("%d bytes", size)
}
You may as well add an extra parameter right away:
// FormatSize formats the specified size (bytes) to a string.
// flags can be used to alter the output format. Not yet used.
func FormatSize(size int, flags int) string {
return fmt.Sprintf("%d bytes", size)
}
Then later you may improve your library and your FormatSize() function to support the following formatting flags:
const (
FlagAutoUnit = 1 << iota // Automatically format as KB, MB, GB etc.
FlagSI // Use SI conversion (1000 instead of 1024)
FlagGroupDecimals // Format number using decimal grouping
)
// FormatSize formats the specified size (bytes) to a string.
// flags can be used to alter the output format.
func FormatSize(size int, flags int) string {
var s string
// Check flags and format accordingly
// ...
return s
}
I'm implementing a dynamic language that will compile to C#, and it's implementing its own reflection API (.NET's is too slow, and the DLR is limited only to more recent and resourceful implementations).
For this, I've implemented a simple .GetField(string f) and .SetField(string f, object val) interface. Until recently, the implementation just switches over all possible field string values and makes the corresponding action.
Also, this dynamic language has the possibility to define anonymous objects. For those anonymous objects, at first, I had implemented a simple hash algorithm.
By now, I am looking for ways to optimize the dynamic parts of the language, and I have come across the fact that a hash algorithm for anonymous objects would be overkill. This is because the objects are usually small. I'd say the objects contain 2 or 3 fields, normally. Very rarely, they would contain more than 15 fields. It would take more time to actually hash the string and perform the lookup than if I would test for equality between them all. (This is not tested, just theoretical).
The first thing I did was to -- at compile-time -- create a red-black tree for each anonymous object declaration and have it laid onto an array so that the object can look for it in a very optimized way.
I am still divided, though, if that's the best way to do this. I could go for a perfect hashing function. Even more radically, I'm thinking about dropping the need for strings and actually work with a struct of 2 longs.
Those two longs will be encoded to support 10 chars (A-za-z0-9_) each, which is mostly a good prediction of the size of the fields. For fields larger than this, a special function (slower) receiving a string will also be provided.
The result will be that strings will be inlined (not references), and their comparisons will be as cheap as a long comparison.
Anyway, it's a little hard to find good information about this kind of optimization, since this is normally thought on a vm-level, not a static language compilation implementation.
Does anyone have any thoughts or tips about the best data structure to handle dynamic calls?
Edit:
For now, I'm really going with the string as long representation and a linear binary tree lookup.
I don't know if this is helpful, but I'll chuck it out in case;
If this is compiling to C#, do you know the complete list of fields at compile time? So as an idea, if your code reads
// dynamic
myObject.foo = "some value";
myObject.bar = 32;
then during the parse, your symbol table can build an int for each field name;
// parsing code
symbols[0] == "foo"
symbols[1] == "bar"
then generate code using arrays or lists;
// generated c#
runtimeObject[0] = "some value"; // assign myobject.foo
runtimeObject[1] = 32; // assign myobject.bar
and build up reflection as a separate array;
runtimeObject.FieldNames[0] == "foo"; // Dictionary<int, string>
runtimeObject.FieldIds["foo"] === 0; // Dictionary<string, int>
As I say, thrown out in the hope it'll be useful. No idea if it will!
Since you are likely to be using the same field and method names repeatedly, something like string interning would work well to quickly generate keys for your hash tables. It would also make string equality comparisons constant-time.
For such a small data set (expected upper bounds of 15) I think almost any hashing will be more expensive then a tree or even a list lookup, but that is really dependent on your hashing algorithm.
If you want to use a dictionary/hash then you'll need to make sure the objects you use for the key return a hash code quickly (perhaps a single constant hash code that's built once). If you can prevent collisions inside of an object (sounds pretty doable) then you'll gain the speed and scalability (well for any realistic object/class size) of a hash table.
Something that comes to mind is Ruby's symbols and message passing. I believe Ruby's symbols act as a constant to just a memory reference. So comparison is constant, they are very lite, and you can use symbols like variables (I'm a little hazy on this and don't have a Ruby interpreter on this machine). Ruby's method "calling" really turns into message passing. Something like: obj.func(arg) turns into obj.send(:func, arg) (":func" is the symbol). I would imagine that symbol makes looking up the message handler (as I'll call it) inside the object pretty efficient since it's hash code most likely doesn't need to be calculated like most objects.
Perhaps something similar could be done in .NET.
I have a C structure that allow users to configure options in an embedded system. Currently the GUI we use for this is custom written for every different version of this configuration structure. What I'd like for is to be able to describe the structure members in some format that can be read by the client configuration application, making it universal across all of our systems.
I've experimented with describing the structure in XML and having the client read the file; this works in most cases except those where some of the fields have inter-dependencies. So the format that I use needs to have a way to specify these; for instance, member A must always be less than or equal to half of member B.
Thanks in advance for your thoughts and suggestions.
EDIT:
After reading the first reply I realized that my question is indeed a little too vague, so here's another attempt:
The embedded system needs to have access to the data as a C struct, running any other language on the processor is not an option. Basically, all I need is a way to define metadata with the structure, this metadata will be downloaded onto flash along with firmware. The client configuration utility will then read the metadata file over RS-232, CAN etc. and populate a window (a tree-view) that the user can then use to edit options.
The XML file that I mentioned tinkering with was doing exactly that, it contained the structure member name, data type, number of elements etc. The location of the member within the XML file implicitly defined its position in the C struct. This file resides on flash and is read by the configuration program; the only thing lacking is a way to define dependencies between structure fields.
The code is generated automatically using MATLAB / Simulink so I do have access to a scripting language to help with the structure creation. For example, if I do end up using XML the structure will only be defined in the XML format and I'll use a script to create the C structure during code generation.
Hope this is clearer.
For the simple case where there is either no relationship or a relationship with a single other field, you could add two fields to the structure: the "other" field number and a pointer to a function that compares the two. Then you'd need to create functions that compared two values and return true or false depending upon whether or not the relationship is met. Well, guess you'd need to create two functions that tested the relationship and the inverse of the relationship (i.e. if field 1 needs to be greater than field 2, then field 2 needs to be less than or equal to field 1). If you need to place more than one restriction on the range, you can store a pointer to a list of function/field pairs.
An alternative is to create a validation function for every field and call it when the field is changed. Obviously this function could be as complex as you wanted but might require more hand coding.
In theory you could generate the validation functions for either of the above techniques from the XML description that you described.
I would have expected you to get some answers by now, but let me see what I can do.
Your question is a bit vague, but it sounds like you want one of
Code generation
An embedded extension language
A hand coded run-time mini language
Code Generation
You say that you are currently hand tooling the configuration code each time you change this. I'm willing to bet that this is a highly repetitive task, so there is no reason that you can't write program to do it for you. Your generator should consume some domain specific language and emit c code and header files which you subsequently build into you application. An example of what I'm talking about here would be GNU gengetopt. There is nothing wrong with the idea of using xml for the input language.
Advantages:
the resulting code can be both fast and compact
there is no need for an interpreter running on the target platform
Disadvantages:
you have to write the generator
changing things requires a recompile
Extension Language
Tcl, python and other languages work well in conjunction with c code, and will allow you to specify the configuration behavior in a dynamic language rather than mucking around with c typing and strings and and and...
Advantages:
dynamic language probably means the configuration code is simpler
change configuration options without recompiling
Disadvantages:
you need the dynamic language running on the target platform
Mini language
You could write your own embedded mini-language.
Advantages:
No need to recompile
Because you write it it will run on your target
Disadvantages:
You have to write it yourself
How much does the struct change from version to version? When I did this kind of thing I hardcoded it into the PC app, which then worked out what the packet meant from the firmware version - but the only changes were usually an extra field added onto the end every couple of months.
I suppose I would use something like the following if I wanted to go down the metadata route.
typedef struct
{
unsigned char field1;
unsigned short field2;
unsigned char a_string[4];
} data;
typedef struct
{
unsigned char name[16];
unsigned char type;
unsigned char min;
unsigned char max;
} field_info;
field_info fields[3];
void init_meta(void)
{
strcpy(fields[0].name, "field1");
fields[0].type = TYPE_UCHAR;
fields[0].min = 1;
fields[0].max = 250;
strcpy(fields[1].name, "field2");
fields[1].type = TYPE_USHORT;
fields[1].min = 0;
fields[1].max = 0xffff;
strcpy(fields[2].name, "a_string");
fields[2].type = TYPE_STRING;
fields[2].min = 0 // n/a
fields[2].max = 0 // n/a
}
void send_meta(void)
{
rs232_packet packet;
memcpy(packet.payload, fields, sizeof(fields));
packet.length = sizeof(fields);
send_packet(packet);
}