Serializing unsigned integers with avro - serialization

I am trying to serialize data with unsigned values to AVRO, but I see only signed int, long, float and double are supported by the AVRO libraries.
I could not find any native support for unsigned values in AVRO libraries.
Can you please help me in pointing to any references for serializing unsigned values(uint32_t, uint16_t) with AVRO libraries or any workaround for this?

Related

Using flatbuffers struct as a key

I am considering using flatbuffers' serialized struct as a key in a key-value store. Here is an example of the structs that I want to use as a key in rocksdb.
struct Foo {
foo_id: int64;
foo_type: int32;
}
I read the documentation and figured that the layout of a struct is deterministic. Does that mean it is suitable to be used as a key? If yes, how do I serialize a struct and deserialize it back. It seems like Table has API for serialization/deserialization but struct does not (?).
I tried serializing struct doing it as follows:
constexpr int key_size = sizeof(Foo);
using FooKey = std::array<char, key_size>;
FooKey get_foo_key(const Foo& foo_object) {
FooKey key;
std::memcpy(&key, &foo_object, key_size);
return key;
}
const Foo* get_foo(const FooKey& key) {
return reinterpret_cast<const Foo*>(&key);
}
I did some sanity checks and the above seems to work in my Ubuntu 18 docker image and is blazing fast. So my questions are as follows:
Is this a safe thing to do on a machine if it passes FLATBUFFERS_LITTLEENDIAN and uint8/char equivalence checks? Or are there any other checks needed?
Are there any other caveats that I should be aware of when doing it as demonstrated above?
Thanks in advance !
You don't actually need to go via std::array, the Foo struct is already a block of memory that is safe to copy or cast as you wish. It needs no serialization functions.
Like you said, that memory contains little endian data, so FLATBUFFERS_LITTLEENDIAN must pass. Actually even on a big endian machine you may copy these structures all you want, as long as you use the accessors to read the fields (which do a byteswap on access on big endian). The only thing that won't work on big endian is casting the struct to, say, an int64_t * to read the first field without using the accessor methods.
The other caveat to certain casting operations is strict aliasing, if you have that turned on certain casts may be undefined behavior.
Also note that in this example Foo will be 16 bytes in size on all platforms, because of alignment.

Get plain old objects from FlatBuffers

I'm parsing a FlatBuffers binary file and create POJOs from the data (Plain Old Javascript Objects) for some reason.
In my .fbs file I have for example a geo_id defined as follows:
table Properties {
geo_id:long;
tags:[TagIndex];
}
In the javascript in my HTML I create a POJO feature object lie this:
function createFeature(rawFeature, cell) {
var feature = {
id: rawFeature.properties().geoId(),
geometry: null,
properties: {}
}
return feature;
}
My expectation was that I get a plain number (long), but I'm getting an object with "low" and "high" where "low" seems to be the id. Though I'm a bit confused and would like to know the proper way to convert this data to plain old variables.
A long is a 64-bit number which can't be represented natively in JS. To preserve the information, it is represented as 2 32-bit numbers.
If you are really using all the bits in a long, then there's no way to convert it to a single number safely. If the high bits are all 0, then you could use just the low bits to represent it as a single number in JS.
You may want to use int in your schema for convenience if you don't need the high bits.

Which Marshal::Copy overload method used?

Please consider the following C++/CLI code:
typedef unsigned __int8 uint8_t;
...
uint8_t unmanaged_buf[MAVLINK_MAX_PACKET_LEN];
array<uint8_t>^ Buffer;
...
Marshal::Copy((IntPtr)unmanaged_buf, Buffer, 0, len);
Is the following the Marshal::Copy() method that is used?
Marshal::Copy Method (IntPtr, array<Byte>, Int32, Int32)
PS: The MSDN URL for the above method is at: http://msdn.microsoft.com/en-us/library/ms146631.aspx
If it is, is it because Byte is the type that is closest to unsigned __int8? Specifically, how does the Visual C++ compiler determine which method overload to use?
From MSDN documentation about __int8:
The types __int8, __int16, and __int32 are synonyms for the ANSI types that have the same size, and are useful for writing portable code that behaves identically across multiple platforms. The __int8 data type is synonymous with type char, …
This doesn't say anything about the unsigned versions of the types, but I think it makes sense to assume that unsigned __int8 is synonymous with unsigned char.
And from .NET Framework Equivalents to C++ Native Types:
The following table shows the keywords for built-in Visual C++ types, which are aliases of predefined types in the System namespace.
unsigned char: System.Byte
Putting this together, unsigned __int8 is synonymous to an alias of System.Byte, which means it is the same as System.Byte in C++/CLI code.

Proper way to define and initialize Decimal in Managed C++/CLI

This seems like it should be really simple but I'm having trouble finding the answer online.
What's the proper way to define a Decimal variable and initialize it with constant value in C++/CLI?
In C# it would be:
decimal d = 1.1M;
In C++/CLI I've been doing:
Decimal d = (Decimal)1.1;
Which works for some numbers, but I suspect it's just converting from double.
I notice there's a constructor: Decimal(int, int, int, bool, unsigned char) but was hoping there's an easier way to deal with large specific numbers.
You are indeed casting the number. You can, as mentioned, parse from a string or divide integers, or you may want to use the BigRational data type. Independently of the option you choose you may create a utility method in a static class to do it so you don't have to repeat it all the time.
You can also suggest on the VS UserVoice Site to allow number sufixes like in C#.

Binding SQLite Parameters directly by Name

I recently - very recently - started learning how to program for iOS, and have been stumped by what appears (to me) to be a blatant oversight in SQLite3. Let me qualify that by saying that prior to last week I had zero (practical) experience with Macs, Objective C, Xcode, iOS or SQLite, so I have no delusions about waltzing into field of tried-and-true tools and finding obvious errors on my first try. I assume there's a good explanation.
However, after spending the last few months using SQL Server, MySQL, and PostgreSQL, I was amazed to discover that SQLite doesn't have better functionality for adding parameters by name. Everything I could find online (documentation, forums [including SO]) says to assign parameters using their integer index, which seems like it would be a pain to maintain if you ever modify your queries. Even though you can name the parameters in your statements and do something like
sqlite3_bind_int(stmt, sqlite3_bind_parameter_index(stmt, "#my_param"), myInt);
no one seems to do that either. In fact, no one seems to try to automate this at all; the only alternate approach I could find used a parameter array and a loop counter, and inspected each parameter to determine which object type to insert. I originally considered a similar approach, but a) my boss's stance is that database parameters should always be type checked (and I agree, although I realize that SQLite fields aren't strongly typed and I technically could do it anyways), b) it felt like an inelegant hack, and c) I assumed there was a reason this approach wasn't widely used. So:
1) Why aren't there binding methods in SQLite that accept a parameter name (as, say, a 'const char')? Or are there and I'm missing something?
2) Why doesn't anyone seem to use an approach like the example above?
I dug in the source code a little and think I could easily modify the library or just write my own (typed) class methods that would do the above for me, but I'm assuming there's a reason no one has built this into SQLite yet. My only guess is that the additional memory and cycles needed to find the parameter index are too precious on an [insert iDevice here], and aren't worth the convenience of being able to use parameter names . . . ?
Any insight would be appreciated.
There are; it's the sqlite3_bind_parameter_index() function you mentioned that you use to turn a parameter name into an index, which you can then use with the sqlite3_bind_*() functions. However, there's no sqlite3_bind_*_by_name() function or anything like that. This is to help prevent API bloat. The popular Flying Meat Database sqlite wrapper has support for named parameters in one of its branches, if you're interested in seeing how it's used.
If you think about what it would take to implement full named parameter binding methods, consider the current list of bind functions:
int sqlite3_bind_blob(sqlite3_stmt*, int, const void*, int n, void(*)(void*));
int sqlite3_bind_double(sqlite3_stmt*, int, double);
int sqlite3_bind_int(sqlite3_stmt*, int, int);
int sqlite3_bind_int64(sqlite3_stmt*, int, sqlite3_int64);
int sqlite3_bind_null(sqlite3_stmt*, int);
int sqlite3_bind_text(sqlite3_stmt*, int, const char*, int n, void(*)(void*));
int sqlite3_bind_text16(sqlite3_stmt*, int, const void*, int, void(*)(void*));
int sqlite3_bind_value(sqlite3_stmt*, int, const sqlite3_value*);
int sqlite3_bind_zeroblob(sqlite3_stmt*, int, int n);
If we wanted to add explicit support for named parameters, that list would double in length to include:
int sqlite3_bind_name_blob(sqlite3_stmt*, const char*, const void*, int n, void(*)(void*));
int sqlite3_bind_name_double(sqlite3_stmt*, const char*, double);
int sqlite3_bind_name_int(sqlite3_stmt*, const char*, int);
int sqlite3_bind_name_int64(sqlite3_stmt*, const char*, sqlite3_int64);
int sqlite3_bind_name_null(sqlite3_stmt*, const char*);
int sqlite3_bind_name_text(sqlite3_stmt*, const char*, const char*, int n, void(*)(void*));
int sqlite3_bind_name_text16(sqlite3_stmt*, const char*, const void*, int, void(*)(void*));
int sqlite3_bind_name_value(sqlite3_stmt*, const char*, const sqlite3_value*);
int sqlite3_bind_name_zeroblob(sqlite3_stmt*, const char*, int n);
Twice as many functions means a lot more time spent maintaining API, ensuring backwards-compatibility, etc etc. However, by simply introducing the sqlite3_bind_parameter_index(), they were able to add complete support for named parameters with only a single function. This means that if they ever decide to support new bind types (maybe sqlite3_bind_int128?), they only have to add a single function, and not two.
As for why no one seems to use it... I can't give any sort of definitive answer with conducting a survey. My guess would be that it's a bit more natural to refer to parameters sequentially, in which case named parameters aren't that useful. Named parameters only seem to be useful if you need to refer to parameters out of order.