Thrift go support of map with struct value as key - serialization

I experimented golang generation with Thrift 0.9.1, for example,
thrift definition,
struct AppIdLeveledHashKeyTimeKeyHour {
1: required i32 appId
2: required LeveledHashKey leveledHashKey
3: required TimeKeyHour timeKeyHour
}
typedef map<AppIdLeveledHashKeyTimeKeyHour, ...sth else...> EventSliceShardIdValue
in the generated code, EventSliceShardIdValue would be,
type EventSliceShardIdValue map[*AppIdLeveledHashKeyTimeKeyHour]EventSliceAppIdLeveledHashKeyTimeKeyHourValue
you can find the key part is a pointer which represents memory address. In golang a pointer as map key (instead of a value, or hash of the obj) is useless in most cases. To use a combination of some fields as map key, the definition should use a value type like
map[AppIdLeveledHashKeyTimeKeyHour]EventSliceAppIdLeveledHashKeyTimeKeyHourValue
Is it a problem of Thrift's go support (or I misused sth)? Any workaround to solve this problem in thrift?

Structs (without pointers) can only be used as map keys under certain limited circumstances (they must be comparable per http://golang.org/ref/spec#Comparison_operators); it's possible that AppIdLeveledHashKeyTimeKeyHour doesn't fit this definition, so it's not actually possible to build a map without using a pointer for the key.

Related

Using flatbuffers struct as a key

I am considering using flatbuffers' serialized struct as a key in a key-value store. Here is an example of the structs that I want to use as a key in rocksdb.
struct Foo {
foo_id: int64;
foo_type: int32;
}
I read the documentation and figured that the layout of a struct is deterministic. Does that mean it is suitable to be used as a key? If yes, how do I serialize a struct and deserialize it back. It seems like Table has API for serialization/deserialization but struct does not (?).
I tried serializing struct doing it as follows:
constexpr int key_size = sizeof(Foo);
using FooKey = std::array<char, key_size>;
FooKey get_foo_key(const Foo& foo_object) {
FooKey key;
std::memcpy(&key, &foo_object, key_size);
return key;
}
const Foo* get_foo(const FooKey& key) {
return reinterpret_cast<const Foo*>(&key);
}
I did some sanity checks and the above seems to work in my Ubuntu 18 docker image and is blazing fast. So my questions are as follows:
Is this a safe thing to do on a machine if it passes FLATBUFFERS_LITTLEENDIAN and uint8/char equivalence checks? Or are there any other checks needed?
Are there any other caveats that I should be aware of when doing it as demonstrated above?
Thanks in advance !
You don't actually need to go via std::array, the Foo struct is already a block of memory that is safe to copy or cast as you wish. It needs no serialization functions.
Like you said, that memory contains little endian data, so FLATBUFFERS_LITTLEENDIAN must pass. Actually even on a big endian machine you may copy these structures all you want, as long as you use the accessors to read the fields (which do a byteswap on access on big endian). The only thing that won't work on big endian is casting the struct to, say, an int64_t * to read the first field without using the accessor methods.
The other caveat to certain casting operations is strict aliasing, if you have that turned on certain casts may be undefined behavior.
Also note that in this example Foo will be 16 bytes in size on all platforms, because of alignment.

How one deals with multiple pointer level (like char**) in Squeak FFI

I want to deal with a structure like this struct foo {char *name; char **fields ; size_t nfields};
If I define corresponding structure in Squeak
ExternalStructure subclass: #Foo
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'FFI-Tests'.
and define the fields naively with
Foo class>fields
^#(
(name 'char*')
(fields 'char**')
(nfields 'unsigned long')
)
then generate the accessors with Foo defineFields, I get those undifferentiated types for name and fields:
Foo>>name
^ExternalData fromHandle: (handle pointerAt: 1) type: ExternalType char asPointerType
Foo>>fields
^ExternalData fromHandle: (handle pointerAt: 5) type: ExternalType char asPointerType
That is troubling, the second indirection is missing for the fields accessor.
How should I specify fields accessor in the spec?
If not possible, how do I define it manually?
And I have the same problem for this HDF5 function prototype: int H5Tget_array_dims(hid_t tid, hsize_t *dims[])
The following syntax is not accepted:
H5Tget_array_dims: tid with: dims
<cdecl: long 'H5Tget_array_dims'(Hid_t Hsize_t * * )>
The compiler barks argument expected -> before the second *...
I add to resort to void * instead, that is totally bypassing typechecking - less than ideal...
Any idea how to deal correctly with such prototype?
Since Compiler-mt.435, the parser will not complain anymore but call back to ExternalType>>asPointerToPointerType. See source.squeak.org/trunk/Compiler-mt.435.diff and source.squeak.org/FFI/FFI-Kernel-mt.96.diff
At the time of writing this, such pointer-to-pointer type will be treated as regular pointer type. So, you loose the information that the external type actually points to an array of pointers.
When would one need that information?
When coercing arguments in the FFI plugin during the call
When constructing the returned object in the FFI plugin during the call
When interpreting instances of ExternalData from struct fields and FFI call return values
In tools such as the object explorer
There already several kinds of RawBitsArray in Squeak. Adding String and ExternalStructure (incl. packed or union) to the mix, we have all kinds of objects in Squeak to map the inner-most dimension (i.e., int*, char*, void*). ExternalData can represent the other levels of the multi-dimensional array (i.e., int**, char**, void** and so on).
So, there are remaining tasks here:
Store that pointer dimension information maybe in a new external type to be found via ExternalType>>referencedType. We may want to put new information into compiledSpec. See http://forum.world.st/FFI-Plugin-Question-about-multi-dimensional-arrays-e-g-char-int-void-td5118484.html
Update value reading in ExternalArray to unwrap one pointer after the other; and let the code generator for struct-field accessors generate code in a similar fashion.
Extend argument coercing in the plugin to accept arrays of the already supported arrays (i.e. String etc.)

How does Julia recognize values as singleton types?

It is a cool feature of Julia that values can be used as types, at least as type parameters. For example, one can assert that arrays are of a particular dimensionality, such as x :: Array{Int,2}. My question is: how does Julia do that and how do users of Julia get access to that power? I assume that 2 is being converted to or interpreted as some sort of singleton type of 2. I am curious to know what function does that conversion. I tried to assert 2 :: Type{2} and isa(2, Type{2}), but that only asserts a singleton if 2 is replaced by an actual type.
You can not define your own imutables and use them as singleton types (yet).
Currently anything that makes static int valid_type_param(jl_value_t *v) defined in jltypes.c return true, can be used as a type parameter. There is a TODO to add more types, and you'll probably just need a compelling usecase to get help to change the behaviour.
Update:
See also the manual documentation on types: Both abstract and concrete types can be paramaterized by other types and by certain other values (currently integers, symbols, bools, and tuples thereof). Type parameters may be completely omitted when they do not need to be referenced or restricted.

Most appropriate data structure for dynamic languages field access

I'm implementing a dynamic language that will compile to C#, and it's implementing its own reflection API (.NET's is too slow, and the DLR is limited only to more recent and resourceful implementations).
For this, I've implemented a simple .GetField(string f) and .SetField(string f, object val) interface. Until recently, the implementation just switches over all possible field string values and makes the corresponding action.
Also, this dynamic language has the possibility to define anonymous objects. For those anonymous objects, at first, I had implemented a simple hash algorithm.
By now, I am looking for ways to optimize the dynamic parts of the language, and I have come across the fact that a hash algorithm for anonymous objects would be overkill. This is because the objects are usually small. I'd say the objects contain 2 or 3 fields, normally. Very rarely, they would contain more than 15 fields. It would take more time to actually hash the string and perform the lookup than if I would test for equality between them all. (This is not tested, just theoretical).
The first thing I did was to -- at compile-time -- create a red-black tree for each anonymous object declaration and have it laid onto an array so that the object can look for it in a very optimized way.
I am still divided, though, if that's the best way to do this. I could go for a perfect hashing function. Even more radically, I'm thinking about dropping the need for strings and actually work with a struct of 2 longs.
Those two longs will be encoded to support 10 chars (A-za-z0-9_) each, which is mostly a good prediction of the size of the fields. For fields larger than this, a special function (slower) receiving a string will also be provided.
The result will be that strings will be inlined (not references), and their comparisons will be as cheap as a long comparison.
Anyway, it's a little hard to find good information about this kind of optimization, since this is normally thought on a vm-level, not a static language compilation implementation.
Does anyone have any thoughts or tips about the best data structure to handle dynamic calls?
Edit:
For now, I'm really going with the string as long representation and a linear binary tree lookup.
I don't know if this is helpful, but I'll chuck it out in case;
If this is compiling to C#, do you know the complete list of fields at compile time? So as an idea, if your code reads
// dynamic
myObject.foo = "some value";
myObject.bar = 32;
then during the parse, your symbol table can build an int for each field name;
// parsing code
symbols[0] == "foo"
symbols[1] == "bar"
then generate code using arrays or lists;
// generated c#
runtimeObject[0] = "some value"; // assign myobject.foo
runtimeObject[1] = 32; // assign myobject.bar
and build up reflection as a separate array;
runtimeObject.FieldNames[0] == "foo"; // Dictionary<int, string>
runtimeObject.FieldIds["foo"] === 0; // Dictionary<string, int>
As I say, thrown out in the hope it'll be useful. No idea if it will!
Since you are likely to be using the same field and method names repeatedly, something like string interning would work well to quickly generate keys for your hash tables. It would also make string equality comparisons constant-time.
For such a small data set (expected upper bounds of 15) I think almost any hashing will be more expensive then a tree or even a list lookup, but that is really dependent on your hashing algorithm.
If you want to use a dictionary/hash then you'll need to make sure the objects you use for the key return a hash code quickly (perhaps a single constant hash code that's built once). If you can prevent collisions inside of an object (sounds pretty doable) then you'll gain the speed and scalability (well for any realistic object/class size) of a hash table.
Something that comes to mind is Ruby's symbols and message passing. I believe Ruby's symbols act as a constant to just a memory reference. So comparison is constant, they are very lite, and you can use symbols like variables (I'm a little hazy on this and don't have a Ruby interpreter on this machine). Ruby's method "calling" really turns into message passing. Something like: obj.func(arg) turns into obj.send(:func, arg) (":func" is the symbol). I would imagine that symbol makes looking up the message handler (as I'll call it) inside the object pretty efficient since it's hash code most likely doesn't need to be calculated like most objects.
Perhaps something similar could be done in .NET.

CLI/C++ Converting "this" pointer to an integer

I am trying to trace managed object creation/disposing in a CLI/C++ prog:
::System::Diagnostics::Trace::WriteLine(String::Format(
"Created {0} #{1:X8}",
this->GetType()->Name,
((UInt64)this).ToString()));
Which fails with
error C2440: 'type cast' : cannot convert from 'MyType ^const ' to 'unsigned __int64'
Is there a way to keep track of the unique object IDs this way?
Thanks!
First of all, why this doesn't work. Managed handle types ^ aren't pointers. They aren't just addresses. An instance of a managed type can and will be moved around in memory by GC, so addresses aren't stable; hence why it wouldn't let you do such a cast (as GC can execute at any moment, and you do not know when, any attempt to use such an address as a raw value is inherently a race condition).
Another thing that is often recommended, but doesn't actually work, is Object.GetHashCode(). For one thing, it returns an int, obviously not enough to be unique on x64. Furthermore, the documentation doesn't guarantee that values are unique, and they actually aren't in 2.0+.
The only working solution is to create a an instance of System.Runtime.InteropServices.GCHandle for your object, and then cast it to IntPtr - that is guaranteed to be both unique, and stable.
Check out the GCHandle type: http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.gchandle.aspx. Looks like it would do what you want, though it looks it would be a bit of a pain to use for your purposes...
Even if you could cast this to some integral value for display, it probably wouldn't be a useful unique identifier. This is because unlike C++, in C++/CLI the location of a (managed) object (and by extension the value of this) can potentially change during that object's lifetime. The (logically) same object could print two different strings at different points in the program.
MyType ^const is a reference type. Hence it's in the managed memory space, and you can't get direct memory pointers to these types, as they can change at any time.
Is there a way to keep track of the unique object IDs this way? Thanks!
You could use MyType.GetHashCode();