Are Bond field names used in deserialization? - bond

Suppose I serialized a given Bond struct with a single field:
struct NameBond
{
1: string name;
}
And then I renamed the field in the .bond file (without changing its ordinal):
struct NameBond
{
1: string displayName;
}
Would I still be able to deserialize it?
What about the name of the struct? (NameBond in the example.)
Would changing that prevent me from deserializing?

This depends on which protocol you are using.
Your change will cause no problems in the CompactBinary serializer.
It may cause trouble with other protocols.
You may want to consult the Bond schema evolution guide, where it says:
Caution should be used when changing or reusing field names as this could break text-based protocols like SimpleJsonProtocol
See also this related SO question.

Related

Are extensible records useless in Elm 0.19?

Extensible records were one of the most amazing Elm's features, but since v0.16 adding and removing fields is no longer available. And this puts me in an awkward position.
Consider an example. I want to give a name to a random thing t, and extensible records provide me a perfect tool for this:
type alias Named t = { t | name: String }
„Okay,“ says the complier. Now I need a constructor, i.e. a function that equips a thing with specified name:
equip : String -> t -> Named t
equip name thing = { thing | name = name } -- Oops! Type mismatch
Compilation fails, because { thing | name = ... } syntax assumes thing to be a record with name field, but type system can't assure this. In fact, with Named t I've tried to express something opposite: t should be a record type without its own name field, and the function adds this field to a record. Anyway, field addition is necessary to implement equip function.
So, it seems impossible to write equip in polymorphic manner, but it's probably not a such big deal. After all, any time I'm going to give a name to some concrete thing I can do this by hands. Much worse, inverse function extract : Named t -> t (which erases name of a named thing) requires field removal mechanism, and thus is not implementable too:
extract : Named t -> t
extract thing = thing -- Error: No implicit upcast
It would be extremely important function, because I have tons of routines those accept old-fashioned unnamed things, and I need a way to use them for named things. Of course, massive refactoring of those functions is ineligible solution.
At last, after this long introduction, let me state my questions:
Does modern Elm provides some substitute for old deprecated field addition/removal syntax?
If not, is there some built-in function like equip and extract above? For every custom extensible record type I would like to have a polymorphic analyzer (a function that extracts its base part) and a polymorphic constructor (a function that combines base part with additive and produces the record).
Negative answers for both (1) and (2) would force me to implement Named t in a more traditional way:
type Named t = Named String t
In this case, I can't catch the purpose of extensible records. Is there a positive use case, a scenario in which extensible records play critical role?
Type { t | name : String } means a record that has a name field. It does not extend the t type but, rather, extends the compiler’s knowledge about t itself.
So in fact the type of equip is String -> { t | name : String } -> { t | name : String }.
What is more, as you noticed, Elm no longer supports adding fields to records so even if the type system allowed what you want, you still could not do it. { thing | name = name } syntax only supports updating the records of type { t | name : String }.
Similarly, there is no support for deleting fields from record.
If you really need to have types from which you can add or remove fields you can use Dict. The other options are either writing the transformers manually, or creating and using a code generator (this was recommended solution for JSON decoding boilerplate for a while).
And regarding the extensible records, Elm does not really support the “extensible” part much any more – the only remaining part is the { t | name : u } -> u projection so perhaps it should be called just scoped records. Elm docs itself acknowledge the extensibility is not very useful at the moment.
You could just wrap the t type with name but it wouldn't make a big difference compared to approach with custom type:
type alias Named t = { val: t, name: String }
equip : String -> t -> Named t
equip name thing = { val = thing, name = name }
extract : Named t -> t
extract thing = thing.val
Is there a positive use case, a scenario in which extensible records play critical role?
Yes, they are useful when your application Model grows too large and you face the question of how to scale out your application. Extensible records let you slice up the model in arbitrary ways, without committing to particular slices long term. If you sliced it up by splitting it into several smaller nested records, you would be committed to that particular arrangement - which might tend to lead to nested TEA and the 'out message' pattern; usually a bad design choice.
Instead, use extensible records to describe slices of the model, and group functions that operate over particular slices into their own modules. If you later need to work accross different areas of the model, you can create a new extensible record for that.
Its described by Richard Feldman in his Scaling Elm Apps talk:
https://www.youtube.com/watch?v=DoA4Txr4GUs&ab_channel=ElmEurope
I agree that extensible records can seem a bit useless in Elm, but it is a very good thing they are there to solve the scaling issue in the best way.

Can we use [DataMember] instead of [ProtoMember] while use protobuf in WCF?

I've already a service working using the DataContract attributes. We would like to switch to the protobuf implementation, but if we have to change all the attributes, it would be a lot of hardwork.
Is it possible to NOT use the ProtoMember and ProtoContract and have ProtoBuf using the DataMember and DataContract attributes?
thanks
Sure; protobuf-net is perfectly happy with [DataContract] / [DataMember] as long as it can still get valid numbers, which it does by looking for the Order property of DataMemberAttribute.
There is, however, a small problem... tools like svcutil don't guarantee the actual numbers - just the order. This can make it problematic to ensure that you have the same numbers of both sides. In addition, svcutil tends to start at zero, not one - and zero is not a valid field number for protobuf. If the numbers you get all turn out to be off-by-one, then you can tweak this by adding a partial class in a seperate file with a fixup, for example:
[ProtoContract(DataMemberOffset = 1)]
partial class Whatever { }
However, if the numbers are now all over the place (because they weren't sequential originally), they you might want to either use multiple [ProtoPartialMember(...)] attributes to tell it how to map each one (remembering that you can use nameof rather than hard-coding the member names):
[ProtoContract]
[ProtoPartialMember(1, nameof(SomeStringValue))]
[ProtoPartialMember(2, nameof(WhateverId))]
partial class Whatever { }
or just share the original type definition, which might be easier.

Microsoft Bond runtime schemaDef

I'm hoping someone could illustrate a common use case for the Microsoft Bond runtime schemas (SchemaDef). I understand these are used when schema definitions are not known at compile time, but if the shape of an object is fluid and changes frequently, what benefits might a runtime generated schema provide?
My use case is that the business user is in control of the shape of an object (via a rules engine). They could conceivably do all sorts of things that could break our backward compatibility (for example, invert the order of fields on the object). If we plan on persisting all the object versions that the user created, is there any way to manage backward/forward compatibility using Bond runtime schemas? I presume no, as if they invert from this:
0: int64 myInt;
1: string myString;
to this
0: string myString;
1: int64 myInt;
I'd expect a runtime error. Which implies managing the object with runtime schemas wouldn't provide much help to me.
What would be a usecase where a runtime schema would in fact be useful?
Thank you!
Some of the uses for runtime schemas are:
with the Simple Binary protocol to handle schema changes
schema validation/evoluton
rendering a struct in a GUI
custom mapping from one struct to another
Your case feels like schema validation, if you can pro-actively reject a schema that would no be compatible. I worked on a system that used Bond under the hood and took this approach. There was an explicit "change the schema of this entity" operation that validated whether the two schemas were compatible with each other.
I don't know the data flow in your system, so such validation might not be possible. In that case, you could use the runtime schemas, along with some rules provided by the business users, to convert between different shapes.
Simple Binary
When deserializing from Simple Binary, the reader must know the exact schema that the writer used, otherwise it has no way to interpret the bytes, resulting in potentially silent data corruption.
Such corruption can happen if the schema undergoes the following change:
// starting struct
struct Foo
{
0: uint8 f1;
1: uint16 f2;
}
The Simple Binary serialized representation of Foo { f1: 1, f2: 2} is 0x01 0x02 0x00.
Let's now change the schema to this:
// changed struct
struct Foo
{
0: uint8 f1;
// It's OK to remove an optional field.
// 1: uint16 f2;
2: uint8 f3;
3: uint8 f4;
}
If we deserialize 0x01 0x02 0x00 with this schema, we'll get Foo { f1: 1, f3: 2, f4: 0}. Notice that f3 is 2, which is not correct: it should be 0. With the runtime schema for the old Foo, the reader will know that the second and third bytes correspond to a field that has since been deleted and can skip them, resulting in the expected Foo { f1:1, f3: 0, f4: 0 }.
Schema Validation and Evolution
Some systems that use Bond have different rules for schema evolution that the normal Bond rules. Runtime schemas can be used to enforce such rules (e.g., checking a type to enforce a rule that no collections are used) before accepting structs of a given type or before registering such a schema in, say, a repository of known schemas.
You could also walk two schemas to determine with they are compatible with each other. It would be nice if Bond provided such an API itself, so that it doesn't have to be reimplemented again and again. I've opened a GitHub issue for such an API.
GUI
With a runtime schema, you have extra information about the struct, including things like the names of the fields. (The binary encoding protocols omit field names, relying, instead, on field IDs.) You can use this additional information to do things like create GUI controls specific to each field.
There's an example showing inspection of a runtime schema in both C# and C++.
Custom Mapping
In C++, the MapTo transform can be used to convert one struct to another, which incompatible shapes, given a set of rules. There's an example of this, that makes use of a runtime schema to derive the rules.

What do member numbers mean in Microsoft Bond?

Using Microsoft Bond (the C# library in particular), I see that whenever a Bond struct is defined, it looks like this:
struct Name
{
0: type name;
5: type name;
...
}
What do these numbers (0, 5, ...) mean?
Do they require special treatment in inheritance? (Do I need to make sure that I do not override members with the same number defined in my ancestor?)
The field ordinals are the unique identity of each field. When serializing to tagged binary protocols, these numbers are used to indicate which fields are in the payload. The names of the fields are not used. (Renaming a field in the .bond file does not break serialized binary data compatibility [though, see caveat below about text protocols].) Numbers are smaller than strings, which helps reduce the payload size, but also ends up improving serialization/deserialization time.
You cannot re-use the same field ordinal within the same struct.
There's no special treatment needed when you inherit from a struct (or if you have a struct field inside your struct). Bond keeps the ordinals for the structs separate. Concretely, the following is legal and will work:
namespace inherit_use_same_ordinal;
struct Base {
0: string field;
}
struct Derived : Base {
0: bool field;
}
A caveat about text serialization protocols like Simple JSON and Simple XML: these protocols use the field name as the field identifier. So, in these protocols renaming a field breaks serialized data compatibility.
Also, Simple JSON and Simple XML flatten the inheritance hierarchy, so re-using names across Base and Derived will result in clashes. Both have ways to work around this. For Simple XML, the SimpleXml.Settings.UseNamespaces parameter can be set to true to emit fully qualified names.
For Simple JSON, the Bond attribute JsonName can be used to change the name used for Simple JSON serialization, to avoid the conflict:
struct Derived : Base {
[JsonName("derived_field")]
0: bool field;
}

Reference Semantics in Google Protocol Buffers

I have slightly peculiar program which deals with cases very similar to this
(in C#-like pseudo code):
class CDataSet
{
int m_nID;
string m_sTag;
float m_fValue;
void PrintData()
{
//Blah Blah
}
};
class CDataItem
{
int m_nID;
string m_sTag;
CDataSet m_refData;
CDataSet m_refParent;
void Print()
{
if(null == m_refData)
{
m_refParent.PrintData();
}
else
{
m_refData.PrintData();
}
}
};
Members m_refData and m_refParent are initialized to null and used as follows:
m_refData -> Used when a new data set is added
m_refParent -> Used to point to an existing data set.
A new data set is added only if the field m_nID doesn't match an existing one.
Currently this code is managing around 500 objects with around 21 fields per object and the format of choice as of now is XML, which at 100k+ lines and 5MB+ is very unwieldy.
I am planning to modify the whole shebang to use ProtoBuf, but currently I'm not sure as to how I can handle the reference semantics. Any thoughts would be much appreciated
Out of the box, protocol buffers does not have any reference semantics. You would need to cross-reference them manually, typically using an artificial key. Essentially on the DTO layer you would a key to CDataSet (that you simply invent, perhaps just an increasing integer), storing the key instead of the item in m_refData/m_refParent, and running fixup manually during serialization/deserialization. You can also just store the index into the set of CDataSet, but that may make insertion etc more difficult. Up to you; since this is serialization you could argue that you won't insert (etc) outside of initial population and hence the raw index is fine and reliable.
This is, however, a very common scenario - so as an implementation-specific feature I've added optional (opt-in) reference tracking to my implementation (protobuf-net), which essentially automates the above under the covers (so you don't need to change your objects or expose the key outside of the binary stream).