Using google protobuffer for delta messages - serialization

I am looking into using Google Protobuffers for delta messaging. Meaning I only want to send out the changed values of my domain object.
But that exposes a problem with the protocol for this purpose. I can easily just omit the properties that have not changed, and that will present us with a compact message.
But what about properties that change value from _something_ to null? There is no way to distinguish between these two scenarios in a protocol buffer.
What have others done here? I am looking at a few different solutions:
Add a meta property to all objects, that is an array of int. In case any of the properties should change to null, include the field number in this array. If no properties change, then the meta property is just omitted and doesn't take up bandwidth in the message.
Add a meta property that is a bit mask, but works like the array mentioned in option 1. This might be harder for clients to understand though.
Use a standard way that I haven't been able to find yet.
BR Jay

Protobuf 3 isn't very well suited for this. But in protobuf 2, you can have a field that is present but has value of null.
Because protobuf 2 isn't going to disappear any time soon, I'd suggest just use that for this kind of purposes.

I just wanted to post a follow-up on this and explain what I did.
As #jpa correctly pointed out protobuffers are not made for delta-compression.
So the way I solved it was to use some meta properties and rely on that convention. I have a close partnership with the people consuming the data, so conventions can be agreed upon.
Values that are set specifically to null
I have added an int array to the messages. This int array is empty most of the time and does not have an impact on the message size. When a property is set to null, I will add the property tag to this array and that way indicate that it has specifically been set to null in that message update.
Arrays that are emptied
This works in the same way as the nulls array. I have added an int array to the messages. This int array is empty most of the time and does not have an impact on the message size. When an array is emptied, I will add the property tag to this array and that way indicate that it has specifically been emptied that message update.
Objects that are deleted
To indicate that an object has been deleted, I have added a boolean property indicating that the object has been deleted. When the object is deleted I will set this value to true, and otherwise null, so it doesn't take up space in the message. The resulting message is the key identifier for that object and the boolean indicating that it is deleted.
It requires that the convention is understood by the clients but otherwise it works pretty well.

Related

Differences between Function that returns a string and read only string property [duplicate]

I need to expose the "is mapped?" state of an instance of a class. The outcome is determined by a basic check. It is not simply exposing the value of a field. I am unsure as to whether I should use a read-only property or a method.
Read-only property:
public bool IsMapped
{
get
{
return MappedField != null;
}
}
Method:
public bool IsMapped()
{
return MappedField != null;
}
I have read MSDN's Choosing Between Properties and Methods but I am still unsure.
The C# standard says
§ 8.7.4
A property is a member that provides access to a characteristic of an object or a class. Examples of properties include the length of a string, the size of a font, the caption of a window, the name of a customer, and so on. Properties are a natural extension of fields. Both are named members with associated types, and the syntax for accessing fields and properties is the same. However, unlike fields, properties do not denote storage locations. Instead, properties have accessors that specify the statements to be executed when their values are read or written.
while as methods are defined as
§ 8.7.3
A method is a member that implements a computation or action that can be performed by an object or class. Methods have a (possibly empty) list of formal parameters, a return value (unless the method’s return-type is void ), and are either static or non-static.
Properties and methods are used to realize encapsulation. Properties encapsulate data, methods encapsulate logic. And this is why you should prefer a read-only property if you are exposing data. In your case there is no logic that modifies the internal state of your object. You want to provide access to a characteristic of an object.
Whether an instance of your object IsMapped or not is a characteristic of your object. It contains a check, but that's why you have properties to access it. Properties can be defined using logic, but they should not expose logic. Just like the example mentioned in the first quote: Imagine the String.Length property. Depending on the implementation, it may be that this property loops through the string and counts the characters. It also does perform an operation, but "from the outside" it just give's an statement over the internal state/characteristics of the object.
I would use the property, because there is no real "doing" (action), no side effects and it's not too complex.
I personally believe that a method should do something or perform some action. You are not performing anything inside IsMapped so it should be a property
I'd go for a property. Mostly because the first senctence on the referenced MSDN-article:
In general, methods represent actions and properties represent data.
In this case it seems pretty clear to me that it should be a property. It's a simple check, no logic, no side effects, no performance impact. It doesn't get much simpler than that check.
Edit:
Please note that if there was any of the above mentioned and you would put it into a method, that method should include a strong verb, not an auxiliary verb like is or has. A method does something. You could name it VerifyMapping or DetermineMappingExistance or something else as long as it starts with a verb.
I think this line in your link is the answer
methods represent actions and properties represent data.
There is no action here, just a piece of data. So it's a Property.
In situations/languages where you have access to both of these constructs, the general divide is as follows:
If the request is for something the object has, use a property (or a field).
If the request is for the result of something the object does, use a method.
A little more specifically, a property is to be used to access, in read and/or write fashion, a data member that is (for consuming purposes) owned by the object exposing the property. Properties are better than fields because the data doesn't have to exist in persistent form all the time (they allow you to be "lazy" about calculation or retrieval of this data value), and they're better than methods for this purpose because you can still use them in code as if they were public fields.
Properties should not, however, result in side effects (with the possible, understandable exception of setting a variable meant to persist the value being returned, avoiding expensive recalculation of a value needed many times); they should, all other things being equal, return a deterministic result (so NextRandomNumber is a bad conceptual choice for a property) and the calculation should not result in the alteration of any state data that would affect other calculations (for instance, getting PropertyA and PropertyB in that order should not return any different result than getting PropertyB and then PropertyA).
A method, OTOH, is conceptually understood as performing some operation and returning the result; in short, it does something, which may extend beyond the scope of computing a return value. Methods, therefore, are to be used when an operation that returns a value has additional side effects. The return value may still be the result of some calculation, but the method may have computed it non-deterministically (GetNextRandomNumber()), or the returned data is in the form of a unique instance of an object, and calling the method again produces a different instance even if it may have the same data (GetCurrentStatus()), or the method may alter state data such that doing exactly the same thing twice in a row produces different results (EncryptDataBlock(); many encryption ciphers work this way by design to ensure encrypting the same data twice in a row produces different ciphertexts).
If at any point you'll need to add parameters in order to get the value, then you need a method. Otherwise you need a property
IMHO , the first read-only property is correct because IsMapped as a Attribute of your object, and you're not performing an action (only an evaluation), but at the end of the day consistancy with your existing codebase probably counts for more than semantics.... unless this is a uni assignment
I'll agree with people here in saying that because it is obtaining data, and has no side-effects, it should be a property.
To expand on that, I'd also accept some side-effects with a setter (but not a getter) if the side-effects made sense to someone "looking at it from the outside".
One way to think about it is that methods are verbs, and properties are adjectives (meanwhile, the objects themselves are nouns, and static objects are abstract nouns).
The only exception to the verb/adjective guideline is that it can make sense to use a method rather than a property when obtaining (or setting) the information in question can be very expensive: Logically, such a feature should probably still be a property, but people are used to thinking of properties as low-impact performance-wise and while there's no real reason why that should always be the case, it could be useful to highlight that GetIsMapped() is relatively heavy perform-wise if it in fact was.
At the level of the running code, there's absolutely no difference between calling a property and calling an equivalent method to get or set; it's all about making life easier for the person writing code that uses it.
I would expect property as it only is returning the detail of a field. On the other hand I would expect
MappedFields[] mf;
public bool IsMapped()
{
mf.All(x => x != null);
}
you should use the property because c# has properties for this reason

RavenDb GenerateDocumentKey

I need to generate Id for child object of my document. What is the current syntax for generating document key?
session.Advanced.Conventions.GenerateDocumentKey(document) is not there anymore. I've found _documentSession.Advanced.DocumentStore.Conventions.GenerateDocumentKey method but its' signature is weird: I am okay with default key generation algorithm I just want to pass an object and receive an Id.
The default implementation of GenerateDocumentKey is to get the "dynamic tag name" for the class, and append a slash. For example, class Foo would turn into Foos/ which then goes through the HiLoKeyGenerator so that ids can be assigned on the client-side without having to consult the server each time.
If you really want this behavior, you could try to use the HiLoKeyGenerator on your own, but have you considered something simpler? I don't know what your model is, but if the child thing is fully owned by the containing document (which it should be, to be in the same document) have you have several much easier options:
Just use the index within the collection
Keep a int NextChildThingId property on the document and increment that every time you add a ChildThing
Just use a Guid, although those are no fun to read, type, look at, compare, or speak to someone over the phone.

Overextending object design by adding many trivial fields?

I have to add a bunch of trivial or seldom used attributes to an object in my business model.
So, imagine class Foo which has a bunch of standard information such as Price, Color, Weight, Length. Now, I need to add a bunch of attributes to Foo that are rarely deviating from the norm and rarely used (in the scope of the entire domain). So, Foo.DisplayWhenConditionIsX is true for 95% of instances; likewise, Foo.ShowPriceWhenConditionIsY is almost always true, and Foo.PriceWhenViewedByZ has the same value as Foo.Price most of the time.
It just smells wrong to me to add a dozen fields like this to both my class and database table. However, I don't know that wrapping these new fields into their own FooDisplayAttributes class makes sense. That feels like adding complexity to my DAL and BLL for little gain other than a smaller object. Any recommendations?
Try setting up a separate storage class/struct for the rarely used fields and hold it as a single field, say "rarelyUsedFields" (for example, it will be a pointer in C++ and a reference in Java - you don't mention your language.)
Have setters/getters for these fields on your class. Setters will check if the value is not the same as default and lazily initialize rarelyUsedFields, then set the respective field value (say, rarelyUsedFields.DisplayWhenConditionIsX = false). Getters they will read the rarelyUsedFields value and return default values (true for DisplayWhenConditionIsX and so on) if it is NULL, otherwise return rarelyUsedFields.DisplayWhenConditionIsX.
This approach is used quite often, see WebKit's Node.h as an example (and its focused() method.)
Abstraction makes your question a bit hard to understand, but I would suggest using custom getters such as Foo.getPrice() and Foo.getSpecialPrice().
The first one would simply return the attribute, while the second would perform operations on it first.
This is only possible if there is a way to calculate the "seldom used version" from the original attribute value, but in most common cases this would be possible, providing you can access data from another object storing parameters, such as FooShop.getCurrentDiscount().
The problem I see is more about the Foo object having side effects.
In your example, I see two features : display and price.
I would build one or many Displayer (who knows how to display) and make the price a component object, with a list of internal price modificators.
Note all this is relevant only if your Foo objects are called by numerous clients.

Set variables to "Nothing" is a good practice?

If I got
Dim myRect As Rectangle = New Rectangle(0,0,100,100)
Is it necessary or just fine to later do this:
myRect = Nothing
Or it isn't necessary? Thank you.
IF it is necessary, are there other cases it isn't for my variables?
In general, as Joel said, it's unnecessary.
In your specific example, though, it is actually pointless. Rectangle is a value type, so setting it to Nothing is not even affecting an object's reference count; it's assigning a new value (the default value for Rectangle) to your myRect variable. This is analogous to having an Integer variable and setting it to 0 at the end of a method. Doesn't buy you anything.
I should point out that the claim "Setting any variable to Nothing [or null in C#] never accomplishes anything"* is a myth. It is entirely possible that you may have a field in a class which you might as well set to null if the object referenced is no longer needed but you still have a reference to the class instance itself.
As a simplistic example, suppose you had some container class which wraps a T[] array, and you give this container an Empty method. It might make sense to set the container's internal array to null in this method, which would result in zero references to the array object, qualifying it for garbage collection. (You would then create a new array when external code next tried to add a T to the collection.) If you did not set the field to null on Empty, then there would still be a reference to the array (i.e., the field), and so that would be a small amount of memory being used that you really don't need.
Like I said, that's a simplistic example. And honestly, it's rare that you ever need to consider scenarios like that. I just thought I would mention it so that you don't get the wrong impression that setting a field to Nothing literally never accomplishes anything.
*I'm not actually quoting anyone specific here; this is just an over-generalization I've heard stated more than once.
Don't do it. This was important to do in the vb6 days, but for .Net it's possible (though very unlikely) for it to actually throw the garbage collector off and have other unexpected side effects.

WCF REST and JSON - Serialization and empty elements

I am returning a list of items (user defined class) in a REST service using WCF. I am returning the items as JSON and it is used in some client side javascript (so the 'schema' of the class was derived from what the javascript library required). The class is fairly basic, strings and a bool. The bool is optional, so if it is absent the javascript library uses a default value, and if it is present (true or false) the value is used.
The problem is if I use a bool, the value is defaulted to false when serialized, and if i use a bool?, the member is still sent accross in the JSON and defaulted to null which causes problems with the library (it wont fall back to the default value).
I know I can probably mess around the the javascript library, but I would like to find a way to just not send any members which are null so they dont show up in the serialized JSON at all.
Any ideas?
You could do a little bit of packing and unpacking before and after the serialization. E.g.:
You could make two different versions
of the class, one with and one
without the bool, and convert as
appropriate before and after
transmitting. (sends the least amount
of data, if # of bytes is a greater
consideration than code complexity)
You could add another bool that tells
whether the first bool is supposed to
be null.