Is it possible to hook into the protobuf-net serializer to add some custom logic? - serialization

This may be overkill, but I am trying to reduce the network consumption of a client/server protocol, by having both sides keep copies of previously transferred URIs, so as to use 2-4 byte placeholders instead of the full URIs on subsequent chatter.
Problem is I think it will be quite expensive to reflect through all the complex objects being transferred to locate the URIs that need processing, whereas the serializer is already visiting all these fields and probably using a mechanism much faster than reflection.
Can this be done in protobuf-net?

If this is part of a single call to Serialize/Deserialize (i.e. your data has the same uri repeated at multiple locations), then you can already do this, simply by telling it to treat those strings as references (it has special handling of strings, so two different references of the same string contents count as equal):
[ProtoMember(7, AsReference=true)]
public string Uri {get;set;}
During serialization, the first time it spots a new string value (decorated with AsReference=true) it will generate a unique token to represent the string; all subsequent usages of that same string will serialize only the token.
If this is in separate calls to Serialize/Deserialize, then no: you would have to do it manually. I can think of some ways of doing it, but I think this would be better handled outside of the serialization layer.

Could you possibly customise the Objects that you are using that you want to Tokenise your URIs and have them inherit or implement an interface that you can check to see if that particular object is a Tokenizer.
Then if that's the case you might be able to use the BeforeSerialization / AfterDeserialization to make your transformations.

Related

Why and when must we need to add "[Serializable]" attribute?

1) Besides for serialization from an object into a file/memory…… When must we add "[Serializable]" attribute?
2) Why must we add that? Why cannot we directly save the object into some known formation to the .NET library?
3) How can I tell in what sitatuations we must add this attribute? In what conditions?
Many thanks!
1) Besides for serialization from an object into a file/memory…… When must we add "[Serializable]" attribute?
When you want to serialize it, or an object that contains it, directly or indirectly, to any other medium, e.g. a socket, RMI, ...
2) Why must we add that?
Because that's the way they designed it. Security reasons spring to mind: you really don't want a password to be serializable by default for example.
Why cannot we directly save the object into some known formation to the .NET library?
Firstly because Serialization preceded .NET by several years; secondly there are various other ways of accomplishing that, such as SOAP.
3) How can I tell in what situations we must add this attribute? In what conditions?
You're just re-asking the same question here, two more times. You certainly need to do it if you ever get a NotSerializableException naming the class and the exception didn't indicate an inadvertent serialization of something you didn't intend to serialize in the first place.

"Visitor" a lot of different type of object

in one application I have a lot of dirrerent object, let´s say : square, circle, etc ... a lot of different shape ---> I´m sorry for the trivial example.
With all this object I want to create a doc of different type : xml, txt, html, etc.. (e.g.: I want to scan all the object (shapes) tree and produce the xml file.
The natural approach I thought is visitor pattern, I tried and it works :-)
- all the objects have one visit method accepting the IVisitor interface.
- I have one concrete visitor for every kind of do I want to create : (XmlVisitor, TxtVisitor, etc). Every visitor has one method "visit" for every kind of object.
My doubt is ... it doesn´t seems scaling well if I have a lot of object ...
from the logic point of view it´s ok, i have just to add the new shape and the method in the concrete Visitor, that´s all.
What do you think ? is an althernative possible ?
I think that you have correctly implemented a visitor pattern and as a result you also have implemented a double dispatching mechanism. If you consider the "not scaling well" as a need to add a bunch of methods in case of adding a new shape and/or visitor, then it is just a side effect of the pattern. Some people consider this "method explosion" as harmful and opt for a different implementation, such as having a "decision matrix" object. For this example in particular I think that the DD approach is the way to go and that actually it does scale well, since you add new methods as you add new requirements (i.e. you add new visit* methods as new shapes are added or you add a new visitor class as new document types are needed).
HTH
It seems to me that what worries you the most is that you are matching against many different kinds of objects, and worry that as more and more object types are added, the performance will suffer. I don't think you need to worry about that, though: The performance of the visitor pattern is not really affected by the potential number of objects or visitors, since it is based on virtual table lookup - the passed object will contain a link to (a link to) the method which should be called.
So the visitor pattern, though relatively expensive in indirect accesses, is scalable in this regard.
I believe you have :
A class hierarchy (Shapes in your example) and
Operations on the class hierarchy (exportXML, exportToHTML etc in your example)
You have to consider what is more likely to change -
You should choose Visitor pattern if the class hierarchy is more or less fixed but you would like to add more operations in future. Visitor pattern will allow you to add more operations (e.g. JSON export) without touching the existing class hierarchy.
OTOH if the operations are more or less fixed, but more Shape objects can be added, then you should use regular inheritance. Define a ShapeExport interface which has methods like exportToXML, exportToHTML etc. Let all Shapes implement that interface. Now you can add new Shape which implements the same interface without touching existing code.

Is "serialisation without duplication" possible in c++0x?

One of the big uses of code generation in c++ is to support message serialisation. Typically, you want to support specifying message contents and layout in the same step and produce code for that message type that can give you objects capable of being serialised to/from communication streams. In the past, this has usually resulted in code that looks like:
class MyMessage : public SerialisableObject
{
// message members
int myNumber_;
std::string myString_;
std::vector<MyOtherSerialisableObject> aBunchOfThingsIWantToSerialise_;
public:
// ctor, dtor, accesors, mutators, then:
virtual void Serialise(SerialisationStream & stream)
{
stream & myNumber_;
stream & myString_;
stream & aBunchOfThingsIWantToSerialise_;
}
};
The problem with using this kind of design is that violates an important rule of good architecture: you should not have to specify the intent of a design twice. Duplication of intent, like duplicated code and other common development duplication, leaves room for one place in the code to become divergent with the other, causing errors.
In the above, the duplication is the list of members. Potential errors include adding a member to the class but forgetting to add it to the serialisation list, serialising a member twice (possibly by not using the same order as the member declaration or possibly due to a misspelling of a similar member, among other ways), or serialising something that is not a member (which might produce a compiler error, unless name lookup finds something at a different scope than the object that matches lookup rules). That kind of mistake is the same reason we no longer try to match every heap allocation with a delete (instead using smart pointers) or ever file open with a close (using RAII ctor//dtor mechanisms) - we don't want to have to match up our intent in multiple places because there are times we - or another engineer less familiar with the intent - make mistakes.
Generally, therefore, this has been one of the things that code generation could take care of. You might create a file MyMessage.cg to specify both layout and members in one step
serialisable MyMessage
{
int myNumber_;
std::string myString_;
std::vector<MyOtherSerialisableObject> aBunchOfThingsIWantToSerialise_;
};
that would be run through a code generation utility and produce the code.
I was wondering if it was possible yet to do this in c++0x without external code generation. Are there any new language mechanisms that make it possible to specify a class as serialisable once, and the names and layout of it's members are used to layout the message during serialisation?
To be clear, I know that there are tricks with boost tuples and fusion that can come close to this kind of behavior even in the pre-c++0x language. Those usages, though, being based on indexing into the tuple rather than by-member-name access, have all been brittle to changing the layout, as other places in the code that access the messages would then also need to be reordered. Some kind of by-member-name access is necessary to not have to duplicate the layout specification in places in the code that use the messages.
Also, I know it might be nice to take this up to the next level and ask for specifying when some of the members shouldn't be serialised. Other languages that offer serialisation built in often offer some kind of attribute to do this, so
int myNonSerialisedNumber_ [[noserialise]];
might seem natural. However, I personally think it is bad design to have serialisable objects where everything is not serialised, since the lifetime of messages is in the transport to/from the communications layer, separate from other data lifetimes. Also, you could have an object which has a purely serialisable as on of it's members, so such functionality doesn't by anything the language doesn't already offer.
Is this possible? Or did the standards committee leave out this kind of introspective capability? I don't need it to look like the code gen file above - any simple method for compiletime specification of layout and members in a single step would solve this common problem.
This is both possible and practical in C++11 – in fact it was possible back in C++03, the syntax was just a little too unwieldy. I wrote a small library based around the same idea - see the following:
www.github.com/molw5/framework
Sample syntax:
class Object : serializable <Object,
value <NAME(“Field 1”), int>,
value <NAME(“Field 2”), float>,
value <NAME(“Field 3”), double>>
{
};
Most of the underlying code could be reproduced, in principal, in C++03 – some of the implementation details without variadic templates would have been...tricky, but I believe it would have been possible to recover the core functionality. What you could not reproduce in C++03 was the NAME macro above and the syntax relies fairly heavily on it. The macro provides the machinery necessary to generate a unique typename from a string, that is the following:
NAME(“Field 1”)
expands to
type_string <'F', 'i', 'e', 'l', 'd', ' ', '1'>
through the use of some common macros and constexpr (for character extraction). Back in C++03 something similar to the type_string above would need to be entered manually.
C++, of any form, supports neither introspection nor reflection (to the extent that they are different).
One nice thing about doing serialization manually (ie: without introspection or reflection) is that you can provide object versioning. You can support older forms of the serialization, and simply create reasonable defaults for the data that wasn't in the old versions. Or if a new version removes some data, you can simply serialize and discard it.
It seems to me that what you need is Boost.Serialization.

Design question about Web services and WCF

I have a service method with a couple of parameters that will be always provided and additional parameters that will change by names and number of parameters (I will know which parameters to expect by the ACTION field)
To solve a design problem like the above I created a web service with the parameters that will be always provided and one more parameter that will accept a string that is written in a Key<*>value way.
using MyServiceMethod:
Action : Action1
Param2: Hello
param3: world
Additional_Params: name<*>Jack;address<*>2 street;
(I know that with Action1 I get a Name and address values from the person who uses the service)
using MyServiceMethod for the second time:
Action : Action5
Param2: Hello
param3: world
Additional_Params: numOfHours<*>3;Sum<*>342;myName<*>asaf;
I think this is not the best design for a web service that accept a different data with each ACTION, is there a better way to do that ?
If you have a small set of actions, it would probably be the best design strategy to simply use a single method for each action. It probably won't require as much work to pick apart on the back end and, in my opinion, there's no shame in it.
If you have a large set of actions, you may be able to represent what you want to do as a data structure, and pass only that parameter in.
I wouldn't say the way you're proposing to do it is wrong, anyway, but it may actually end up creating more work for you and being confusing to you or another developer later.
I'm not 100% clear on what you're trying to do, but if you know that Action1 will always be called with the name/value pairs of "name" and "address", and Action5 will be called with "numOfHours", "Sum" and "myName", then I agree with #Eugarps answer and #Steven_Sudit associated comment. Simply expose them as parameters on the method for each action.
If however, you don't know what the variable parameters are going to be (only that you will get certain name/value pairs when invoking the actions), then I would at least use an IDictionary<string, object> rather than a string. Doing it this way means that you can avoid parsing string values. If the values being passed are instances of your own types (rather than primitives), then you'll also need to make use of the ServiceKnownType attribute to tell the serializer about them.
I've kind of already answered in comments, but here it is in one place:
If you have a small number of different functions, which change rarely, the best way is just to expose each as a separate function. When new ones are needed, you can extend your interface without breaking existing code.
Now, on the other hand, if you had a larger number of functions and/or they changed frequently, then a simple but flexible interface would be better. You could have a single function exposed, taking an XML document that is a serialization of a request DTO. This will require some demux logic (switch/case or, better, a lookup table of delegates) to dispatch the requests to a handler. This approach is harder to get going, but easier to maintain.

How much responsibility should a method have?

This is most certainly a language agnostic question and one that has bothered me for quite some time now. An example will probably help me explain the dilemma I am facing:
Let us say we have a method which is responsible for reading a file, populating a collection with some objects (which store information from the file), and then returning the collection...something like the following:
public List<SomeObject> loadConfiguration(String filename);
Let us also say that at the time of implementing this method, it would seem infeasible for the application to continue if the collection returned was empty (a size of 0). Now, the question is, should this validation (checking for an empty collection and perhaps the subsequent throwing of an exception) be done within the method? Or, should this methods sole responsibility be to perform the load of the file and ignore the task of validation, allowing validation to be done at some later stage outside of the method?
I guess the general question is: is it better to decouple the validation from the actual task being performed by a method? Will this make things, in general, easier at a later stage to change or build upon - in the case of my example above, it may be the case at a later stage where a different strategy is added to recover from the event of an empty collection being return from the 'loadConfiguration' method..... this would be difficult if the validation (and resulting exception) was being done in the method.
Perhaps I am being overly pedantic in the quest for some dogmatic answer, where instead it simply just relies on the context in which a method is being used. Anyhow, I would be very interested in seeing what others have to say regarding this.
Thanks all!
My recommendation is to stick to the single responsibility principle which says, in a nutshell, that each object should have 1 purpose. In this instance, your method has 3 purposes and then 4 if you count the validation aspect.
Here's my recommendation on how to handle this and how to provide a large amount of flexibility for future updates.
Keep your LoadConfig method
Have it call the a new method for reading the file.
Pass the previous method's return value to another method for loading the file into the collection.
Pass the object collection into some validation method.
Return the collection.
That's taking 1 method initially and breaking it into 4 with one calling 3 others. This should allow you to change pieces w/o having any impact on others.
Hope this helps
I guess the general question is: is it
better to decouple the validation from
the actual task being performed by a
method?
Yes. (At least if you really insist on answering such a general question – it’s always quite easy to find a counter-example.) If you keep both the parts of the solution separate, you can exchange, drop or reuse any of them. That’s a clear plus. Of course you must be careful not to jeopardize your object’s invariants by exposing the non-validating API, but I think you are aware of that. You’ll have to do some little extra typing, but that won’t hurt you.
I will answer your question by a question: do you want various validation methods for the product of your method ?
This is the same as the 'constructor' issue: is it better to raise an exception during the construction or initialize a void object and then call an 'init' method... you are sure to raise a debate here!
In general, I would recommend performing the validation as soon as possible: this is known as the Fail Fast which advocates that finding problems as soon as possible is better than delaying the detection since diagnosis is immediate while later you would have to revert the whole flow....
If you're not convinced, think of it this way: do you really want to write 3 lines every time you load a file ? (load, parse, validate) Well, that violates the DRY principle.
So, go agile there:
write your method with validation: it is responsible for loading a valid configuration (1)
if you ever need some parametrization, add it then (like a 'check' parameter, with a default value which preserves the old behavior of course)
(1) Of course, I don't advocate a single method to do all this at once... it's an organization matter: under the covers this method should call dedicated methods to organize the code :)
To deflect the question to a more basic one, each method should do as little as possible. So in your example, there should be a method that reads in the file, a method that extracts the necessary data from the file, another method to write that data to the collection, and another method that calls these methods. The validation can go in a separate method, or in one of the others, depending on where it makes the most sense.
private byte[] ReadFile(string fileSpec)
{
// code to read in file, and return contents
}
private FileData GetFileData(string fileContents)
{
// code to create FileData struct from file contents
}
private void FileDataCollection: Collection<FileData> { }
public void DoItAll (string fileSpec, FileDataCollection filDtaCol)
{
filDtaCol.Add(GetFileData(ReadFile(fileSpec)));
}
Add validation, verification to each of the methods as appropriate
You are designing an API and should not make any unnecessary assumptions about your client. A method should take only the information that it needs, return only the information requested, and only fail when it is unable to return a meaningful value.
So, with that in mind, if the configuration is loadable but empty, then returning an empty list seems correct to me. If your client has an application specific requirement to fail when provided an empty list, then it may do so, but future clients may not have that requirement. The loadConfiguration method itself should fail when it really fails, such as when it is unable to read or parse the file.
But you can continue to decouple your interface. For example, why must the configuration be stored in a file? Why can't I provide a URL, a row in a database, or a raw string containing the configuration data? Very few methods should take a file path as an argument since it binds them tightly to the local file system and makes them responsible for opening, reading, and closing files in addition to their core logic. Consider accepting an input stream as an alternative. Or if you want to allow for elaborate alternatives -- like data from a database -- consider accepting a ConfigurationReader interface or similar.
Methods should be highly cohesive ... that is single minded. So my opinion would be to separate the responsibilities as you have described. I sometimes feel tempted to say...it is just a short method so it does not matter...then I regret it 1.5 weeks later.
I think this depends on the case: If you could think of a scenario where you would use this method and it returned an empty list, and this would be okay, then I would not put the validation inside the method. But for e.g. a method which inserts data into a database which have to be validated (is the email address correct, has a name been specified, ... ) then it should be ok to put validation code inside the function and throw an exception.
Another alternative, not mentioned above, is to support Dependency Injection and have the method client inject a validator. This would allow the preservation of the "strong" Resource Acquisition Is Initialization principle, that is to say Any Object which Loads Successfully is Ready For Business (Matthieu's mention of Fail Fast is much the same notion).
It also allows a resource implementation class to create its own low-level validators which rely on the structure of the resource without exposing clients to implementation details unnecessarily, which can be useful when dealing with multiple disparate resource providers such as Ryan listed.