Should I move from REST-HTTP to Rabbitmq-RPC for synchronous call? - rabbitmq

I read lots, many people suggested does not use AQMP-RPC for synchronous call. My response data size is 4MB, so, REST-HTTP taking too much time to send data from server to client. So, we decided to move RPC.
Can someone please suggest, should I move from REST-HTTP to AQMP-RPC or any other RPC methods like Apache Avro, Thrift or Google Protocol Buffer for sending bigger data.

You could do worse than take a look at Cap'n Proto. It's an interesting take on serialisastion, in that it endeavours to remove the need for it at all whilst still making things sane in application code. It's written by one of the guys who did Google Protocol Buffers v2. They're doing a sneaky thing with RPC too, allowing some time saving if the result of one RPC call is merely the input to a subsequent RPC call.
GPB aren't too bad either, ASN.1, etc. Anything (apart from Cap'n Proto) that has a binary wire format is probably going to be about the same - they have to marshal bits and bytes to and from a local representations. Avro of course includes its own schema with messages - pity if that's bigger than the message that's being sent.
Anything binary is probably way better than anything text (JSON, XML, etc).

Related

What is a protobuf message?

I'm learning how to use tf.records and in the official tutorial they mention you can print a tf.train.Example message (which is a primitive of the protobuf protocol if I get it right).
I understand that tf.records are used to serialize the data, and that they use the protobuf protocol in this case. I also understand that using tf.train.Feature, tf.train.Features and tf.train.Example one can convert the data into the right format.
My question is what does it mean to print a messege in this context? (the tutorial shows how to print an tf.train.Example message)
A message is classically thought of as a collection of bytes that are conveyed from one process/thread to another process/thread. Typically (but not necessarily), the collection of bytes means something to the sender and receiver, e.g. it's an object that has been serialised somehow (perhaps using Google Protocol Buffers). So, an object can become a message by serialising it and placing the bytes into an array that one might term a "message".
It's not necessarily the case the processes handling the collection of bytes will deserialise them. For example, a process that is simply going to pass them onwards down another connection need not actually deserialise them, if it already knows where the bytes are supposed to be sent.
The means by which a message is conveyed is typically some sort of queue / pipe / socket / stream / etc. Where it gets interesting is that most data transports of this sort are stream connections; whatever bytes you push in one end comes out the other. So, then, how to use those for sending messages?
The answer is that there has to be some way of demarcating between messages. There's lots of ways of doing that, but these days it makes far more sense to use something like ZeroMQ, which takes care of all that for you (and more besides). ZeroMQ is a library / protocol that allows a program to transfer a collection of bytes from one process/thread to another via stream connections, and ensure that the receiving program gets the collection in one nice and complete buffer. Those bytes could be objects serialised by Google Protocol Buffer, or serialised in some other way (there's lots). HTTP is also used as a way of moving objects around, e.g. a page of HTML.
So the pattern is object -> serialisation -> message buffer -> some sort of byte transport that demarcates one message from another -> message buffer -> deserialisation -> object.
An advantage of serialisations like Protocol Buffers is that the sender and receiver need not be written in the same language, or share anything at all except for the .proto file. Other approaches to serialisation often involves marking up class definitions in the program source code, which then makes it difficult to deserialise data in another language.
Also in languages like C/C++ one might get away with simply copying the bytes at the object's address from one place to another. This can be a total disaster if the destination is a different machine; endianness etc. can matter a lot. There are serialisation standards that get close to this, specifically Cap'n Proto (see this).
There are variations. Within a process, "passing a message" can simply mean passing ownership of an object around. Ownership can be by convention, i.e. if I've just written the object pointer to a message queue, I won't mutate the object anymore. I think in Rust it's even expressed by the language syntax, in that once object ownership has been given up the language won't let you mutate the object (worked out at compile time, part of what makes Rust so good). The net result looks like message transfer, but in fact all that's happened is a pointer (typically, 64bits) has been copied from A to B, not the entire data in the object. This is a lot faster.
EDIT
So, How Does a Message Transport Protocol Work?
It's worth digging into how something like ZeroMQ works. For it to be able to pass whole application messages across a stream connection, it needs operate some sort of protocol. That protocol is itself going to involve objects (Protocol Data Units) being "serialised" (well, converted to an agreed wire format), pushed through the stream connection, deserialised, and understood by the ZeroMQ library that's on the receiving end. And, when gets on down to it, ZeroMQ is using TCP (over a network), and that too is a protocol built on IP. And that goes on down to Ethernet frames.
So, there's protocols running atop protocols, running atop other protocols (in fact, this is the Layer Model of how computer interconnectedness works).
Why That Matters, and What Can Go Wrong
It's useful to bearing this protocol layering in mind. Sometimes, one might have a requirement to (for example), take very strong measures against buffer overflows, perhaps to prevent remote exploitation. That might be a reason to pick a serialisation technology that helps guard against such things - e.g. Protocol Buffers. However, when picking such a technology, one has to realise that the requirement is met provided that all of the protocol layerings are equally robust. There's no point using, say, Protocol Buffers and declaring oneself safe against buffer overflows, if the OS's IP stack is broken and exploitable.
This is well illustrated by the Heartbleed bug in OpenSSL (see here). This was caused effectively by a weakly specified protocol (see RFC6520); it's defined in English language, and requires the programmer to read this, code up the protocol by hand, and pay attention to all the strictures written in the document. The associated RFC5426 even says:
This document deals with the formatting of data in an external
representation. The following very basic and somewhat casually
defined presentation syntax will be used. The syntax draws from
several sources in its structure. Although it resembles the
programming language "C" in its syntax and XDR [XDR] in both its
syntax and intent, it would be risky to draw too many parallels. The
purpose of this presentation language is to document TLS only; it has
no general application beyond that particular goal.
The Heartbleed bug in OpenSSL was a result of the coding up of the English language spec being done wrong, and given that highlighted statement perhaps it's no great surprise. Applications that were using OpenSSL were wide, wide open to exploitation, even thought the applications themselves (e.g. Web servers) were very well written implementations of, say, HTTPS.
Now, had the designers of TLS chosen to use a decent and strict serialisation technology - perhaps even Google Protocol Buffers (plus some message demarcation) - to define the PDUs in TLS, it would have been far more likely that Heartbleed wouldn't have happened. Specifically, the payload_length field in a request / response would have been taken care of inside Google Protocol Buffers, thereby removing responsibility for handling the length of the payload from the developer.
What's interesting is to compare protocol specifications as written in RFCs with those that tend to be found in the world of telephony (regulated by the International Telephony Union). The ITU's specifications and tools are very "comprehensive" (that ought to be an acceptably neutral way of describing them). A lot of telephony uses ASN.1, which is not disimilar to (and substantially pre-dates) Google Protocol Buffers, but allows for very strict definitions of messages, requires pretty comprehensive tools to do it right, but is bang up to date (it even has JSON as a wire format these days).
"But", one points out, "what if the ASN.1 tools (or Google Protocol Buffers) has a bug?". Well indeed that is a problem, and that has indeed happened to ASN.1 (from one of the commercial ASN.1 tools vendors, can't rememeber which). But the point is that if there's one library that is widely used for defining lots of interfaces, then there's a greater chance of bugs being identified (I myself have found and reported bugs in commercial ASN.1 tools). Whereas if a messaging protocol is defined using, say, English language, there's only ever going to be a very few set of eyes on how well the developer has coded up the meaning of that English language.
Not Everyone Has Got the Message
What I find disappointing is that, across a large portion of the software world, there's still resistance to using tools like Google Protocol Buffers, ASN.1. There's also projects that, having identified the need for such things, go and invent their own.
One such example is dBus - which to be fair is pretty good. However they did go an invent their own serialisation technology for specifying dBus messages; I'm not sure what they gained over using something mature and off-the-shelf.
Google themselves, when they first announced Google Protocol Buffers to the world, were asked "Why didn't you use ASN.1?", and the Googler on the stage had to admit to never having heard of it. So, Googlers in Google hadn't used Google to Google for "binary serialisation technologies"; they'd just gone ahead and wrote their own, and GPB is missing a ton of useful features. Oh, the irony. They'd not even have had to write a toolset from scratch; they could have simply adopted and improved on one of the open source ASN.1 implementations.
Transliteration Problem
This fragmentation and proliferation causes problems. Say, for example, in your project you want to be able to transfer some of your messages into a dBus service on Linux. To do that, you've got a .proto defining your messages, which is great for communicating in/out of Tensor Flow, but fundamentally useless for dBus, which speaks its own format. You'd end up having something like
MyProtoMsg ipMsg;
MyEquivalentDBusMsg opMsg;
opMsg.field1 = ipMsg.field1;
opMsg.field2 = ipMsg.field2;
opMsg.field3 = ipMsg.field3;
and so on. Very laborious, very unmaintainable, and needlessly consumes resources. The other option would be simply to wrap up your GPB encoded messages in a byte array in a dBus message, but one feels that's missing the point (it bypasses any opportunity for dBus to assert that messages it's passing are correctly formed and within specifications).
If the world agreed on the One True Serialisation technology then the flexibility in object / message exchange would be fantastic.

Is it common to have RESTful endpoint returning Protobuf strings?

Instead of having a gRPC server (say, due to platform restrictions), you have a REST endpoint that returns data.SerializeToString() as the payload. Of course, any clients of this endpoint would have the appropriate proto files for each response, so they can ParseFromString(data) and be on their way. Reasons for doing this includes the benefits of Protobufs.
Improved understanding of the question: is it common to use PBs for other purposes than gRPC transport?
Yes it is totally common and reasonable. PBs are really nothing more than a data serialization format. gRPC just uses it as message interchange format (natural choice as both are Google creations). Let the answer be the description from Google itself:
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data.
Google's basic tutorial is saving it to disk. Do anything you would do with any other binary blob (jpeg, mp3, ...)
BUT! if serialization speed is really critical for you, don't assume anything. Today's JSON libs may be unexpectedly well performing - depends on your specific platforms and dominant message characteristics. Do your own performance tests. If JSON inferiority is confirmed, then there are again libs with faster serialization than PB. To name a couple: Google's less popular PB sibling FlatBuffers and something called Simple Binary Encoding, which was developed for High Frequency Trading... speaks for itself.

Server for online game

I want to make a multi-player online game which will require some fast data exchange between users (that's why i need UDP).
So, I probably need to have an UDP Socket Server that will receive data from a player in a game room and send this data to other players in that room. Am I right?
What should I use for this server? Probably I must put there some script that will run all the time and serve the clients. Should this script be written in Java, Perl, Python, ...?
I just don't want to waste my time and choose completely wrong direction, so I need some advice.
Thanks :)
If you want something fast to relay UDP packets between clients, I'd actually go with a C implementation. No memory management overhead in the runtime, and less overhead in parsing the data.
If the data is primarily fixed length, it makes parsing very easy. Rather than going through a heavyweight serialization / deserialization process, all you would need to do is define a struct that represents you data and have a pointer to that struct point to the beginning of your data buffer. And boom, you're done. Just make sure to use ntohs and ntohl to properly read integer fields.

Transmitting SOAP messages over Named Pipes in WCF

I fear I may be displaying my ignorance with this question, but here goes...
I would like to use WCF to implement interprocess communication between a .NET app and a third-party app written in Qt. The Qt app has a plugin architecture that, if I choose to, can be used to bootstrap some .NET classes to handle WCF cleanly at both ends, but I'd rather keep the codebase native and therefore I'm thinking of ways to make sure that whatever I send down the wire with WCF, I can reassemble at the other end using classes available in Qt.
Qt has a SOAP message class, so I figured the preferable solution - and the one that's closest to the one we've hacked together already - is to send SOAP messages and pick them up off a QLocalSocket. Question is, is it possible to force WCF to encode messages as SOAP over a NetNamedPipeBinding, and if so, is it wise to do so?
I'm feeling rather wary at this point that my question might not make complete sense due to my shaky understanding of the technology involved. If this is the case, please take the time to explain why instead of just saying 'no'.
edit #1: I figure an update is warranted, as I've investigated some and should report my findings.
Firstly, I have found that Qt is a pig. The QtSoapMessage class I mentioned, it turns out, doesn't exist in the current version, and is available only as an after-market source package that you have to compile yourself. It took me many hours of googling to find out why this wasn't working. The Qt documentation is utterly dreadful, Qt Creator is counterintuitive in the extreme, and I've all but lost patience with it so haven't pursued this idea any further as yet. Furthermore, it isn't obvious how exactly I am to pass the socket data into the soap message constructor, which takes a QDomDocument, whereas the API for reading XML from a socket uses a QXmlStreamReader or somesuch. There doesn't seem to be any conversion between them.
You actually have a different problem to the one you think you have.
WCF will by default exchange SOAP messages over the NetNamedPipeBinding.
However, the message exchange is layered over some Microsoft proprietary protocols for transaction flow, message framing and encoding, which means that if on the Qt side you pick up a byte stream directly from a QLocalSocket you will have a lot of work to do to implement these underlying protocols before you will be able to get at the SOAP infoset itself.
It is possible to configure the NetNamedPipe binding to remove some of these protocol layers, but not all of them - the framing protocol will always be there, for example.
You might like to read my blog for a lot more detail on this.

A smart UDP protocol analyzer?

Is there a "smart" UDP protocol analyzer that can help me reverse engineer a message based protocol?
I'm using Wireshark to do the sniffing, but if there's a tool that can detect regularities in the protocol (repeated strings, bits of the protocol that are CRC/Checksum or length, ...) and aid the process that would help.
You are asking for a universal inference engine. The best way to try to recover the protocol (assuming you are in a jurisdiction that permits this) is to understand the underlying message transfer from the beginning of a session, and then trying to manually simulate the behaviour of each party through a sequence of ping-pong message trials. This way you develop an understanding of the message structures and their functioning.
Using the UDP frame boundaries is a good place to start looking for structure.
If you have no documentation, you will find that even if you gain a good understanding of the protocol, expect to be surprised many times during the project.
If you can, have your existing systems carry out exactly the scenario you need to use, and then simply replicate the same sequence with payload (and any checksum) changes only. This way you can possibly achieve the requirement without a comprehensive understanding of the protocol.
For an example of the effort in doing this you could look at a historical review of the Samba project at A bit of history and a bit of fun.