Biggest differences of Thrift vs Protocol Buffers? [closed] - serialization

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
What are the biggest pros and cons of Apache Thrift vs Google's Protocol Buffers?

They both offer many of the same features; however, there are some differences:
Thrift supports 'exceptions'
Protocol Buffers have much better documentation/examples
Thrift has a builtin Set type
Protocol Buffers allow "extensions" - you can extend an external proto to add extra fields, while still allowing external code to operate on the values. There is no way to do this in Thrift
I find Protocol Buffers much easier to read
Basically, they are fairly equivalent (with Protocol Buffers slightly more efficient from what I have read).

Another important difference are the languages supported by default.
Protocol Buffers: Java, Android Java, C++, Python, Ruby, C#, Go, Objective-C, Node.js
Thrift: Java, C++, Python, Ruby, C#, Go, Objective-C, JavaScript, Node.js, Erlang, PHP, Perl, Haskell, Smalltalk, OCaml, Delphi, D, Haxe
Both could be extended to other platforms, but these are the languages bindings available out-of-the-box.

RPC is another key difference. Thrift generates code to implement RPC clients and servers wheres Protocol Buffers seems mostly designed as a data-interchange format alone.

Protobuf serialized objects are about 30% smaller than Thrift.
Most actions you may want to do with protobuf objects (create, serialize, deserialize) are much slower than thrift unless you turn on option optimize_for = SPEED.
Thrift has richer data structures (Map, Set)
Protobuf API looks cleaner, though the generated classes are all packed as inner classes which is not so nice.
Thrift enums are not real Java Enums, i.e. they are just ints. Protobuf has real Java enums.
For a closer look at the differences, check out the source code diffs at this open source project.

As I've said as "Thrift vs Protocol buffers" topic :
Referring to Thrift vs Protobuf vs JSON comparison :
Thrift supports out of the box AS3, C++, C#, D, Delphi, Go, Graphviz, Haxe, Haskell, Java, Javascript, Node.js, OCaml, Smalltalk, Typescript, Perl, PHP, Python, Ruby, ...
C++, Python, Java - in-box support in Protobuf
Protobuf support for other languages (including Lua, Matlab, Ruby, Perl, R, Php, OCaml, Mercury, Erlang, Go, D, Lisp) is available as Third Party Addons (btw. Here is SWI-Prolog support).
Protobuf has much better documentation and plenty of examples.
Thrift comes with a good tutorial
Protobuf objects are smaller
Protobuf is faster when using "optimize_for = SPEED" configuration
Thrift has integrated RPC implementation, while for Protobuf RPC solutions are separated, but available (like Zeroc ICE ).
Protobuf is released under BSD-style license
Thrift is released under Apache 2 license
Additionally, there are plenty of interesting additional tools available for those solutions, which might decide. Here are examples for Protobuf: Protobuf-wireshark , protobufeditor.

Protocol Buffers seems to have a more compact representation, but that's only an impression I get from reading the Thrift whitepaper. In their own words:
We decided against some extreme storage optimizations (i.e. packing
small integers into ASCII or using a 7-bit continuation format)
for the sake of simplicity and clarity in the code. These alterations
can easily be made if and when we encounter a performance-critical
use case that demands them.
Also, it may just be my impression, but Protocol Buffers seems to have some thicker abstractions around struct versioning. Thrift does have some versioning support, but it takes a bit of effort to make it happen.

I was able to get better performance with a text based protocol as compared to protobuff on python. However, no type checking or other fancy utf8 conversion, etc... which protobuff offers.
So, if serialization/deserialization is all you need, then you can probably use something else.
http://dhruvbird.blogspot.com/2010/05/protocol-buffers-vs-http.html

One obvious thing not yet mentioned is that can be both a pro or con (and is same for both) is that they are binary protocols. This allows for more compact representation and possibly more performance (pros), but with reduced readability (or rather, debuggability), a con.
Also, both have bit less tool support than standard formats like xml (and maybe even json).
(EDIT) Here's an Interesting comparison that tackles both size & performance differences, and includes numbers for some other formats (xml, json) as well.

I think most of these points have missed the basic fact that Thrift is an RPC framework, which happens to have the ability to serialize data using a variety of methods (binary, XML, etc).
Protocol Buffers are designed purely for serialization, it's not a framework like Thrift.

ProtocolBuffers is FASTER.
There is a nice benchmark here:
https://github.com/eishay/jvm-serializers/wiki (last updated 2016, but there are forks that contain faster serializers as of 2020, e.g. ActiveJ created a fork to demonstrate their speed on the JVM: https://github.com/activej/jvm-serializers).
You might also want to look into Avro, which can be faster. There are two libraries for Avro in .NET:
Apache.Avro
Chr.Avro - written by engineers at C.H. Robinson, a supply chain logistics company
By the way, the fastest I've ever seen is Cap'nProto;
A C# implementation can be found at the Github-repository of Marc Gravell.

And according to the wiki the Thrift runtime doesn't run on Windows.

For one, protobuf isn't a full RPC implementation. It requires something like gRPC to go with it.
gPRC is very slow compared to Thrift:
http://szelei.me/rpc-benchmark-part1/

I think the basic data structure is different
Protocol Buffer use variable-length integee which refers to variable-length digital encoding, turning a fixed-length number into a variable-length number to save space.
Thrift proposed different types of serialization formats (called "protocols").
In fact, Thrift has two different JSON encodings, and no less than three different binary encoding methods.
In conclusion,these two libraries are completely different. Thrift likes a one-stop shop, giving you the entire integrated RPC framework and many options (supporting cross-language), while Protocol Buffers is more inclined to "just do one thing and do it well".

There are some excellent points here and I'm going to add another one in case someones' path crosses here.
Thrift gives you an option to choose between thrift-binary and thrift-compact (de)serializer, thrift-binary will have an excellent performance but bigger packet size, while thrift-compact will give you good compression but needs more processing power. This is handy because you can always switch between these two modes as easily as changing a line of code (heck, even make it configurable). So if you are not sure how much your application should be optimized for packet size or in processing power, thrift can be an interesting choice.
PS: See this excellent benchmark project by thekvs which compares many serializers including thrift-binary, thrift-compact, and protobuf: https://github.com/thekvs/cpp-serializers
PS: There is another serializer named YAS which gives this option too but it is schema-less see the link above.

It's also important to note that not all supported languages compair consistently with thrift or protobuf. At this point it's a matter of the modules implementation in addition to the underlying serialization. Take care to check benchmarks for whatever language you plan to use.

Related

How to describe messagepack data structure is used in internal bin protocol? is ASN.1 or BNF suited for it?

My goal is to write specification of simple client-server application protocol for our project where will be few kinds of client: IOS(swift), Android(java) and Web(http/websocket) probably. Server is the python. Our team decided to use MessagePack as a data structure serializer for different requests/responses.
So now i think how to describe such data structures. I don't wanna write the whole description of specification manually and spent time for thinking over different rules and agreements. I would want to point to a notation system description for my colleagues of client development.
My question is a common.
How do you behave with such task? Do you write pure text in your native speaking language or use some notation system? Is it right to use notation system and existing serializer together? I meant ASN.1. It is seemed clear.

Serialization formats that support tagged unions

This question has been asked back in 2012 , but I'm looking for new updates.
Are there any serialization formats that support tagged unions (aka sum types)? My requirements are that it has Java and .Net client libs, and it should be "reasonable mature". Performance is not a major concern.
Avro provides partial support, but does not allow nested unions (http://avro.apache.org/docs/1.7.6/spec.html#Unions). Cap'n Proto appears to have better support, but I'm not sure if it is production ready yet. Transit doesn't have direct support, but does provide an extension mechanism that might be able to support tagged unions.
Any other suggestions, or comments on the above choices?
Protocol Buffers version 2.6.0 added support for tagged unions in the form of the oneof declaration, but it looks like protobuf-net has not been updated recently so I'm guessing it doesn't support this yet.
Cap'n Proto is used in production in a lot of places (e.g. CloudFlare), but it's true that the C# and Java implementations are relatively new.
(Disclosure: I'm the author of Cap'n Proto, and also of most of Google's open source Protobuf code.)

Compilable IDLs that serialize to JSON

I've used Protobuf before, and I was looking into Thrift, but I was wondering what the options were for IDLs that compile to (at least) C#, JS, Objective C and Java, but also serialize/deserialize JSON in all of those languages. Thrift mostly does that, but doesn't support JSON in OC, and I was concerned (perhaps unwarranted) about the maturity of its JSON interfaces. Are there any IDLs that use JSON as their primary serialization, but also compile to strongly typed bindings in all of the languages listed above?
Thanks!
Regarding Thrift: If there are any serialization protocols could be considered "primary", it would certainly be the binary format. However, we strive to introduce a common minimum set of protocols and transports for each language, one of which is JSON.
Next, please keep in mind that Thrift's JSON format might not be what you expect. The JSON format is especially designed for Thrift, the main goal is a compact representation of the data. The SimpleJSON protocol also available for some languages is more verbatim, but initially designed to be write only (although that viewpoint right now changes slightly).
I was concerned (perhaps unwarranted) about the maturity of its JSON interfaces
There is nothing to be concerned of, honestly. There are a few PHP-related issues with regard to proper string encoding but otherwise it works just fine - when available for the language of choice. If you don't mind, it is not that hard to write a JSON transport and we always welcome quality contributions. If you need help during that process, ask the mailing lists.

High performance object serialization library supporting sum types

I wonder if any of the high performance serialization libraries (like Google protocol buffers) support sum types.
Sum types are tagged unions, basically the ability to say that something is either A, B, or C. Sum types are used in languages such as Haskell and ML which support Algebraic Data Types
If by "like Google protocol buffers" you mean ability to generate code for multiple languages then probably such thing doesn't exist. Emulating sum types in languages which don't support them is awkward at best (try to pattern match on boost:variant for example). So it makes sense to leave them out if main target is mainstream languages.
If you are content with using only haskell/ocaml/whatever there is plenty of choices. For haskell there are cereal, binary, safecopy and probably others. There is piqi project for ocaml.
I'm not aware of any practical systems that support sum types other than Piqi (I'm the author). Piqi is compatible with Protocol Buffers and natively supports OCaml and Erlang. Absence of sum types in Protocol buffers was one of the reasons why I created it.
My plan is to expand Piqi to support other languages such as Haskell, Clojure, etc.
Is there need for a "high-performance" format? Many general-purpose formats should be able to simply use existing constructs -- specifically, Maps/hashtables, to support unions (just include entry with key that indicates type of actual value).
So maybe you could just use simple convention to use, say, JSON, to transfer such content.

Cross platform IPC [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm looking for suggestions on possible IPC mechanisms that are:
Cross platform (Win32 and Linux at least)
Simple to implement in C++ as well as the most common scripting languages (perl, ruby, python, etc).
Finally, simple to use from a programming point of view!
What my options are? I'm programming under Linux, but I'd like what I write to be portable to other OSes in the future. I've thought about using sockets, named pipes, or something like DBus.
In terms of speed, the best cross-platform IPC mechanism will be pipes. That assumes, however, that you want cross-platform IPC on the same machine. If you want to be able to talk to processes on remote machines, you'll want to look at using sockets instead. Luckily, if you're talking about TCP at least, sockets and pipes behave pretty much the same behavior. While the APIs for setting them up and connecting them are different, they both just act like streams of data.
The difficult part, however, is not the communication channel, but the messages you pass over it. You really want to look at something that will perform verification and parsing for you. I recommend looking at Google's Protocol Buffers. You basically create a spec file that describes the object you want to pass between processes, and there is a compiler that generates code in a number of different languages for reading and writing objects that match the spec. It's much easier (and less bug prone) than trying to come up with a messaging protocol and parser yourself.
For C++, check out Boost IPC.
You can probably create or find some bindings for the scripting languages as well.
Otherwise if it's really important to be able to interface with scripting languages your best bet is simply to use files, pipes or sockets or even a higher level abstraction like HTTP.
Why not D-Bus? It's a very simple message passing system that runs on almost all platforms and is designed for robustness. It's supported by pretty much every scripting language at this point.
http://freedesktop.org/wiki/Software/dbus
If you want a portable, easy to use, multi-language and LGPLed solution, I would recommend you ZeroMQ:
Amazingly fast, almost linear scaleable and still simple.
Suitable for simple and complex systems/architectures.
Very powerful communication patterns available: REP-REP, PUSH-PULL, PUB-SUB, PAIR-PAIR.
You can configure the transport protocol to make it more efficient if you are passing messages between threads (inproc://), processes (ipc://) or machines ({tcp|pgm|epgm}://), with a smart option to shave off some part of the protocol overheads in case of connections are running between VMware virtual machines (vmci://).
For serialization I would suggest MessagePack or Protocol Buffers (which other have already mentioned as well), depending on your needs.
You might want to try YAMI , it's very simple yet functional, portable and comes with binding to few languages
I can suggest you to use the plibsys C library. It is very simple, lightweight and cross-platform. Released under the LGPL. It provides:
named system-wide shared memory regions (System V, POSIX and Windows implementations);
named system-wide semaphores for access synchronization (System V, POSIX and Windows implementations);
named system-wide shared buffer implementation based on the shared memory and semaphore;
sockets (TCP, UDP, SCTP) with IPv4 and IPv6 support (UNIX and Windows implementations).
It is easy to use library with quite a good documentation. As it is written in C you can easily make bindings from scripting languages.
If you need to pass large data sets between processes (especially if speed is essential) it is better to use shared memory to pass the data itself and sockets to notify a process that the data is ready. You can make it as following:
a process puts the data into a shared memory segment and sends a notification via a socket to another process; as a notification usually is very small the time overhead is minimal;
another process receives the notification and reads the data from the shared memory segment; after that it sends a notification that the data was read back to the first process so it can feed more data.
This approach can be implemented in a cross-platform fashion.
How about Facebook's Thrift?
Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.
I think you'll want something based on sockets.
If you want RPC rather than just IPC I would suggest something like XML-RPC/SOAP which runs over HTTP, and can be used from any language.
YAMI - Yet Another Messaging Infrastructure is a lightweight messaging and networking framework.
If you're willing to try something a little different, there's the ICE platform from ZeroC. It's open source, and is supported on pretty much every OS you can think of, as well as having language support for C++, C#, Java, Ruby, Python and PHP. Finally, it's very easy to drive (the language mappings are tailored to fit naturally into each language). It's also fast and efficient. There's even a cut-down version for devices.
Distributed computing is usually complex and you are well advised to use existing libraries or frameworks instead of reinventing the wheel. Previous poster have already enumerated a couple of these libraries and frameworks. Depending on your needs you can pick either a very low level (like sockets) or high level framework (like CORBA). There can not be a generic "use this" answer. You need to educate yourself about distributed programming and then will find it much easier to pick the right library or framework for the job.
There exists a wildly used C++ framework for distributed computing called ACE and the CORBA ORB TAO (which is buildt upon ACE). There exist very good books about ACE http://www.cs.wustl.edu/~schmidt/ACE/ so you might take a look. Take care!
TCP sockets to localhost FTW.
It doesn't get more simple than using pipes, which are supported on every OS I know of, and can be accessed in pretty much every language.
Check out this tutorial.
Python has a pretty good IPC library: see https://docs.python.org/2/library/ipc.html
Xojo has built-in cross-platform IPC support with its IPCSocket class. Although you obviously couldn't "implement" it in other languages, you could use it in a Xojo console app and call it from other languages making this option perhaps very simple for you.
In the current ages there is available a very easy, C++1x compliant, well documented, Linux and Windows compatible, open-source "CommonAPI" library: CommonAPI C++.
The underlying IPC system is D-Bus (libdbus) or SomeIP if one wish. Application interfaces are specified using a simple and tailored for that Franca IDL language.