I've used Protobuf before, and I was looking into Thrift, but I was wondering what the options were for IDLs that compile to (at least) C#, JS, Objective C and Java, but also serialize/deserialize JSON in all of those languages. Thrift mostly does that, but doesn't support JSON in OC, and I was concerned (perhaps unwarranted) about the maturity of its JSON interfaces. Are there any IDLs that use JSON as their primary serialization, but also compile to strongly typed bindings in all of the languages listed above?
Thanks!
Regarding Thrift: If there are any serialization protocols could be considered "primary", it would certainly be the binary format. However, we strive to introduce a common minimum set of protocols and transports for each language, one of which is JSON.
Next, please keep in mind that Thrift's JSON format might not be what you expect. The JSON format is especially designed for Thrift, the main goal is a compact representation of the data. The SimpleJSON protocol also available for some languages is more verbatim, but initially designed to be write only (although that viewpoint right now changes slightly).
I was concerned (perhaps unwarranted) about the maturity of its JSON interfaces
There is nothing to be concerned of, honestly. There are a few PHP-related issues with regard to proper string encoding but otherwise it works just fine - when available for the language of choice. If you don't mind, it is not that hard to write a JSON transport and we always welcome quality contributions. If you need help during that process, ask the mailing lists.
I am wondering if there are any performance overhead issues to consider when using WCF vs. Binary Serialization done manually. I am building an n-tier site and wish to implement asynchronous behavior across tiers. I plan on passing data in binary form to lessen bandwidth. WCF seems to be a good shortcut to building your own tools, but I am wondering if there are any points to be aware of when making the choice between use of the WCF vs. System.IO Namespace and building your own light weight library.
There is a binary formatter for WCF, though its not entirely binary; it produces SOAP messages whose content is formatted using the .NET Binary Format for XML, which is a highly compacted form of XML. (Examples of what this looks like are found on this samples page.)
Alternatively, you can implement your own custom message formatter, as long as the formatter was available on both client and server side, to format however you want. (I think you'll still have some overhead from WCF but not much.)
My personal opinion, no amount of overhead savings you might get from defining a custom binary format, and writing all of the serialization/deserialization code to implement it manually, will ever compensate the time and effort you will spend trying to implement and debug such a mechanism.
I come from a web development background and haven't done anything significant in Java in quite some time.
I'm doing a small project, most of which involves some models with relationships and straightforward CRUD operations with those objects.
JPA/EclipseLink seems to suit the problem, but this is the kind of app that has File->Open and File->Save features, i.e. the data will be stored in files by the user, rather than persisting in the database between sessions.
The last time I worked on a project like this, I stored the objects in ArrayList objects, but having worked with MVC frameworks since, that seems a bit primitive. On the other hand, using JPA, opening a file would require loading a whole bunch of objects in the database, just for the convenience of not having to write code to manage the objects.
What's the typical approach for managing model data with Java SE desktop applications?
JPA was specifically build with databases in mind. This means that typically it operates on a big datastore with objects belonging to many different users.
In a file based scenario, quite often files are not that big and all objects in the file belong to the same user and same document. In that case I'd say for a binary format the old Java serialization still works for temporary files.
For longer term or interchangeable formats XML is better suited. Using JAXB (included in the standard Java library) you can marshal and demarshal Java objects to XML using an annotation based approach that on the surface resembles JPA. In fact, I've worked with model objects that have both JPA and JAXB annotations so they can be stored in a Database as well as in an XML file.
If your desktop app however uses files that represents potentially huge datasets for which you need paging and querying, then using JPA might still be the better option. There are various small embedded DBs available for Java, although I don't know how simple it is to let a data source point to a user selected file. Normally a persistence unit in Java is mapped to a fixed data source and you can't yet create persistence units on the fly.
Yet another option would be to use JDO, which is a mapping technology like JPA, but not an ORM. It's much more independent of the backend persistence technology that's being used and indeed maps to files as well.
Sorry that this is not a real answer, but more like some things to take into account, but hope it's helpful in some way.
We currently use XStream for encoding our web service inputs/outputs in XML. However we are considering switching to a binary format with code generator for multiple languages (protobuf, Thrift, Hessian, etc) to make supporting new clients easier and less reliant on hand-coding (also to better support our message formats which include binary data).
However most of our objects on the server are POJOs with XStream handling the serialization via reflection and annotations, and most of these libraries assume they will be generating the POJOs themselves. I can think of a few ways to interface an alternative library:
Write an XStream marshaler for the target format.
Write custom code to marshal the POJOs to/from the classes generated by the alternative library.
Subclass the generated classes to implement the POJO logic. May require some rewriting. (Also did I mention we want to use Terracotta?)
Use another library that supports both reflection (like XStream) and code generation.
However I'm not sure which serialization library would be best suited to the above techniques.
(1) might not be that much work since many serialization libraries include a helper API that knows how to read/write primitive values and delimiters.
(2) probably gives you the widest choice of tools: https://github.com/eishay/jvm-serializers/wiki/ToolBehavior (some are language-neutral). Flawed but hopefully not totally useless benchmarks: https://github.com/eishay/jvm-serializers/wiki
Many of these tools generate classes, which would require writing code to convert to/from your POJOs. Tools that work with POJOs directly typically aren't language-neutral.
(3) seems like a bad idea (not knowing anything about your specific project). I normally keep my message classes free of any other logic.
(4) The Protostuff library (which supports the Protocol Buffer format) lets you write a "schema" to describe how you want your POJOs serialized. But writing this schema might end up being more work and more error-prone than just writing code to convert between your POJOs and some tool's generated classes.
Protostuff can also automatically generate a schema via reflection, but this might yield a message format that feels a bit Java-centric.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
What are the biggest pros and cons of Apache Thrift vs Google's Protocol Buffers?
They both offer many of the same features; however, there are some differences:
Thrift supports 'exceptions'
Protocol Buffers have much better documentation/examples
Thrift has a builtin Set type
Protocol Buffers allow "extensions" - you can extend an external proto to add extra fields, while still allowing external code to operate on the values. There is no way to do this in Thrift
I find Protocol Buffers much easier to read
Basically, they are fairly equivalent (with Protocol Buffers slightly more efficient from what I have read).
Another important difference are the languages supported by default.
Protocol Buffers: Java, Android Java, C++, Python, Ruby, C#, Go, Objective-C, Node.js
Thrift: Java, C++, Python, Ruby, C#, Go, Objective-C, JavaScript, Node.js, Erlang, PHP, Perl, Haskell, Smalltalk, OCaml, Delphi, D, Haxe
Both could be extended to other platforms, but these are the languages bindings available out-of-the-box.
RPC is another key difference. Thrift generates code to implement RPC clients and servers wheres Protocol Buffers seems mostly designed as a data-interchange format alone.
Protobuf serialized objects are about 30% smaller than Thrift.
Most actions you may want to do with protobuf objects (create, serialize, deserialize) are much slower than thrift unless you turn on option optimize_for = SPEED.
Thrift has richer data structures (Map, Set)
Protobuf API looks cleaner, though the generated classes are all packed as inner classes which is not so nice.
Thrift enums are not real Java Enums, i.e. they are just ints. Protobuf has real Java enums.
For a closer look at the differences, check out the source code diffs at this open source project.
As I've said as "Thrift vs Protocol buffers" topic :
Referring to Thrift vs Protobuf vs JSON comparison :
Thrift supports out of the box AS3, C++, C#, D, Delphi, Go, Graphviz, Haxe, Haskell, Java, Javascript, Node.js, OCaml, Smalltalk, Typescript, Perl, PHP, Python, Ruby, ...
C++, Python, Java - in-box support in Protobuf
Protobuf support for other languages (including Lua, Matlab, Ruby, Perl, R, Php, OCaml, Mercury, Erlang, Go, D, Lisp) is available as Third Party Addons (btw. Here is SWI-Prolog support).
Protobuf has much better documentation and plenty of examples.
Thrift comes with a good tutorial
Protobuf objects are smaller
Protobuf is faster when using "optimize_for = SPEED" configuration
Thrift has integrated RPC implementation, while for Protobuf RPC solutions are separated, but available (like Zeroc ICE ).
Protobuf is released under BSD-style license
Thrift is released under Apache 2 license
Additionally, there are plenty of interesting additional tools available for those solutions, which might decide. Here are examples for Protobuf: Protobuf-wireshark , protobufeditor.
Protocol Buffers seems to have a more compact representation, but that's only an impression I get from reading the Thrift whitepaper. In their own words:
We decided against some extreme storage optimizations (i.e. packing
small integers into ASCII or using a 7-bit continuation format)
for the sake of simplicity and clarity in the code. These alterations
can easily be made if and when we encounter a performance-critical
use case that demands them.
Also, it may just be my impression, but Protocol Buffers seems to have some thicker abstractions around struct versioning. Thrift does have some versioning support, but it takes a bit of effort to make it happen.
I was able to get better performance with a text based protocol as compared to protobuff on python. However, no type checking or other fancy utf8 conversion, etc... which protobuff offers.
So, if serialization/deserialization is all you need, then you can probably use something else.
http://dhruvbird.blogspot.com/2010/05/protocol-buffers-vs-http.html
One obvious thing not yet mentioned is that can be both a pro or con (and is same for both) is that they are binary protocols. This allows for more compact representation and possibly more performance (pros), but with reduced readability (or rather, debuggability), a con.
Also, both have bit less tool support than standard formats like xml (and maybe even json).
(EDIT) Here's an Interesting comparison that tackles both size & performance differences, and includes numbers for some other formats (xml, json) as well.
I think most of these points have missed the basic fact that Thrift is an RPC framework, which happens to have the ability to serialize data using a variety of methods (binary, XML, etc).
Protocol Buffers are designed purely for serialization, it's not a framework like Thrift.
ProtocolBuffers is FASTER.
There is a nice benchmark here:
https://github.com/eishay/jvm-serializers/wiki (last updated 2016, but there are forks that contain faster serializers as of 2020, e.g. ActiveJ created a fork to demonstrate their speed on the JVM: https://github.com/activej/jvm-serializers).
You might also want to look into Avro, which can be faster. There are two libraries for Avro in .NET:
Apache.Avro
Chr.Avro - written by engineers at C.H. Robinson, a supply chain logistics company
By the way, the fastest I've ever seen is Cap'nProto;
A C# implementation can be found at the Github-repository of Marc Gravell.
And according to the wiki the Thrift runtime doesn't run on Windows.
For one, protobuf isn't a full RPC implementation. It requires something like gRPC to go with it.
gPRC is very slow compared to Thrift:
http://szelei.me/rpc-benchmark-part1/
I think the basic data structure is different
Protocol Buffer use variable-length integee which refers to variable-length digital encoding, turning a fixed-length number into a variable-length number to save space.
Thrift proposed different types of serialization formats (called "protocols").
In fact, Thrift has two different JSON encodings, and no less than three different binary encoding methods.
In conclusion,these two libraries are completely different. Thrift likes a one-stop shop, giving you the entire integrated RPC framework and many options (supporting cross-language), while Protocol Buffers is more inclined to "just do one thing and do it well".
There are some excellent points here and I'm going to add another one in case someones' path crosses here.
Thrift gives you an option to choose between thrift-binary and thrift-compact (de)serializer, thrift-binary will have an excellent performance but bigger packet size, while thrift-compact will give you good compression but needs more processing power. This is handy because you can always switch between these two modes as easily as changing a line of code (heck, even make it configurable). So if you are not sure how much your application should be optimized for packet size or in processing power, thrift can be an interesting choice.
PS: See this excellent benchmark project by thekvs which compares many serializers including thrift-binary, thrift-compact, and protobuf: https://github.com/thekvs/cpp-serializers
PS: There is another serializer named YAS which gives this option too but it is schema-less see the link above.
It's also important to note that not all supported languages compair consistently with thrift or protobuf. At this point it's a matter of the modules implementation in addition to the underlying serialization. Take care to check benchmarks for whatever language you plan to use.