pros and cons of Avro - JSON / binary serialization - serialization

I've recently started working with Apache-Avro and would like to serialize my Kafka Topics. Now I've read that Avro offers both Json and binary serialization. Are there any pros and cons? Is not the binary solution better?
Kind regards,
nika

It depends on what you are looking for:
Binary - smaller data and faster serialization.
JSON - good for debugging and web-based applications.
Pay attention that a JSON schema is transferred in any case.

Related

Is it common to have RESTful endpoint returning Protobuf strings?

Instead of having a gRPC server (say, due to platform restrictions), you have a REST endpoint that returns data.SerializeToString() as the payload. Of course, any clients of this endpoint would have the appropriate proto files for each response, so they can ParseFromString(data) and be on their way. Reasons for doing this includes the benefits of Protobufs.
Improved understanding of the question: is it common to use PBs for other purposes than gRPC transport?
Yes it is totally common and reasonable. PBs are really nothing more than a data serialization format. gRPC just uses it as message interchange format (natural choice as both are Google creations). Let the answer be the description from Google itself:
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data.
Google's basic tutorial is saving it to disk. Do anything you would do with any other binary blob (jpeg, mp3, ...)
BUT! if serialization speed is really critical for you, don't assume anything. Today's JSON libs may be unexpectedly well performing - depends on your specific platforms and dominant message characteristics. Do your own performance tests. If JSON inferiority is confirmed, then there are again libs with faster serialization than PB. To name a couple: Google's less popular PB sibling FlatBuffers and something called Simple Binary Encoding, which was developed for High Frequency Trading... speaks for itself.

Compilable IDLs that serialize to JSON

I've used Protobuf before, and I was looking into Thrift, but I was wondering what the options were for IDLs that compile to (at least) C#, JS, Objective C and Java, but also serialize/deserialize JSON in all of those languages. Thrift mostly does that, but doesn't support JSON in OC, and I was concerned (perhaps unwarranted) about the maturity of its JSON interfaces. Are there any IDLs that use JSON as their primary serialization, but also compile to strongly typed bindings in all of the languages listed above?
Thanks!
Regarding Thrift: If there are any serialization protocols could be considered "primary", it would certainly be the binary format. However, we strive to introduce a common minimum set of protocols and transports for each language, one of which is JSON.
Next, please keep in mind that Thrift's JSON format might not be what you expect. The JSON format is especially designed for Thrift, the main goal is a compact representation of the data. The SimpleJSON protocol also available for some languages is more verbatim, but initially designed to be write only (although that viewpoint right now changes slightly).
I was concerned (perhaps unwarranted) about the maturity of its JSON interfaces
There is nothing to be concerned of, honestly. There are a few PHP-related issues with regard to proper string encoding but otherwise it works just fine - when available for the language of choice. If you don't mind, it is not that hard to write a JSON transport and we always welcome quality contributions. If you need help during that process, ask the mailing lists.

Protobuf vs JSON - Objective-C/iOS

In Objective-C, making iOS apps, what is the best way to go regarding serialization?
Protobuf or JSON?
Protobuf is more time- and space-efficient, JSON is probably more nerve-efficient. As long as there is no reason for the former two (e.g. because the amount of data to be serialized is small and serialization is not time-critical), I would stick to JSON.
This also makes debugging more fun :-)
I don't know Protobuf, but JSONKit is a very good choice on iOS. See JSON vs. PLIST, the Ultimate Showdown for a performance comparison. JSONKit is widely used and actively developed which makes it a solid choice.

Data serialization

I've got an Objective-C/cocoa based application that I'm working on. This app is client<->server. Currently, the communcation protocol is based upon some fairly simple XML. While XML works for this task, it is not ideal in any aspect. It's a pain to serialize data to XML, it's not particularly light-weight, and difficult to incorporate non-data information (such as: 'do this next') in.
I'm looking for suggestions to an alternative.
I've considered some of the ones listed here, but haven't decided on any. Suggestions?
If you are talking to a Objective-C server you can look into encoding and decoding with the preferred serialization methods available in Objective-C.
NSKeyedArchiver and NSKeyedUnarchiver
Basically you will get an NSData from the NSKeyedArchiver that you will send (bytes/length) to the other part and there place it back into an NSData and use NSKeyedUnarchiver to unpack it into objects again.
I'm using JSON for an iphone application - I typically would prefer XML, but we needed it very lightweight, so we decided on JSON.
If your working with XML, you should take a look at XPath if you've not already - it will give you tremendous power for extracting values from a XML data structure.
What kind of server do you have? If the server is java based I'd recommend looking at HessianKit by Fredrik Olsson. Encode/Decode to ordinary Objective-C types and put in NSArrays and NSDictionaries will make the experience smoother.
What's wrong with (Portable) Distributed Objects?

Biggest differences of Thrift vs Protocol Buffers? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
What are the biggest pros and cons of Apache Thrift vs Google's Protocol Buffers?
They both offer many of the same features; however, there are some differences:
Thrift supports 'exceptions'
Protocol Buffers have much better documentation/examples
Thrift has a builtin Set type
Protocol Buffers allow "extensions" - you can extend an external proto to add extra fields, while still allowing external code to operate on the values. There is no way to do this in Thrift
I find Protocol Buffers much easier to read
Basically, they are fairly equivalent (with Protocol Buffers slightly more efficient from what I have read).
Another important difference are the languages supported by default.
Protocol Buffers: Java, Android Java, C++, Python, Ruby, C#, Go, Objective-C, Node.js
Thrift: Java, C++, Python, Ruby, C#, Go, Objective-C, JavaScript, Node.js, Erlang, PHP, Perl, Haskell, Smalltalk, OCaml, Delphi, D, Haxe
Both could be extended to other platforms, but these are the languages bindings available out-of-the-box.
RPC is another key difference. Thrift generates code to implement RPC clients and servers wheres Protocol Buffers seems mostly designed as a data-interchange format alone.
Protobuf serialized objects are about 30% smaller than Thrift.
Most actions you may want to do with protobuf objects (create, serialize, deserialize) are much slower than thrift unless you turn on option optimize_for = SPEED.
Thrift has richer data structures (Map, Set)
Protobuf API looks cleaner, though the generated classes are all packed as inner classes which is not so nice.
Thrift enums are not real Java Enums, i.e. they are just ints. Protobuf has real Java enums.
For a closer look at the differences, check out the source code diffs at this open source project.
As I've said as "Thrift vs Protocol buffers" topic :
Referring to Thrift vs Protobuf vs JSON comparison :
Thrift supports out of the box AS3, C++, C#, D, Delphi, Go, Graphviz, Haxe, Haskell, Java, Javascript, Node.js, OCaml, Smalltalk, Typescript, Perl, PHP, Python, Ruby, ...
C++, Python, Java - in-box support in Protobuf
Protobuf support for other languages (including Lua, Matlab, Ruby, Perl, R, Php, OCaml, Mercury, Erlang, Go, D, Lisp) is available as Third Party Addons (btw. Here is SWI-Prolog support).
Protobuf has much better documentation and plenty of examples.
Thrift comes with a good tutorial
Protobuf objects are smaller
Protobuf is faster when using "optimize_for = SPEED" configuration
Thrift has integrated RPC implementation, while for Protobuf RPC solutions are separated, but available (like Zeroc ICE ).
Protobuf is released under BSD-style license
Thrift is released under Apache 2 license
Additionally, there are plenty of interesting additional tools available for those solutions, which might decide. Here are examples for Protobuf: Protobuf-wireshark , protobufeditor.
Protocol Buffers seems to have a more compact representation, but that's only an impression I get from reading the Thrift whitepaper. In their own words:
We decided against some extreme storage optimizations (i.e. packing
small integers into ASCII or using a 7-bit continuation format)
for the sake of simplicity and clarity in the code. These alterations
can easily be made if and when we encounter a performance-critical
use case that demands them.
Also, it may just be my impression, but Protocol Buffers seems to have some thicker abstractions around struct versioning. Thrift does have some versioning support, but it takes a bit of effort to make it happen.
I was able to get better performance with a text based protocol as compared to protobuff on python. However, no type checking or other fancy utf8 conversion, etc... which protobuff offers.
So, if serialization/deserialization is all you need, then you can probably use something else.
http://dhruvbird.blogspot.com/2010/05/protocol-buffers-vs-http.html
One obvious thing not yet mentioned is that can be both a pro or con (and is same for both) is that they are binary protocols. This allows for more compact representation and possibly more performance (pros), but with reduced readability (or rather, debuggability), a con.
Also, both have bit less tool support than standard formats like xml (and maybe even json).
(EDIT) Here's an Interesting comparison that tackles both size & performance differences, and includes numbers for some other formats (xml, json) as well.
I think most of these points have missed the basic fact that Thrift is an RPC framework, which happens to have the ability to serialize data using a variety of methods (binary, XML, etc).
Protocol Buffers are designed purely for serialization, it's not a framework like Thrift.
ProtocolBuffers is FASTER.
There is a nice benchmark here:
https://github.com/eishay/jvm-serializers/wiki (last updated 2016, but there are forks that contain faster serializers as of 2020, e.g. ActiveJ created a fork to demonstrate their speed on the JVM: https://github.com/activej/jvm-serializers).
You might also want to look into Avro, which can be faster. There are two libraries for Avro in .NET:
Apache.Avro
Chr.Avro - written by engineers at C.H. Robinson, a supply chain logistics company
By the way, the fastest I've ever seen is Cap'nProto;
A C# implementation can be found at the Github-repository of Marc Gravell.
And according to the wiki the Thrift runtime doesn't run on Windows.
For one, protobuf isn't a full RPC implementation. It requires something like gRPC to go with it.
gPRC is very slow compared to Thrift:
http://szelei.me/rpc-benchmark-part1/
I think the basic data structure is different
Protocol Buffer use variable-length integee which refers to variable-length digital encoding, turning a fixed-length number into a variable-length number to save space.
Thrift proposed different types of serialization formats (called "protocols").
In fact, Thrift has two different JSON encodings, and no less than three different binary encoding methods.
In conclusion,these two libraries are completely different. Thrift likes a one-stop shop, giving you the entire integrated RPC framework and many options (supporting cross-language), while Protocol Buffers is more inclined to "just do one thing and do it well".
There are some excellent points here and I'm going to add another one in case someones' path crosses here.
Thrift gives you an option to choose between thrift-binary and thrift-compact (de)serializer, thrift-binary will have an excellent performance but bigger packet size, while thrift-compact will give you good compression but needs more processing power. This is handy because you can always switch between these two modes as easily as changing a line of code (heck, even make it configurable). So if you are not sure how much your application should be optimized for packet size or in processing power, thrift can be an interesting choice.
PS: See this excellent benchmark project by thekvs which compares many serializers including thrift-binary, thrift-compact, and protobuf: https://github.com/thekvs/cpp-serializers
PS: There is another serializer named YAS which gives this option too but it is schema-less see the link above.
It's also important to note that not all supported languages compair consistently with thrift or protobuf. At this point it's a matter of the modules implementation in addition to the underlying serialization. Take care to check benchmarks for whatever language you plan to use.