Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm looking for suggestions on possible IPC mechanisms that are:
Cross platform (Win32 and Linux at least)
Simple to implement in C++ as well as the most common scripting languages (perl, ruby, python, etc).
Finally, simple to use from a programming point of view!
What my options are? I'm programming under Linux, but I'd like what I write to be portable to other OSes in the future. I've thought about using sockets, named pipes, or something like DBus.
In terms of speed, the best cross-platform IPC mechanism will be pipes. That assumes, however, that you want cross-platform IPC on the same machine. If you want to be able to talk to processes on remote machines, you'll want to look at using sockets instead. Luckily, if you're talking about TCP at least, sockets and pipes behave pretty much the same behavior. While the APIs for setting them up and connecting them are different, they both just act like streams of data.
The difficult part, however, is not the communication channel, but the messages you pass over it. You really want to look at something that will perform verification and parsing for you. I recommend looking at Google's Protocol Buffers. You basically create a spec file that describes the object you want to pass between processes, and there is a compiler that generates code in a number of different languages for reading and writing objects that match the spec. It's much easier (and less bug prone) than trying to come up with a messaging protocol and parser yourself.
For C++, check out Boost IPC.
You can probably create or find some bindings for the scripting languages as well.
Otherwise if it's really important to be able to interface with scripting languages your best bet is simply to use files, pipes or sockets or even a higher level abstraction like HTTP.
Why not D-Bus? It's a very simple message passing system that runs on almost all platforms and is designed for robustness. It's supported by pretty much every scripting language at this point.
http://freedesktop.org/wiki/Software/dbus
If you want a portable, easy to use, multi-language and LGPLed solution, I would recommend you ZeroMQ:
Amazingly fast, almost linear scaleable and still simple.
Suitable for simple and complex systems/architectures.
Very powerful communication patterns available: REP-REP, PUSH-PULL, PUB-SUB, PAIR-PAIR.
You can configure the transport protocol to make it more efficient if you are passing messages between threads (inproc://), processes (ipc://) or machines ({tcp|pgm|epgm}://), with a smart option to shave off some part of the protocol overheads in case of connections are running between VMware virtual machines (vmci://).
For serialization I would suggest MessagePack or Protocol Buffers (which other have already mentioned as well), depending on your needs.
You might want to try YAMI , it's very simple yet functional, portable and comes with binding to few languages
I can suggest you to use the plibsys C library. It is very simple, lightweight and cross-platform. Released under the LGPL. It provides:
named system-wide shared memory regions (System V, POSIX and Windows implementations);
named system-wide semaphores for access synchronization (System V, POSIX and Windows implementations);
named system-wide shared buffer implementation based on the shared memory and semaphore;
sockets (TCP, UDP, SCTP) with IPv4 and IPv6 support (UNIX and Windows implementations).
It is easy to use library with quite a good documentation. As it is written in C you can easily make bindings from scripting languages.
If you need to pass large data sets between processes (especially if speed is essential) it is better to use shared memory to pass the data itself and sockets to notify a process that the data is ready. You can make it as following:
a process puts the data into a shared memory segment and sends a notification via a socket to another process; as a notification usually is very small the time overhead is minimal;
another process receives the notification and reads the data from the shared memory segment; after that it sends a notification that the data was read back to the first process so it can feed more data.
This approach can be implemented in a cross-platform fashion.
How about Facebook's Thrift?
Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.
I think you'll want something based on sockets.
If you want RPC rather than just IPC I would suggest something like XML-RPC/SOAP which runs over HTTP, and can be used from any language.
YAMI - Yet Another Messaging Infrastructure is a lightweight messaging and networking framework.
If you're willing to try something a little different, there's the ICE platform from ZeroC. It's open source, and is supported on pretty much every OS you can think of, as well as having language support for C++, C#, Java, Ruby, Python and PHP. Finally, it's very easy to drive (the language mappings are tailored to fit naturally into each language). It's also fast and efficient. There's even a cut-down version for devices.
Distributed computing is usually complex and you are well advised to use existing libraries or frameworks instead of reinventing the wheel. Previous poster have already enumerated a couple of these libraries and frameworks. Depending on your needs you can pick either a very low level (like sockets) or high level framework (like CORBA). There can not be a generic "use this" answer. You need to educate yourself about distributed programming and then will find it much easier to pick the right library or framework for the job.
There exists a wildly used C++ framework for distributed computing called ACE and the CORBA ORB TAO (which is buildt upon ACE). There exist very good books about ACE http://www.cs.wustl.edu/~schmidt/ACE/ so you might take a look. Take care!
TCP sockets to localhost FTW.
It doesn't get more simple than using pipes, which are supported on every OS I know of, and can be accessed in pretty much every language.
Check out this tutorial.
Python has a pretty good IPC library: see https://docs.python.org/2/library/ipc.html
Xojo has built-in cross-platform IPC support with its IPCSocket class. Although you obviously couldn't "implement" it in other languages, you could use it in a Xojo console app and call it from other languages making this option perhaps very simple for you.
In the current ages there is available a very easy, C++1x compliant, well documented, Linux and Windows compatible, open-source "CommonAPI" library: CommonAPI C++.
The underlying IPC system is D-Bus (libdbus) or SomeIP if one wish. Application interfaces are specified using a simple and tailored for that Franca IDL language.
Related
The Asynchronous Messaging Protocol is a simple protocol in python-twisted. I have a fairly complete app (python, twisted, kivy) using it. The client-server architecture implements a view-controller sort of relationship, with allmost all business logic server-side and the UI interface code simply reflecting change in state of models (sent by server) and sending the appropriate AMP messages.
Here is a list of implementations of the AMP protocol in other languages, but some seen unfinished, and most don't seem to be actually being used for anything serious.
The use-case I'm looking at is a fully Python app which currently works on Windows, Linux, and Android (possibly iOS if I ever get round to building that). And possibly, in the future, replacing the View/UI bit with 'native' language (Java/Swift on Android, for instance) while keeping the business bits in python and twisted.
So I have two main questions:-
Is it accurate to say that AMP is only really used within python-twisted and those programs that use it?
Are there other, more generally useful network protocols which are both implemented and fairly easy to use in twisted as well as being non-specific (e.g. jabber is really only for chat)? Preferably which don't require a server like WAMP/autobahn do (if I understand correctly) so it can be self-contained within any device which can run python.
This isn't entirely accurate. Twisted just happens to use it the most. Other languages make use of AMP, it's just that AMP hasn't become very popular given popularity of other more robust options like AMQP (ZeroMQ, RabbitMQ, WebsphereMQ, etc).
AMP is about as simple as it can get. Also, it's unlikely you will find a solution without a server.
AMP is not locked to Twisted or Python. There are other implementations in other languages but like you said some are not used in a "serious" manner and often go unmaintained. Don't let that scare you off because the protocol is so simple, there often isn't much to do after it's been implemented. You will be happy to know that the actual protocol hasn't changed much and isn't very difficult to implement in any language if you follow the design. If you want something more generic, cross platform, and ensured compatibility, then consider HTTP requests.
I'm looking at building a Cocoa application on the Mac with a back-end daemon process (really just a mostly-headless Cocoa app, probably), along with 0 or more "client" applications running locally (although if possible I'd like to support remote clients as well; the remote clients would only ever be other Macs or iPhone OS devices).
The data being communicated will be fairly trivial, mostly just text and commands (which I guess can be represented as text anyway), and maybe the occasional small file (an image possibly).
I've looked at a few methods for doing this but I'm not sure which is "best" for the task at hand. Things I've considered:
Reading and writing to a file (…yes), very basic but not very scalable.
Pure sockets (I have no experience with sockets but I seem to think I can use them to send data locally and over a network. Though it seems cumbersome if doing everything in Cocoa
Distributed Objects: seems rather inelegant for a task like this
NSConnection: I can't really figure out what this class even does, but I've read of it in some IPC search results
I'm sure there are things I'm missing, but I was surprised to find a lack of resources on this topic.
I am currently looking into the same questions. For me the possibility of adding Windows clients later makes the situation more complicated; in your case the answer seems to be simpler.
About the options you have considered:
Control files: While it is possible to communicate via control files, you have to keep in mind that the files need to be communicated via a network file system among the machines involved. So the network file system serves as an abstraction of the actual network infrastructure, but does not offer the full power and flexibility the network normally has. Implementation: Practically, you will need to have at least two files for each pair of client/servers: a file the server uses to send a request to the client(s) and a file for the responses. If each process can communicate both ways, you need to duplicate this. Furthermore, both the client(s) and the server(s) work on a "pull" basis, i.e., they need to revisit the control files frequently and see if something new has been delivered.
The advantage of this solution is that it minimizes the need for learning new techniques. The big disadvantage is that it has huge demands on the program logic; a lot of things need to be taken care of by you (Will the files be written in one piece or can it happen that any party picks up inconsistent files? How frequently should checks be implemented? Do I need to worry about the file system, like caching, etc? Can I add encryption later without toying around with things outside of my program code? ...)
If portability was an issue (which, as far as I understood from your question is not the case) then this solution would be easy to port to different systems and even different programming languages. However, I don't know of any network files ystem for iPhone OS, but I am not familiar with this.
Sockets: The programming interface is certainly different; depending on your experience with socket programming it may mean that you have more work learning it first and debugging it later. Implementation: Practically, you will need a similar logic as before, i.e., client(s) and server(s) communicating via the network. A definite plus of this approach is that the processes can work on a "push" basis, i.e., they can listen on a socket until a message arrives which is superior to checking control files regularly. Network corruption and inconsistencies are also not your concern. Furthermore, you (may) have more control over the way the connections are established rather than relying on things outside of your program's control (again, this is important if you decide to add encryption later on).
The advantage is that a lot of things are taken off your shoulders that would bother an implementation in 1. The disadvantage is that you still need to change your program logic substantially in order to make sure that you send and receive the correct information (file types etc.).
In my experience portability (i.e., ease of transitioning to different systems and even programming languages) is very good since anything even remotely compatible to POSIX works.
[EDIT: In particular, as soon as you communicate binary numbers endianess becomes an issue and you have to take care of this problem manually - this is a common (!) special case of the "correct information" issue I mentioned above. It will bite you e.g. when you have a PowerPC talking to an Intel Mac. This special case disappears with the solution 3.+4. together will all of the other "correct information" issues.]
+4. Distributed objects: The NSProxy class cluster is used to implement distributed objects. NSConnection is responsible for setting up remote connections as a prerequisite for sending information around, so once you understand how to use this system, you also understand distributed objects. ;^)
The idea is that your high-level program logic does not need to be changed (i.e., your objects communicate via messages and receive results and the messages together with the return types are identical to what you are used to from your local implementation) without having to bother about the particulars of the network infrastructure. Well, at least in theory. Implementation: I am also working on this right now, so my understanding is still limited. As far as I understand, you do need to setup a certain structure, i.e., you still have to decide which processes (local and/or remote) can receive which messages; this is what NSConnection does. At this point, you implicitly define a client/server architecture, but you do not need to worry about the problems mentioned in 2.
There is an introduction with two explicit examples at the Gnustep project server; it illustrates how the technology works and is a good starting point for experimenting:
http://www.gnustep.org/resources/documentation/Developer/Base/ProgrammingManual/manual_7.html
Unfortunately, the disadvantages are a total loss of compatibility (although you will still do fine with the setup you mentioned of Macs and iPhone/iPad only) with other systems and loss of portability to other languages. Gnustep with Objective-C is at best code-compatible, but there is no way to communicate between Gnustep and Cocoa, see my edit to question number 2 here: CORBA on Mac OS X (Cocoa)
[EDIT: I just came across another piece of information that I was unaware of. While I have checked that NSProxy is available on the iPhone, I did not check whether the other parts of the distributed objects mechanism are. According to this link: http://www.cocoabuilder.com/archive/cocoa/224358-big-picture-relationships-between-nsconnection-nsinputstream-nsoutputstream-etc.html (search the page for the phrase "iPhone OS") they are not. This would exclude this solution if you demand to use iPhone/iPad at this moment.]
So to conclude, there is a trade-off between effort of learning (and implementing and debugging) new technologies on the one hand and hand-coding lower-level communication logic on the other. While the distributed object approach takes most load of your shoulders and incurs the smallest changes in program logic, it is the hardest to learn and also (unfortunately) the least portable.
Disclaimer: Distributed Objects are not available on iPhone.
Why do you find distributed objects inelegant? They sounds like a good match here:
transparent marshalling of fundamental types and Objective-C classes
it doesn't really matter wether clients are local or remote
not much additional work for Cocoa-based applications
The documentation might make it sound like more work then it actually is, but all you basically have to do is to use protocols cleanly and export, or respectively connect to, the servers root object.
The rest should happen automagically behind the scenes for you in the given scenario.
We are using ThoMoNetworking and it works fine and is fast to setup. Basically it allows you to send NSCoding compliant objects in the local network, but of course also works if client and server are on he same machine. As a wrapper around the foundation classes it takes care of pairing, reconnections, etc..
I have a scientific program written in F# which I want to parallelize and run on 1 server with multiple processors (64) and for the future also in the cloud (Windows Azure?). The program will have a simple 1-1 communication between the nodes (no broadcast etc.).
If I used WCF, would it be as fast as MPI? What has MPI that WCF does not? There exists Pure MPI .NET written for WCF which puzzles me even more. I do not know if to use WCF or MPI.NET or Pure Mpi running on WCF.
PS: I guess that TPL is out of the game for 64 processors and more, right?
It is difficult to give a concrete answer, because it all depends on the specific aspects of your application, its current architecture (I suppose you already have some app) etc.
As you mention MPI and WCF, I assume that the application is written as several components that communicate with each other. The best way to structure this kind of application is to use F# agents.
As far as I understand, you want to run the application on a single server first. If you write it using agents, the agents can just communicate directly with each other (so you don't need MPI or WCF).
TPL should work well on a single-server (with lots of CPUs), but it will not scale to the distributed setting - you cannot run Task on another machine. However, you can use it inside individual components (e.g. agents) that will be distributed.
Regarding MPI vs. WCF - I don't have enough experience to answer that. However, if you use agent-based architecture, it should be easy to try various options. You may also check out fracture and related projects, which aims to implement high-performance sockets for F# (and possibly distributed agents in the future).
If you're doing it on 1 server you could just execute one process and execute the code in parallel. That way you could share memory more easily and faster than doing it through messages like MPI and WCF. Although the overhead of communication might not be that much, depending on your problem + solution.
Also the changes to your code would be much less that way, F# can usually be turned into prallel code with little effort. Going to MPI/WCF would require you to rewrite large portions.
Googling for F# + parallel gives plenty useful info that you should read first, like this for a good start:
http://blogs.msdn.com/b/dsyme/archive/2010/01/09/async-and-parallel-design-patterns-in-f-parallelizing-cpu-and-i-o-computations.aspx
So on 1 server, I woudl use the parallel features of F#, it's designed to prallelize easily.
Later when you want to go for cloud, that would be turning it into cleint-server. That's a different problem then parallization. I would treat and solve them seperately.
On the MPI vs WCF. WCF is designed as a RPC technology, i.e. you call remote procedures and get answers. If you want to use it for parallel programming with separate processes, you would have to create the boilerplate code for that. (Keep track of subsribed clients etc.)
MPI was designed to run that kind of architecture and handles it much more easily. (the first process gets number 0 and is the master, the other are slaves get numbered incrementally etc.)
Howver I don't think MPI will be very good to go cloud, since that invloves http, protocols, security etc. Not sure how well MPI works for those kind of things, WCF will handle that very well indeed.
The fact that there is an MPI.NET for WCF is because MPI is about a certain style of parallizing code that a lot of people are familiar with. So you can use the programming concepts and use them on the .NET platform leveraging WCF for the communications.
Something else you might want to look into if you need to exchange a lot of data over the wire is protocol-buffers (see protobuf-net for instance). That can easily be combined with WCF for communication and is very lean in serializing structured data so you can send over the wire efficiently.
Gert-Jan
WCF and MPI are different concepts. WCF is like a person A asks a person B to do something where as MPI is like a person A creates clones of himself (all clone have same ability/logic) and then these clones work on specific parts of the problem to be solved and once done they combine their results.
So choosing between which one fits your specific application depends on the problem your application is trying to solve. It may even be a combination of both WCF and MPI. Where your client application asks the WCF to do some task and the WCF create clones of the "problem solver" using MPI and when the clone are done with solving the problem (in parallel) they return the aggregated result back to the WCF and then that result is sent to client application.
You might also want to take at the 'mbrace' product, which provides a cloud monad (http://blogs.msdn.com/b/dsyme/archive/2011/08/23/m-brace-f-in-the-cloud.aspx). It's still at a fairly early stage though. I'm no expert but it may be that you can run an mbrace-based solution as effectively a private cloud on your 64-processor setup. When you outgrow that, a move to Azure would be seamless.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have a web application project where performances count more
than anything else, and I have the choice of the technologies
to use.
The language shootout benchmarks that are not really related
to web applications.
What would you recommand as the best suitable candidates?
Thanks!
A friend suggested the gwan server on IRC. Looks to be what I
was searching but I never heard about it before. Anybody with
prior experience on this package? Ease of use, reliability?
Before I leave Apache, I would like to get your thoughts.
G-WAN is a neat webserver: it's based around the "C scripts" concept:
A C script is simply C source-code that is compiled by the webserver and then loaded in protected memory. It will get called by the webserver when a request to the servlet is made. The servlet, as it's compiled by a C compiler, is "as fast" as normally compiling a C program. However, the advantage of C scripts to, for instance, CGI or FastCGI, is that the compiled program is in the same memory space as the webserver. This reduces the overhead of communication (either by creating a process, in the case of CGI, for each request, or the socket for FastCGI).
The webserver is using the select/poll technique: non-blocking I/O. However, there's a neat thing to it. Every program can be written as if it was using blocking I/O. As the webserver itself compiles each C script, it can transform the program to use non-blocking I/O. As of this, it can link itself to third-party libraries (like database access) and still make use of the non-blocking I/O nature: no thread/process context switching.
The tools provided for programming the C scripts are, for instance, caching and safe buffers. The next (not yet released as of writing this post) version will also include a Key-Value store.
Performance-wise: there are some benchmarks available showing it's outperforming any other webserver, however I don't trust these. Try writing a small CPU intensive program in C and in, for instance, PHP. Let the C script run on G-WAN and the PHP script on Apache, and do a benchmark yourself.
There is more to it, but that's out of scope for this question.
Some downsides of G-WAN is that it is developed by only one person. There is a forum, however, where you can ask questions.
Ease of use is limited by your skill in C. The API provided, however, is simple. It still has some inconsistencies and (in my opinion) ugly parts, but that's not a problem. A more serious problem is that each version is not guaranteed to be backwards-compatible and you may have to rewrite.
If you want to be safe: make use of C's platform independentness: allow your code to be compiled to (Fast)CGI programs and also to be used by G-WAN. Might G-WAN fail, you can always fallback to Apache's (Fast)CGI (see http://www.fastcgi.com/ for API's).
If performance counts more than anything else, don't use a scripting language. Especially since you have full control over the technology stack. Compiled languages will perform better for CPU intensive operations.
LuaJit (Lua) is the fastest scripting language with JIT technology..
if you want the fastest for server side web application (that not always scripting), that would be g-wan.. you can use c, c++, java..
ASP.NET is also fast enough for almost anything, but quite pricey
php with hiphop would be easiest to learn and also fast enough..
it depends on how many request do you need.. and how fast you learn the language ^^
don't forget to cache static data (using memcache or nosql)
Begin by identifying if your application performance really depends on the language or on some other factor (like database requests for instance). Ability to cache results can also be a very important factor.
For performance the language used come quite far in the list of important points to check and the use case also influence which language is better. For example if you have many regex to check you should check regex support in the candidate language, etc...
For image processing, the most important point will probably be the underlying image library you use, usually written in C. I have the case of ImageMagick in mind, because I'm currently using it. It's available for as a library for most languages and the scripting language layer is only necessary to call functions of the library and used language at that level won't change much (but caching precomputed result images could change performance by a large margin). This use case would probably be similar for calling a cryptographic lib.
If performance is really such an issue, for image processing you could also consider using a lib that works with GPU accelerator cards (libs with cuda or openGPU support).
Javascript is constantly being scrutinized and optimized for use on mobile devices, so on actual full-size servers it runs EXTREMELY fast. Check out Node.JS, a project for implementing server side javascript to serve webpages: http://nodejs.org/
Well, if you use a database with a large volume of data you will spend more time there than running a php or asp or (insert other flavours here) script
If you can you should build a mockup of your app (or at least a segment of the more database or processor-intensive parts) and try to benchmark those
Update: Seem like Java 7 using NIO.2 has manage to outperform Gwan using C but almost 2x in timing, it is incredible but you can a few a simple tests.
The only downside of Java is not able to integrate shared libraries built on C. I'm ready to challenge someone to prove me wrong that Java NIO.2 is slower than C.
I recommend the Java programming language; it's not a scripting language, but it's probably the fastest programming language that can be used for programming web applications. I also recommend using a framework like Spring for a better programming experience (versus "raw" Java Servlet Programming).
The fasted scripting Language is ASP followed by PHP, but if you want applications that scale to unlimited speeds, use C++ or Java.
Google Search uses C++
Gmail uses Java
YouTube = Python
Twiiter used to use Ruby now they shifted to Java
Facebook = PHP at front end and some java at the backend
But i recommend PHP at the front end and C++ at the back-end
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
What are the biggest pros and cons of Apache Thrift vs Google's Protocol Buffers?
They both offer many of the same features; however, there are some differences:
Thrift supports 'exceptions'
Protocol Buffers have much better documentation/examples
Thrift has a builtin Set type
Protocol Buffers allow "extensions" - you can extend an external proto to add extra fields, while still allowing external code to operate on the values. There is no way to do this in Thrift
I find Protocol Buffers much easier to read
Basically, they are fairly equivalent (with Protocol Buffers slightly more efficient from what I have read).
Another important difference are the languages supported by default.
Protocol Buffers: Java, Android Java, C++, Python, Ruby, C#, Go, Objective-C, Node.js
Thrift: Java, C++, Python, Ruby, C#, Go, Objective-C, JavaScript, Node.js, Erlang, PHP, Perl, Haskell, Smalltalk, OCaml, Delphi, D, Haxe
Both could be extended to other platforms, but these are the languages bindings available out-of-the-box.
RPC is another key difference. Thrift generates code to implement RPC clients and servers wheres Protocol Buffers seems mostly designed as a data-interchange format alone.
Protobuf serialized objects are about 30% smaller than Thrift.
Most actions you may want to do with protobuf objects (create, serialize, deserialize) are much slower than thrift unless you turn on option optimize_for = SPEED.
Thrift has richer data structures (Map, Set)
Protobuf API looks cleaner, though the generated classes are all packed as inner classes which is not so nice.
Thrift enums are not real Java Enums, i.e. they are just ints. Protobuf has real Java enums.
For a closer look at the differences, check out the source code diffs at this open source project.
As I've said as "Thrift vs Protocol buffers" topic :
Referring to Thrift vs Protobuf vs JSON comparison :
Thrift supports out of the box AS3, C++, C#, D, Delphi, Go, Graphviz, Haxe, Haskell, Java, Javascript, Node.js, OCaml, Smalltalk, Typescript, Perl, PHP, Python, Ruby, ...
C++, Python, Java - in-box support in Protobuf
Protobuf support for other languages (including Lua, Matlab, Ruby, Perl, R, Php, OCaml, Mercury, Erlang, Go, D, Lisp) is available as Third Party Addons (btw. Here is SWI-Prolog support).
Protobuf has much better documentation and plenty of examples.
Thrift comes with a good tutorial
Protobuf objects are smaller
Protobuf is faster when using "optimize_for = SPEED" configuration
Thrift has integrated RPC implementation, while for Protobuf RPC solutions are separated, but available (like Zeroc ICE ).
Protobuf is released under BSD-style license
Thrift is released under Apache 2 license
Additionally, there are plenty of interesting additional tools available for those solutions, which might decide. Here are examples for Protobuf: Protobuf-wireshark , protobufeditor.
Protocol Buffers seems to have a more compact representation, but that's only an impression I get from reading the Thrift whitepaper. In their own words:
We decided against some extreme storage optimizations (i.e. packing
small integers into ASCII or using a 7-bit continuation format)
for the sake of simplicity and clarity in the code. These alterations
can easily be made if and when we encounter a performance-critical
use case that demands them.
Also, it may just be my impression, but Protocol Buffers seems to have some thicker abstractions around struct versioning. Thrift does have some versioning support, but it takes a bit of effort to make it happen.
I was able to get better performance with a text based protocol as compared to protobuff on python. However, no type checking or other fancy utf8 conversion, etc... which protobuff offers.
So, if serialization/deserialization is all you need, then you can probably use something else.
http://dhruvbird.blogspot.com/2010/05/protocol-buffers-vs-http.html
One obvious thing not yet mentioned is that can be both a pro or con (and is same for both) is that they are binary protocols. This allows for more compact representation and possibly more performance (pros), but with reduced readability (or rather, debuggability), a con.
Also, both have bit less tool support than standard formats like xml (and maybe even json).
(EDIT) Here's an Interesting comparison that tackles both size & performance differences, and includes numbers for some other formats (xml, json) as well.
I think most of these points have missed the basic fact that Thrift is an RPC framework, which happens to have the ability to serialize data using a variety of methods (binary, XML, etc).
Protocol Buffers are designed purely for serialization, it's not a framework like Thrift.
ProtocolBuffers is FASTER.
There is a nice benchmark here:
https://github.com/eishay/jvm-serializers/wiki (last updated 2016, but there are forks that contain faster serializers as of 2020, e.g. ActiveJ created a fork to demonstrate their speed on the JVM: https://github.com/activej/jvm-serializers).
You might also want to look into Avro, which can be faster. There are two libraries for Avro in .NET:
Apache.Avro
Chr.Avro - written by engineers at C.H. Robinson, a supply chain logistics company
By the way, the fastest I've ever seen is Cap'nProto;
A C# implementation can be found at the Github-repository of Marc Gravell.
And according to the wiki the Thrift runtime doesn't run on Windows.
For one, protobuf isn't a full RPC implementation. It requires something like gRPC to go with it.
gPRC is very slow compared to Thrift:
http://szelei.me/rpc-benchmark-part1/
I think the basic data structure is different
Protocol Buffer use variable-length integee which refers to variable-length digital encoding, turning a fixed-length number into a variable-length number to save space.
Thrift proposed different types of serialization formats (called "protocols").
In fact, Thrift has two different JSON encodings, and no less than three different binary encoding methods.
In conclusion,these two libraries are completely different. Thrift likes a one-stop shop, giving you the entire integrated RPC framework and many options (supporting cross-language), while Protocol Buffers is more inclined to "just do one thing and do it well".
There are some excellent points here and I'm going to add another one in case someones' path crosses here.
Thrift gives you an option to choose between thrift-binary and thrift-compact (de)serializer, thrift-binary will have an excellent performance but bigger packet size, while thrift-compact will give you good compression but needs more processing power. This is handy because you can always switch between these two modes as easily as changing a line of code (heck, even make it configurable). So if you are not sure how much your application should be optimized for packet size or in processing power, thrift can be an interesting choice.
PS: See this excellent benchmark project by thekvs which compares many serializers including thrift-binary, thrift-compact, and protobuf: https://github.com/thekvs/cpp-serializers
PS: There is another serializer named YAS which gives this option too but it is schema-less see the link above.
It's also important to note that not all supported languages compair consistently with thrift or protobuf. At this point it's a matter of the modules implementation in addition to the underlying serialization. Take care to check benchmarks for whatever language you plan to use.