Interfacing Twisted to other applications - twisted

I have decided to use Twisted for a project and have developed a server that can push data to clients on other computers. At the moment I am using dummy data for testing speed requirements but I now need to interface Twisted to my other Python DAQ application which basically collects real-time data (500 Hz) from various external devices over different transports (e.g. Bluetooth). (note: the DAQ (data acquisition) application is on the same computer as the Twisted server)
Since the DAQ application is not part of the Twisted framework I am wondering what is the most efficient (fastest, robust, minimal latency) way to pass the data to the Twisted server. I have considered using a light-weight database, memcache, Queue or even the Twisted plugins but it is hard to tell which would be the most appropriate and best fit. I should add that the DAQ application was developed before deciding on using Twisted so I have so far considered it as separate from the Twisted network.
On the other side of the system, the client side, which reside on multiple computers, I have a similar problem. As the data streams in (I am sending lines of data, about 100 bytes each) I want to hand this data off to another application which will process this data for a web application (I would prefer to use Twisted Web Service for this but that is not my choice!) The Web application is being written in Java. Once again I have considered the choices above but since I am new to Twisted I am not sure which is the best approach. (note: the Web application is on the same computers as the Twisted clients)
Any advice or thoughts would be greatly appreciated.

My suggestion would be to to build a simple protocol with twisted's built-in support for AMP; you can hook this in to any other languages or frameworks using one of the implementations of AMP in other languages. AMP is designed to be as easy as possible to implement, as it's just a socket with some length-prefixed strings arranged into key/value pairs.

There's obviously a zillion different ways you could go about this, but I would first look at using a queue to pass the data to your Twisted server. If you deploy one of the many opensource queueing tools (e.g. RabbitMQ, ZeroMQ, OpenMQ, and loads of others), you should be able to write from your DAQ product using something generic like HTTP, then read into your Twisted server also using HTTP. If you don't like HTTP, then there would be a lot of alternative transports to choose from - just identify which you want to use, then use that as a basis for selecting your queueing tool.
This would give you an extremely flexible solution, in that you could upgrade or change any of these products with minimal impact to anything else in the whole solution.

Related

how can we integrate two rails applications deployed within an intranet

Is RESTful services the only route for integrating any application with a rails applications including any other rails applications irrespective of whether it is in same network or not?
For integrating two applications how heavy is a RESTful service compared to the RMI based integration available in other technologies like Java EE?
Is there way to integrate two rails applications using any natively understood binary format which can avoid transformation to a different format ex: HTTP request.
The REST approach means simply that application A will make requests of application B (and potentially the other way around) using the HTTP protocol. The data send can be in whatever format you like, although JSON is the default today (and XML was the default yesterday, and even ... SOAP -- gaq!).
These days, the vast majority of external APIs are implemented this way -- Amazon, Google Maps, Yelp, etc, etc, etc. Why? Because the HTTP (or HTTPS) protocol is well understood and widely deployed. No special configuration is required and the same protocol that serves the application to regular people on web browsers works for other applications. Rails makes this brilliantly easy (if you go with the flow).
Java's RMI is a specific protocol (just as HTTP is). The advantage is that objects defined in A are available as instances in B (after a great deal of work in both). This really makes sense when you have a set of applications all designed up front to work together and whose main requirement is to be distributed across locations, servers, etc. RMI creates a tight binding between applications -- a change in one typically requires a change in the other. It's right for some kinds of applications.
But if you have, for example, two departments in a company who talk to each other, but don't want to be "bound at the hip", a REST interface provides a great deal of flexibility.
Your second question ("how heavy") is very difficult to answer. A company I worked for in 2001 had hundreds of servers all running an instance of a "worker" process -- they were all designed to queue their results to a "controller" process which would process the output and forward to another set servers designed to process and manage the data. In 2001, this was the right architecture because it was completely designed to work together -- persistent socket connections on a single subnet of our intranet running on a room full of servers. Now in 2012, that room full of servers is replaced by a few high-powered processors running 64-bit OS and addressing massive amounts of memory -- it's a whole new world. A doubling of performance in 2001 could save potentially millions of dollars of hardware, operational support, space and so on. In 2012, the most expensive thing is good developers! So "heavy" is really kind of irrelevant in all but the most compute-intensive operations these days. An HTTP request is light and simple.
Final question: natively understood binary format. Sure, if needed. In the end, any binary format that is sent over the wire between two servers needs to be serialized and de-serialized as a stream, and this is work, both for programmers and for machines. JSON is a text format, but one natively understood by JavaScript (JavaScript Object Notation) and has the distinct advantage of being human-readable. Given that most servers are set up to compress output automatically whether something is text or binary becomes kind of less relevant, at least as far as I/O and payload goes. Of course you can come up with any mutually understood format and send it over HTTP, but again, this is something that mattered a decade ago, and today is usually not an issue worth considering. Processors have been getting faster and faster, and memory cheaper (and bigger) -- so (as always) I/O (whether network or disk) is the typical bottleneck in modern applications.
If I were to re-design the application I mentioned from 2001 where hundreds of (today's) servers needed to communicate with (many) peer servers very specifically designed to interoperate, I might work to make sure that the serialize/deserialize process was as lightweight as possible (but only if it turned out to be a bottleneck). For me, being bound to any given platform or language is a non-starter -- the computing world is moving way to fast.
But in almost all realistic business applications today, keeping things simple, standard, and straightforward has both present and future benefits that make the need to worry obsessively about performance a thing of the past.
Hope this helps :-)

MPI vs. Microsoft WCF vs. Microsoft TPL

I have a scientific program written in F# which I want to parallelize and run on 1 server with multiple processors (64) and for the future also in the cloud (Windows Azure?). The program will have a simple 1-1 communication between the nodes (no broadcast etc.).
If I used WCF, would it be as fast as MPI? What has MPI that WCF does not? There exists Pure MPI .NET written for WCF which puzzles me even more. I do not know if to use WCF or MPI.NET or Pure Mpi running on WCF.
PS: I guess that TPL is out of the game for 64 processors and more, right?
It is difficult to give a concrete answer, because it all depends on the specific aspects of your application, its current architecture (I suppose you already have some app) etc.
As you mention MPI and WCF, I assume that the application is written as several components that communicate with each other. The best way to structure this kind of application is to use F# agents.
As far as I understand, you want to run the application on a single server first. If you write it using agents, the agents can just communicate directly with each other (so you don't need MPI or WCF).
TPL should work well on a single-server (with lots of CPUs), but it will not scale to the distributed setting - you cannot run Task on another machine. However, you can use it inside individual components (e.g. agents) that will be distributed.
Regarding MPI vs. WCF - I don't have enough experience to answer that. However, if you use agent-based architecture, it should be easy to try various options. You may also check out fracture and related projects, which aims to implement high-performance sockets for F# (and possibly distributed agents in the future).
If you're doing it on 1 server you could just execute one process and execute the code in parallel. That way you could share memory more easily and faster than doing it through messages like MPI and WCF. Although the overhead of communication might not be that much, depending on your problem + solution.
Also the changes to your code would be much less that way, F# can usually be turned into prallel code with little effort. Going to MPI/WCF would require you to rewrite large portions.
Googling for F# + parallel gives plenty useful info that you should read first, like this for a good start:
http://blogs.msdn.com/b/dsyme/archive/2010/01/09/async-and-parallel-design-patterns-in-f-parallelizing-cpu-and-i-o-computations.aspx
So on 1 server, I woudl use the parallel features of F#, it's designed to prallelize easily.
Later when you want to go for cloud, that would be turning it into cleint-server. That's a different problem then parallization. I would treat and solve them seperately.
On the MPI vs WCF. WCF is designed as a RPC technology, i.e. you call remote procedures and get answers. If you want to use it for parallel programming with separate processes, you would have to create the boilerplate code for that. (Keep track of subsribed clients etc.)
MPI was designed to run that kind of architecture and handles it much more easily. (the first process gets number 0 and is the master, the other are slaves get numbered incrementally etc.)
Howver I don't think MPI will be very good to go cloud, since that invloves http, protocols, security etc. Not sure how well MPI works for those kind of things, WCF will handle that very well indeed.
The fact that there is an MPI.NET for WCF is because MPI is about a certain style of parallizing code that a lot of people are familiar with. So you can use the programming concepts and use them on the .NET platform leveraging WCF for the communications.
Something else you might want to look into if you need to exchange a lot of data over the wire is protocol-buffers (see protobuf-net for instance). That can easily be combined with WCF for communication and is very lean in serializing structured data so you can send over the wire efficiently.
Gert-Jan
WCF and MPI are different concepts. WCF is like a person A asks a person B to do something where as MPI is like a person A creates clones of himself (all clone have same ability/logic) and then these clones work on specific parts of the problem to be solved and once done they combine their results.
So choosing between which one fits your specific application depends on the problem your application is trying to solve. It may even be a combination of both WCF and MPI. Where your client application asks the WCF to do some task and the WCF create clones of the "problem solver" using MPI and when the clone are done with solving the problem (in parallel) they return the aggregated result back to the WCF and then that result is sent to client application.
You might also want to take at the 'mbrace' product, which provides a cloud monad (http://blogs.msdn.com/b/dsyme/archive/2011/08/23/m-brace-f-in-the-cloud.aspx). It's still at a fairly early stage though. I'm no expert but it may be that you can run an mbrace-based solution as effectively a private cloud on your 64-processor setup. When you outgrow that, a move to Azure would be seamless.

Use erlang as/instead of expect script

I would like to reset passwords on a bunch of boxes over SSH. Any pointers on how Erlang could be used for this purpose?
Erlang is indeed a well-suited choice for this problem.
You should have a look at the ssh module. Start a connection with
ssh:connect(Host, Port, Options).
Then use the ssh_connection module to execute the right passwd command (hint: start a shell first) and log out.
Edit: The above is mostly wrong, this blog post might get you started faster.
You can even write a simple server that does all of these things on several hosts in parallel, resulting in the most multicore-capable multi-host ssh password changer on this very planet. Weekend project idea: make a web app out of it.
Simply don't use Erlang for such a thing.
Reading from here:
What sort of applications is Erlang particularly suitable for?
Distributed, reliable, soft real-time concurrent systems.
Telecommunication systems, e.g. controlling a switch or converting
protocols.
Servers for Internet applications, e.g. a mail transfer agent, an IMAP-4
server, an HTTP server or a WAP Stack.
Telecommunication applications, e.g. handling mobility in a mobile network
or providing unified messaging.
Database applications which require soft realtime behaviour.
Erlang is good at solving these sorts of problems because this is the
problem domain it was originally
designed for. Stating the above in
terms of features:
Erlang provides a simple and powerful
model for error containment and fault
tolerance (supervised processes).
Concurrency and message passing are a fundamental to the language.
Applications written in Erlang are
often composed of hundreds or
thousands of lightweight processes.
Context switching between Erlang
processes is typically one or two
orders of magnitude cheaper than
switching between threads in a C
program.
Writing applications which are made of parts which execute on different
machines (i.e. distributed
applications) is easy. Erlang's
distribution mechanisms are
transparent: programs need not be
aware that they are distributed.
The OTP libraries provide support for many common problems in networking
and telecommunications systems.
The Erlang runtime environment (a virtual machine, much like the Java
virtual machine) means that code
compiled on one architecture runs
anywhere. The runtime system also
allows code in a running system to be
updated without interrupting the
program.
What sort of problems is Erlang not particularly suitable for?
People use Erlang for all sorts of
surprising things, for instance to
communicate with X11 at the protocol
level, but, there are some common
situations where Erlang is not likely
to be the language of choice.
The most common class of 'less
suitable' problems is characterised by
performance being a prime requirement
and constant-factors having a large
effect on performance. Typical
examples are image processing, signal
processing, sorting large volumes of
data and low-level protocol
termination.
Another class of problem is
characterised by a wide interface to
existing C code. A typical example is
implementing operating system device
drivers.
Most (all?) large systems developed
using Erlang make heavy use of C for
low-level code, leaving Erlang to
manage the parts which tend to be
complex in other languages, like
controlling systems spread across
several machines and implementing
complex protocol logic.
As suggested by Andrzej, you should look into other directions. Maybe a different question on StackOverflow asking "which language would be good for..." could be the first step...
UPDATE
If you still intend to use Erlang to reset your passwords you might want to have a look to the Erlang SSH Channel Behaviour as well.
Reading from the doc:
Ssh services are implemented as channels that are multiplexed over an ssh connection and
communicates via the ssh connection protocol. This module provides a callback API that
takes care of generic channel aspects such as flow control and close messages and lets the
callback functions take care of the service specific parts.

PyAMF backend choices!

I've been using PyAMF to write a backend for a flex app that will request different groups of hundreds of different images depending on what the client needs. I have been using the "simple_server" WSGI server that PyAMF supplies while developing the flex code. Now I'm ready to write a robust backend that will be able to pull images from a mySQL database and send them as fast as possible and as efficiently as possible to many concurrent clients.
The PyAMF documentation is great because they supply many examples to follow, however I am confused about what kind of backend I am trying to create.
Do I want a SocketServer or a WSGI server or something like Twisted or web2py or Tornado? Are these even all different? :) Should I be using Apache modules instead (mod_wsgi or modjy or mod_python)?
I realize that this probably touches on many open debates, so maybe you could just point me to any good summaries of these debates?
Its great to have so many options, but how do I choose?
The short answer is, of course, that it depends on the requirements of your project.
How many concurrent connections is "a lot"?
How much programmer time can you throw at the problem?
How much hardware can you throw at the problem?
...etc...
If you plan to have lots of concurrent clients, it's hard to beat Twisted in the Python world. However, you'll have to deal with your database asynchronously to avoid blocking, and depending on how complex your database interactions are, this can be a bit of a pain. You're basically limited to either using twisted.enterprise.adbapi or coming up with your own twisted-ORM integration.
If you'd rather have "easy" database code (i.e. you want to use an ORM), you're better off going with a (TurboGears/Pylons/plain wsgi) project, probably hosted using Apache and mod_wsgi. This can be a pretty scalable solution, and you get a lot of stuff for free using these frameworks, but it may be more than you need.
I would avoid using one of the many plain python wsgi servers out there (wsgiref, paster, etc.) in production if you really want high performance.
Good Luck!

Has anybody compared WCF and ZeroC ICE?

ZeroC's ICE (www.zeroc.com) looks interesting and I am interested in looking at it and comparing it to our existing software that uses WCF. In particular, our WCF app uses server callbacks (via HTTP).
Anybody who's compared them? How did it go? I'm particularly interested in the performance aspect, since interoperability isn't much of a concern for us right now. Thanks!
I did a very terse review of ICE a few years ago, and although I haven't compared them directly before, having reasonable knowledge of WCF my thoughts might have some relevance.
Firstly, it's not entierely fair to compare WCF with ICE as WCF as ICE is a specific remote communication mechanism and WCF is a higher level remote communications framework.
While WCF is often thought of as implementing SOAP web services, and that is indeed its main use to date, it can also be used for implementing remote services using all manner of encodings and transport channels, which means it can theoretically be used for performant comms between applications.
In comparison, ICE is a cross-platform remote communicaton mechanism that uses binary encoding for performant communications between applications. It's something of a simplified evolution of CORBA and is more directly comparable to CORBA, DCOM, .NET Remoting, and JNI.
However, even though there's no direct correspondence between ICE and WCF, if you need your .NET app to communicate remotely then they're both contenders. Some of the decision points you might want to consider include:
Resourcing. It'll be easier to find developers with WCF experience than ICE experience.
Performance. If you want performance then ICE performs fast, but WCF can also be used in a performant configuration. Alternatively, .NET Remoting can provide very good performance, and whatever the MS-sponsored benchmarks say I've seen it outperform WCF by 10%.
Cross-platform. If you need to communicate with non-Windows applications then you're limited with the WCF options you can use. In addition, since every SOAP stack seems to implement the standards differently it can be a pain creating truly generic Web Services (though WS-I helps)
If you don't need every ounce of performance from day one, then I'd personally plump for WCF to start with, and then consider ICE if performance ever becomes critical. Even then it might be cheaper to scale out your service boxes than it is to move to ICE, and if you don't have any exotic cross-platform needs then you could always look at reconfiguring WCF for binary encoding etc
Michi Henning from ZeroC has recently published a white paper on just this topic -- "Choosing Middleware: Why Performance and Scalability do (and do not) Matter". It compares Ice, WCF (binary & SOAP), and RMI with various performance metrics, platforms, languages, etc. There's more information on Michi's blog, but the white paper is also quite readable, with all the standard caveats of any benchmark.
Disclaimer: I've used Ice and RMI extensively, but never WCF.
Apache Thrift is another contender to ICE and WCF. It was developed and open sourced by Facebook. Apache Thrift is nice in some ways because its not only extremely efficient on the encoding side, it also supports adding of fields to structures without breaking all of the clients (something we found extremely useful for our projects).
Google Protocol Buffers would seem not really a contender as it doesn't mention .NET support on the home page. However, some community addons support C#. In addition, ICE provides emulation for Google Protocol Buffers if you're working with existing services.
Data point: we just converted a callback multi-platform and multi-language project from Ice to Thrift with pretty good results. Ice does a lot for you, so we had to implement disconnection listeners, connection events, etc. ourselves. And in one case we got bit in the proverbial with a big object lock that Ice was letting us get away with -- this caused a deadlock in the Thrift server but it was easily fixed by less lazy coding on the C# side.
I've just finished benchmarking, and in our application anything that pushes large amounts of data is faster than, or on par with, Ice. Shorter messages with more over-head (i.e., a "heartbeat" that updates a status over the protocol) is a bit slower.
The most important bit was that in order to implement the callback service correctly we had to extend Thrift interfaces and define our own protocol, along with a Thrift "Processor" and callback client-server. But I freely admit our application is /very/ special. The existing protocols and servers should be sufficient. But extending them, even to use multiplex sockets from .Net, was not terribly difficult.
We are using ICE to integrate modules written in both C++, Java and C#. The nice thing is that our server can access components on remote machines as well, so if we need more performance we can shift processing to different machines.
I've used both WCF and ICE, and I'd say that ICE is cleaner on the implementation side. ICE also has very detailed and readable documentation.
ICE supports some things that WCF cannot do, including load balancing, automated remote client updates, etc.