A rack-supporting webserver that runs in JRuby and supports SSL and streaming - ssl

I'm having issues looking for a rack supporting webserver in Ruby that meets our requirements. What we've coded already is using Sinatra, so that's what we're aiming to run.
The webservice must:
Run in JRuby
Support SSL
support streaming of files. It's much more important to be memory-efficient than performant.
Be Multiplatform (Windows and Linux flavours)
As lightweight as possible - links back to memory efficiency
We're currently using WEBrick, but it can't handle streaming, so we're looking for alternatives. I've been looking around myself, but I'm having real difficulty finding documentation about what various rack webservers can and can't do. The services I've looked at are:
WEBrick - doesn't support streaming
Thin - depends on C code, so doesn't run in JRuby
Passenger - ditto Thin (C Code)
Unicorn - ditto Thin (C Code)
We're aware that it could be deployed J2EE container, but as that would require distributing the container around with it, we'd rather not go down that approach if we can avoid it (as it would be a heavier weight solution).
Thanks in advance for any help people can give.

Does Puma meet your requirements? It supports JRuby, SSL, Windows + Linux, and advertises itself as lightweight. I'm afraid I haven't tried it out, nor do I know if it streams files.

For anyone who finds this question and wants to know what we used in the end: we went with Trinidad as it met all the requirements we needed.
It was quite a simple job to strip out the original WEBrick server we were using and replace it with Trinidad, while still using Sinatra. We then used JRuby to wrap it all up into a Jar and run it in the JVM as a self contained web-service package.

Related

JRuby deployment options that support multiple versions of JRuby

I'm looking for a way to deploy multiple JRuby apps on a single server, the apps are in different stages and hence use different versions of JRuby, in the long term it would be pretty complicated to try and sync all of the applications with all of the application servers, so I'm looking something akin to Phusion Passenger 4 in the Javaland.
Apparently Passenger allows something like this, but there's no documentation available how such a setup should work. Torquebox doesn't mention this use case in their docs.
Bonus points for:
solutions that allow git pull deployment
solutions that are rvm friendly
solutions that are not Tomcat based
solutions that are clustering friendly
solutions that handle daemonization, routing, resource management and monitoring on their own
solutions that are mature and actively supported
So far everything I've looked at failed on some of the points, torquebox doesn't seem to support multiple jruby versions, trinidad is tomcat, puma requires some of hand-holding (process monitoring, reverse proxy, ....) etc
probably your best choice would be to do it the "Java-way" using https://github.com/jruby/warbler
you'll end up with a .war that packs the JRuby's jars in the archive thus each app will have it's own version of JRuby. this of course requires you to setup a Java application server (such as Tomcat), the deployment process would usually mean copying the packaged .war into the server's deployment folder.
be aware that this will likely require a lot of memory since none of the libraries JRuby uses will be shared (also with some servers you need to make sure the class-loader does look at the war's jars first during a specific deployment configuration option)
In the end I opted for a reverse proxy + puma + a process monitoring tool, but it feels like this must be simpler somehow - without three distinct pieces of software working together to make it happen.
Pros are that it's rvm compatible, can support multiple rubies through multiple puma processes and allows git pull deployments
There's also jetpack as an alternative, but I haven't had a chance to play with it

Twisted and Tornado difference when deploying?

I only have a small knowledge about Tornado, and when it comes to deployement, it is better to use Nginx as a load balancer with the number of Tornado process.
How about Twisted? is that the same way?
If I'm tracking your question right, you seem to be asking: "Should Tornado be front-ended with Nginx, how about Twisted?"
If thats really where the question is going, then its got a "it depends" answer but perhaps in a way that you might not expect. In a runtime sense, twisted, tornado and Nginx are in more ways than not the same thing.
All three of these systems use the same programming pattern in their core. Something the OO people call a Reactor pattern, which is often also known as asynchronous I/O event programming, and something that old-school unix people would call a select style event loop. (done via select / epoll / kqueue / WaitForMultipleObjects / etc)
To build up to the answer, some background:
Twisted is a Reactor based framework that was created to for writing python based asynchronous I/O projects in their most generic form. So while its extremely good for web applications (via the Twisted Web sub-module), its equally good for dealing with Serial data (via the SerialPort sub-module) or implementing a totally different networking protocol like ssh.
Twisted excels in flexibility. If you need to glue many IO types together, and particularly if you want to connect IOs to the web it is fantastic. As noted in remudada answer, it also has application tool built in (twistd).
As an async I/O framework, Twisted has very few weaknesses. As a web framework though (while it is actively continuing to evolve) it feel decidedly behind the curve particularly as compared to plugin rich frameworks like Flask, and Tornado definitely has many web conveniences over Twisted Web.
Torando is a Reactor based framework in python, that was created to for serving webpages/webapps extremely fast (basically as fast as python can be made to serve up webpages). The folks who wrote it were aiming to write a system so fast that the production code itself could be python.
The Tornado core is nearly a logically match to the core of Twisted. The core of these projects are so similar that on modern releases you can run Twisted inside of Tornado or run a Tornado port inside of Twisted.
Tornado is single-mindedly focused on serving webpages/webapps fast. In most applications it will be 20%ish faster then Twisted Web, but nowhere near as flexible as supporting other async I/O uses. It (like twisted) is still python based, so if a given webapp does too much CPU based work its performance will fall quickly.
Nginx is a Reactor based application that was created to for serving webpages and connection redirection written in C. While Twisted and Tornado use the Reactor pattern to make python go fast, Nginx takes things to next step and uses that logic in C.
When people compare Python and C they often talk about Python being 1.2 to 100 times slower. However, in the Reactor pattern (which when done right spends most of it's time in operating system) language inefficiency is minimized - as long as not too much logic happens outside of the reactor.
I don't have hard data to back this up but my expectation is that you would find the simplest "Hello world" (I.E. serving static test) running no more then 50% slower on Tornado then on Nginx (with Twisted Web being 20% slower then Tornado on average).
Differences speed of the same thing, where does that leave us?
You asked "it is better to use Nginx as a load balancer with the number of Tornado process", so to answer that I need to ask you question back.
Are you deploying in a way where its critical that you take advanced of multiple cores?
In exchange for blazing async IO speed, the Reactor pattern has a weakness:
Each Reactor can only take advantage of one process core.
As you might guess, the flip side of that weakness is that the Reactor pattern uses that core extremely efficiently and if you have the load, should be able to take that core near 100% use.
This leads back to the type of design your asking about, but the reason for all the background in this answer is because the layering of these services (Nginx in front of Tornado or Twisted) should only be done to take advantage of multi-core machines.
If your running on a single core system (lowest class cloud server or an embedded platform like a Raspberry Pi) you SHOULD NOT front-end a reactor. Doing so will simple slow the whole system down.
If you run more (heavily loaded) reactor services then CPU cores, your also going to be working against yourself.
So:
if you're deploying on single core system:
Run one instance of either a Tornado or Twisted (or Nginx alone if its static pages)
if your trying to fully exploit multiple cores
Run Nginx (or twisted) on the application port, run one instance of Tornado or Twisted for each remaining processor core.
Twisted is a good enough application server on it's own and I would rather use it as it is (unlike Tornado)
If you look at the offical guide http://twistedmatrix.com/documents/11.1.0/core/howto/application.html
you can see how it is set up. Ofcourse you can use uwsgi / nginx / emperor with twsited since it can be run as a standard application, but I would suggest that you do this when you really need the scaling and load balancing.

Load-testing xmpp server

I am looking for a tool capable of generating multiple Xmpp connections to load-test a XMPP server with a secure connection, especially starttls.
For a xmpp plain text authentication I had used jab_simul(followed this tutorial) and tsung both with success.
But I was unable to use the tolls above for the starttls,I peeked into the code of both tools and tried different configurations of the tools.
Another option I am pondering is using a xmpp library like eXmpp and make a specific load-testing tool myself with, instead of altering jab_simul (C software with comments in language i do not understand) or altering tsung(all purpose load-testing tool, so lots of place where you can go wrong).
short-story - I am looking for a tool or advice to stress-testing/load-testing a xmpp server.
We are facing exactly the same challenge right now. After deep consideration we found out that only especially build software can deliver the load we want to test. (Remember, you can configure ejabberd to something very specific :-)
For that we developed a small library called xmpp_talker https://github.com/burinov/xmpp_talker (Apache Licence) which is a kind of xmpp client made as a gen_server. I find it is a very nice starting point to build any kind of load simulation software. There is also echo_worker example included. So, you have good base to start. At the moment xmpp_talker is suited for exmpp 0.9.7. As far as I know in a few days will be out version 1.0.0. (or 0.9.9?) There are many bug fixes (trust me you don't want to know about them). On monday I will release xmpp_talker for exmpp 0.9.8 with proper service interruption handling.
In case you deside to go the same way xmpp_talker could be useful for you.
Added: Here is also great article that is realted to the topic: https://support.process-one.net/doc/display/EXMPP/Scalable+XMPP+bots+with+erlang+and+exmpp
There's also the recently started XMPP benchmarking project called xmppench which aims to be a high-performance benchmarking tool simulating some reasonable use cases of XMPP servers. It's written in C++, based on Swiften and boost.

Can I Replace Apache with Node.js?

I have a website running on CentOS using the usual suspects (Apache, MySQL, and PHP). Since the time this website was originally launched, it has evolved quite a bit and now I'd like to do fancier things with it—namely real-time notifications. From what I've read, Apache handles this poorly. I'm wondering if I can replace just Apache with Node.js (so instead of "LAMP" it would "LNMP").
I've tried searching online for a solution, but haven't found one. If I'm correctly interpreting the things that I've read, it seems that most people are saying that Node.js can replace both Apache and PHP together. I have a lot of existing PHP code, though, so I'd prefer to keep it.
In case it's not already obvious, I'm pretty confused and could use some enlightenment. Thanks very much!
If you're prepared to re-write your PHP in JavaScript, then yes, Node.js can replace your Apache.
If you place an Apache or NGINX instance running in reverse-proxy mode between your servers and your clients, you could handle some requests in JavaScript on Node.js and some requests in your Apache-hosted PHP, until you can completely replace all your PHP with JavaScript code. This might be the happy medium: do your WebSockets work in Node.js, more mundane work in Apache + PHP.
Node.js may be faster than Apache thanks to it's evented/non-blocking architecture, but you may have problems finding modules/libraries which substitute some of Apache functionality.
Node.js itself is a lightweight low-level framework which enables you to relatively quickly build server-side stuff and real-time parts of your web applications, but Apache offers much broader configuration options and "classical" web server oriented features.
I would say that unless you want to replace PHP with node.js based web application framework like express.js then you should stay with Apache (or think about migrating to Nginx if you have performance problems).
I believe Node.js is the future in web serving, but if you have a lot of existing PHP code, Apache/MySQL are your best bet. Apache can be configured to proxy requests to Node.js, or Node.js can proxy requests to Apache, but I believe some performance is lost in both cases, especially in the first one. Not a big deal if you aren't running a very high traffic website though.
I just registered to stackoverflow, and I can't comment on the accepted answer yet, but today I created a simple Node.js script that actually uses sendfile() to serve files through the HTTP protocol. (The existing example that the accepted answer links to only uses bare TCP protocol to send the file, and I could not find an example for HTTP, so I wrote it myself.)
So I thought someone might find this useful. Serving files through the sendfile() OS call is not necessarily faster than when data is copied through "user land", but it ends up utilizing the CPU and RAM less, thus being able to handle larger number of connections than the classic way.
The link: https://gist.github.com/1350901
Previous SO post describing exactly what im saying (php + socket.io + node)
I think you could put up a node server on somehost:8000 with socket.io and slap the socket.io client code into tags and with minimal work get your existing app rocking with socket.io (realtime baby) without a ton of work.
While node can be your only backend server remember that node likes to live up to it's name and become a node. I checked out a talk awhile back that Ryan Dahl gave to a PHP Users's group and he mentioned the name node relating to a vision of several node processes doing work and talking with each other.
Its LAMP versus MEAN nowadays. For a direct comparison see http://tamas.io/what-is-the-mean-stack.
Of course M, E and A are somewhat variable. For example the more recent koa may replace (E)xpress.
However, just replacing Apache with Node.js is probably not the right way to modernize your web stack.

Web programming: Apache modules: mod_python vs mod_php

I've been using for more than 12 years PHP with Apache (a.k.a mod_php) for my web
development work. I've recenlty discovered python and its real power (I still don't understand why this is not always the best product that becomes the most famous).
I've just discovered mod_python for Apache. I've already googled but without success things like mod_python vs mod_php. I wanted to know the differences between the two mod_php and mod_python in terms of:
speed
productivity
maintainance
(I know `python is most productive and maintainable language in the world, but is it the same for Web programming with Apache)
availability of features e.g, cookies and session handling, databases, protocols, etc.
My understanding is that PHP was designed with Internet/Web in mind, but Python is for a more general purpose.
Now most people are leaving mod_python for mod_wsgi, which is more robust and flexible.
To answer other questions:
speed: python is faster. (PHP is slower than both ruby and python)
productivity: at least the same as php with numerous libraries
maintenance: python is clear and neat
features: more than you need, I would say.
Python was not popular on web because it wasn't focused on web at all. It has too many web frameworks (more frameworks than programming languages), so the community has not been as strong as Ruby on Rails.
I wanted to know the differences between the two mod_php and mod_python...
PHP is more widely available on Internet hosts than Python.
I've noticed on one of my Python web sites that if I'm the first user to use Python, on that Internet host, the start up time of the Python services can be measured in minutes. Most people won't wait minutes for a web page to pop up.
Python has the same web features (cookies, session handling, database connections, protocols) as PHP.