wamp apache - polling server continuously - apache

I've heard polling the server is not the best of ideas.
Let's say I make a client-server application.
A simple game for example.
Where each client polls the server every half a minute.
How many clients is it possible to have before it overloads a wamp server?
Basically how robust is Apache for this kind of stuff?
Getting a request, aggregating data from mysql server, and then returning the data in an xml format.

This is a really open ended question. It entirely depends on your configuration, how many apache services do you have running, how many physical servers do you have, how is your mysql server setup (is it on it's own machine)? You want to also keep in mind that by polling the server you have to initiate a connection each time and allocate resources for that communication (in the lower level networking and your program).
If at all possible it might be better for the server to push content to the client (assuming a push happens less frequently than polls happen).

My guess is you will maxout whatever the request is handling and the mysql database before Apache server becomes a problem.
However, going out on a limb, if proper caching is in place, and a sufficiently smart architecture and pragmatic design, and effectively only the 30 sec polling, you should be able to support a couple 1000 users.
But : mock it up, quick and dirty, do roughly what you think you will be doing, and hit it with JMeter (or similar) with 3, 10, 30, 100, 300, 1000, 3000, ... till you find the performance wall.
I am scared about the aggregating data part... if aggregation is really needed, carefully architect it so you do not need to go to the DB, because this will kill you in production, and you wont find it in development.

Can't really give you numbers. It depends on a bunch of factors (including hardware).
What's more important to you: How many concurrent users you can support or how real-time the results are?
If you looking for real-time results, you probably want to investigate something like Comet or long polling.
If you're looking for supporting a lot of users, the long polling approach probably isn't ideal and you'll probably want something more lightweight than Apache. Personally, I'm a fan of nginx.
EDIT: And if you're feeling really hip, your best bet for real-time results here is Web Sockets, but if you're a microsoft guy this isn't going to do you much good as IE doesn't support them.

Related

how can we integrate two rails applications deployed within an intranet

Is RESTful services the only route for integrating any application with a rails applications including any other rails applications irrespective of whether it is in same network or not?
For integrating two applications how heavy is a RESTful service compared to the RMI based integration available in other technologies like Java EE?
Is there way to integrate two rails applications using any natively understood binary format which can avoid transformation to a different format ex: HTTP request.
The REST approach means simply that application A will make requests of application B (and potentially the other way around) using the HTTP protocol. The data send can be in whatever format you like, although JSON is the default today (and XML was the default yesterday, and even ... SOAP -- gaq!).
These days, the vast majority of external APIs are implemented this way -- Amazon, Google Maps, Yelp, etc, etc, etc. Why? Because the HTTP (or HTTPS) protocol is well understood and widely deployed. No special configuration is required and the same protocol that serves the application to regular people on web browsers works for other applications. Rails makes this brilliantly easy (if you go with the flow).
Java's RMI is a specific protocol (just as HTTP is). The advantage is that objects defined in A are available as instances in B (after a great deal of work in both). This really makes sense when you have a set of applications all designed up front to work together and whose main requirement is to be distributed across locations, servers, etc. RMI creates a tight binding between applications -- a change in one typically requires a change in the other. It's right for some kinds of applications.
But if you have, for example, two departments in a company who talk to each other, but don't want to be "bound at the hip", a REST interface provides a great deal of flexibility.
Your second question ("how heavy") is very difficult to answer. A company I worked for in 2001 had hundreds of servers all running an instance of a "worker" process -- they were all designed to queue their results to a "controller" process which would process the output and forward to another set servers designed to process and manage the data. In 2001, this was the right architecture because it was completely designed to work together -- persistent socket connections on a single subnet of our intranet running on a room full of servers. Now in 2012, that room full of servers is replaced by a few high-powered processors running 64-bit OS and addressing massive amounts of memory -- it's a whole new world. A doubling of performance in 2001 could save potentially millions of dollars of hardware, operational support, space and so on. In 2012, the most expensive thing is good developers! So "heavy" is really kind of irrelevant in all but the most compute-intensive operations these days. An HTTP request is light and simple.
Final question: natively understood binary format. Sure, if needed. In the end, any binary format that is sent over the wire between two servers needs to be serialized and de-serialized as a stream, and this is work, both for programmers and for machines. JSON is a text format, but one natively understood by JavaScript (JavaScript Object Notation) and has the distinct advantage of being human-readable. Given that most servers are set up to compress output automatically whether something is text or binary becomes kind of less relevant, at least as far as I/O and payload goes. Of course you can come up with any mutually understood format and send it over HTTP, but again, this is something that mattered a decade ago, and today is usually not an issue worth considering. Processors have been getting faster and faster, and memory cheaper (and bigger) -- so (as always) I/O (whether network or disk) is the typical bottleneck in modern applications.
If I were to re-design the application I mentioned from 2001 where hundreds of (today's) servers needed to communicate with (many) peer servers very specifically designed to interoperate, I might work to make sure that the serialize/deserialize process was as lightweight as possible (but only if it turned out to be a bottleneck). For me, being bound to any given platform or language is a non-starter -- the computing world is moving way to fast.
But in almost all realistic business applications today, keeping things simple, standard, and straightforward has both present and future benefits that make the need to worry obsessively about performance a thing of the past.
Hope this helps :-)

What is the best way of pulling json data in terms of performance?

Currently I am using HttpWebRequest to pull json data from an external site, and the performance was not good. Is wcf much better?
I need expert advice on this..
Probably not, but that's not the right question.
To answer it: WCF, which certainly supports JSON, is ultimately going to use HttpWebRequest at the bottom level, and it will certainly have the same network latency. Even more importantly, it will use the same server to get the JSON. WCF has a lot of advantages in building, maintaining, and configuring web services and clients, but it's not magically faster. It's possible that your method of deserializing JSON is really slow compared to what WCF would use by default, but I doubt it.
And that brings up the really important point: find out why the performance is bad. Changing frameworks is only an intelligible optimization option if you know what's slow, and, by extension, how doing something different would make it less slow. Is it the server? Is it deserialization? Is it network? Is it authentication or some other request overhead detail? And so on.
So the real answer is: profile! Once you know what the performance issue really is, you can make an informed decision about whether a framework like WCF would help.
The short answer is: no.
The longer answer is that WCF is an API which doesn't specify a communication method, but supports multiple methods. However, those methods are normally over SOAP which is going to involve more overheard than a JSON, and it would seem the world has decided to move on from SOAP.
What sort of performance are you looking for and what are you getting? It may be that you are simply facing physical limitations of network locations, in which case you might look towards making your interface feel more responsive, even if the data is sluggish.
It'd be worth it to see if most of the latency is just in reaching the remote site (e.g. response times are comparable to ping times). Or, perhaps, the problem is the time it takes for the remote site to generate and serve the page. If so, some intermediate caching might be best.
+1 on what Isaac said, but one thing I'd add is, if you do use WCF here, it'll internally use the HttpWebRequest in most places, so you're definitely not gaining performance at all. One way you may unintentionally gain in performance -- however -- is in how WCF recycles, reuses, pools, and caches most transport objects internally. So it ultimately goes back to Isaac's advice on profiling.

Why use AMQP/ZeroMQ/RabbitMQ

as opposed to writing your own library.
We're working on a project here that will be a self-dividing server pool, if one section grows too heavy, the manager would divide it and put it on another machine as a separate process. It would also alert all connected clients this affects to connect to the new server.
I am curious about using ZeroMQ for inter-server and inter-process communication. My partner would prefer to roll his own. I'm looking to the community to answer this question.
I'm a fairly novice programmer myself and just learned about messaging queues. As i've googled and read, it seems everyone is using messaging queues for all sorts of things, but why? What makes them better than writing your own library? Why are they so common and why are there so many?
what makes them better than writing your own library?
When rolling out the first version of your app, probably nothing: your needs are well defined and you will develop a messaging system that will fit your needs: small feature list, small source code etc.
Those tools are very useful after the first release, when you actually have to extend your application and add more features to it.
Let me give you a few use cases:
your app will have to talk to a big endian machine (sparc/powerpc) from a little endian machine (x86, intel/amd). Your messaging system had some endian ordering assumption: go and fix it
you designed your app so it is not a binary protocol/messaging system and now it is very slow because you spend most of your time parsing it (the number of messages increased and parsing became a bottleneck): adapt it so it can transport binary/fixed encoding
at the beginning you had 3 machine inside a lan, no noticeable delays everything gets to every machine. your client/boss/pointy-haired-devil-boss shows up and tell you that you will install the app on WAN you do not manage - and then you start having connection failures, bad latency etc. you need to store message and retry sending them later on: go back to the code and plug this stuff in (and enjoy)
messages sent need to have replies, but not all of them: you send some parameters in and expect a spreadsheet as a result instead of just sending and acknowledges, go back to code and plug this stuff in (and enjoy.)
some messages are critical and there reception/sending needs proper backup/persistence/. Why you ask ? auditing purposes
And many other use cases that I forgot ...
You can implement it yourself, but do not spend much time doing so: you will probably replace it later on anyway.
That's very much like asking: why use a database when you can write your own?
The answer is that using a tool that has been around for a while and is well understood in lots of different use cases, pays off more and more over time and as your requirements evolve. This is especially true if more than one developer is involved in a project. Do you want to become support staff for a queueing system if you change to a new project? Using a tool prevents that from happening. It becomes someone else's problem.
Case in point: persistence. Writing a tool to store one message on disk is easy. Writing a persistor that scales and performs well and stably, in many different use cases, and is manageable, and cheap to support, is hard. If you want to see someone complaining about how hard it is then look at this: http://www.lshift.net/blog/2009/12/07/rabbitmq-at-the-skills-matter-functional-programming-exchange
Anyway, I hope this helps. By all means write your own tool. Many many people have done so. Whatever solves your problem, is good.
I'm considering using ZeroMQ myself - hence I stumbled across this question.
Let's assume for the moment that you have the ability to implement a message queuing system that meets all of your requirements. Why would you adopt ZeroMQ (or other third party library) over the roll-your-own approach? Simple - cost.
Let's assume for a moment that ZeroMQ already meets all of your requirements. All that needs to be done is integrating it into your build, read some doco and then start using it. That's got to be far less effort than rolling your own. Plus, the maintenance burden has been shifted to another company. Since ZeroMQ is free, it's like you've just grown your development team to include (part of) the ZeroMQ team.
If you ran a Software Development business, then I think that you would balance the cost/risk of using third party libraries against rolling your own, and in this case, using ZeroMQ would win hands down.
Perhaps you (or rather, your partner) suffer, as so many developers do, from the "Not Invented Here" syndrome? If so, adjust your attitude and reassess the use of ZeroMQ. Personally, I much prefer the benefits of Proudly Found Elsewhere attitude. I'm hoping I can proud of finding ZeroMQ... time will tell.
EDIT: I came across this video from the ZeroMQ developers that talks about why you should use ZeroMQ.
what makes them better than writing your own library?
Message queuing systems are transactional, which is conceptually easy to use as a client, but hard to get right as an implementor, especially considering persistent queues. You might think you can get away with writing a quick messaging library, but without transactions and persistence, you'd not have the full benefits of a messaging system.
Persistence in this context means that the messaging middleware keeps unhandled messages in permanent storage (on disk) in case the server goes down; after a restart, the messages can be handled and no retransmit is necessary (the sender does not even know there was a problem). Transactional means that you can read messages from different queues and write messages to different queues in a transactional manner, meaning that either all reads and writes succeed or (if one or more fail) none succeeds. This is not really much different from the transactionality known from interfacing with databases and has the same benefits (it simplifies error handling; without transactions, you would have to assure that each individual read/write succeeds, and if one or more fail, you have to roll back those changes that did succeed).
Before writing your own library, read the 0MQ Guide here: http://zguide.zeromq.org/page:all
Chances are that you will either decide to install RabbitMQ, or else you will make your library on top of ZeroMQ since they have already done all the hard parts.
If you have a little time give it a try and roll out your own implemntation! The learnings of this excercise will convince you about the wisdom of using an already tested library.

PyAMF backend choices!

I've been using PyAMF to write a backend for a flex app that will request different groups of hundreds of different images depending on what the client needs. I have been using the "simple_server" WSGI server that PyAMF supplies while developing the flex code. Now I'm ready to write a robust backend that will be able to pull images from a mySQL database and send them as fast as possible and as efficiently as possible to many concurrent clients.
The PyAMF documentation is great because they supply many examples to follow, however I am confused about what kind of backend I am trying to create.
Do I want a SocketServer or a WSGI server or something like Twisted or web2py or Tornado? Are these even all different? :) Should I be using Apache modules instead (mod_wsgi or modjy or mod_python)?
I realize that this probably touches on many open debates, so maybe you could just point me to any good summaries of these debates?
Its great to have so many options, but how do I choose?
The short answer is, of course, that it depends on the requirements of your project.
How many concurrent connections is "a lot"?
How much programmer time can you throw at the problem?
How much hardware can you throw at the problem?
...etc...
If you plan to have lots of concurrent clients, it's hard to beat Twisted in the Python world. However, you'll have to deal with your database asynchronously to avoid blocking, and depending on how complex your database interactions are, this can be a bit of a pain. You're basically limited to either using twisted.enterprise.adbapi or coming up with your own twisted-ORM integration.
If you'd rather have "easy" database code (i.e. you want to use an ORM), you're better off going with a (TurboGears/Pylons/plain wsgi) project, probably hosted using Apache and mod_wsgi. This can be a pretty scalable solution, and you get a lot of stuff for free using these frameworks, but it may be more than you need.
I would avoid using one of the many plain python wsgi servers out there (wsgiref, paster, etc.) in production if you really want high performance.
Good Luck!

Website Hardware Scaling

So I was listening to the latest Stackoverflow podcast (episode 19), and Jeff and Joel talked a bit about scaling server hardware as a website grows. From what Joel was saying, the first few steps are pretty standard:
One server running both the webserver and the database (the current Stackoverflow setup)
One webserver and one database server
Two load-balanced webservers and one database server
They didn't talk much about what comes next though. Do you add more webservers? Another database server? Replicate this three-machine cluster in a different datacenter for redundancy? Where does a web startup go from here in the hardware department?
A reasonable setup supporting an "average" web application might evolve as follows:
Single combined application/database server
Separate database on a different machine
Second application server with DNS round-robin (poor man's load balancing) or, e.g. Perlbal
Second, replicated database server (for read loads, requires some application logic changes so eligible database reads go to a slave)
At this point, evaluating the current state of affairs would help to determine a better scaling path. For example, if read load is high and content doesn't change too often, it might be better to emphasise caching and introduce dedicated front-end caches, e.g. Squid to avoid un-needed database reads, although you will need to consider how to maintain cache coherency, typically in the application.
On the other hand, if content changes reasonably often, then you will probably prefer a more spread-out solution; introduce a few more application servers and database slaves to help mitigate the effects, and use object caching, such as memcached to avoid hitting the database for the less volatile content.
For most sites, this is probably enough, although if you do become a global phenomenon, then you'll probably want to start considering having hardware in regional data centres, and using tricks such as geographic load balancing to direct visitors to the closest "cluster". By that point, you'll probably be in a position to hire engineers who can really fine-tune things.
Probably the most valuable scaling advice I can think of would be to avoid worrying about it all far too soon; concentrate on developing a service people are going to want to use, and making the application reasonably robust. Some easy early optimisations are to make sure your database design is fairly solid, and that indexes are set up so you're not doing anything painfully crazy; also, make sure the application emits cache-control headers that direct browsers on how to cache the data. Doing this sort of work early on in the design can yield benefits later, especially when you don't have to rework the entire thing to deal with cache coherency issues.
The second most valuable piece of advice I want to put across is that you shouldn't assume what works for some other web site will work for you; check your logs, run some analysis on your traffic and profile your application - see where your bottlenecks are and resolve them.
plenty of fish Architecture
some interesitng videos:
Youtube scalibility
Inteview with Dan Farino, System Architect at Myspace
Joel mentioned adding a second datacenter, with the same setup, and then assigning your users randomly to each. Changes to the data are logged and sent from one location to the other, so that both locations contain all the data.
The talk Scalable Web Architectures Common Patterns & Approaches from Cal Henderson (Yahoo) on Web 2.0 Expo was quite interesting. I thought there was an video, but I could not find it. But here are the slides:
http://www.slideshare.net/techdude/scalable-web-architectures-common-patterns-and-approaches
A certain next step would be a cluster of webservers (a web farm) and a clustered system of database servers (replication or Oracle RAC etc. etc.)
If your interested in caching and using .Net, look into the application caching block in enterprise library (of course use this along with the other points above).