I am new to web developing, so my question might be silly.
I want to set up an Apache cluster. I have four hardware machines and I want to distribute http requests load between them. Now on every machine I have installed Fedora.
For now it can be simple load balance cluster without any recovery techniques (in case of some hardware error on some servers). And of course I need open source (free to commercial use) software.
Any suggestions on what soft/tutorials/books I should look to learn how to set up environment like this?
The O'Reilly book "Server Load Balancing" by Tony Bourke (now runs lbdigest.com) is the classic. Unfortunately it's a little dated. Maybe Tony will consider an update.
If you really want "no recovery techniques", basic DNS Round-robin might work for you but it's pretty crude. There's an open source project called Ultramonkey, but I haven't had a chance to mess with it. A lot of development has gone into commercial solutions which offer load balancing, high-availability, etc. Including the Coyote Point product (which I helped write). The appliance based products are actually quite affordable today.
Related
We want to start working with liferay. But the server is too heavy and the developpers computer don't have enought RAM. We want to centralize the server instance.
In other words, we want to build a development server where all developpers can connect and directly develop in their web browser, compile, view the result and push the code to git repository.
I found some good cloud IDE like eclipse CHE and a good maven archetype for liferay projet. So i can build the projet with maven. But now i want to know if it is possible to configure Liferay like every developpers can work without troubling another. And if possible, How ?
The developpers can share the same database and can use different port. Maybe, the server can generate tempory URL like some online cloud editor.
I found this post Liferay With Multiple Server Instances, but i don't think is the best way because he create one server per project. I think is too heavy.
If necessary, We have kubernetes in our IS.
Liferay's tomcat bundle, by default, is configured to take a maximum of 2.5G for the process, but it can run with far less - the default only recently was bumped up, because many people never change the default and then wonder why production systems run out of memory. For 1 concurrent user (the sole developer) on a machine, I guess that the previous default of 1G heap space is enough. Are you saying that that's too much for your developers' machines?
Having many developers on a shared server poses one problem: Yes, you may deploy different code from different machines, but: How about setting a breakpoint? Can you connect with multiple debuggers? If something fails, how do you know whos recent deployment caused the failure?
Sharing a server is an integration technique, not a development technique. If your developers don't have enough memory available for running their own Liferay server next to their IDE, it's a lot cheaper to upgrade their machines than to slow them down when everybody is accessing the same server and they can't properly debug. You pay the memory once, but your waiting developers by the hour.
Is it possible to share one server? Sure it is.
Is it possible to share one server without troubling each other? I doubt.
When you say: You think it's too heavy: What are you basing that assumption on? What does the actual developer machine look like and what keeps you from investing in the extra memory?
It's trivial to share some infrastructure - i.e. have all of them connect to the same database server (and give everyone their own schema). But just the extra effort and setup might require you to pay the developers by the hour as much as you'd otherwise pay for a couple of memory chips.
And yet another option is: Run Liferay on a remote server, but keep 1 instance per developer. This way you don't need the local memory, but can have the memory in the cloud. Calculate if you pay more for remote cloud machines than for local memory - that decision is up to you.
Is it possible having virtual machines in the cloud, install visual studio there, and making developers using the 'cloud' to do day-to-day programming work? Is the cost going to be too high? Is the speed going to be too slow?
Where can I find statistics or numbers to convince people?
I like using remote virtual machines to run development servers, but I don't like using my IDE on a remote server. The latency is noticeable. If you're without an internet connection you can't work. My happy compromise is to have a dev server available (EC2) and sync it with my laptop via git.
It is completely possible to do this, using a service like Rackspace you can set up a fairly powerful windows server for as little as $60 a month:
http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing
In my experience using Remote Desktop to log into a Rackspace Windows Cloud Server has been snappy and quick (of course a lot of that depends on the strength of your internet connection). The process of standing up the server is lighting fast, backing it up is even easier, and it can be easily resized down the line if you need more storage/bandwidth.
These days I don't understand why a small to mid sized organization would actually waste capital on server hardware.
Evan
Practical uses of virtualization in software development are about as diverse as the techniques to achieve it.
Whether running your favorite editor in a virtual machine, or using a system of containers to host various services, which use cases have proven worth the effort and boosted your productivity, and which ones were a waste of time ?
I'll edit my question to provide a summary of the answers given here.
Also it'd be interesting to read about about the virtualization paradigms employed too, as they have gotten quite numerous over the years.
Edit : I'd be particularly interested in hearing about how people virtualize "services" required during development, over the more obvious system virtualization scenarios mentioned so far, hence the title edit.
Summary of answers :
Development Environment
Allows encapsulation of a particular technology stack, particularly useful for build systems
Testing
Easy switching of OS-specific contexts
Easy mocking of networked workstations in a n-tier application context
We deploy our application into virtual instances at our host (Amazon EC2). It's amazing how easy that makes it to manage our test, QA and production environments.
Version upgrade? Just fire up a few new virtual servers, install the software to be tested/QA'd/used in production, verify the deployment went well, and throw away the old instances.
Need more capacity? Fire up new virtual servers and deploy the software.
Peak usage over? Just dispose of no-longer-needed virtual servers.
Virtualization is used mainly for various server uses where I work:
Web servers - If we create a new non-production environment, the servers for it tend to be virtual ones so there is a virtual dev server, virtual test server, etc.
Version control and QA applications - Quality Center and SVN are run on virtual servers. The SVN box also runs CC.Net for our CI here.
There may be other uses but those seem to be the big ones at the moment.
We're testing the way our application behaves on a new machine after every development iteration, by installing it onto multiple Windows virtual machines and testing the functionality. This way, we can avoid re-installing the operating system and we're able to test more often.
We needed to test the setup of a collaborative network application in which data produced on some of the nodes was shared amongst cooperating nodes on the network in a setup with ~30 machines, which was logistically (and otherwise) prohibitive to deploy and set up. The test runs could be long, up to 48 hours in some cases. It was also tedious to deploy changes based on the results of our tests because we'd have to go around to each workstation and make the appropriate changes, which was a manual and error-prone process involving several tired developers.
One approach we used with some success was to deploy stripped-down virtual machines containing the software to be tested to various people's PCs and run the software in a simulated data-production/sharing mode on those PCs as a background task in the virtual machine. They could continue working on their day-to-day tasks (which largely consisted of producing documentation, writing email, and/or surfing the web, as near as I could tell) while we could make more productive use of the spare CPU cycles without "harming" their PC configuration. Deployment (and re-deployment) of the software was simplified, since we could essentially just update one image and re-use it on all the PCs. This wasn't the entirety of our testing, but it did make that particular aspect a lot easier.
We put the development environments for older versions of the software in virtual machines. This is particularly useful for Delphi development, as not only do we use different units, but different versions of components. Using the VMs makes managing this much easier, and we can be sure that any updated exes or dlls we issue for older versions of our system are built against the right stuff. We don't waste time changing our compiler setups to point at the right shares, or de-installing and re-installing components. That's good for productivity.
It also means we don't have to keep an old dev machine set up and hanging around just-in-case. Dev machines can be re-purposed as test machines, and it's no longer a disaster if a critical old dev machine expires in a cloud of bits.
We used to have a dedicated server (1&1) and very infrequently ran into problems with the server having issues.
Recently, we migrated to a VPS (Wiredtree.com) with similar specs to our old dedicated server, but notice frequent problems running out of memory, mysql having to restart, etc... both when knowingly running intensive scrips and also just randomly during normal use.
Because of this, we're considering migrating to another at VPS - this time at Slicehost to see if it performs better.
My question is two fold...
Are their straightforward ways we could stress test a VPS at Slicehost to see if the same issues occur without having to actually migrate everything over?
Also, is it possible that the issues we're facing aren't just because of the provider (Wiredtree) but just the difference between a dedicated box and VPS (despite having similar specs)?
The best way to stress test an environment is to put it under load. If this VPS is hosting a web application, use one of the many available web server benchmark tools: ab, httperf, Siege or http_load. You don't necessarily care that much about the statistics from the tool itself, but more that it puts a predictable load on the server so that you can tune Apache to handle it, or at least not crash and burn.
The one problem you have with testing against Slicehost is that you are at the mercy of the Internet and your bandwidth to Slicehost. You may not be able to put enough load on the server to reach a meaningful conclusion.
Instead, you might find it just as valuable to run one of the many virtualization products on the market and set up a VM with comparable specs to the VPS plan you're considering. Local testing over your LAN will allow you to put a higher and more predictable load on the server.
In either case, you don't need to migrate everything, but you will need to set up an environment for your application to run in, with representative data in your database.
A VPS with similar specs to a dedicated server should perform approximately the same, but in order to get good performance, you still need to tune Apache, MySQL and any other long-lived server processes. In my experience, the out-of-the-box configuration of Apache in many Linux distributions is not ideal and will allow far too many child processes, overcommitting memory and sending the server into a swap-death spiral.
Simplified, I have an application where data is intended to flow over the internet between two servers. Ideally, I'd like to test at what point the software ceases to function. At what lowerbound limit (bandwidth, latency, dropped packets) do things stop working to test the reliability of the software.
What I thought I would do was the following:
Setup up 3 machines (VMware instances)
Install the 2 applications on two of the servers.
Setup up the 3rd server to sit between the two machines by doing some sort of magic with Routing and Remote Access on Windows 2003
Install either Traffic Shaper XP or NetLimiter to limit the bandwidth
Run something like TMnetSim Network Simulator to simulate a bad connection.
Does this sound like a good idea or are there easier/better ways of doing this? I'm not that comfortable on Linux and my team mates are even less so.
WANem does exactly this. We have used it both in a virtual machine on the desktop and on a dedicated old pc and it worked great. It can simulate all sorts of broken connectivity.
FreeBSDs ipfw has provisions to simulate links with a given bandwith, latency or error rate. You could use that FreeBSD machine as your machine "in the middle" in your above setup.
You probably can also run at least one of the endpoints on the same machine if you want to reduce the amount of servers involved.
Someone actually packaged up the settings and whatnot necessary for the FreeBSD solution to this problem and they call it DUMMYNET.
It simulates/enforces queue and bandwidth limitations, delays, packet losses, and multipath effects. It also implements a variant of Weighted Fair Queueing called WF2Q+. It can be used on user's workstations, or on FreeBSD machines acting as routers or bridges.
It can simulate exactly what you want, and its free and will boot onto commodity hardware. They even have a canned install of it that is small enough to put on a floppy disk (!) that you can download at that link.
Maybe it is time to learn a bit about Linux because adding a 50ms delay on every outgoing packet can be done in typing just one line:
tc qdisc add dev eth0 root netem delay 50ms
For more see the Linux Traffic Control HOWTO
We had a similar requirement some ten years ago - I'll see if I can recall how we managed it.
If I remember, we wrote a socket proxy program which was controlled by inetd on a UNIX box. This socket would accept connections from a client and open equivalent sessions through to the server. It would then loop, passing messages in both directions.
The way we achieved WAN characteristics was to introduce random delays (with upper and lower limits) in both the connection establishment and the passing of data once the link was up.
It also had the feature to drop the link occasionally as WAN links were less reliable for us than local traffic.
I recall we had to make it threaded to stop the delays from affecting reverse traffic on the link.
There is a very good (and free) Microsoft solution for that, we use it for quite some time and it works great, it can very easily simulate every thing(packet loss, low bandwidth, disconnection, latency....)
This is the best solution i found for a windows environment
More information and a download link can be found here: MARCO blog post
this product has gone some evolution and it is now integrated into visual studio as part of the automation testing, but i found the use of the standalone(that is quite hard to find, so keep a local copy) to work much better. keep in mind that you need at least two computers(or VMs) since you need to pass through a network adapter in order for the application to work its magic.