At our peak hour we need to serve around 250/rps. What we're doing is accepting a url for an image, pulling the image out of memcache, and returning it via Apache.
Our currently system is a dual-core machine with 4GB of memory: 2GB for the images in memcache and 2GB for Apache; but we're seeing a very high load (20-30) during our peak time. The average response time, as reported by Apache, is 30-80ms per request, which seems kind of slow for a simple Apache request served from memory.
Are there better tools for this? Serving from disk is not an option since the IO wait was holding it back, so we moved it to memory. How do CDN's do it?
EDIT: Well, the system works like this. A request comes in, we check a "queue" to see if we've seen this request before and if we have we serve the image(from disk...or memory). If not we increment the counter for that request in a memcached queue and there are worker machines that actually generate the image and then store it back on the main server. So, currently when a request comes in we're checking the memcached db if it exists then we'll connecting to another db for the actual image database. When the images were on disk we found that just the file_exist function would take 30+ ms to completed so we moved it to memory. If we moved the images to a ramdisk would this speed up the file_exist or would we still want a first check to see if we should even seek the image out?
Have you looked at nginx?
According to Netcraft in May 2009 nginx served or proxied 3.25% busiest sites. It can serve from memcached too.
Depending on size of your image, Apache should handle this with no problem at all. We have an Apache serving 2000 request/seconds, the average size of response is 12K. The machine has 32GB memory so all our content is cached.
Here are some tuning tips,
Use threaded MPM like worker, with lots of threads open (We have 256).
Use mod_cache so all the images will be in memory
Allocate as much memory as possible to the Apache process
When you say memcache, do you mean the memcached server? Running memcached will be slower because the latency on TCP connection (even though it's loopback) is much larger than direct memory access.
If you can fit all your images in memory, a RAM disk will also help a lot.
Related
I am using Azure Redis Cache with 250MB storage, and am storing list of objects with expire time.When i save more list of objects with different key means expire time not working properly. If there is no data means its working fine, refreshing in every 10 mins. But in work load time its not working properly.
How to fix this?
Thank you.
A 250 MB redis cache is hosted on Extra Small (A0) virtual machines, which is hosted using shared cores, has limited bandwidth and is as such not recommended for production workloads. You could check your cache performance counters for CPU and Bandwidth to see if you might be hitting such limits.
I'm using apache with ldirector i'm facing some issues during load times when google, bing crawlers hit my site it makes apache to choke due to which my server's cup useage went to 100% utlization. after this i have to stop apache and monitor load manually i want to automate all this scenario. here is what i want when ever load comes on apache it normalizes server according to given settings and if cpu usage goes high it should not be exceded to given cpu usage limit.
I want to control all this via shell script, please give suggestions.
Would nginx be a more suitable choice as a web server for high traffic websites?
The site we will be building is an e-commerce site, if that makes a difference.
I am really interested in the actual 'why' from a technical point of view either way. i.e., why would nginx be a better choice for this type of site from a technical standpoint, or the opposite, why it wouldn't?
Martin,
In general, Nginx is better for high-traffic sites due to its event-driven architecture. Rather than handling each request in a distinct thread, it uses non-blocking I/O to service many requests in each thread.
The important aspect of this architecture is the reduced use of processes or threads. A thread can consume anywhere from 2MB to over 64MB of RAM. So when Apache serves a 10KB JPEG, it may actually be using a significant amount of RAM. It becomes worse if you have slow clients (e.g. smartphones) where the request may keep a thread busy for several seconds.
Many people find that running Nginx as a proxy in front of Apache to be an ideal middle ground. Nginx talks to the slow clients and can do so using a very small amount of RAM. When requests are forwarded to Apache, the request speed is limited by your local connectivity, not that of the remote user. This means that the network bottleneck will not keep the request (and it's memory-hogging thread) alive for any longer than necessary.
In short you get the low-resource benefits of Nginx coupled with the wide feature-set of Apache.
How is Apache in respect to handling the c10k problem under normal conditions ?
Say while running very small scripts with little data, or do I need to scale out if I use Apache?
In the background heavy lifting is done by a few servers running specialized software that processes the requests but I'd like to use Apache as a front. Is this a viable plan?
I consider Apache to be more of an origin server - running something like mod_php or mod_perl to generate the content and being smart about routing to the appropriate system.
If you are getting thousands of concurrent hits to the front of your site, with a mix of types of data (static and dynamic) being returned, you may find it useful to put a more optimised system in front of it though.
The classic post-optimisation problem with Apache isn't generating the dynamic content (or at least, that can be optimised for early in the process), but simply waiting for a slow client to be able to receive the bytes that are being sent. It can therefore be a significant advantage to put a reverse proxy, in the form of Squid or Nginx, in front of the servers to take over the 'spoon-feeding' of the slow network clients, while allowing the content production to happen at full speed, and at local network speeds - 100Mb/sec or even gigabit speeds - if it even has to traverse a network at all.
I'm assuming you've probably seen this data, but if not, it might give you some idea.
Guys, imagine that you are running web server with 10K connections (simultaneous). How could it be?
You've got many many connections per second
Dynamic content
Are you sure that your CPU can handle that many PHP sessions for example? I guess no, so why are you thinking about C10K problem? :D
Static content - small files
And still soo many connections? On single server? Probably you've got problems with networking/throughput too or you are future competitor of Google. Use lighttpd which addresses C10K problem and is stable - fly light. Using Apache for only static files for large sites is obvious.
Your clients are downloading large files for a large time - static content
ISO images, archives etc
If you are doing it via web server - FTP may be more appropriate.
Video streaming
Use lighttpd or specialized software. And still... What about other resources?
I am using Linux Virtual Server as load balancer in front of apache servers (with specific patches for LVS-NAT) and I am happy :) This string is an answer you want to hear.
I currently have a cluster of 4 Apache web servers which are used to serve up static files of up to 30Mb in size. Generally, I can expect up to 5000 concurrent connections to these servers. What performance improvement would I expect to get by moving this to lighttpd?
I would expect it to handle the concurrency with much more ease and less memory overhead. I've stopped deploying Apache pretty much everywhere I can.
You may also consider nginx for a comparison.
If you are using Apache with MPM with worker or event you probably won't see much of a difference. If you haven't moved to using them I would give that a try. There isn't really any problem with lighttpd though either. I think today it is just a matter of picking one and going with it.
If I where serving that type of file I would push it out to a CDN and not have to worry about it. There are plenty of cheap ones now like CacheFly and Amazon's Cloudfront.
From the top of my head:
Smaller memory footprint
Quicker file reads
Definitely check out the benchmark at their site, they provide a lot of information on this topic: http://www.lighttpd.net/benchmark