To compress or not to compress? - apache

Enabling compressing (gzip/deflate) in the Apache server will reduce the size of the response but will add more CPU cycles, I will run a stress test with various response sizes but
I wanted to ask if in terms of server load is there any suggestion on when should I turn compressing on or off?
Thank you

In most cases web servers are limited by io (be it memory, network bandwidth, database, hard drive, ...), and have plenty of spare cpu cycle to use for compressing the pages before serving them, especially since this isn't even really that much cpu intensive, while it provide a huge usability boost for your users and save you bandwidth.

I think that as long as the server has a powerful CPU, use compression. Speed is usually the best feature that servers should have after security and stability.

It depends on what you want to achieve. Tipically, turning deflate on won't add a very significant footprint to your CPU performance and if your website/s include large text files (html, js, css, etc.) it's likely to make an important difference in bandwidth usage and page loading times. Of course, if what you want is to reduce system load and don't care much for bandwidth, this wouldn't be the right choice for you.
Another option you might find useful is installing a lightweight web server/proxy like Nginx, lighttpd or Varnish (I personally prefer the first one), and serve compressed static content with that (leaving heavier Apache processes only to handle the dynamic content). That would also be likely to result in a better overall performance of your server. But, again, these all depends on your scenario, what's your website/web application like, and what you want to achieve.

Related

When to turn off Transparent Huge Pages for redis

According to redis docs, it's advisable to disable Transparent Huge Pages.
Would the guidance be the same if the machine was shared between the redis server and the application.
Moreover, for other technologies, I've also read guidance that THP should be disabled for all production environments when setting up the server. Is this kind of pre-emptiveness applicable to redis as well, or one must first strictly monitor latency issues before deciding to turn off THP?
Turn it off. The problem lies in how THP shifts memory around to try and keep or create contiguous pages. Some applications can tolerate this, most databases cannot and it causes intermittent performance problems, some pretty bad. This is not unique to Redis by any means.
For your application, especially if it is JAVA, set up real HugePages and leave the transparent variety out of it. If you do that just make sure you alocate memory correctly for the app and redis. Though I have to say, I probably would not recommend running both the app and redis on the same instance/server/vm.
Turning off transparent hugepages is a bad idea, and redis no longer recommends it.
What you should do instead is make sure transparent_hugepage is not set to always. (This is what recent versions of redis check for.) You can check the current value of the setting with:
$ cat /sys/kernel/mm/transparent_hugepage/enabled
And correct it like so:
# echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
Although no action is likely to be necessary, since madvise is typically the default setting in recent linux distros.
Some background:
transparent_hugepage = always: can force applications to use hugepages unless they opt out with madvise. This has a lot of problems and is rarely enabled.
transparent_hugepage = never: does not fulfill allocations with hugepages, even if the application requests it with madvise
transparent_hugepage = madvise: allows applications to opt-in to hugepages. This is normally a good idea because hugepages can improve performance in some applications, but this setting doesn't force them on applications that, like redis, don't opt in
It is rather annoying that searching for "transparent huge pages" yields top results about how to disable them because Redis and some other databases cannot handle transparent huge pages without performance degradation.
These applications should do any of:
Use prctl(PR_SET_THP_DISABLE, ...) call to opt out from using transparent huge pages.
Be started by a script which does this call for them prior to fork/exec the database process. PR_SET_THP_DISABLE get inherited by child processes/threads for exactly this scenario when an existing application cannot be modified.
prctl(PR_SET_THP_DISABLE, ...) has been available since Linux 3.15 circa 2014, so that there is little excuse for those databases to not mention this solution, instead of giving this poor/panic advice to their users to disable transparent huge pages for the entire system.
3 years after this question was asked, Redis got disable-thp config option to make prctl(PR_SET_THP_DISABLE, ...) call on its own, by default.
My production memory-intensive processes go 5-15% faster with /sys/kernel/mm/transparent_hugepage/enabled set to always. Many popular desktop applications benefit from always transparent huge pages immensely.
This is why I cannot appreciate those search results for "transparent huge pages" spammed with Redis adviсe to disable them. That's a panic advice from Redis, not the best practice.
The overhead THP imposes occurs only during memory allocation, because of defragmentation costs.
If your redis instance has a (near-)constant memory footprint, you can only benefit from THP. Same applies to java or any other long-lived service that does its own memory management. Pre-allocate memory once and benefit.
why playing such echo-games when there is a kernel-param you can boot with?
transparent_hugepage=never

Image Resizer - Best Practice for security

We are currently testing out Image Resizer library and one of the questions is, how do we avoid malicious attacks to the site if someone programmically send thousands of resizing requests of images with arbitrary sizes to the server, overloading the CPU/RAM of server and potentially causing disk space to run out due to tremendous caching files.
Is there any way to whitelisting certain dimensions? Or what is the best practice to avoid this scenario?
Thanks!
Stephen
Neither CPU or RAM can generally be overloaded during a (D)DOS attack to ImageResizer. Memory allocation is contiguous, meaning an image cannot be processed unless there is around 15-30% free RAM remaining. Under the default pipeline, only 2 cores are used for image processing, so a regular server will not see CPU saturation either.
In general, there are far more effective ways to attack an ASP.NET website than though ImageResizer. Any database-heavy page is more likely to be a weak point, as the memory allocations are smaller and easier to saturate the server with.
Disk space starvation can be mitigated by enabling autoClean="true".
If you're a high-profile site with lots of determined ill-wishers, you can also consider the following:
Use request signing - only URLs generated by your server will be accepted.
Use the Presets plugin to white-list defined permitted command combinations.
Both of these reduce development agility and limit your options for responsive web design, so unless you have actually been attacked in the past, I don't suggest them.
In practice, (D)DOS attacks against dynamic imaging software are rarely useful at bringing down anything except — temporarily — uncached images — even when running under the same application pool. Since visited images tend to be cached, the actual effect is rather laughable.

Memory requirements when hosting R in the cloud

What is the minimal size server we need to run opencpu, if we expect 100,000 hits a month?
I think opencpu is an exciting project, but need to know about memory usage when opencpu is deployed, since a cloud hosting service such as rackspace charges about $40 per month for 1 GB of RAM.
I know that if I load R without doing anything or without loading any data or package in RAM, it uses almost 700m of RAM (virtual) and 50 megabytes of RAM (in residence).
I know that opencpu uses rApache, and rApache uses preforking, but want to know how this will scale as the number of concurrent users increases. Thanks.
Thanks for the responses
I talked with Jeroen Ooms when visiting LA, and am partly convinced that opencpu will work in high concurrency environments if used correctly, and that he is available to fix issues if they arrise. Opencpu related to his dissertation, after all! In particular, what I find useful about opencpu is its integration with ubuntu's AppArmor, which can restrict processes from using too much RAM and CPU. I think apache might also be able to do this, but RAppArmor can do this and much more. Brilliant! If AppArmor were the only advantage, I would just use that and json as a backend, but it seems like opencpu can also streamline the installation of all this stuff and provides a built in API system.
Given the cost of web-hosting, I imagine a workable real-time analytics system is the following:
create R statistical models on demand, on a specialized analytical server, as often as needed (e.g. every day or hour using cron)
transfer the results of the models to a directory on the opencpu servers using ftp, as native R objects
on the opencpu server, go to the directory and grab the R objects representing the statistical models, and then make predictions or do simulations using it. For example, use the 'predict' function to provide estimates based on user supplied variables.
Does anybody else see this as a viable way to make R a backend for real time analytics?
Dirk is right, it all depends on the R functions that you are calling; the overhead of the OpenCPU architecture should be quite minimal. OpenCPU itself will run on a small server, but as we all know some functionality in R requires much more memory/cpu than others.
If you really want to know how much resources are needed just to run opencpu you could do some benchmarking. As you noted, prefork is used to branch sessions of the main process, so in most cases the copy-on-write principle of forking should make it pretty cheap.
Also there is some other stuff that you can tweak; e.g. preloading of frequently used packages.

Should I use Gzip in this case?

I have a restful java api that provides data to a Node.js client (that gzip data to users). The question is, If they are running in the same machine, should I Gzip the data from the java api to the node.js application?
I'm asking this because this case, I dont have to worry to network latency, but Gzip compression may increase CPU utilization.
Does it worth use gzip this situation?
If the objective is to increase speed of the overall system, then using gzip to transfer across processes boundaries would not be very useful, particularly if the message size is small enough to fit within memory. If the message is too large to fit in memory, and some paging overhead is incurred, the benefit of gzip may be greater but still not anywhere near enough to justify using it. Gzip only makes sense when the speed of compression is significantly greater than the speed of communication. This is usually not the case with inter-process communication (even if it incurs pagefault overhead.)

File upload/download using UDP

We have web based j2ee application which allows file upload/download. Due to latency issue upload/download is slower for many users.
1) I read that sending data using UDP can improve data transfer speed. How can we send file data using UDP?
2) We are zipping file using GZIP before upload/download to reduce amount data transfer. Is there better method available improve data compression?
UDP is a protocol that does not guarantee the arrival of messages. You are most likely using a standard file transfer protocol like ftp which should suit you fine. Are your issues with latency or with bandwidth? You might be better of investigating why the link has a high latency or bandwidth issues, as this could prove to be an issue with other parts of your web application.
GZIP and other zipping tools are good for reducing the amount of data that is sent if you're willing up put up with the initial cost of compressing. These tools should have options so you can tweak the level of compression (i.e. take a long amount of time and compress optimally, or compress it quickly but have a larger zipped file). You will probably need to experiment and see what balance works the best for you.
1) Are there protocols faster than TCP on high latency links?
Yes, UDT is the primary example, but it is not a free trade, for instance consider you now need a custom frontend application to download files.
2) Is there better file compression than GZIP?
Yes, view the exhaustive list at http://www.maximumcompression.com/index.html, bzip2 and 7-zip are popular alternatives to gzip.
Note for specific domains, such as text, photographic images, scanned text, there are domain specific codecs which are more preferable.