Which file is consuming most of the bandwidth? - cpanel

My website is consuming much more bandwidth than it supposed to be. From Weblizer or awstats of WHM/ cPanel I can monitor the bandwidth usage, which type of files (jpg, png, php, css etc.) is consuming the bandwidth. But I couldn't get any specific file name. My assumption is the bandwidth usage is done by referral spaming. But from the "Visitors" page of cPanel I can see only last 1000 hits. Is there any way from where I can see that which image or css file is consuming the bandwidth.

If there is a particular file which you think is consuming the most bandwidth, then you use apachetop tool.
yum install apachetop
then run
apachetop -f /var/log/apache2/domlogs/website_name-ssl.log
replace website_name with which you wish too.
It will basically pick the entries from domlogs (which saves requests being served from websites, you may read more about domlogs here).
This will show the file which is being requested the most in real time basis and might give you an idea if particular image/php etc file has maximum requests.
Domlogs is a way to find which file request by which bot etc is being carried out. Your initial investigation may start from this point.

Related

Dropbox API requests performance issues

The issue we dealing with is when moving or copying files in Dropbox server to another folder in Dropbox server.
The API requires to send request for each file separately. That takes way too long.
Maybe You provide some kind of batch request so I could move more than one files per request?
I also know the ability to move all folder content, but it doesn't work on our case, cause we need only subset of files to move.
If we try to flush many request at once threw several connections, we get 'Server Unavailable' or 'File Locked' errors and need to repeat request again.
Tl;DR;
To move 1000 files that already are in Dropbox server it takes over 30 minutes.
What possible solutions You have to increase the performance?
The Dropbox API now provides a batch endpoint for moving files. You can find the documentation here:
https://www.dropbox.com/developers/documentation/http/documentation#files-move_batch

How to upload large files to mediawiki in an efficient way

We have to upload a lot of virtual box images witch are between 1G and 6G.
So i would prefer to use ftp for upload and then include the files in mediawiki.
Is there a way to do this?
Currently I use a jailed ftp user who can upload to a folder and then use the UploadLocal extension to include the files.
But this works only for files smaller then around 1G. If we upload bigger files we get a timeout and even by setting execution_time of PHP to 3000s the including stops after about 60s with a 505 gateway time out (witch is also the only thing appearing in the logs).
So is there a better way of doing this?
You can import files from shell using maintenance/importImages.php. Alternatively, upload by URL by flipping $wgAllowCopyUploads, $wgAllowAsyncCopyUploads and friends (requires that job queue be run using cronjobs). Alternatively, decide if you need to upload these files into MediaWiki at all, because just linking to them might suffice.

what's the performance impact causing from the large size of Apache's access.log?

If the logs file like access.log or error.log gets very large, will the large-size problem impact the performance of Apache running or user accessing? From my understanding, Apache doesn't read entire logs into memory, but just make use of filehandle to write. Right? If so, I don't have to remove the logs manually every time when it's large enough except for the filesystem issue. Please help and correct me if I'm wrong. Or is there any Apache Log I/O issue I'm supposed to take care when running it?
Thx very much
Well, i totally agree with you. Per my understanding apache access the log files using handlers and just append the new message at the end of the file. That's way a huge log file will not make the difference when has to do with writing to the file. But may be if you want to access the file or open it with a kind of logging monitoring tool then the huge size will slowdown the process of reading the file.
So i would suggest you to use log rotation to have an overall better end result.
This suggestion is directly form the apche web site.
Log Rotation
On even a moderately busy server, the quantity of information stored in the log files is very large. The access log file typically grows 1 MB or more per 10,000 requests. It will consequently be necessary to periodically rotate the log files by moving or deleting the existing logs. This cannot be done while the server is running, because Apache will continue writing to the old log file as long as it holds the file open. Instead, the server must be restarted after the log files are moved or deleted so that it will open new log files.
From the Apache Software Foundation site

Speeding up file downloads from web server

How do I optimize my server for server file downloads, apart from buying a faster network plan? The files are static and stored in a folder as is. An .htaccess file ensures files are forced to download rather than attempt to open in browser. Is it recommended to use a CDN?
A CDN is definitely a way to get good performance, but the price only makes sense for quite large volumes.
What kind of files are you serving? If they're compressible (eg HTML, word processor documents, PDFs; not JPEGs), then make sure compression is enabled - use mod_deflate.

Serving dynamic zip files through Apache

One of the responsibilities of my Rails application is to create and serve signed xmls. Any signed xml, once created, never changes. So I store every xml in the public folder and redirect the client appropriately to avoid unnecessary processing from the controller.
Now I want a new feature: every xml is associated with a date, and I'd like to implement the ability to serve a compressed file containing every xml whose date lies in a period specified by the client. Nevertheless, the period cannot be limited to less than one month for the feature to be useful, and this implies some zip files being served will be as big as 50M.
My application is deployed as a Passenger module of Apache. Thus, it's totally unacceptable to serve the file with send_data, since the client will have to wait for the entire compressed file to be generated before the actual download begins. Although I have an idea on how to implement the feature in Rails so the compressed file is produced while being served, I feel my server will get scarce on resources once some lengthy Ruby/Passenger processes are allocated to serve big zip files.
I've read about a better solution to serve static files through Apache, but not dynamic ones.
So, what's the solution to the problem? Do I need something like a custom Apache handler? How do I inform Apache, from my application, how to handle the request, compressing the files and streaming the result simultaneously?
Check out my mod_zip module for Nginx:
http://wiki.nginx.org/NgxZip
You can have a backend script tell Nginx which URL locations to include in the archive, and Nginx will dynamically stream a ZIP file to the client containing those files. The module leverages Nginx's single-threaded proxy code and is extremely lightweight.
The module was first released in 2008 and is fairly mature at this point. From your description I think it will suit your needs.
You simply need to use whatever API you have available for you to create a zip file and write it to the response, flushing the output periodically. If this is serving large zip files, or will be requested frequently, consider running it in a separate process with a high nice/ionice value / low priority.
Worst case, you could run a command-line zip in a low priority process and pass the output along periodically.
it's tricky to do, but I've made a gem called zipline ( http://github.com/fringd/zipline ) that gets things working for me. I want to update it so that it can support plain file handles or paths, right now it assumes you're using carrierwave...
also, you probably can't stream the response with passenger... I had to use unicorn to make streaming work properly... and certain rack middleware can even screw that up (calling response.to_s breaks it)
if anybody still needs this bother me on the github page