Server load is minimal but website responds poorly - apache

I have VPS on hetzner. Server is located in Germany.
It has 256GB RAM, 6CPUs (12 threads).
I have a file which since yesterday, is requested about 30 times in one second. File has 2 Select, 2 Update, 2 Insert queries, so I assumed (not sure how this works) from this file server has about 180 requests per second. So right after this requests started, all the websites on the server just started loading poorly. I made this file run just one select query and than die. This didn't help. In WHM load is aboiut 0.02.
I've checked for error logs and there is no max_user_connection or any error there.
I have enabled slow query log and checked log file. there is nothing (I've tested it with select sleep(10) and this query was logged).
This is visit statistics, please bring your attention to may 30th:
Bandwidth stats for last 24 hours:
There are many errors like this in ssl_log (diff IPs of course):
188.121.206.150 - - [30/May/2018:19:50:03 +0200] "-" 408 - "-" "-"
I've been searching web a lot and couldn't find any solution. Could anyone at least tell what should I monitor or where. I have full access to anything there is possible inside the server. Any help is appreciated.
UPDATE 1
I have subdomain: banners.analyticson.com (access allowed for now) and there I have all the images and html5 files that are requested.
Take one image for example : https://banners.analyticson.com/img/suy8G1S6RU.jpg
It needs too much time to load. As I noticed, this sub domain has some issue.
Script, that I mentioned earlier (with 6 queries) just tries to get one of those banners to the user, so result of that script is to return one banner from banners.analyticson.com.
UPDATE 2
I've checked my script, it is fine. It takes less than 1 second to complete.
I've also checked Top command and there is a result. I'm not sure if $MEM value is fine.

You're going to have to narrow the problem down...
There are multiple potential issues.
First thing to eliminate would be the performance of your new script on a development laptop - I assume you're using PHP, so use the profiling tools to work out what is going on. If it's a database query, you'll see which one by looking at the profiler.
If your PHP script and database queries are fine, the next thing to look at: it sounds like you've hit some bottleneck resource on your infrastructure. In these cases, scripts that run fine as a single request start queueing for the bottleneck resource, and every new request adds to the queue until the whole server starts to crawl. This can be a bit of a puzzle - start with top and keep digging.
Next, I'd look at configuration of Apache to make sure everything is squeaky clean - Apache used to have a default to do a reverse DNS lookup for every request, which slows the server down rather impressively on production. You may also want to look at your SSL configuration - the error you report is linked to a load balancer issue.
If it's not as simple as memory, CPU etc., you're into more esoteric issues. You may need to ramp up a load testing rig so you can experiment without affecting the live site - typically, I do this on a machine as similar to live as possible, using Apache JMeter to generate load, and find the "inflection point". Typically, you see response times increase linearly with the number of concurrent requests, until you hit the bottleneck resource, at which point the response time increases rapidly. As a simple example, if you have 10 database connections available, response time should increase linearly up to 10 concurrent connections, and then become much larger from 11 up.
Knowing where the inflection point is and being able to recreate it allows you to use PHP profiling tools under load. This is a lot of work.
UPDATE
You're using php-cgi; this is easily the most inefficient way of running PHP scripts. Your server is barely breaking a sweat - CPU and memory basically idle. Here's a comparison for how to run PHP; consider changing to mod_php.

Related

jmeter Load Test Serevr down issues

I was used a load of 100 using ultimate thread group for execution in NON GUI Mode .
The Execution takes place around 5 mins. only . After that my test environment got shut down. I am not able to drill down the issues. What could be the reason for server downs. my environment supports for 500 users.
How do you know your environment supports 500 users?
100 threads don't necessarily map to 100 real users, you need to consider a lot of stuff while designing your test, in particular:
Real users don't hammer the server non-stop, they need some time to "think" between operations. So make sure you add Timers between requests and configure them to represent reasonable think times.
Real users use real browsers, real browsers download embedded resources (images, scripts, styles, fonts, etc) but they do it only once, on subsequent requests the resources are being returned from cache and no actual request is being made. Make sure to add HTTP Cache Manager to your Test Plan
You need to add the load gradually, this way you will be able to state what was amount of threads (virtual users) where response time start exceeding acceptable values or errors start occurring. Generate a HTML Reporting Dashboard, look into metrics and correlate them with the increasing load.
Make sure that your application under test has enough headroom to operate in terms of CPU, RAM, Disk space, etc. You can monitor these counters using JMeter PerfMon Plugin.
Check your application logs, most probably they will have some clue to the root cause of the failure. If you're familiar with the programming language your application is written in - using a profiler tool during the load test can tell you the full story regarding what's going on, what are the most resources consuming functions and objects, etc.

debugging a slow site -> long delay between connection and data sending

i ran a test from pingdom tools to check the loading time of my website... the result is that i have a lot of files that, in spite of being very small (5kB), take a lot of time (1 second or more) to load because there is a big delay between the beginning of the connection and the beginning of data downloading (in pingdom tools, this results in a very large green bar).
Have a look at this for example: http://tools.pingdom.com/default.asp?url=http%3a%2f%2fwww.giochigratis-online.net%2f&id=5691308
How can i lower the "green bar" time? Is this an apache problem (like, i dont know, the number of max. parallel connections, or something similar...), or an hardware problem? Cpu-limited, bandwith-limited, or what else?
I see that many other websites have very little green bars... how do they reduce the delay between the connection and the real data sending?
Thanks!
ps.: the site is made with drupal. Homepage generation takes about 700ms
pps.: i tested 3 other websites on the same server: same problem.
I think it could be the problem with max no. of parallel connections as you mentioned - either on server or client side. For instance, Firefox has default of network.http.max-connections-per-server = 15 (see here) while you have >70 files to be downloaded in your domain and next 40 from Facebook.
You can reduce number of loaded images by generating sprites i.e. the image consisting of multiple small images, and then using CSS to diplay them properly in places that you want. This is widely used e.g. by Google - see http://www.google.com/images/nav_logo83.png

Intermittent SQL timeouts

Since a few days ago, the SQL server (Microsoft SQL Server 2005) backing our site has started occasionally timeouting. It is happening at seemingly random times approximately every hour or two. It usually takes about 10 minutes during which we see hundreds of timeouted requests. Under normal circumstances, most of our queries take less than 50ms. A query which takes a significant fraction of a second is an exception.
I have effectively killed a day trying to figure out at least something without any real progress. Normally, the server load is about 10-20%, and when the timeouts happen, we don’t see any increased CPU load. Also, there is nothing special happening during the timeouts; no overzealous web crawler, no heavy background tasks, no increased network traffic, no increased number of connections etc. Simply, everything looks as usual.
Not making any progress, we decided to restart it (and install the latest SP since we were in it) which seems to have fixed the problem. It has been already over six hours without any incident. Also, the CPU load has gone down under 10%.
It almost seems like if the SQL server "deteriorated" overtime. Perhaps, some internal structure (some cache or statistic) got out shape and caused the occasional problems. I don’t have any other explanation.
The only thing I noticed when I was monitoring the server (and got lucky once to be present when the timeouts were happening), I saw several long running queries waiting on CXPACKET. But I learned that this is most likely just a consequence of some other problem. I wrote a script monitoring SQL requests, and so hopefully, next time it happens, I will have more information.
Has anybody had similar experience? I’m not an SQL Server guru. Any suggestions are welcome.
since everything looked normal: CPU, nothing special happening, no overzealous web crawler, no heavy background tasks, no increased network traffic, no increased number of connections etc. I'd look into locking\blocking\race condition. Use this to see what (if anything) is locking when the time-out are happening:
How to find out what SQL queries are being blocked and what's blocking them?

How can you test your web server speed?

Our website seems to be slower than it used to be, how can I test that? And is there a way to find the cause? (eg too many visitors).
Thanks.
There is a rather good tool for performance benchmarking of web servers: Jakarta Jmeter, which is an Apache project, so it's rather well supported and tested.
The key to be able to pinpoint the cause would be to do benchmarking regularly, so you can actually match changes in your benchmark results with events on your server: upgrades, code changes, variations in the number of visitors...
The Firebug add on for Firefox has a Net tab which is useful for debugging issues and testing. Also Fiddler on Windows is nice. And then there is the age old tradition of checking your server error logs for any problems.
A good first step is to make sure you are keeping fairly complete server logs and feed them into a log analyser. This is helpful for giving you a general idea of how long things take and which pages are slowest. It's also a good idea to check your error logs to make sure things are working properly.
Beyond that, things get more complicated as you may need to isolate your webserver, code and database to see if one of these is the bottleneck. Also, Jeff's blog, coding horror had a recent entry on server optimization.
Use Google Analytics to track your site's visitors over time to find out if you are getting more traffic.
You tagged your question with shared-hosting - being on a shared host means that someone else's code running on the same machine as your may be affecting your site's performance.
I'd suggest going with Varkhan and apphacker's suggestion to make sure your site's code is reasonably quick. use Analytics to get some stats and the possibly, depending on how many visitors you are getting and how slow the site is, consider moving away from a shared host.
Try the server speed checker at Bitcatcha.com. The tool ping your website server and record the time needed to get a response. It also pings from 8 different nodes to your server. You are able to at least find out whether it's your server that is slowing your website.

Top & httpd - demystifying what is actually running

I often use the "top" command to see what is taking up resources. Mostly it comes up with a long list of Apache httpd processes, which is not very useful. Is there any way to see a similar list, but such that I could see which PHP scripts etc. those httpd processes are actually running?
If you're concerned about long running processes (i.e. requests that take more than a second or two to execute), you'll be able to get an idea of them using Apache's mod_status. See the documentation, and an example of the output (from www.apache.org). This isn't unique to PHP, but applies to anything running inside an apache process.
Note that the www.apache.org status output is publicly available presumably for demonstration purposes -- you'd want to restrict access to yours so that not everyone can see it.
There's a top-like ncurses-based utility called apachetop which provides realtime log analysis for Apache. Unfortunately, the project has been abandoned and the code suffers from some bugs, however it's actually very much usable. Just don't run it as root, run it as any user with access to the web server log files and you should be fine.
The php scripts happen so fast, top wouldn't show you very much. Or it would zip by quite quickly. Most webrequests are quite quick.
I think your best bet would be to have some type of real time log processor, that kept an eye on your access logs and updates stats for you of average run time, memory usage and stuff like that.
You could make your PHP pages time themselves and write their path and execution time to file or database. Note that would slow everything down while you were monitoring, but it would serve as a good measuring method.
It wouldn't be that interactive though. You'd be able to get daily or weekly results from it, but it'd be hard to see something meaningful within minutes or hours.