We have a fairly busy website (1 million page views/day) using Apache mod proxy that keeps getting overloaded with connections (>1,000) in the TIME_WAIT state. The connections are to port 3306 (mysql), but mysql only shows a few connections (show process list) and is performing fine.
We have tried changing a bunch of things (keep alive on/off), but nothing seems to help. All other system resources are within reasonable range.
I've searched around, which seems to indicate changing the tcp_time_wait_interval. But that seems a bit drastic. I've worked on busy website before, but never had this problem.
Any suggestions?
Each time_wait connection is a connection that has been closed.
You're probably connecting to mysql, issuing a query, then disconnecting. Repeat for each query on the page. Consider using a connection pooling tool, or at very least, a global variable that holds on to your database connection. If you use a global, you'll have to close the connection at the end of the page. Hopefully you have someplace common you can put that, like a footer include.
As a bonus, you should get a faster page load. MySQL is quick to connect, but not having to re-connect is even faster.
If your client applications are using JDBC, you might be hitting this bug:
http://bugs.mysql.com/bug.php?id=56979
I believe that php has the same problem
Cheers,
Gilles.
We had a similar problem, where our web servers all froze up because our php was making connections to a mysql server that was set up to do reverse host lookups on incoming connections.
When things were slow it worked fine, but under load the responstimes shot through the roof and all the apache servers got stuck in time_wait.
The way we figured the problem out was through using xdebug to create profiling data on the scripts under high load, and looking at that. the mysql_connect calls took up 80-90% of the execution time.
Related
I have RabbitMQ Server 3.6.0 installed on Windows (I know it's time to upgrade, I've already done that on the other server node).
Heartbeats are enabled on both server and client side (heartbeat interval 60s).
I have had a resource alarm (RAM limit), and after that I have observed the raise of amount of TCP connections to RMQ Server.
At the moment there're 18000 connections while normal amount is 6000.
Via management plugin I can see there is a lot of connections with 0 channels, while our "normal" connection have at least 1 channel.
And even RMQ Server restart won't help: all connections would re-establish.
1. Does that mean all of them are really alive?
Similar issue was described here https://github.com/rabbitmq/rabbitmq-server/issues/384, but as I can see it was fixed exactly in v3.6.0.
2. Do I understand right that before RMQ Server v3.6.0 the behavior after resource alarm was like that: several TCP connections could hang on server side per 1 real client autorecovery connection?
Maybe important: we have haProxy between the server and the clients.
3. Could haProxy be an explanation for this extra connections? Maybe it prevents client from receiving a signal the connection was closed due to resource alarm?
Are all of them alive?
Only you can answer this, but I would ask - how is it that you are ending up with many thousands of connections? Really, you should only create one connection per logical process. So if you really have 6,000 logical processes connecting to the server, that might be a reason for that many connections, but in my opinion, you're well beyond reasonable design limits even in that case.
To check, see how many connections decrease when you kill one of your logical processes.
Do I understand right that before RMQ Server v3.6.0 the behavior after resource alarm was like that: several TCP connections could hang on server side per 1 real client autorecovery connection?
As far as I can tell, yes. It looks like the developer in this case ran across a common problem in sockets, and that is the detection of dropped connections. If I had a dollar for every time someone misunderstood how TCP works, I'd have more money than Bezos. So, what they found is that someone made some bad assumptions, when actually read or write is required to detect a dead socket, and the developer wrote code to (attempt) to handle it properly. It is important to note that this does not look like a very comprehensive fix, so if the conceptual design problem had been introduced to another part of the code, then this bug might still be around in some form. Searching for bug reports might give you a more detailed answer, or asking someone on that support list.
Could haProxy be an explanation for this extra connections?
That depends. In theory, haProxy as is just a pass-through. For the connection to be recognized by the broker, it's got to go through a handshake, which is a deliberate process and cannot happen inadvertently. Closing a connection also requires a handshake, which is where haProxy might be the culprit. If haProxy thinks the connection is dead and drops it without that process, then it could be a contributing cause. But it is not in and of itself making these new connections.
The RabbitMQ team monitors this mailing list and only sometimes answers questions on StackOverflow.
I recommended that this user upgrade from Erlang 18, which has known TCP connection issues -
https://groups.google.com/d/msg/rabbitmq-users/R3700QdIVJs/taDYKI6bAgAJ
I've managed to reproduce the problem: in the end it was a bug in the way our client used RMQ connections.
It created 1 auto-recovery connection (that's all fine with that) and sometimes it created a separate simple connection for "temporary" purposes.
Step to reproduce my problem were:
Reach memory alarm in RabbitMQ (e.g. set up an easily reached RAM
limit and push a lot of big messages). Connections would be in state
"blocking".
Start sending message from our client with this new "temp" connection.
Ensure the connection is in state "blocked".
Without eliminating resource alarm, restart RabbitMQ node.
The "temp" connection itself was here! Despite the fact auto-recovery
was not enabled for it. And it continued sending heartbeats so the
server didn't close it.
We will fix the client to use one and the only connection always.
Plus we of course will upgrade Erlang.
I work on a few .NET web apps that use Redis heavily for caching along with ServiceStack's Redis client. In all cases I've got Redis running on the same machine. I've used both BasicRedisClientManager and PooledRedisClientManager (always implemented as singletons) and have had some issues with both approaches.
With BasicRedisClientManager, things would work fine for a while, but eventually Redis would start refusing connections. Using netstat we discovered that thousands of TCP connections to the default Redis port were hanging around in TIME_WAIT status.
We then switched to PooledRedisClientManager, which seemed to fix the problem immediately. However, not long after, we started noticing occasional CPU spikes that we narrowed down to thread waiting (System.Threading.Monitor.Wait calls) caused by PooledRedisClientManager.GetClient.
In code, we use a get-in-get-out approach (using ServiceStack's handy ExecAs shortcuts) so in general connections are acquired very frequently but held as briefly as possible.
We get a modest amount of traffic but we're no StackExchange, and I can't help but think the ServiceStack client is up to the job and we're just doing something wrong. Is PooledRedisClientManager the correct approach here? Would it be advisable to simply increase the pool size? Or is that likely just masking a problem with our code?
Just looking for general guidance here, I don't have specific code I need help with at this point. Thanks in advance.
Are you absolutely sure all Redis connections are being disposed?
With ServiceStack, the Redisproperty on Service and ViewPageBase (if you're using SS Razor) do dispose themselves, but any time you request a connection from the pool yourself you must dispose it yourself.
However, despite this, we recently had issues with our pool being exhausted of all connections, too. One of my colleagues discovered that there wasn't proper clean up for Razor pages and made a pull request here - This means that there has only been correct disposal on Razor pages since ServiceStack v4.0.21. I have not checked if that fix has been back-ported to the v3 branch.
My colleague also added TrackingRedisClientsManager that may help you track down the improper disposal. See here
You can also check the stats of a PooledRedisClientManager by using this helper method. We threw it on a little razor page to check the stats as we feel appropriate) but you could write better code around this to monitor the pool health of specific nodes, too.
I'm running in to an issue in an OS X app that creates multiple, persistent connections to the same host using NSURLConnection. I create a separate connection for different rooms, and it stays connected the entire time the room is open to consume a streaming API. When opening many rooms, it stops working correctly.
I created a separate sample app that creates 10 connections, and it seems to only allow 6 connections to work, and the others are queued. Does anyone know if there is a way to override this limit? I can't find it documented anywhere. The only workaround I've found is it seems to be per host name, so testing with "localhost" and "127.0.0.1" allows 6 connections per host. I uploaded a sample project with client and server here - http://cl.ly/1x3K0D1F072V3U2T0C0I.
I filed a Radar for something that seems like the same issue but on iOS. I found that you can't have more than 5 connections open at once. The connections don't have to be pointing to the same domain. Anything after that would be queued. So if you have 5 connections open to an extremely slow endpoint, any other connections will not go through.
Radar: http://openradar.appspot.com/radar?id=2542401
Apple's reply:
This is the effect of our NSURLConnection connection cache. It is expected. We expect to address this type of configuration with new API.
I asked if they could give me anymore information (does it vary? does the type of connection affect it?) and they said:
Unfortunately, we can't give details about the connection limit behavior.
User agents in general (Chrome, Firefox, Safari) use six simultaneous TCP connections per hostname, with potential one-offs.
You could break this limitation by using CFNetwork API (CFHTTPMessage).
Here is the CFNetwork Programming Guide.
https://developer.apple.com/library/mac/documentation/Networking/Conceptual/CFNetwork/Introduction/Introduction.html#//apple_ref/doc/uid/TP30001132
BTW, if you decide to use CFNetwork, you'll need to work around the proxy and authenticate.
Wish this could helped!
I got this question regarding web server (such as nginx, Cherokee or Oracle iPlanet) and Java containers (such as GlassFish): Can we control what happens to the connection if the user drops an unfinished connection?
When a browser opens an HTTP/HTTPS connection to a server, it hits the web server (nginx, Cherokee or Oracle iPlanet) and then reverse proxies to the Java container (GlassFish). The Java application then executes and does quite a lot of things such as calculation and finally needs to write to, say, 3 different databases. If it has finished writing to the 1st database - but not yet to the 2nd and 3rd database - and the user closes the connection (by closing the browser window, or looses a network connection, etc.) what will happen to the process?
Specifically, I would like the process to CONTINUE until it finishes executing all the code. I know of one way is to spin off the process on a new thread, but then this will incur computation costs. So, are there any settings/config I can do to make sure it will continue to execute even though the user has broken the connection?
With nginx, you can set proxy_ignore_client_abort on; and it will not close the connection to the backend if the client closes its connection.
I am working on a website that displays some data from DB that changes frequently (Status of a queue and a chat conversation). My current setup is Apache/PHP/MySQL. Naturally I would like to avoid polling the server every x seconds since this does not scale well. I would like to do reverse ajax long polling, however, I've read that Apache does not work well with this since it quickly runs out of worker threads. There are many other web servers out there that get around this problem: nginx, tornado, etc. However, my problem is, PHP is the ONLY server-side scripting language I know. Also I've already written some PHP scripts so I'd like to keep them if I can. I am ok with switching server so long as I can still use PHP.
But after doing some more research, I've read that people say PHP (PHP-FPM?) also creates a process for every request made, which means if I have hundreds/thousands of open connections, there will be hundreds/thousands of processes, which will be problem as well.
Can I conclude that there's no good scalable ways to make long polling websites using PHP? Should I abandon PHP and learn another server scripting language? I can continue developing long polling using my current setup (Apache/PHP) for now but I don't want the choice of scripting language to pose any limitation on the scalability of my system when I deploy. So what should I do? I am not very experienced with web programming, so if any gurus out there can give me some pointers I'd appreciate it! Thank you!
PHP runned in php-fpm mode will still have limitations, especially if your code is eating a lot of memory. You won't be able to run thousands of parallel processes without some available memory. But it usually perform faster than mod_php, and at least HTTP request that do not need PHP are handled by the webserver, and if that webserver is nginx you'll get a lot more HTTP requests available in parallel.
With php-fpm you will also have a queue of waiting requests, that may be usefull in case a temporary big traffic, as at least requests are queued, not rejected.
Now the long polling operations are OK with nginx (or others, that's an example), but not with PHP. PHP is not built to be a long-running server, each request is a new process, it's really not the right choice for a KeepAlive thing. But "Divide ut regnes" (divide and rule). Your long polling tasks could run near your PHP application, but without your PHP application.
As an example look at the jappix project, this is a PHP project. But you need to put somewhere an XMPP server (like ejabberd), and a BOSH server with nginx as a proxy on port 80 to that BOSH server (so you have the xmpp chat protocol on port 80, via nginx and ejabberd, and nothing on the PHP side for that). The problem is then to connect your application authentification, identification, and such, and this will have to be done by extending the XMPP server configuration (so that it use the same LDAP server as your PHP app for example).
Your second long polling problem is the status of a queue. You may find some XMPP extensions for that, maybe. Or you may perform regular ajax queries on the queue. One of the useful technique to avoid the big number of ajax requests on your PHP application is to reschedule the next ajax check on the ajax callback of the check, based on the Fibonacci numbers (it's an example). So the first time the next ajax call will be scheduled 1 minutes after, next time 2 minutes, then 3m, 5m, 8m, 13m, 21m, 34m, 55m, 89m, 144m, etc. The idea is that it's maybe important to check new messages incoming 1 minute after a page load. As the user is still reading the same page (or drinking a coffee, talking to a friend, going to holidays without switching off his computer, etc), we can delay more and more the next checks. Is a way of assuming the user is not really active. Note that you could detect user activity by other means and alter the rescheduling.
PHP is nor right for long polling, Comet and reverse ajax technologies. You should use Node.js