How best to manage Redis connections using ServiceStack? - redis

I work on a few .NET web apps that use Redis heavily for caching along with ServiceStack's Redis client. In all cases I've got Redis running on the same machine. I've used both BasicRedisClientManager and PooledRedisClientManager (always implemented as singletons) and have had some issues with both approaches.
With BasicRedisClientManager, things would work fine for a while, but eventually Redis would start refusing connections. Using netstat we discovered that thousands of TCP connections to the default Redis port were hanging around in TIME_WAIT status.
We then switched to PooledRedisClientManager, which seemed to fix the problem immediately. However, not long after, we started noticing occasional CPU spikes that we narrowed down to thread waiting (System.Threading.Monitor.Wait calls) caused by PooledRedisClientManager.GetClient.
In code, we use a get-in-get-out approach (using ServiceStack's handy ExecAs shortcuts) so in general connections are acquired very frequently but held as briefly as possible.
We get a modest amount of traffic but we're no StackExchange, and I can't help but think the ServiceStack client is up to the job and we're just doing something wrong. Is PooledRedisClientManager the correct approach here? Would it be advisable to simply increase the pool size? Or is that likely just masking a problem with our code?
Just looking for general guidance here, I don't have specific code I need help with at this point. Thanks in advance.

Are you absolutely sure all Redis connections are being disposed?
With ServiceStack, the Redisproperty on Service and ViewPageBase (if you're using SS Razor) do dispose themselves, but any time you request a connection from the pool yourself you must dispose it yourself.
However, despite this, we recently had issues with our pool being exhausted of all connections, too. One of my colleagues discovered that there wasn't proper clean up for Razor pages and made a pull request here - This means that there has only been correct disposal on Razor pages since ServiceStack v4.0.21. I have not checked if that fix has been back-ported to the v3 branch.
My colleague also added TrackingRedisClientsManager that may help you track down the improper disposal. See here
You can also check the stats of a PooledRedisClientManager by using this helper method. We threw it on a little razor page to check the stats as we feel appropriate) but you could write better code around this to monitor the pool health of specific nodes, too.

Related

Redis connection settings for app "surviving" redis connectivity issues

I'm using azure redis cache for certain performance monitoring services. Basically when events like page loads, etc occur, I send a fire and forget command to redis to record the event. My goal is for my app to function fine whether or not it can contact the redis server. I'm looking for a best practice for this scenario. I would be OK with losing some events if necessary. I've been finding that even though I'm using fire and forget, the app staggers when the web server runs into high latency or connectivity issues with the server.
I'm using StackExchange.Redis. Any best practice configuration options/programming practices for this scenario?
The way I was implementing a singleton pattern on the connection turned out to be blocking requests. Once I fixed this my app behaves as I want (e.g. it still functions when redis connection dies).

The Node.js event loop - nginx/apache

Both nginx and Node.js have event loops to handle requests. I put nginx in front of Node.js as has been recommended here
Using Node.js only vs. using Node.js with Apache/Nginx
with the setup shown here
Node.js + Nginx - What now?
How do the two event loops play together? Is there any risk of conflicts between the two? I wonder because Nginx may not be able to handle as many events per second as Node.js or vice versa. For example, if Nginx can handle 1000 events per second but node.js only 500, won't that cause issues? (I have no idea if 1000,500 are reasonable orders of magnitude, you could correct me on that.)
What about putting Apache in front of Node.js? Apache has no event loop. Just threads. So won't putting Apache in front of Node.js defeat the purpose?
In this 2010 talk, Node.js creator Ryan Dahl had vision to get rid of nginx/apache/whatever entirely and make node talk directly to the internet. When do you think this will be reality?
Both nginx and Node use an asynchronous and event-driven approach. The communication between them will go more or less like this:
nginx receives a request
nginx forwards the request to the Node process and immediately goes back to wait for more requests
Node receives the request from nginx
Node handles the request with minimal CPU usage, until at some point it needs to issue one or more I/O requests (read from a database, write the response, etc). At this point it launches all these I/O requests and goes back to wait for more requests.
The above can repeat lots of times. You could have hundreds of thousands of requests all in a non-blocking wait state where nginx is waiting for Node and Node is waiting for I/O. And while this happens both nginx and Node are ready to accept even more requests!
Eventually async I/O started by the Node process will complete and a callback function will get invoked.
If there are still I/O requests that haven't completed for this request, then Node goes back to its loop one more time. It can also happen that once an I/O operation completes this data is consumed by the Node callback and then new I/O needs to happen, so Node can start more async I/O requests before going back to the loop.
Eventually all I/O operations started by Node for a particular request will be complete, including those that write the response back to nginx. So Node ends this request, and then as always goes back to its loop.
nginx receives an event indicating that response data has arrived for a request, so it takes that data and writes it back to the client, once again in a non-blocking fashion. When the response has been written to the client and event will trigger and nginx will then end the request.
You are asking about what would happen if nginx and Node can handle a different number of maximum connections. They really don't have a maximum, the maximum in general comes from operating system configuration, for example from the maximum number of open handles the system can have at a time or the CPU throughput. So your question does not really apply. If the system is configured correctly and all processes are I/O bound, neither nginx or Node will ever block.
Putting Apache in front of Node will only work well if you can guarantee that your Apache never blocks (i.e it never reaches its maximum connection limit). This is hard/impossible to achieve for large number of connections, because Apache uses an individual process or thread for each connection. nginx and Node scale really well, Apache does not.
Running Node without another server in front works fine and it should be okay for small/medium load sites. The reason putting a web server in front of it is preferred is that web servers like nginx come with features that Node does not have and you would need to implement yourself. Things like caching, load balancing, running multiple apps from the same server, etc.
I think your questions have been largely covered by some of the others answers, but there are a few pieces missing, and some that I disagree with, so here are mine:
The event loops are isolated from each other at the process level, but do interact. The issues you're most likely to encounter are around the configuration of nginx response buffers, chunked data, etc. but this is optimisation rather than error resolution.
As you point out, if you use Apache you're nullifying the benefit of using Node.js, i.e. massive concurrency and websockets. I wouldn't recommend doing that.
People are already using Node.js at the front of their stack. Searching for benchmarks returns some reasonable-looking results in Node's favour, so performance to my mind isn't an issue. However, there are still reasons to put Nginx in front of Node.
Security - Node has been given increasing scrutiny, but it's still young. You may not have problems here, but caution is often your friend.
Training - Ops staff that you hire will know how to manage Nginx, but the configuration and management of your custom Node app will only ever be understood by those people your developers successfully communicate it to. In some companies this is nobody.
Operational Flexibility - If you reach scale you might want to split out the serving of static content, purely to reduce the load on your app servers. You might want to split content amongst different domains and have it managed separately, or have different SSL or proxying behaviour for different domains or URL patterns. These are the things that are easy for Ops guys to configure in Nginx, but you'd have to code manually in a Node app.
The event loops are independent. Event loops are implemented at the application level, so neither cares what sort of architecture the other uses.
NodeJS is good at many things, but there are some places where it still falters. Once example is serving static files. At the moment, nodejs performs fairly poorly in this test, so having a dedicated web server for your static files greatly improves response time. Also, nodejs is still in its infancy, and has not been "tested and hardened" in the matters of security like Apache on nginX.
It'll take a long time for people to consider fronting nodejs all by itself. The cluster module is a step in the right direction, but it'll take a long time even after it reaches v1 before it happens.
Both event loops are unrelated. They don't play together.
Yes, it is pretty useless. Apache is not a load balancer.
What Ryan Dahl said may be applicable already. The limit of concurrent users is definitely higher than that of Apache. Before node.js websites with fair amount of concurrent users had to use nginx to balance the load. For small to medium sized businesses it can be done with node.js alone. But ruling out nginx completely will take time. Let node.js be stable before it can follow this ambitious dream.

WCF client proxy keep alive?

It there any disadvantage of creating a wcf client in code everytime a call is needed. currently i have a static class that creates a client and reuses it for a period of time (couple of minutes before the wcf service times out)
i'm having problems with it getting into a faulted state while i'm in development because i keep recompiling the WCF code. its an annoyance now but think it'll be fine in production.
but... creating client proxy with user creds everytime a call is made... bad practice? performance issues?
As far as I know there is no performance penalty and this is the good way of doing it i.e create a client proxy each time you need it.
And each time you're done with it, it is a recommended best practice to always close the proxy. Closing the proxy releases the connection held toward the service, which is particularly important to do in the presence of a transport session. It also helps ensure the threshold for the maximum number of connections on the client’s machine is not reached. Closing the proxy terminates the session with the service instance.
I think the best answer is a little of both.
there is definitely a performance hit creating a proxy client every call. if you can create a proxy client and use it for all the calls you're going to make immediately. then dispose of it. it is much faster.

WCF client application hang -- need repro advice

I have a WCF application with a couple thousand clients connecting to a pair of services running under IIS. What I've noticed is that some of these clients get into a hung state, and I'm trying to reproduce this.
When this problem was first noticed, I had not modified the throttling configuration and the services were set to ConcurrencyMode.Single. One thing I noticed was that an IISReset on the server caused many clients to hang. Yet pulling this same stunt on the client running against IIS on my local machine doesn't seem to cause the problem.
I caught this only once in the wild, but didn't have debugging enabled at the time. The symptom I witnessed was that the client appeared to be trying to open a connection to the web server, but did not succeed. While monitoring with Fiddler, I saw no attempt to reach the service endpoint. Obviously that makes me suspect the client proxy.
I have a very solid hunch as to what's happening -- namely I've been using "Close()" instead of "Abort()" when the service throws an exception, which I believe is causing the channels to become corrupted. But considering the effort to get a new version out there, I need to reproduce this problem by causing a client on my own machine to hang before I can start making changes to the code.
Where should I start?
Thanks in advance,
roufamatic
Have you got any logging turned on? This could help in diagnosing the problem. It can be done completely in config, so no need to build a new version. Use the Service Configuration Editor tool to set it all up. The Visual Studio 2008 Training Kit has a good tutorial on how to use logging and the log viewer.
I suppose this was too vague a question though I was mostly curious what people might suggest. As it turns out there was a nontrivial difference between my workstation and a production environment that, once resolved, allowed me to see the problem. In this case, somehow using Fiddler to watch the traffic actually prevented the error from occurring! Now to ask another question.

Too many TIME_WAIT connections

We have a fairly busy website (1 million page views/day) using Apache mod proxy that keeps getting overloaded with connections (>1,000) in the TIME_WAIT state. The connections are to port 3306 (mysql), but mysql only shows a few connections (show process list) and is performing fine.
We have tried changing a bunch of things (keep alive on/off), but nothing seems to help. All other system resources are within reasonable range.
I've searched around, which seems to indicate changing the tcp_time_wait_interval. But that seems a bit drastic. I've worked on busy website before, but never had this problem.
Any suggestions?
Each time_wait connection is a connection that has been closed.
You're probably connecting to mysql, issuing a query, then disconnecting. Repeat for each query on the page. Consider using a connection pooling tool, or at very least, a global variable that holds on to your database connection. If you use a global, you'll have to close the connection at the end of the page. Hopefully you have someplace common you can put that, like a footer include.
As a bonus, you should get a faster page load. MySQL is quick to connect, but not having to re-connect is even faster.
If your client applications are using JDBC, you might be hitting this bug:
http://bugs.mysql.com/bug.php?id=56979
I believe that php has the same problem
Cheers,
Gilles.
We had a similar problem, where our web servers all froze up because our php was making connections to a mysql server that was set up to do reverse host lookups on incoming connections.
When things were slow it worked fine, but under load the responstimes shot through the roof and all the apache servers got stuck in time_wait.
The way we figured the problem out was through using xdebug to create profiling data on the scripts under high load, and looking at that. the mysql_connect calls took up 80-90% of the execution time.