Redis connection/buffer-size limit exceeded - redis

While stress testing our application server we have got the following exception from Redis:
ServiceStack.Redis.RedisException: could not connect to redis Instance at redis-host:6379 ---> System.Net.Sockets.SocketException: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full redis-host:6379
at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port)
at System.Net.Sockets.Socket.Connect(String host, Int32 port)
at ServiceStack.Redis.RedisNativeClient.Connect()
--- End of inner exception stack trace ---
at ServiceStack.Redis.RedisNativeClient.Connect()
at ServiceStack.Redis.RedisNativeClient.AssertConnectedSocket()
at ServiceStack.Redis.RedisNativeClient.SendCommand(Byte[][] cmdWithBinaryArgs)
at ServiceStack.Redis.RedisNativeClient.SendExpectData(Byte[][] cmdWithBinaryArgs)
at ServiceStack.Redis.RedisClient.GetValueFromHash(String hashId, String key)
at ServiceStack.Redis.Generic.RedisTypedClient1.GetValueFromHash[TKey](IRedisHash2 hash, TKey key)
It seems that there are connection limit exceeds on redis host port. Any idea how to increase this threshold through Redis.conf OR server configuration? We have hosted the Redis instance over Ubuntu server.

I was able to duplicate the same issue of buffer size limit exceeded using ServiceStack. The code to do the stress testing is here - run 20 instances of the application for at least 20 minutes. https://github.com/ServiceStack/ServiceStack.Redis/commit/b01582f9c873f375794c04d46aad400590ca5bf3
The first error you may see is "Could not connect to redis instance" as described by
Redis unable to connect in busy load, but if you expand the inner exception you see "An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full"
My problem occured on Window7, but not Window Server 2008 rc. So I begin to look at if it was an OS problem. After emailing Demis at ServiceStack, it was concluded that ServiceStack was closing the sockets correctly. Looking at the OS, the problem was fixed with setting TcpTimeWaitDelay and MaxUserPort.
More references. TcpTimeWaitDelay to 45 seconds
and MaxUserPort
http://mashijie.blogspot.com/2009/05/change-default-setting-of-tcp-ports.html
I adjusted the port range to 1025-64511

Related

Found "Thread Exhausted" in Gemfire Server Log

I checked the gemfire server log and found the following statements in my log file. :
Rejected connection from Server connection from [client host address=XXX.XXX.XXX.XX; client port=XXXX1] because incoming request was rejected by pool possibly due to thread exhaustion
Rejected connection from Server connection from [client host address=XXX.XXX.XXX.XX; client port=XXXX2] because incoming request was rejected by pool possibly due to thread exhaustion
....
What are the possible causes? How do i find the root cause?
I using Gemfire 9.8.6, and most of the regions are replicated. Clients are connected to the server regions through Caching Proxy by Spring Data Gemfire.
gemfire.properties [Server]
Based on the Cache Server Log File, i found out that my Handshaker max Pool Size: 4 and max-connections=800 and max-threads=0
Handshaker max Pool size: 4
CacheServer Configuration: port=51XX max-connections=800 max-threads=0 notify-by-subscription=true socket-buffer-size=1250000
I changed the file descriptors for redhat soft limit to 8192, and the hard limit to 81920, and number of processes (nproc) soft limit to 501408, with an unlimited hard limit.
Total Number of Server : 2
Total Number of Locator : 2
Total Number of Client: 15
Thank you for your help
This message is generally logged by the GemFire server whenever it doesn't have enough resources to handle the amount of requests coming in. I'd suggest having a look at Fine-Tuning Your Client/Server Configuration and Making Sure You Have Enough Sockets.
Hope this helps. Cheers.

How to stop client from reconnecting to server when the server is down?

How can we stop a client from reconnecting to the server after some retries.
In our case (in memory DB for fast retrieval), we have used Ignite and Oracle in parallel so that if Ignite server is down, then I could get my data from Oracle.
But when I start my application (while the Ignite server node is down for some reason), my application always waiting until it connects to server.
Console message:
Failed to connect to any address from IP finder (will retry to join topology every 2000 ms; change 'reconnectDelay' to configure the frequency of retries):
There is a TcpDiscoverySpi.joinTimeout property, which does exactly what you want: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.html#setJoinTimeout-long-
By default, it's not defined, so, node will try to reconnect endlessly.

Network issues and Redis PubSub

I am using ServiceStack 5.0.2 and Redis 3.2.100 on Windows.
I have got several nodes with active Pub/Sub Subscription and a few Pub's per second.
I noticed that if Redis Service restarts while there is no physical network connection (so one of the clients cannot connect to Redis Service), that client stops receiving any messages after network recovers. Let's call it a "zombie subscriber": it thinks that it is still operational, but never actually receives a message: client thinks it has a connection, the same connection on server is closed.
The problem is no exception is thrown in RedisSubscription.SubscribeToChannels, so I am not able to detect the issue in order to resubscribe.
I have also analyzed RedisPubSubServer and I think I have discovered a problem. In the described case RedisPubSubServer tries to restart (send stop command CTRL), but "zombie subscriber" does not receive it and no resubscription is made.

Jedis pool configuration for get and set/flush operations

I am new to redis and using Jedis client in my application. I have gone through couple of threads and did not find the conclusive answers.
I have 2 questions where I need clarity.
For my production use I want to set separate timeout for jedis get operations and set operations. For all set operation I want to set timeout to 2000ms and for get 100ms. I have implemented below configuration.
JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setMaxIdle(30);
poolConfig.setMinIdle(10);
poolConfig.setMaxWaitMillis(2000);
jedisPool = new JedisPool(poolConfig, RedisDBUrl, port, 100);
Let me know if above configuration will do the job. I am setting read timeout to 100ms and maxwait to 2000ms.
Let me know if my understanding is correct.
At times I get JedisConnectionException: java.net.SocketTimeoutException: Read timed out or sometimes connect timeout.
Here connect time out is thrown when my application is not able to make connection to redis withing configured time?
Secondly, read timeout comes when application is connected to redis but operations(get/set) are taking time or for some reason?
Lastly, how do i configure timeout for read timeout and connect timeout?
After many hit and trial and some test runs found out, you can not set separate timeout for jedis get and set operations.
May be you can use some external library to achieve(Like google's SimpleTimeLimiter.
Further from what I have observed connect timeout occurs when jedis tries to connect to redis server. In my case my redis server latency from my system is ~120-125ms so if I set timeout=100ms in jedis constructor I get "connect time out".
Whereas "read time out" comes when you are connected to redis server but redis operation doesn't return in specified time. To test this scenario I have set the timeout in constructor to 180ms and tried to run the flushall operation(takes long time), here I got read timeout.
Still not sure though whats the significance of poolConfig.setMaxWaitMillis().

jedis connection settings for high performance and reliablity

I am using Jedis client for connecting to my Redis server. The following are the settings I'm using for connecting with Jedis (using apache common pool):
JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setTestOnBorrow(true);
poolConfig.setTestOnReturn(true);
poolConfig.setMaxIdle(400);
// Tests whether connections are dead during idle periods
poolConfig.setTestWhileIdle(true);
poolConfig.setMaxTotal(400);
// configuring it for some good max value so that timeout don't occur
poolConfig.setMaxWaitMillis(120000);
So far with these setting I'm not facing any issues in terms of reliability (I can always get the Jedis connection whenever I want), but I am seeing a certain lag with Jedis performance.
Can any one suggest me some more optimization for achieving high performance?
You have 3 tests configured:
TestOnBorrow - Sends a PING request when you ask for the resource.
TestOnReturn - Sends a PING whe you return a resource to the pool.
TestWhileIdle - Sends periodic PINGS from idle resources in the pool.
While it is nice to know your connections are still alive, those onBorrow PING requests are wasting an RTT before your request, and the other two tests are wasting valuable Redis resources. In theory, a connection can go bad even after the PING test so you should catch a connection exception in your code and deal with it even if you send a PING. If your network is stable, and you do not have too many drops, you should remove those tests and handle this scenario in your exception catches only.
Also, by setting MaxIdle == MaxTotal, there will be no eviction of resources from your pool (good/bad?, depends on your usage). And when your pool is exhausted, an attempt to get a resource will endup in timeout after 2 minutes of waiting for a free resource.