Jedis is running out of resource instances, web application blocks - redis

I'm running a Tomcat application which uses Jedis to access a Redis database. Form time to time the whole application blocks. By monitoring Tomcat using JavaMelody I found out that the problem seems be related to the JedisPool when a object requests a Jedis instance.
catalina-exec-74
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:503)
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1104)
redis.clients.util.Pool.getResource(Pool.java:20)
....
This is the JedisPoolConfig I'm using
JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setMaxActive(20);
poolConfig.setTestOnBorrow(true);
poolConfig.setTestOnReturn(true);
poolConfig.setMaxIdle(5);
poolConfig.setMinIdle(5);
poolConfig.setTestWhileIdle(true);
poolConfig.setNumTestsPerEvictionRun(10);
poolConfig.setTimeBetweenEvictionRunsMillis(10000);
jedisPool = new JedisPool(poolConfig, "localhost");
So obviously some threads try to get a Jedis instance but the pool is empty and cannot return an instance so the default pool behavior is wait.
I've already double checked my whole code and I'm pretty sure I return every Jedis instance to the pool that I used before. So I'm not sure why I'm running out of instance.
Is there a ways to check how many instances are left in the pool? I'm trying to find a sensible value for the maxActive parameter to prevent the application from blocking.
Are there any other ways to create memory holes other than not returning the Jedis instances to the pool?

Returning the resource to the Pool is important, so remember to do it. Otherwise when closing your app it'll wait for the resource to return.
https://groups.google.com/forum/?fromgroups=#!topic/jedis_redis/UeOhezUd1tQ
After each Jedis method call, return the resource pool. Your app has probably used all the threads and waits for some to be dropped. This may cause behavior you're explaining and the app is probably blocked.
Jedis jedis = JedisFactory.jedisPool.getResource();
try{
jedis.set("key","val");
}finally{
JedisFactory.jedisPool.returnResource(jedis);
}

Partial answer to hopefully be of some help to people in similar scenarios, though I'm unsure if my problem was the same as yours (if you've since figured it out, please let us know!).
I've already double checked my whole code and I'm pretty sure I return every Jedis instance to the pool that I used before. So I'm not sure why I'm running out of instance.
I thought I had to - I always put my code in try / finally blocks, but it turns out I did have a memory leak:
Jedis jedis = DBHelper.pool.getResource();
try {
// Next line causes the leak
Jedis jedis = DBHelper.pool.getResource();
...
} finally {
DBHelper.pool.returnResource(jedis);
}
No idea how that second call snuck in, but it did and caused both a leak and the web app to block.
Is there a ways to check how many instances are left in the pool? I'm trying to find a sensible value for the maxActive parameter to prevent the application from blocking.
In my case, I both found the bug and optimized based on the number of clients seen by the redis server. I set the loglevel (in redis.conf to verbose (default is notice), which will report about every 5-10 seconds the number of clients connected. Once I found my memory leak, I repeatedly sent requests to the page calling that method, and watched the redis clients reported by redis logs climb, but never drop.. I would think that would be a good start for optimizing?
Anyways, hope this is helpful to someone!

When you use jedis pool, every time you get the resource using getResource(), you have to call releaseResource(). And if number of threads are more than resources, you will have thread contention. I found it much simpler to have Jedis connection per thread using Java ThreadLocal. So, for each thread check whether jedis connection already exists. If yes, use it, otherwise create a connection for the running thread. This ensures there wouldn't be any lock contention or error conditions to look after.

Related

Unexplained latency with ValueOperations using Jedis

We have Spring Boot web services hosted on AWS. They make frequent calls to a Redis Cluster cache using Jedis.
During load testing, we're seeing increased latency around ValueOperations that we're having trouble figuring out.
The method we've zoomed in on does two operations, a get followed by an expire.
public MyObject get(String key) {
var obj = (MyObject)valueOps.get(key);
if (obj != null) {
valueOps.getOperations().expire(key, TIMEOUT_S, TimeUnit.SECONDS)
}
}
Taking measurements on our environment, we see that it takes 200ms to call "valueOps.get" and another 160ms calling "expire", which isn't an acceptable amount of latency.
We've investigated these leads:
Thread contention. We don't currently suspect this. To test, we configured our JedisConnectionFactory with a JedisPoolConfig that has blockWhenExhausted=true and maxWaitMs=100, which if I understand correctly, means that if the connection pool is empty, a thread will block for 100ms waiting for a connection to be released before it fails. We had 0 failures running a load test with these settings.
Slow deserializer. We have our Redis client configured to use GenericJackson2JsonRedisSerializer. We see latency with the "expire" call, which we don't expect has to use the deserializer at all.
Redis latency. We used Redis Insights to inspect our cluster, and it's not pegged on memory or CPU when the load test is running. We also examined slowlog, and our slowest commands are not related to this operation (our slowest commands are at 20ms, which we're going to investigate).
Does anyone have any ideas? Even a "it could be this" would be appreciated.

Getting Aerospike timeout with multiple java client in application

Currently I am using Aerospike in my application.
I faced lots of timeout issues as shown below when I was creating new java client for each transaction and I was not closing it so number of connection ramp up dramatically.
Aerospike Error: (9) Client timeout: timeout=1000 iterations=1 failedNodes=0 failedConns=0
so to resolve this timeout issue,I didn't made any changes to client, read and write policy, I just created only one client, stored it's instance in some variable and used this same client for all transaction (get or put requests).
now I want to understand how moving from multiple client to one client resolved my timeout issue.
how these connection were not closing automatically.
The AerospikeClient constructor requests peers, partition maps and racks for all nodes in the cluster and initializes connection pools and async eventloops. This is an expensive process that is only meant to be performed once per cluster at application startup. AerospikeClient is thread-safe, so instances can be shared between threads.
If AerospikeClient close() is not called, connections residing in the pools (at least one connection pool per node) will not be closed. There are no finalize() methods in AerospikeClient.
The first transaction(s) usually need to create new connections. This adds to the latency and can cause timeouts.
The client does more than just the application's transactions. It also monitors the cluster for changes so that it can maintain one hop per transaction. Also, I believe when we initialize the client, we create an initial pool of sockets.
It is expected that most apps would only need one global client.

How we can test the Apache Common pool evict functionality

I am trying to consume Apache common pool library to implement an object pooling for the objects that are expensive to create in my application. For respource pooling I have used the GenericObjectPool class of the library to use the default implementation provided by API for the object pooling. In order to ensure that we do not end up having several idle objects in memory, I set up the minEvictableIdleTimeMillis and timeBetweenEvictionRunsMillis properties to 30 minutes.
As I understood from other questions, blogs and API documentation, these properties trigger a separate thread in order to evict the idle objects from pool.
Could someone help me if that has any adverse impact on application performance and if there is any way to test if that thread is actually executed or not?
Library comes with the performance disclaimer when evictor is enabled
Eviction runs contend with client threads for access to objects in the pool, so if they run too frequently performance issues may result.
reference : https://commons.apache.org/proper/commons-pool/api-1.6/org/apache/commons/pool/impl/GenericObjectPool.html
However, we have a high TPS system running eviction every 1 sec and we don't see much of a performance bottle necks.
As for the eviction thread runs are concerned, you can override the evict() method in your implementation of GenericObjectPool and add a log line.
#Override
public void evict() throws Exception {
//log
super.evict();
}

Why A single Jedis instance is not threadsafe?

https://github.com/xetorthio/jedis/wiki/Getting-started
using Jedis in a multithreaded environment
You shouldn't use the same instance from different threads because you'll have strange errors. And sometimes creating lots of Jedis instances is not good enough because it means lots of sockets and connections, which leads to strange errors as well.
A single Jedis instance is not threadsafe
! To avoid these problems, you should use JedisPool, which is a threadsafe pool of network connections. You can use the pool to reliably create several Jedis instances, given you return the Jedis instance to the pool when done. This way you can overcome those strange errors and achieve great performance.
=================================================
I want to know why? Can anyone help me please
A single Jedis instance is not threadsafe because it was implemented this way. That's the decision that the author of the library made.
You can check in the source code of BinaryJedis which is a super type of Jedis https://github.com/xetorthio/jedis/blob/master/src/main/java/redis/clients/jedis/BinaryJedis.java
For example these lines:
public Transaction multi() {
client.multi();
client.getOne(); // expected OK
transaction = new Transaction(client);
return transaction;
}
As you can see the transaction field is shared for all threads using Jedis instance and initialized in this method. Later this transaction can be used in other methods. Imagine two threads perform transactional operations at the same time. The result may be that a transaction created by one thread is unintentionally accessed by another thread. The transaction field in this case is shared state access to which is not synchronized. This makes Jedis non-threadsafe.
The reason why the author decided to make Jedis non-threadsafe and JedisPool threadsafe might be to provide flexibility for clients so that if you have a single-threaded environment you can use Jedis and get better performance or if you have a multithreaded environment you can use JedisPool and get thread safety.

ISessionFactory.OpenSession() from multiple threads

I would like to know the behavior of the following.
basically i have a static ISessionFactory, and an application with 10 threads running and each of them would use ISessionFactory.OpenSession() to get an ISession. Would this cause any problem?
No. This is correct. You want to make sure you have a separate session for each thread.
SessionFactory is thread safe but not Session. So if you open a session with ISessionFactory.OpenSession() in a thread and use it there(within that thread) without sharing with other thread, you are safe to go.
But do not use ISessionFactory.GetCurrentSession() among multiple therads.
This will not cause any problems, but make sure that:
you don't 'leak' ISession instance (no other threads will ever have access to it)
you properly Dispose session when you no longer need it
ISessionFactory on the other hand is thread safe and can be used from multiple threads without additional synchronization on your part.
using(ISession session = _sessionFactory.OpenSession()) {
// use session making sure it will not become visible to other threads
}