Increasing Pool size in JEDIS will affect redis performance? - redis

Consider a scenario my application needs only 100 jedisPool connections to fetch data from redis server. What happen if I configure 500 as max pool size keeping in mind that my application needs more connections in future. Does it affect the redis performance. If yes, what is the severity of this problem?
Thanks in advance

The default max clients for Redis is 10000, you can check it with CONFIG GET maxclients. So if you ever get to 500 connections is quite manageable. If you will have multiple Jedis clients then consider the product nodes*maxTotal. See https://redis.io/topics/clients
You can expect a soft performance decrease as your number of clients increases. You probably want to run some benchmarks.
I tried redis-benchmark -t set -r 100000 -n 1000000 -c 500 where -c 500 is the number of clients. From 50 to 500 clients I got a 12% decrease in requests-per-second.
Check this out for JedisPool optimization.

To my understanding, the max pool size is just a max limit.
If you set it to 500, and your APP only needs 100, the pool will not actually create 500 connections int it. So i guess this will not affect performance issues to your case.
And there are other params to manage the connections:
config.setMaxIdle();
config.setMaxTotal();
you may make a test, and use cmd "info" to see the connections.

Related

ADF Integration Runtime - Limit Concurrent Jobs

We developed a pipeline with get meta data and for each activity. Inside for each, there are few activities like lookup, stored procedure, and delete.
Source is a file share. We have tried it with 5000 files, each file size is 3kb, using self hosted IR and concurrent jobs as 12. The time it took is 3hr 30min.
Will there be any improvement if we can increase concurrent jobs or it will just limit the concurrent jobs. Also please let me know what is the maximum limit of concurrent jobs.
The default value of the concurrent jobs limit is set based on the machine size. The factors used to calculate this value depend on the amount of RAM and the number of CPU cores of the machine. So the more cores and the more memory, the higher the default limit of concurrent jobs.
You scale out by increasing the number of nodes. When you increase the number of nodes, the concurrent jobs limit is the sum of the concurrent job limit values of all the available nodes. For example, if one node lets you run a maximum of twelve concurrent jobs, then adding three more similar nodes lets you run a maximum of 48 concurrent jobs (that is, 4 x 12).

Impala concurrent query delay

My cluster configuration is as follows:
3 Node cluster
128GB RAM per cluster node.
Processor: 16 core HyperThreaded per cluster node.
All 3 nodes have Kudu master and T-Server and Impala server, one of the node has Impala catalogue and Impala StateStore.
My issues are as follows:
1) I've a hard time figuring out Dynamic resource pooling in Impala while running concurrent queries. I've tried giving mem_limit still no luck. I've also tried static service pool but with that also I couldn't achieve required concurrency. Even with admission control, the required concurrency was not achieved.
I) The time taken for 1 query: 500-800ms.
II) But if 10 concurrent queries are given the time taken grows to 3-6s per query.
III) But if more than 20 concurrent queries are given the time taken is exceeding 10s per query.
2) One of my cluster nodes is not taking the load after submitting the query, I checked this by the summary of the query. I've tried giving the NUM_NODES as 0 and 1 on the node which is not taking the load, still, the summary shows that the node is not taking the load.
What is the table size ? How many rows are there in the tables ? Are the tables partitioned ? It will be nice if you can compare your configurations with the Impala Benchmarks
As mentioned above Impala is designed to run on a Massive Parallel Processing Infrastructure. If when we had a setup of 10 nodes with 80 cores and 160 virtual cores with 12 TB SAN storage, we could get a computation time of 60 seconds with 5 concurrent users.

redis performance -- delete 100 records at maximum?

I'm newbie to Redis, reading the book < Redis in Action >, and in section 2.1 ("Login and cookie caching") there is a clean_sessions function:
QUIT = False
LIMIT = 10000000
def clean_session:
while not QUIT:
size = conn.zcard('recent:')
if size <= LIMIT:
time.sleep(1)
continue
# find out the range in `recent:` ZSET
end_index = min(size-LIMIT, 100)
tokens = conn.zrange('recent:', 0, end_index-1)
# delete corresponding data
session_keys = []
for token in tokens:
session_keys.append('viewed:' + token)
conn.delete(*session_keys)
conn.hdel('login:', *tokens)
conn.zrem('recent:', *tokens)
It deletes login token and corresponding data if there is more than 10 million records, the question is:
why delete 100 records at most per time?
why not just delete size - LIMIT records at once?
is there some performance consideration?
Thanks, all responses are appreciated :)
I guess there are multiple reasons for that choice.
Redis is a single-threaded event loop. It means a large command (for instance a large zrange, or a large del, hdel or zrem) will be processed faster than several small commands, but with an impact on the latency for the other sessions. If a large command takes one second to execute, all the clients accessing Redis will be blocked for one second as well.
A first reason is therefore to minimize the impact of these cleaning operations on the other client processes. By segmenting the activity in several small commands, it gives a chance to other clients to execute their commands as well.
A second reason is the size of the communication buffers in Redis server. A large command (or a large reply) may take a lot of memory. If millions of items are to be cleaned out, the reply of the lrange command or the input of the del, hdel, zrem commands can represent megabytes of data. Past a certain limit, Redis will close the connection to protect itself. So it is better to avoid dealing with very large commands or very large replies.
A third reason is the memory of the Python client. If millions of items have to be cleaned out, Python will have to maintain very large list objects (tokens and session_keys). They may or may not fit in memory.
The proposed solution is incremental: whatever the number of items to delete, it will avoid consuming a lot of memory on both client and Redis sides. It will also avoid to hit the communication buffer limit (resulting in the connection to be closed), and will limit the impact on the performance of the other processes accessing Redis.
Note that the 100 value is arbitrary. A smaller value will allow for better latencies at the price of a lower session cleaning throughput. A larger value will increase the throughput of the cleaning algorithm at the price of higher latencies.
It is actually a classical trade-off between the throughput of the cleaning algorithm, and the latency of other operations.

Memory utilization in redis for each database

Redis allows storing data in 16 different 'databases' (0 to 15). Is there a way to get utilized memory & disk space per database. INFO command only lists number of keys per database.
No, you can not control each database individually. These "databases" are just for logical partitioning of your data.
What you can do (depends on your specific requirements and setup) is spin multiple redis instances, each one does a different task and each one has its own redis.conf file with a memory cap. Disk space can't be capped though, at least not in Redis level.
Side note: Bear in mind that the 16 database number is not hardcoded - you can set it in redis.conf.
I did it by calling dump on all the keys in a Redis DB and measuring the total number of bytes used. This will slow down your server and take a while. It seems the size dump returns is about 4 times smaller than the actual memory use. These number will give you an idea of which db is using the most space.
Here's my code:
https://gist.github.com/mathieulongtin/fa2efceb7b546cbb6626ee899e2cfa0b

What is the performance of HSQLDB with several clients

I would like to use HSQLDB +Hibernate in a server with 5 to 30 clients that will fairly intensively write to the DB.
Each client will persist a dozen thousands lines in a single table every 30 seconds (24/7, that's roughly 1 billion rows/day), and the clients will also query the database for a few thousands lines more or less at random times at an average frequency of a couple of requests every 5 to 10 seconds.
Can HSQLDB handle such a use case or should I switch to MySQL/PostgreSQL ?
You are looking at a total of 2000 - 12000 writes and 5000 - 30000 reads per second.
With fast hardware, HSQLDB can probably handle this with persistent memory tables. With CACHED tables, it may be able to handle the lower range with solid state disks (disk seek time is the main parameter).
See this test. You can run it with MySQL and PostgresSQL for comparison.
http://hsqldb.org/web/hsqlPerformanceTests.html
You should switch. HSQLDB is not for critical apps. Be prepared for data corruption and decreasing startup performance over time.
The main negative hype comes from JBoss: https://community.jboss.org/wiki/HypersonicProduction
See also http://www.coderanch.com/t/89950/JBoss/HSQLDB-production
Also see similar question: Is is safe to use HSQLDB for production? (JBoss AS5.1)