AeroSpike Atomic counter limit - aerospike

I'm trying to use AeroSpike Atomic counter for load balancing coming txn to a sequence of users. I plan to use the counter to track the number of coming txn and make % with users number to get who to assign.
Does this counter has a limit? If yes, what is limit? And what would happen if the the limit is hit? Would it be reset to 0 automatically?
Or is there any other suggestions to do the load balancing in high concurrency?

Not sure if that's the best strategy for load balancing in high concurrency or what happens if you hit the limit.
But as for the limit itself -> according to Atomic Counters using Aerospike blog post an atomic counter is represented by a Bin that stores 64-bit unsigned integer.
64-bit unsigned integer value starts at 0 and the highest value is 2^64-1 (18,446,744,073,709,551,615), keep in mind that unsigned integers cannot represent negative values.

Related

Spring Batch Meta Data Schema Sequences

This is more of a request for a quick explanation of the sequences used to generate ID from the Spring Batch tables that store Job and Step information.
I've ran the below sequences in DB2 for Spring Boot + Batch application:
CREATE SEQUENCE AR_REPORT_ETL_STEP_EXECUTION_SEQ AS BIGINT MAXVALUE 9223372036854775807 NO CYCLE;
CREATE SEQUENCE AR_REPORT_ETL_JOB_EXECUTION_SEQ AS BIGINT MAXVALUE 9223372036854775807 NO CYCLE;
CREATE SEQUENCE AR_REPORT_ETL_JOB_SEQ AS BIGINT MAXVALUE 9223372036854775807 NO CYCLE;
When the Spring Batch job is running, each ID field is being incremented by 20 on each new record. Though this isn't a major issue, it's still slightly confusing as to why.
I had removed the sequences and added them again with INCREMENT BY 1. This is now incremented every second record by 1 and the other record by 20.
Any tips or explanation would be a great learning opportunity.
For performance reasons, Db2 for Linux Unix Windows by default will preallocate 20 numbers of a sequence and keep them in memory for faster access.
If you don't want that caching behaviour and can tolerate the overhead of allocating without caching, then you can use the NO CACHE option when defining the sequence. But be aware that without caching, Db2 must do synchronous transaction-log-write for each number to be allocated from the sequence, which is usually undesirable in high frequency insert situations.
Remember to explicitly activate the database (i.e do not depend on auto-activation), as unused pre-allocated cached sequence numbers get discarded when the database deactivates.
Example no cache syntax:
CREATE SEQUENCE AR_REPORT_ETL_STEP_EXECUTION_SEQ
AS BIGINT MAXVALUE 9223372036854775807 NO CACHE NO CYCLE;
You can read more details in the documentation.

Is there any number limitations of field for Redis command HMSET?

What's the max number limitation of field for Redis command HMSET ? If I set 100000 fields to one key by HMSET , if would cause the performance issue comparing to use the each field as a key?
it is quite large, 2^64-1 in 64 bit systems, and 2^32 -1 in 32 bit systems,
https://groups.google.com/d/msg/redis-db/eArHCH9kHKA/UFFRkp0iQ4UJ
1) Number of keys in every Redis database: 2^64-1 in 64 bit systems.
2^32-1 in 32 bit systems. 2) Number of hash fields in every hash:
2^64-1 in 64 bit systems. 2^32-1 in 32 bit systems.
Given that a 32 bit instance has at max 4GB of addressable space, the
limit is unreachable. For 64 bit instances, given how big is 2^64-1,
the limit is unreachable.
So for every practical point of view consider keys and hashes only
limited by the amount of RAM you have.
Salvatore
I did a couple of quick tests for this using the lua client.
I tried storing 100,000 fields using a single hmset command, individual hmset commands, and pipelined individual commands, and timed how long they took to complete:
hmset 100000 fields: 3.164817
hmset individual fields: 9.564578
hmset in pipeline: 4.784714
I didn't try larger values as 1,000,000+ were taking too long but the code is here if you'd like to tinker. https://gist.github.com/kraftman/1f15dc75649f07ee044eccab5379a8e3
Depending on the application bear in mind that you loose the storage efficiency of hashes once you add too many fields('too many' can be set, see here for more info.
According to Redis documentation, there's no such limitation.
actually the number of fields you can put inside a hash has no practical limits (other than available memory)
I think there's no performance penalty to save data in a HASH. However, if you have a very large HASH, it's always a bad idea to call HGETALL. Because HGETALL returns all fields and values of the HASH, and that would block the Redis instance for a long time when the HASH is very large.
Whether a HASH is better than key-value store, largely depends on your scenario.

How can Redis be optimized for storing lists of GUIDs?

We are using Redis to store shuffled decks of cards. A card is represented by a 20 character GUID, and a deck is an array of shuffled card GUIDs. The primary operations called on the Deck list is LLEN (length) and LPOP (pop). The only time that we push to a deck is a) when the deck is initially created and b) when the deck runs out of cards and is re-shuffled (which happens rarely). Currently, the length of a deck varies from 10 to 700 items.
What type of memory optimizations can be made in Redis for this sort of problem? Is there any sort of setting we can configure to reduce the memory overhead, or optimize how (zip)list data types are used?
Related Article: http://redis.io/topics/memory-optimization
My first suggestion would be to use 8byte unsigned integers as your identifier key instead of guids, that saves you several bytes per entry in memory, and increases overall performance of any database including redis you are using.
In case you want to go with guid, and considering the size of list and the operations you are doing on the list.
You can tune the redis defaults to suit your need :
Redis defaults :
list-max-ziplist-entries 512
list-max-ziplist-value 64
You can change this to :
list-max-ziplist-entries 1024 #to accomodate your 700 cards list
list-max-ziplist-value 256 # to accomodate your 20 byte guids
YMMV, hence you need to benchmark redis with both settings, for storage as well as read/write performance with your sample data.

redis performance -- delete 100 records at maximum?

I'm newbie to Redis, reading the book < Redis in Action >, and in section 2.1 ("Login and cookie caching") there is a clean_sessions function:
QUIT = False
LIMIT = 10000000
def clean_session:
while not QUIT:
size = conn.zcard('recent:')
if size <= LIMIT:
time.sleep(1)
continue
# find out the range in `recent:` ZSET
end_index = min(size-LIMIT, 100)
tokens = conn.zrange('recent:', 0, end_index-1)
# delete corresponding data
session_keys = []
for token in tokens:
session_keys.append('viewed:' + token)
conn.delete(*session_keys)
conn.hdel('login:', *tokens)
conn.zrem('recent:', *tokens)
It deletes login token and corresponding data if there is more than 10 million records, the question is:
why delete 100 records at most per time?
why not just delete size - LIMIT records at once?
is there some performance consideration?
Thanks, all responses are appreciated :)
I guess there are multiple reasons for that choice.
Redis is a single-threaded event loop. It means a large command (for instance a large zrange, or a large del, hdel or zrem) will be processed faster than several small commands, but with an impact on the latency for the other sessions. If a large command takes one second to execute, all the clients accessing Redis will be blocked for one second as well.
A first reason is therefore to minimize the impact of these cleaning operations on the other client processes. By segmenting the activity in several small commands, it gives a chance to other clients to execute their commands as well.
A second reason is the size of the communication buffers in Redis server. A large command (or a large reply) may take a lot of memory. If millions of items are to be cleaned out, the reply of the lrange command or the input of the del, hdel, zrem commands can represent megabytes of data. Past a certain limit, Redis will close the connection to protect itself. So it is better to avoid dealing with very large commands or very large replies.
A third reason is the memory of the Python client. If millions of items have to be cleaned out, Python will have to maintain very large list objects (tokens and session_keys). They may or may not fit in memory.
The proposed solution is incremental: whatever the number of items to delete, it will avoid consuming a lot of memory on both client and Redis sides. It will also avoid to hit the communication buffer limit (resulting in the connection to be closed), and will limit the impact on the performance of the other processes accessing Redis.
Note that the 100 value is arbitrary. A smaller value will allow for better latencies at the price of a lower session cleaning throughput. A larger value will increase the throughput of the cleaning algorithm at the price of higher latencies.
It is actually a classical trade-off between the throughput of the cleaning algorithm, and the latency of other operations.

Redis Internals - LRU Implementation For Sampling

Does someone know about the internals of Redis LRU based eviction / deletion.
How does Redis ensure that the older (lesser used) keys are deleted first (in case we do not have volatile keys and we are not setting TTL expiration)?
I know for sure that Redis has a configuration parameter "maxmemory-samples" that governs a sample size that it uses for removing keys - so if you set a sample size of 10 then it samples 10 keys and removes the oldest from amongst these.
What I don't know is whether it sample these key's completely randomly, or does it somehow have a mechanism that allows it to automatically sample from an equivalent of an "older / less used generation"?
This is what I found at antirez.com/post/redis-as-LRU-cache.html - the whole point of using a "sample three" algorithm is to save memory. I think this is much more valuable than precision, especially since this randomized algorithms are rarely well understood. An example: sampling with just three objects will expire 666 objects out of a dataset of 999 with an error rate of only 14% compared to the perfect LRU algorithm. And in the 14% of the remaining there are hardly elements that are in the range of very used elements. So the memory gain will pay for the precision without doubts.
So although Redis samples randomly (implying that this is not actual LRU .. and as such an approximation algorithm), the accuracy is relatively high and increasing the sampling size will further increase this. However, in case someone needs exact LRU (there is zero tolerance for error), then Redis may not be the correct choice.
Architecture ... as they say ... is about tradeoffs .. so use this (Redis LRU) approach to tradeoff accuracy for raw performance.
Since v3.0.0 (2014) the LRU algorithm uses a pool of 15 keys, populated with the best candidates out of the different samplings of N keys (where N is defined by maxmemory-samples).
Every time a key needs to be evicted, N new keys are selected randomly and checked against the pool. If they're better candidates (older keys), they're added in it, while the worst candidates (most recent keys) are taken out, keeping the pool at a constant size of 15 keys.
At the end of the round, the best eviction candidate is selected from the pool.
Source: Code and comments in evict.c file from Redis source code