I am trying to get my head around why lindex is O(N)?
This is related to needing to delete an element from list by index, and realizing that Redis only allows you to remove by value. So lindex first and then lrem.
Neither is better than O(N).
I don't get how redis is built. I mean is the list a Map? How does it retain indexes, order? Is is a LinkedList? Head, tail seems to suggest it is.
Also, none of the methods appear to be better than O(N).
Why do we not have lrem by index. Why only by value?
Related
The time complexity of the HGETALL command is according to the documentation O(N) where N is the size of the hash. In many cases where HGETALL is mentioned, users are often warned about its time complexity for example in this answer without going much into what HGETALL does under the hood and why the time complexity is the way it is. So why is this O(N)? Has it something to do with how Redis stores the hashes, is it networking, or is it just CPU-bound? HGET has the time complexity of O(1) and is not dependent on size in any way, so can I just store my hash set as one value concatenated with some separator to improve performance?
Redis stores a Hash as a hash table in memory. Getting a single entry from any hash table, by its very nature, is an O(1) operation. HGETALL has to get the all of the entries in the hash table, one by one. So, it's O(N). If you coded your own hash table and didn't use Redis, it would also work that way. This is just how hash tables work.
Serializing your hash table to a single string and then saving that string will not save you anything. You're replacing an O(N) operation on the backend for one in your code.
The thing I always find missing around discussions of time-complexity is that it's about scaling, not time. People talk about things being "slower" and "faster". But it's not about milliseconds. An O(1) operation is "constant time" not slower. That just means it always takes the same amount of time—every time. A function can be O(1) and still be slower than some other function that is O(N) with a billion entries.
In the case of Redis, HGETALL is really fast and O(N). Unless you have thousands of fields in your Hash, you probably don't need to worry about it.
I have read online that Redis can say if an element is member of set or not in O(1) time. I want to know how Redis does this. What algorithm does Redis use to achieve this.
A Redis Set is implemented internally in one of two ways: an intset or a hashtable. The intset is a special optimization for integer-only sets and uses the intsetSearch function to search the set. This function, however, uses a binary search so that's actually technically O(logN). However, since the cardinallity of intsets is capped at a constant (the set-max-intset-entries configuration directive), we can assume O(1) accurately reflects the complexity here.
hashtable is used for a lot of things in Redis, including the implementation of Sets. It uses a hash function on the key to map it into a table (array) of entries - checking whether the hashed key value is in the array is straightforwardly done in O(1) in dictFind. The elements under each hashed key are stored as a linked list, so again you're basically talking O(N) to traverse it, but given the hash function extremely low probability for collisions (hmm, need some sort of citation here?) these lists are extremely short so we can safely assume it is effectively O(1).
Because of the above, SISMEMBER's claim of being O(1) in terms of computational complexity is valid.
On REDIS documentation, it states that insert and update operations on sorted sets are O(log(n)).
On this question they specify more details about the underlying data structure, the skip list.
However there are a few special cases that depend on the REDIS implementation with which I'm not familiar.
adding at the head or tail of the sorted set will probably not be a O(log(n)) operation, but O(1), right? this question seems to agree with reservations.
updating the score of a member even if the ordering doesn't change is still O(log(n)) either because you take the element out and insert it again with the slightly different score, or because you have to check that the ordering doesn't change and so the difference is only in constant operations between insert or update score. right? I really hope I'm wrong in this case.
Any insights will be most welcome.
Note: skip lists will be used once the list grows above a certain size (max_ziplist_entries), below that size a zip list is used.
Re. 1st question - I believe that it would still be O(log(n)) since a skip list is a type of a binary tree so there's no assurance where the head/tail nodes are
Re. 2nd question - according to the source, changing the score is implemented with a removing and readding the member: https://github.com/antirez/redis/blob/209f266cc534471daa03501b2802f08e4fca4fe6/src/t_zset.c#L1233 & https://github.com/antirez/redis/blob/209f266cc534471daa03501b2802f08e4fca4fe6/src/t_zset.c#L1272
In Skip List, when you insert a new element in head or tail, you still need to update O(log n) levels. The previous head or tail can have O(log n) levels and each may have pointers which need to be updated.
Already answered by #itamar-haber
I wonder why Redis has no command to increment an element in the list?
You can increment a key's value with INCR, you can use HINCRBY to increment an item in the hash set and you can use ZINCRBY to inrement an element of the sorted set. But not in the list.
This puzzles me. Why not?
What was the thinking behind this decision? Lists are "not supposed to be used like this", then why? They work in a very different way from sets? Then what's the big difference?
The big difference is there is no possibility of accessing a given item efficiently in a Redis list. They are implemented as double-linked lists (for big lists) or completely serialized (ziplist optimization, for small lists). By comparison hash and sorted set are implemented using a hash table which allows O(1) amortized complexity for item accesses.
So if such incrementation command would exist for lists, its complexity would be O(n). Not very interesting for just an incrementation.
Note that if you need such feature, you can easily implement it yourself with a server-side Lua script by calling LINDEX and LSET.
Redis.io
The main features of Redis Lists from the point of view of time
complexity is the support for constant time insertion and deletion of
elements near the head and tail, even with many millions of inserted
items. Accessing elements is very fast near the extremes of the list
but is slow if you try accessing the middle of a very big list, as it
is an O(N) operation.
what is the LIST alternative when the data is too high and writes are lesser than Reads
This is something I'd definitely benchmark before doing, but if you're really hitting a performance issue accessing items in the middle of the list, there are a couple of alternatives that really depend on your use case.
Don't make a list so big, age out/trim pieces that don't matter any more.
Memoize hot sections of the list. If a particular paginated range is being requested much more often than others, make that it's own list. Check if it exists already, and if it doesn't create a subset of your list in the paginated range.
Bucket your list from the beginning into "manageable sizes" (for whatever your definition of managable is). If a list is purely additive (no removal from the list), you could use the modulus index of an item as part of the key so that your list is stored in smaller buckets. Ex: key = "your_key_name_" + index % 100000