The time complexity of the HGETALL command is according to the documentation O(N) where N is the size of the hash. In many cases where HGETALL is mentioned, users are often warned about its time complexity for example in this answer without going much into what HGETALL does under the hood and why the time complexity is the way it is. So why is this O(N)? Has it something to do with how Redis stores the hashes, is it networking, or is it just CPU-bound? HGET has the time complexity of O(1) and is not dependent on size in any way, so can I just store my hash set as one value concatenated with some separator to improve performance?
Redis stores a Hash as a hash table in memory. Getting a single entry from any hash table, by its very nature, is an O(1) operation. HGETALL has to get the all of the entries in the hash table, one by one. So, it's O(N). If you coded your own hash table and didn't use Redis, it would also work that way. This is just how hash tables work.
Serializing your hash table to a single string and then saving that string will not save you anything. You're replacing an O(N) operation on the backend for one in your code.
The thing I always find missing around discussions of time-complexity is that it's about scaling, not time. People talk about things being "slower" and "faster". But it's not about milliseconds. An O(1) operation is "constant time" not slower. That just means it always takes the same amount of time—every time. A function can be O(1) and still be slower than some other function that is O(N) with a billion entries.
In the case of Redis, HGETALL is really fast and O(N). Unless you have thousands of fields in your Hash, you probably don't need to worry about it.
Related
According documentation section for ZRANGEBYLEX command, there is following information. If store keys in ordered set with zero score, later keys can be retrieved with lexicographical order. And ZRANGEBYLEX operation complexity will be O(log(N)+M), where N is total elements count and M is result set size. Documentation has some information about string comparation, but tells nothing about structure, in which elements will be stored.
But after some experiments and reading source code, it's probably what ZRANGEBYLEX operation has a linear time search, when every element in ziplist will be matched against request. If so, complexity will be more larger than described above - about O(N), because every element in ziplist will be scanned.
After debugging with gdb, it's clean that ZRANGEBYLEX command is implemented in genericZrangebylexCommand function. Control flow continues at eptr = zzlFirstInLexRange(zl,&range);, so major work for element retrieving will be performed at zzlFirstInLexRange function. All namings and following control flow consider that ziplist structure is used, and all comparation with input operands are done sequentially element by element.
Inspecting memory with analysis after inserting well-known keys in redis store, it seems that ZSET elements are really stored in ziplist - byte-per-byte comparation with gauge confirm it.
So question - how can documentation be wrong and propagate logarithmic complexity where linear one appears? Or maybe ZRANGEBYLEX command works slightly different? Thanks in advance.
how can documentation be wrong and propagate logarithmic complexity where linear one appears?
The documentation has been wrong on more than a few occasions, but it is an ongoing open source effort that you can contribute to via the repository (https://github.com/antirez/redis-doc).
Or maybe ZRANGEBYLEX command works slightly different?
Your conclusion is correct in the sense that Sorted Set search operations, whether lexicographical or not, exhibit linear time complexity when Ziplists are used for encoding them.
However.
Ziplists are an optimization that prefers CPU to memory, meaning it is meant for use on small sets (i.e. low N values). It is controlled via configuration (see the zset-max-ziplist-entries and zset-max-ziplist-value directives), and once the data grows above the specified thresholds the ziplist encoding is converted to a skip list.
Because ziplists are small (little Ns), their complexity can be assumed to be constant, i.e. O(1). On the other hand, due to their nature, skip lists exhibit logarithmic search time. IMO that means that the documentation's integrity remains intact, as it provides the worst case complexity.
I have read that insertion time complexity of skip lists is order of (log n) with very high probability but O(n) in worst case. But while reading the documentation of redis zadd at https://redis.io/commands/zadd It tells that: O(log(N)) for each item added, where N is the number of elements in the sorted set.
If redis uses skip lists, then zadd should be O(n) in worst case, isn't it ?
ps: Sorry, but I posted the same question earlier but didn't get any response.
Deleted that and creating again.
Redis' implementation of skiplist is a modification of William Pugh's paper. So, in worst case, the time complexity is O(n). The AVERAGE time complexity of ZADD is O(log(n)).
I have read online that Redis can say if an element is member of set or not in O(1) time. I want to know how Redis does this. What algorithm does Redis use to achieve this.
A Redis Set is implemented internally in one of two ways: an intset or a hashtable. The intset is a special optimization for integer-only sets and uses the intsetSearch function to search the set. This function, however, uses a binary search so that's actually technically O(logN). However, since the cardinallity of intsets is capped at a constant (the set-max-intset-entries configuration directive), we can assume O(1) accurately reflects the complexity here.
hashtable is used for a lot of things in Redis, including the implementation of Sets. It uses a hash function on the key to map it into a table (array) of entries - checking whether the hashed key value is in the array is straightforwardly done in O(1) in dictFind. The elements under each hashed key are stored as a linked list, so again you're basically talking O(N) to traverse it, but given the hash function extremely low probability for collisions (hmm, need some sort of citation here?) these lists are extremely short so we can safely assume it is effectively O(1).
Because of the above, SISMEMBER's claim of being O(1) in terms of computational complexity is valid.
I have a use case where I know for a fact that some sets I have materialized in my redis store are disjoint. Some of my sets are quite large, as a result, their sunion or sunionstore takes quite a large amount of time. Does redis provide any functionality for handling such unions?
Alternatively, if there is a way to add elements to a set in Redis without checking for uniqueness before each insert, it could solve my issue.
Actually, there is no need for such feature, because of the relative cost of operations.
When you build Redis objects (such as sets or lists), the cost is not dominated by the data structure management (hash table or linked lists), because the amortized complexity of individual insertion operations is O(1). The cost is dominated by the allocation and initialization of all the items (i.e. the set objects or the list objects). When you retrieve those objects, the cost is dominated by the allocation and formatting of the output buffer, not by the access paths in the data structure.
So bypassing the uniqueness property of the sets does not bring a significant optimization.
To optimize a SUNION command if the sets are disjoint, the best is to replace it by a pipeline of several SMEMBERS commands to retrieve the individual sets (and build the union on client side).
Optimizing a SUNIONSTORE is not really possible since disjoint sets is a worst case for the performance. The performance is dominated by the number of resulting items, so the less items in common, the more response time.
I wonder why Redis has no command to increment an element in the list?
You can increment a key's value with INCR, you can use HINCRBY to increment an item in the hash set and you can use ZINCRBY to inrement an element of the sorted set. But not in the list.
This puzzles me. Why not?
What was the thinking behind this decision? Lists are "not supposed to be used like this", then why? They work in a very different way from sets? Then what's the big difference?
The big difference is there is no possibility of accessing a given item efficiently in a Redis list. They are implemented as double-linked lists (for big lists) or completely serialized (ziplist optimization, for small lists). By comparison hash and sorted set are implemented using a hash table which allows O(1) amortized complexity for item accesses.
So if such incrementation command would exist for lists, its complexity would be O(n). Not very interesting for just an incrementation.
Note that if you need such feature, you can easily implement it yourself with a server-side Lua script by calling LINDEX and LSET.