Redis zrangebyscore performance, when min is -inf - redis

Time complexity of zrangebyscore is O(Log(N))
What if I run zrangebyscore with min = -inf and limit 1, is it alow O(log(n))? or O(1)?

ZRANGEBYSCORE is O(Log(N) + M), where "N" is the number of elements in the sorted set and "M" is the number of elements being returned.
So your complexity computation needs to account for the logarithm of the number of elements in your set plus one.
ZRANGEBYSCORE - Redis Documentation

Related

Median of medians algorithm: why divide the array into blocks of size 5 what if we divide into group of 4 is this effect time complexity

i was little confuse to make recursion (T) what if we divide the array (n/4). and why it is always recommended to use odd division of array why specifically 5. when we divide into 5 the recursion is like this
T(n) <= theta(n) + T(n/5) + T(7n/10)
what will be the recursion (T) of this and is it effect time complexity as well because when we divide the array into (n/5) the time complexity is theta(n).

Do the non-uniform distribution scores reduce Redis Sorted Set performance?

I wonder whether the non-uniform distribution scores would reduce Redis Sorted Set performance.
There are two random score generation functions, one obeys uniform distribution, and the other obeys a special custom distribution as the following image(the probability of a lower score is much bigger than the probability of a higher score). The two generators will pick random scores from the same range([score_lower_bound, score_upper_bound)).
score_lower_bound = 0
score_upper_bound = 10000 # exclude
N = int(1e6)
redis_client = redis.Redis(host='127.0.0.1', port=6379)
f = lambda x: 1.1 ** (-0.01 * x + 100)
weights = [int(f(x)) for x in range(score_lower_bound, score_upper_bound)]
def random_score_uniform_distr():
return random.randint(score_lower_bound, score_upper_bound-1)
def random_score_custom_distr():
return random.choices(range(score_lower_bound, score_upper_bound), weights=weights, k=1)[0]
And then, I called the ZADD command repeatedly for N times with a random score generated by the specific generation function and a unique string member/element.
def populate_zset(random_score_func):
# generate random suffix
suffix = ''.join(random.choices(string.ascii_lowercase, k=10))
zset_name = "zset_" + suffix
print("populating...", zset_name)
for i in range(N):
# generate random string as member
member = ''.join(random.choices(string.ascii_lowercase, k=64))
ret = redis_client.zadd(zset_name, {member: random_score_func()})
print("populated: ", zset_name, redis_client.zcard(zset_name))
I ran the test many times and collected some results as follows.
populating... zset_zvzqnrujao
populated: zset_zvzqnrujao 1000000
Time taken to populate zset with uniform distributed scores: 91.79979228973389
populating... zset_ocwlgdohpt
populated: zset_ocwlgdohpt 1000000
Time taken to populate zset with custom distributed scores: 405.32152819633484
populating... zset_anfqzgrbyu
populated: zset_anfqzgrbyu 1000000
Time taken to populate zset with uniform distributed scores: 116.31756711006165
populating... zset_oyrjodoasm
populated: zset_oyrjodoasm 1000000
Time taken to populate zset with custom distributed scores: 473.89297699928284
populating... zset_ezpstuvtmd
populated: zset_ezpstuvtmd 1000000
Time taken to populate zset with uniform distributed scores: 98.64593005180359
populating... zset_rjndappswl
populated: zset_rjndappswl 1000000
Time taken to populate zset with custom distributed scores: 428.9520342350006
populating... zset_qnrocvjzec
populated: zset_qnrocvjzec 1000000
Time taken to populate zset with uniform distributed scores: 104.60574007034302
populating... zset_oyrchapofd
populated: zset_oyrchapofd 1000000
Time taken to populate zset with custom distributed scores: 434.5851089954376
I think the non-uniform distribution of scores will affect Sorted Set performance because the underlying SkipList degenerates to the normal LinkedList when some tightly adjacent scores(nodes) are accessed more frequently than others, the "next pointer" from the higher layer does not take its advantage. Maybe it is similar to storing a sorted list of numbers in an unbalanced search tree.

What is the time complexity of looping over the array and then splitting the number into digits?

If we had a case where we are looping over the array of numbers and the numbers can scale to infinity, and inside of each iteration we are looping over digits of each number, so for number 125556 we would loop through six numbers, 1, 2, 5, 5, 5, and 6, is the time complexity of this algorithm just O(N), where N represents the numbers in the array, or it is O(N*K) where K is the number of digits in the number. In this case K is always less than N, so I am unclear whether this is multiplication or we can just disregard the number of digits?
The algorithm you describe is always O(N * K).
However, if you know something about the relationship between N and K, then you can simplify the expression. For instance, if the numbers are guaranteed to fit in a 32-bit integer representation on a computer, then K is a constant and your algorithm is O(N). If K < N, then you can say O(N2).
But if you have no assumption on K, then you have to go with O(N * K), as K could be significantly larger than N or vice versa. Intuitively, your time complexity depends on two factors, so you need to express it with two variables unless they depend on each other.
Edit:
Since you clarified that you are looping through the numbers in order, as in 1, 2, ..., N, we now have some information on the relationship between K and N. In fact, K = O(logN), so the algorithm can be expressed as O(N logN).
If you are confused about how we know that K = O(logN), then take any power of 10 as an example. You will find that 10K has log10 10K + 1 = K + 1 digits. Similarly, any number X has O(log X) decimal digits (notice that the base of the logarithm does not matter in the big-O notation).

Simplifying time complexity of a function

Say I have a time complexity O(f(m) * n) where f(m) is not a randomized function but it will always produce a value between 0 and 1 (exclusive). Should I drop the f(m) term and conclude that my time complexity is O(n)? Thanks so much.
This is big O notation you are using. It always tells maximum time an algorithm will take or time for worst case scenario. As O(f(m)*n) will have max value n when f(m) will have max value 1. So it can be written as O(n).

time complexity for loop justification

Hi could anyone explain why the first one is True and second one is False?
First loop , number of times the loop gets executed is k times,
Where for a given n, i takes values 1,2,4,......less than n.
2 ^ k <= n
Or, k <= log(n).
Which implies , k the number of times the first loop gets executed is log(n), that is time complexity here is O(log(n)).
Second loop does not get executed based on p as p is not used in the decision statement of for loop. p does take different values inside the loop, but doesn't influence the decision statement, number of times the p*p gets executed, its time complexity is O(n).
O(logn):
for(i=0;i<n;i=i*c){// Any O(1) expression}
Here, time complexity is O(logn) when the index i is multiplied/divided by a constant value.
In the second case,
for(p=2,i=1,i<n;i++){ p=p*p }
The incremental increase is constant i.e i=i+1, the loop will run n times irrespective of the value of p. Hence the loop alone has a complexity of O(n). Considering naive multiplication p = p*p is an O(n) expression where n is the size of p. Hence the complexity should be O(n^2)
Let me summarize with an example, suppose the value of n is 8 then the possible values of i are 1,2,4,8 as soon as 8 comes look will break. You can see loop run for 3 times i.e. log(n) times as the value of i keeps on increasing by 2X. Hence, True.
For the second part, its is a normal loop which runs for all values of i from 1 to n. And the value of p is increasing be the factor p^2n. So it should be O(p^2n). Thats why it is wrong.
In order to understand why some algorithm is O(log n) it is enough to check what happens when n = 2^k (i.e., we can restrict ourselves to the case where log n happens to be an integer k).
If we inject this into the expression
for(i=1; i<2^k; i=i*2) s+=i;
we see that i will adopt the values 2, 4, 8, 16,..., i.e., 2^1, 2^2, 2^3, 2^4,... until reaching the last one 2^k. In other words, the body of the loop will be evaluated k times. Therefore, if we assume that the body is O(1), we see that the complexity is k*O(1) = O(k) = O(log n).