According to the Redis documentation,LTRIM command of Redis has the following time complexity
Time complexity: O(N) where N is the number of elements to be removed
by the operation.
However, I have some confusion.
For example, If the linked list has 400 numbers from 0 to 399, if I LTRIM 0 99, the raw linked node from 99 to 399 has no necessity to be visited I think. Disconnecting the node between node 99 and node 100 is enough maybe.
So I think N is equal to 100 not 300
Please give me a deep explanation.
the raw linked node from 99 to 399 has no necessity to be visited I think
NO. These nodes need to be released one-by-one, and that's why the time complexity is O(N), where N is the number of elements to be removed.
Related
I just started learning about Hash Dictionaries. Currently we are implementing a hash dictionary with separate buckets that are made of chains (linked lists). The book posed this problem and I am having a lot of trouble figuring it out. Imagine we have an initial table size of 10 ie 10 buckets. If we want to know the time complexity for n insertions and a single lookup, how do we figure this out? (Assuming a pointer access is one unit of time).
It poses three scenarios:
A hash dictionary that does not resize, what is the time complexity for n insertions and 1 lookup?
A hash dictionary that resizes by 1 when the load factor exceeds .8, what is the time complexity for n insertions and 1 lookup?
A hash dictionary that resizes by doubling the table size when the load factor exceeds .8, what is the time complexity for n insertions and 1 lookup?
MY initial thoughts had me really confused. I couldn't quite figure out how to know the length of some given chain for an insertion. Assuming k length (I thought), there is the pointer access of the for loop going through the whole chain so k units of time. Then, in each iteration to insert it checks if the current node's data is equivalent to the key trying to be inserted (if it exists, overwrite it) so either 2k units of time if not found, 2k+1 if found. Then, it does 5 pointer accesses to prepend some element. So, 2k+5 or 2k+1 to insert 1 time. Thus, O(kn) for the first scenario for n insertions. To lookup, it seems to be 2k+1 or 2k. So for 1 lookup, o(k). I don't have a clue how to approach the other two scenarios. Some help would be great. Once again to clarify: k isn't mentioned in the problem. The only facts given are an initial size of 10 and the information given in the scenarios, so k can't be used as the results for the time complexity of n insertions or 1 lookup.
if you have a hash dictionary then your insert, delete and search operation will take O(n) of Time-Complexity for 1 key in the worst case scenario. For n insertions it would be O(n^2). It doesn't matter what the size of your table is.
|--------|
|element1| -> element2 -> element3 -> element4 -> element5
|--------|
| null |
|--------|
| null |
|--------|
| null |
|--------|
| null |
|--------|
Now for Average Case
Scenario one will have the load factor fixed (assuming m slots) : n/m. Therefore, one insert function will be O(1+n/m). 1 for the hash function computation and n/m for the lookup.
For the 2nd and 3rd scenario it should be O(1+n/m+1) and O(1+n/2m) respectively.
For your confusion, you can ask yourself a question that what will be the expected chain length for any random set of keys. The solution will be that we can't be sure at all.
That's where the idea of load factor comes into place to define the average case scenario, we give each slot equal probability to form a chain, if our no. of keys is greater than the slot count.
Imagine we have an initial table size of 10 ie 10 buckets. If we want to know the time complexity for n insertions and a single lookup, how do we figure this out?
When we talk about time complexity, we're looking at the steepness of the n-vs-time-for-operation curve as n approaches infinity. In the case above, you're saying there are only ten buckets, so - assuming the hash function scatters the insertions across the buckets with near-uniform distribution (as it should), n insertions will result in 10 lists of roughly n/10 elements.
During each insertion, you can hash to the correct bucket in O(1) time. Now - a crucial factor here is whether you want your hash table implementation to protect you against duplicate insertions.
If you simply trust there will be no duplicates, or the hash table is allowed to have duplicates (e.g. C++'s unordered_multiset), then the insertion itself can be done without inspecting the existing bucket content, at an accessible end of the bucket's list (i.e. using a head or tail pointer), also in O(1) time. That means the overall time per insertion is O(1), and the total time for n insertions is O(n).
If the implementation much identify and avoid duplicates, then for each insertion it has to search along the existing linked list, the size of which is related to n by a constant #buckets factor (1/10) and varies linearly during insertion from 1 to 1/10 of the final number of elements, so on average is n/2/10 which - removing constant factors - simplifies to n. In other words, each insertion is O(n).
Presumably the question intends to ask the time for a single lookup done after all elements are inserted: in that case you have the 10 linked lists of ~n/10 length, so the lookup will hash to one of those lists and then on average have to look half way along the list before finding the desired value: that's roughly n/20 elements searched, but as /20 is a constant factor it can be dropped, and we can say the average complexity is O(n).
A hash dictionary that does not resize, what is the time complexity for n insertions and 1 lookup?
Well, we discussed that above with our hash table size stuck at 10.
A hash dictionary that resizes by 1 when the load factor exceeds .8, what is the time complexity for n insertions and 1 lookup?
Say the table has 100 buckets and 80 elements, you insert an 81st element, it resizes to 101, the load factor is then about .802 - should it immediately resize again, or wait until doing another insertion? Anyway, ignoring that -each resize operation involves visiting, rehashing (unless the elements or nodes cache the hash values), and "rewiring" the linked lists for all existing elements: that's O(s) where s is the size of the table at that point in time. And you're doing that once or twice (depending on your answer to "immediately resize again" behaviour above) for s values from 1 to n, so s averages n/2, which simplifies to n. The insertion itself may or may not involve another iteration of the bucket's linked list (you could optimise to search while resizing). Regardless the overall time complexity is O(n2).
The lookup then takes O(1), because the resizing has kept the load factor below a constant amount (i.e. the average linked list length is very, very short (even ignoring the empty buckets).
A hash dictionary that resizes by doubling the table size when the load factor exceeds .8, what is the time complexity for n insertions and 1 lookup?
If you consider the resultant hash table there with n elements inserted, about half the elements will have been inserted without needing to be rehashed, while for about a quarter, they'll have been rehashed once, and an eight rehashed twice, a sixteenth rehashed 3 times, a 32nd rehashed 4 times: if you sum up that series - 1/4 + 2/8 + 3/16 + 4/32 + 5/64 + 6/128... - the series approaches 1 as n goes to infinity. In other words, the average amount of repeated rehashing/linking work done per element in the final table size doesn't increase with n - it's constant. So, the total time to insert is simply O(n). Then because the load factor is kept below 0.8 - a constant rather than a function of n - the lookup time is O(1).
during my classroom i asked this question to my teacher and he couldn't answer that's why i am asking here.
i asked that during a code , what if we have a loop to run from 1 to 10 , does the complexity would be O(1) {big O of 1} . heanswered yes. so here's the question what if i have written a loop to run from 1 to 1 million .is it sill O(1)? or is it O(n) or something else?
pseudo code -
for i in range(1,1 million):
print("hey")
what is the time complexity for that loop
now , if you think the answer is O(n) , how can you say it to be O(n) , because O(n) is when complexity is linear.
and what is the silver lining? when a code gets O(1) and O(n) .
like if i would have written a loop for 10 or 100 or 1000 or 10000 or 100000. when did it transformed from O(1) to O(n).
By definition, O(10000000) and O(1) are equal, Let me quickly explain what complexity means.
What we try to represent with the abstraction of time (and space) complexity isn't how fast a program will run, it what is the growth in runtime (or space) given the growth in input length.
For instance, given a loop with a fixed number of iterations (lets say 10), it doesnt matter if your input will be 1 long or 10000000000000, because your loop will ALWAYS run the same number of iteration therefore, no growth in runtime (even if that 10 iterations may take 1 week to run, it will always be 1 week).
but, if your algorithm's steps are dependent in your input length, that means the longer your input, the longer your algorithm's steps, the question is, how much more steps?
in summary, time (and space) complexity is an abstraction, its not here to tell us how long things will take, its simply here to tell us how the growth in time will be given growth in input, O(1) == O(10000000), because its not about how long it will take, its about the change in the runtime, O(1) algorithm can take 10 years, but it will always take 10 years, even for very large input length.
I think you are confusing the term. Time complexity for a given algorithm is given by the relationship between change in execution time with respect to change in input size.
If you are running a fixed loop from 1 to 10, but doing something in each iteration, then that counts as O(10), or O(1), meaning that it will take the same time each run.
But, as soon as the number of iterations starts depending on the number of elements or tasks, then a loop becomes O(n), meaning that the complexity becomes linear. The more the tasks, proportionally more the time.
I hope that clears some things up. :-)
Problem
My system inserts records to oracle rac DB at a rate of 600tps. During the insertion-procedure-call each record is assigned a sequence, so that each record should get distributed among 20 different batch ids (implementation of a round robin mechanism).
Following is the step for selecting batch
1) A record comes. Assigns nextValue from a sequence.
2) Do MOD(sequence,20). It gives values from 0 to 19.
Issue:
3 records comes to DB simultaneously and hits 3 different nodes in RAC
Comes out with sequences 2,102,1002.
MOD for all happens to be same.
All try to get into the same batch.
Round Robin fails here.
Please help to resolve the issue.
This is due to the implementation of Sequences on RAC. When a node is first asked for the next value of a sequence it get a bunch of them (e.g. 100 to 119) and then hands them out until it needs a new lot, when it gets another bunch (160 - 179). While Node 1 is handing out 100 then 101, Node 2 will be handing out 121, 122 etc etc.
The size of the 'bunch' is controlled by as I remember the Cache size defined on a Sequence. If you set a cache size of 0, then you will get no caching, and the sequences will be handed out sequentially. However, doing that will involve the Nodes is management overhead while they work out what the next one actually is, and with 600tps this might not be a good idea: you'd have to try it and see,
I have a table challenge containing about 12000 rows. Every point connects to the four points around it, for example 100 connects to 99 101 11 and 189. I tried this with a small scale table and it worked just fine but as I increased the size of the table the query became exponentially slower and now it won't even finish. Here's my query
SELECT level, origin, destination
FROM challenge
WHERE destination = 2500
START WITH origin = 1
CONNECT BY NOCYCLE PRIOR destination = origin;
Any advice on how to optimize this query would be greatly appreciated.
So you're finding every path from node 1 to node 2500 in a degree-4 graph (rectangular lattice?) of thousands of nodes. I expect there'll be quite a lot of them. Did the challenge just ask you to count them? Because I think the point was that you have to figure out how many there are by doing math, not brute force computation.
For example, if it's a 50x50 rectangular grid with node 1 and node 2500 in opposite corners, then the minimum path length is 100 steps. A path of 100 steps will have 50 of them horizontal and 50 of them vertical, and they can come in any order. Figure out how many ways you can arrange a string of 50 H's and 50 V's and you might find it's a number that even the mighty Oracle will have a bit of a problem with. (Generating the rows, that is. Doing the calculation just requires large integer arithmetic, which Oracle can probably do quite quickly once you tell it the formula.)
And your query is actually worse than that. It doesn't ask only for minimum-length paths. So it will also return all the paths of length 102 that take a step away from the destination somewhere along the way. And paths of length 104 that take 2 backward steps. And paths of length 2498 that visit almost all of the nodes! Counting those paths is more complicated than counting the short paths because you have to exclude the ones that cross themselves.
I'm using redis 2.6. I've faced with strange behavior of ZRANGEBYSCORE function.
I have a sorted set with a length of about a few million elements.
Something like this:
10 marry
15 john
25 bob
...
So compare to queries:
ZRANGEBYSCORE longset 25 50 LIMIT 0 20 works like a charm, it takes milliseconds
ZRANGEBYSCORE longset 25 50 this one hangs up for a minutes!!
All elements which I'm intrested in are in the first hundred of the set.
I think that there's no need to scan elements with weight greater than "50"
because it is SORTED set.
Please explain how redis scans sorted sets and why there is such a big difference between these two queries.
One of the best things about Redis, IMO, is that you can check the time complexity of each command in the docs. The docs for zrangebyscore specifies:
Time complexity: O(log(N)+M) with N being the number of elements in the sorted set and M the number of elements being returned. If M is constant (e.g. always asking for the first 10 elements with LIMIT), you can consider it O(log(N)).
[...]
Keep in mind that if offset is large, the sorted set needs to be traversed for offset elements before getting to the elements to return, which can add up to O(N) time complexity.
This means that if you know that you only need a certain number of items, and specify a LIMIT offset count, if offset is (close to) 0, you can consider it O(log(N)), but if the returned number of items is high (or the offset is high), it can be considered O(N).