An alternative method to create an AVL tree from a sorted array in O(n) time - time-complexity

I need some help in this data structure homework problem. I was requested to write an algorithm that creates an AVL tree from a sorted array in O(n) time.
I read this solution method: Creating a Binary Search Tree from a sorted array
They do it recursively for the two halves of the sorted array and it works.
I found a different solution and I want to check if it's valid.
My solution is to store another property of the root called "root.minimum" that will contain a pointer to the minimum.
Then, for the k'th element, we'll add it recursively to the AVL tree of the previous k-1 elements. We know that the k'th element is smaller than the minimum, so we'll add it to the left of root.minimum to create the new tree.
Now the tree is no longer balanced, but all we need to do to fix it is just one right rotation of the previous minimum.
This way the insertion takes O(1) for every node, and in total O(n).
Is this method valid to solve the problem?
Edit: I meant that I"m starting from the largest element. And then continue adding the rest according to the order. So each element I'm adding is smaller than the rest of them so I add it to the left of root.minimum. Then all I have to do to balance the tree is a right rotation which is O(1). Is this a correct solution?

If you pick a random element as the root in the first place (which is probably not the best idea, since we know the root should be the middle element), you put root itself in the root.minimum. Then for each new element, if it is smaller than root.minimum, you do as you said and make the tree balanced in O(1) time. But what if it is larger? In that case we need to compare it with the root.minimum of the right child, and if it is also larger, with the root.minimum of the right child of the right child and so on. This might take O(k) in the worst case, which will result in O(n^2) in the end. Also, this way, you are not using the sorted property of the array.

Related

How to handle a tree given in an array of pairs?

I'm struggling with finding the best of handling tree problems where the input is given as an array/list of pairs.
For example a tree is given as input in the format:
[(1,3),(1,2),(2,5)(2,4),(5,8)]
Where the first value in a pair is the parent, and the second value in a pair is the child.
I'm used to being given the root in tree problems. How would one go about storing this for problems such as "Lowest Common Ancestor"?
It depends on which problem you need to solve. For the problem of finding the lowest common ancestor of two nodes, you'll benefit most from a structure where you can find the parent of a given node in constant time. If it is already given that the nodes are numbered from 1 to n (without gaps), then an array is a good structure, such that arr[child] == parent. If the identifiers for the nodes are not that predictable, then use a hashmap/dictionary, such that map.get(child) == parent.

Time complexity of traversing an array

Below are two ways I could traverse any array:
Using for loop a variable would traverse from starting to end of array.
Using while loop 2 variables would traverse from opposite direction and meet in between.
How would the time complexity vary, would it be reduced in second case or it would be same?
Of course the same. Both are O(n), in fact there is no way to traverse an array faster than O(n).Even if you tarverse from opposite direction, you still have to visit each element once.

Natural way of indexing elements in Flink

Is there a built-in way to index and access indices of individual elements of DataStream/DataSet collection?
Like in typical Java collections, where you know that e.g. a 3rd element of an ArrayList can be obtained by ArrayList.get(2) and vice versa ArrayList.indexOf(elem) gives us the index of (the first occurence of) the specified element. (I'm not asking about extracting elements out of the stream.)
More specifically, when joining DataStreams/DataSets, is there a "natural"/easy way to join elements that came (were created) first, second, etc.?
I know there is a zipWithIndex transformation that assigns sequential indices to elements. I suspect the indices always start with 0? But I also suspect that they aren't necessarily assigned in the order the elements were created in (i.e. by their Event Time). (It also exists only for DataSets.)
This is what I currently tried:
DataSet<Tuple2<Long, Double>> tempsJoIndexed = DataSetUtils.zipWithIndex(tempsJo);
DataSet<Tuple2<Long, Double>> predsLinJoIndexed = DataSetUtils.zipWithIndex(predsLinJo);
DataSet<Tuple3<Double, Double, Double>> joinedTempsJo = tempsJoIndexed
.join(predsLinJoIndexed).where(0).equalTo(0)...
And it seems to create wrong pairs.
I see some possible approaches, but they're either non-Flink or not very nice:
I could of course assign an index to each element upon the stream's
creation and have e.g. a stream of Tuples.
Work with event-time timestamps. (I suspect there isn't a way to key by timestamps, and even if there was, it wouldn't be useful for
joining multiple streams like this unless the timestamps are
actually assigned as indices.)
We could try "collecting" the stream first but then we wouldn't be using Flink anymore.
The 1. approach seems like the most viable one, but it also seems redundant given that the stream should by definition be a sequential collection and as such, the elements should have a sense of orderliness (e.g. `I'm the 36th element because 35 elements already came before me.`).
I think you're going to have to assign index values to elements, so that you can partition the data sets by this index, and thus ensure that two records which need to be joined are being processed by the same sub-task. Once you've done that, a simple groupBy(index) and reduce() would work.
But assigning increasing ids without gaps isn't trivial, if you want to be reading your source data with parallelism > 1. In that case I'd create a RichMapFunction that uses the runtimeContext sub-task id and number of sub-tasks to calculate non-overlapping and monotonic indexes.

Double Ended Singly Linked List - Time complexity of searching

I have read that the time complexity of a searching for an element, which is located at end of the double ended singly linked list is o(N).
But since time complexity of searching for an element at front is o(1), I think the same should apply to end element. Any ideas? Thanks
The cost of searching for an element that is at the front of the linked list is indeed one, because you would be holding a pointer to that first element. Thus, it would be O(1) to find the first element.
In the case of a double ended singly linked list, assuming you mean you hold a pointer to both the first and last element of the singly linked list, you would indeed find that the time to locate the last element would be O(1), because you have a reference to exactly where it is.
However, consider the case of a double ended singly linked list where you want to find the (n-1)th element in that list. Suddenly, you find that you have to iterate over n-1 elements until you get to that element. Thus you would find that the worst case runtime for the double ended singly linked list would be O(n-1), which is really O(n).
Even in the case where you had a double ended doubly linked list, you would find that the worst case runtime would be O(n/2), (assuming you had a mechanism to tell if the element was in the first half of second half, which is unlikely). But O(n/2) is still really O(n).
Since we generally refer to the worst case when we talk about big-o time complexity, you can see that linked lists are invariably O(n).
Note:
That's not to say that big-o is the only measure of time-complexity. Depending on your implementation, the amortized or probabilistic time-complexity could indeed be different from its worst case time complexity, and likely is.

How is AVL tree insertion O(log n) when you need to recalculate balance factors up the tree after every insertion?

I'm implementing an AVL tree, and I'm trying to wrap my head around the time complexity of the adding process. It's my understanding that in order to achieve O(log n) you need to keep either balance or height state in tree nodes so that you don't have to recalculate them every time you need them (which may require a lot of additional tree traversal).
To solve this, I have a protocol that recursively "walks back up" a trail of parent pointers to the root, balancing if needed and setting heights along the way. This way, the addition algorithm kind of has a "capture" and "bubble" phase down and then back up the tree - like DOM events.
My question is: is this still technically O(log n) time? Technically, you only deal with divisions of half at every level in the tree, but you also need to travel down and then back up every time. What is the exact time complexity of this operation?
Assuming the height of the tree is H and the structure stays balanced during all operation.
Then, as you mentioned, inserting a node will take O(H).
However, every time a node is added to the AVL tree, you need to update the height of the parents all the way up to the root node.
Since the tree is balanced, updating height will traverse only the linked-list like structure with the newly inserted node in the tail.
The height updating can be viewed equivalent to traversing a linked-list with length equals to H.
Therefore, updating height will take another O(H) and the total update time is 2 * O(H), which is still O(log N) if we get rid of the constant factor.
Hope this makes sense to you.
"Technically, you only deal with divisions of half at every level in the tree, but you also need to travel down and then back up every time. What is the exact time complexity of this operation?"
You've stated that you have to travel down and up every time.
So, we can say that your function is upper bounded by a runtime of 2 * logn.
It's clear that this is O(logn).
More specifically, we could assign the constant 3 and a starting value of 1, such that
2 * logn <= 3 * logn for all values of n >= 1.
This reduces to 2 <= 3, which is of course true.
The idea behind big-O is to understand the basic shape of the function that upper-bounds your function's runtime as the input size moves towards infinity - thus, we can drop the constant factor of 2.