BST Construction with multiple choices - binary-search-tree

I encountered the following question:
The Height of a BST constructed from the following list of values 10 , 5 , 4 , 3 , 2 , 1 , 0 , 9 , 13 , 11 , 12 , 16 , 20 , 30 , 40 , 14 will be:
A) 5
B) 6
C) 7
D) 8
Now to certain point I can construct the Tree with no problem because there aren't many choices to insert the values so from the first value 10 to the eleventh value 12 my tree would go like this:
10
/ \
5 13
/ \ /
4 9 11
/ \
3 12
/
2
/
1
/
0
But after that, now I have to add the value 16 and I have two choices to make it the right child of the value 12 or to make it the right child of the value 13 and the height would differ according to this choice, so what's the right approach here?

A binary search tree is a binary tree (that is, every node has at most 2 children) where for each node in the tree, all nodes in the left subtree rooted at that node are less than the root of the subtree and all nodes in the right subtree rooted at that node are greater than the root of the subtree.
Your walkthrough so far assumes a "naive" insertion algorithm that makes no attempt at balancing the tree and inserts the nodes in sequence by recursively comparing against each node from the root. Under typical circumstances, you would not want to build such an unbalanced tree because the performance of operations devolves into O(n) time instead of the optimal O(log(n)) time, which defeats the purpose of the BST structure.
Making 16 the right child of 12 would be an invalid choice because 16 as a left child of 13 violates the BST property. By definition, every insertion must go in one location, so there are no choices to be made, at least following this naive algorithm.
Following through with your approach, it should be clear that the final height (longest root-to-leaf path) will be 7 in any case due to the unbalanced left branch.
However, if the question is to find the height of an optimal tree, then the correct answer is 5. The below illustrates a complete tree (that is, every level is full except the bottom level, and that level has all elements to the left) that can be built from the given data:
11
+---------------+
4 16
+------+ +-------+
2 9 14 30
+---+ +---+ +---+ +---+
1 3 5 10 12 13 20 40
+--
0
Take a look at self-balancing binary search tree data structures for more information on algorithms that can build such a tree.

Related

No of Passes in a Bubble Sort

For the list of items in an array i.e. {23 , 12, 8, 15, 21}; the number of passes I could see is only 3 contrary to (n-1) passes for n elements. I also see that the (n-1) passes for n elements is also the worst case where all the elements are in descending order. So I have got 2 conclusions and I request you to let me know if my understanding is right wrt the conclusions.
Conclusion 1
(n-1) passes for n elements can occur in the following scenarios:
all elements in the array are in descending order which is the worst case
when it is not the best case(no of passes is 1 for a sorted array)
Conclusion 2
(n-1) passes for n elements is more like a theoretical concept and may not hold true in all cases as in this example {23 , 12, 8, 15, 21}. Here the number of passes are (n-2).
In a classic, one-directional bubble sort, no element is moved more than one position to the “left” (to a smaller index) in a pass. Therefore you must make n-1 passes when (and only when) the smallest element is in the largest position. Example:
12 15 21 23 8 // initial array
12 15 21 8 23 // after pass 1
12 15 8 21 23 // after pass 2
12 8 15 21 23 // after pass 3
8 12 15 21 23 // after pass 4
Let's define L(i) as the number of elements to the left of element i that are larger than element i.
The array is sorted when L(i) = 0 for all i.
A bubble sort pass decreases every non-zero L(i) by one.
Therefore the number of passes required is max(L(0), L(1), ..., L(n-1)). In your example:
23 12 8 15 21 // elements
0 1 2 1 1 // L(i) for each element
The max L(i) is L(2): the element at index 2 is 8 and there are two elements left of 8 that are larger than 8.
The bubble sort process for your example is
23 12 8 15 21 // initial array
12 8 15 21 23 // after pass 1
8 12 15 21 23 // after pass 2
which takes max(L(i)) = L(2) = 2 passes.
For a detailed analysis of bubble sort, see The Art of Computer Programming Volume 3: Sorting and Searching section 5.2.2. In the book, what I called the L(i) function is called the “inversion table” of the array.
re: Conclusion 1
all elements in the array are in descending order which is the worst case
Yep, this is when you'll have to do all (n-1) "passes".
when it is not the best case (no of passes is 1 for a sorted array)
No. When you don't have the best-case, you'll have more than 1 passes. So long as it's not fully sorted, you'll need less than (n-1) passes. So it's somewhere in between
re: Conclusion 2
There's nothing theoretical about it at all. You provide an example of a middle-ground case (not fully reversed, but not fully sorted either), and you end up needing a middle-ground number of passes through it. What's theoretical about it?

SQL Max Consecutive Values in a number set using recursion

The following SQL query is supposed to return the max consecutive numbers in a set.
WITH RECURSIVE Mystery(X,Y) AS (SELECT A AS X, A AS Y FROM R)
UNION (SELECT m1.X, m2.Y
FROM Mystery m1, Mystery m2
WHERE m2.X = m1.Y + 1)
SELECT MAX(Y-X) + 1 FROM Mystery;
This query on the set {7, 9, 10, 14, 15, 16, 18} returns 3, because {14 15 16} is the longest chain of consecutive numbers and there are three numbers in that chain. But when I try to work through this manually I don't see how it arrives at that result.
For example, given the number set above I could create two columns:
m1.x
m2.y
7
7
9
9
10
10
14
14
15
15
16
16
18
18
If we are working on rows and columns, not the actual data, as I understand it WHERE m2.X = m1.Y + 1 takes the value from the next row in Y and puts it in the current row of X, like so
m1.X
m2.Y
9
7
10
9
14
10
15
14
16
15
18
16
18
Null?
The main part on which I am uncertain is where in the SQL recursion actually happens. According to Denis Lukichev recursion is the R part - or in this case the RECURSIVE Mystery(X,Y) - and stops when the table is empty. But if the above is true, how would the table ever empty?
Since I don't know how to proceed with the above, let me try a different direction. If WHERE m2.X = m1.Y + 1 is actually a comparison, the result should be:
m1.X
m2.Y
14
14
15
15
16
16
But at this point, it seems that it should continue recursively on this until only two rows are left (nothing else to compare). If it stops here to get the correct count of 3 rows (2 + 1), what is actually stopping the recursion?
I understand that for the above example the MAX(Y-X) + 1 effectively returns the actual number of recursion steps and adds 1.
But if I have 7 consecutive numbers and the recursion flows down to 2 rows, should this not end up with an incorrect 3 as the result? I understand recursion in C++ and other languages, but this is confusing to me.
Full disclosure, yes it appears this is a common university question, but I am retired, discovered this while researching recursion for my use, and need to understand how it works to use similar recursion in my projects.
Based on this db<>fiddle shared previously, you may find it instructive to alter the CTE to include an iteration number as follows, and then to show the content of the CTE rather than the output of final SELECT. Here's an amended CTE and its content after the recursion is complete:
Amended CTE
WITH RECURSIVE Mystery(X,Y) AS ((SELECT A AS X, A AS Y, 1 as Z FROM R)
UNION (SELECT m1.X, m2.A, Z+1
FROM Mystery m1
JOIN R m2 ON m2.A = m1.Y + 1))
CTE Content
x
y
z
7
7
1
9
9
1
10
10
1
14
14
1
15
15
1
16
16
1
18
18
1
9
10
2
14
15
2
15
16
2
14
16
3
The Z field holds the iteration count. Where Z = 1 we've simply got the rows from the table R. The, values X and Y are both from the field A. In terms of what we are attempting to achieve these represent sequences consecutive numbers, which start at X and continue to (at least) Y.
Where Z = 2, the second iteration, we find all the rows first iteration where there is a value in R which is one higher than our Y value, or one higher than the last member of our sequence of consecutive numbers. That becomes the new highest number, and we add one to the number of iterations. As only three numbers in our original data set have successors within the set, there are only three rows output in the second iteration.
Where Z = 3, the third iteration, we find all the rows of the second iteration (note we are not considering all the rows of the first iteration again), where there is, again, a value in R which is one higher than our Y value, or one higher than the last member of our sequence of consecutive numbers. That, again, becomes the new highest number, and we add one to the number of iterations.
The process will attempt a fourth iteration, but as there are no rows in R where the value is one more than the Y values from our third iteration, no extra data gets added to the CTE and recursion ends.
Going back to the original db<>fiddle, the process then searches our CTE content to output MAX(Y-X) + 1, which is the maximum difference between the first and last values in any consecutive sequence, plus one. This finds it's value from the record produced in the third iteration, using ((16-14) + 1) which has a value of 3.
For this specific piece of code, the output is always equivalent to the value in the Z field as every addition of a row through the recursion adds one to Z and adds one to Y.

Discrete Binary Search Main Theory

I have read this: https://www.topcoder.com/community/competitive-programming/tutorials/binary-search.
I can't understand some parts==>
What we can call the main theorem states that binary search can be
used if and only if for all x in S, p(x) implies p(y) for all y > x.
This property is what we use when we discard the second half of the
search space. It is equivalent to saying that ¬p(x) implies ¬p(y) for
all y < x (the symbol ¬ denotes the logical not operator), which is
what we use when we discard the first half of the search space.
But I think this condition does not hold when we want to find an element(checking for equality only) in an array and this condition only holds when we're trying to find Inequality for example when we're searching for an element greater or equal to our target value.
Example: We are finding 5 in this array.
indexes=0 1 2 3 4 5 6 7 8
1 3 4 4 5 6 7 8 9
we define p(x)=>
if(a[x]==5) return true else return false
step one=>middle index = 8+1/2 = 9/2 = 4 ==> a[4]=5
and p(x) is correct for this and from the main theory, the result is that
p(x+1) ........ p(n) is true but its not.
So what is the problem?
We CAN use that theorem when looking for an exact value, because we
only use it when discarding one half. If we are looking for say 5,
and we find say 6 in the middle, the we can discard the upper half,
because we now know (due to the theorem) that all items in there are > 5
Also notice, that if we have a sorted sequence, and want to find any element
that satisfies an inequality, looking at the end elements is enough.

Why is the time complexity of Heap Sort, O(nlogn)?

I'm trying to understand time complexity of different data structures and began with heap sort. From what I've read, I think collectively people agree heap sort has a time complexity of O(nlogn); however, I have difficulty understanding how that came to be.
Most people seem to agree that the heapify method takes O(logn) and buildmaxheap method takes O(n) thus O(nlogn) but why does heapify take O(logn)?
From my perspective, it seems heapify is just a method that compares a node's left and right node and properly swaps them depending if it is min or max heap. Why does that take O(logn)?
I think I'm missing something here and would really appreciate if someone could explain this better to me.
Thank you.
It seems that you are confusing about the time complexity about heap sort. It is true that build a maxheap from an unsorted array takes your O(n) time and O(1) for pop one element out. However, after you pop out the top element from the heap, you need to move the last element(A) in your heap to the top and heapy for maintaining heap property. For element A, it at most pop down log(n) times which is the height of your heap. So, you need at most log(n) time to get the next maximum value after you pop maximum value out. Here is an example of the process of heapsort.
18
15 8
7 11 1 2
3 6 4 9
After you pop out the number of 18, you need to put the number 9 on the top and heapify 9.
9
15 8
7 11 1 2
3 6 4
we need to pop 9 down because of 9 < 15
15
9 8
7 11 1 2
3 6 4
we need to pop 9 down because of 9 < 11
15
11 8
7 9 1 2
3 6 4
9 > 4 which means heapify process is done. And now you get the next maximum value 15 safely without broking heap property.
You have n number for every number you need to do the heapify process. So the time complexity of heapsort is O(nlogn)
You are missing the recursive call at the end of the heapify method.
Heapify(A, i) {
le <- left(i)
ri <- right(i)
if (le<=heapsize) and (A[le]>A[i])
largest <- le
else
largest <- i
if (ri<=heapsize) and (A[ri]>A[largest])
largest <- ri
if (largest != i) {
exchange A[i] <-> A[largest]
Heapify(A, largest)
}
}
In the worst case, after each step, largest will be about two times i. For i to reach the end of the heap, it would take O(logn) steps.

Circle Summation (30 Points) InterviewStree Puzzle

The following is the problem from Interviewstreet I am not getting any help from their site, so asking a question here. I am not interested in an algorithm/solution, but I did not understand the solution given by them as an example for their second input. Can anybody please help me to understand the second Input and Output as specified in the problem statement.
Circle Summation (30 Points)
There are N children sitting along a circle, numbered 1,2,...,N clockwise. The ith child has a piece of paper with number ai written on it. They play the following game:
In the first round, the child numbered x adds to his number the sum of the numbers of his neighbors.
In the second round, the child next in clockwise order adds to his number the sum of the numbers of his neighbors, and so on.
The game ends after M rounds have been played.
Input:
The first line contains T, the number of test cases. T cases follow. The first line for a test case contains two space seperated integers N and M. The next line contains N integers, the ith number being ai.
Output:
For each test case, output N lines each having N integers. The jth integer on the ith line contains the number that the jth child ends up with if the game starts with child i playing the first round. Output a blank line after each test case except the last one. Since the numbers can be really huge, output them modulo 1000000007.
Constraints:
1 <= T <= 15
3 <= N <= 50
1 <= M <= 10^9
1 <= ai <= 10^9
Sample Input:
2
5 1
10 20 30 40 50
3 4
1 2 1
Sample Output:
80 20 30 40 50
10 60 30 40 50
10 20 90 40 50
10 20 30 120 50
10 20 30 40 100
23 7 12
11 21 6
7 13 24
Here is an explanation of the second test case. I will use a notation (a, b, c) meaning that child one has number a, child two has number b and child three has number c. In the beginning, the position is always (1,2,1).
If the first child is the first to sum its neighbours, the table goes through the following situations (I will put an asterisk in front of the child that just added its two neighbouring numbers):
(1,2,1)->(*4,2,1)->(4,*7,1)->(4,7,*12)->(*23,7,12)
If the second child is the first to move:
(1,2,1)->(1,*4,1)->(1,4,*6)->(*11,4,6)->(11,*21,6)
And last if the third child is first to move:
(1,2,1)->(1,2,*4)->(*7,2,4)->(7,*13,4)->(7,13,*24)
And as you notice the output to the second case are exactly the three triples computed that way.
Hope that helps.