How to handle a tree given in an array of pairs? - input

I'm struggling with finding the best of handling tree problems where the input is given as an array/list of pairs.
For example a tree is given as input in the format:
[(1,3),(1,2),(2,5)(2,4),(5,8)]
Where the first value in a pair is the parent, and the second value in a pair is the child.
I'm used to being given the root in tree problems. How would one go about storing this for problems such as "Lowest Common Ancestor"?

It depends on which problem you need to solve. For the problem of finding the lowest common ancestor of two nodes, you'll benefit most from a structure where you can find the parent of a given node in constant time. If it is already given that the nodes are numbered from 1 to n (without gaps), then an array is a good structure, such that arr[child] == parent. If the identifiers for the nodes are not that predictable, then use a hashmap/dictionary, such that map.get(child) == parent.

Related

permutations of bst , given preorder array

Given a Preorder array of a Binary Search Tree how many other permutations of the given binary search tree will form the same BST as the given preorder.
First of a all: a value in a preorder sequence can never become an ancestor node of a node that has a value that occurred earlier in that sequence (this would be a contradiction with "preorder"). Any next value in the sequence (except the first) represents a child node of an earlier value's node. So values that are added to a tree are always leaves at the moment they are added -- never internal nodes.
The preorder sequence starts with the root, so any BST that it represents, must have it as root.
Any next value must be added as a leaf. So for the second value in the sequence there are 2 potential positions for it. For the 3rd, 3, for the 4th 4, ...etc. However, since the partial tree formed by the first k values must be a BST, there is only one of these positions for the unique (k+1)th value that is in line with the BST requirement, so there is never a choice between multiple alternatives.
By induction that means that only one BST can be formed from a preorder sequence of unique values.

Natural way of indexing elements in Flink

Is there a built-in way to index and access indices of individual elements of DataStream/DataSet collection?
Like in typical Java collections, where you know that e.g. a 3rd element of an ArrayList can be obtained by ArrayList.get(2) and vice versa ArrayList.indexOf(elem) gives us the index of (the first occurence of) the specified element. (I'm not asking about extracting elements out of the stream.)
More specifically, when joining DataStreams/DataSets, is there a "natural"/easy way to join elements that came (were created) first, second, etc.?
I know there is a zipWithIndex transformation that assigns sequential indices to elements. I suspect the indices always start with 0? But I also suspect that they aren't necessarily assigned in the order the elements were created in (i.e. by their Event Time). (It also exists only for DataSets.)
This is what I currently tried:
DataSet<Tuple2<Long, Double>> tempsJoIndexed = DataSetUtils.zipWithIndex(tempsJo);
DataSet<Tuple2<Long, Double>> predsLinJoIndexed = DataSetUtils.zipWithIndex(predsLinJo);
DataSet<Tuple3<Double, Double, Double>> joinedTempsJo = tempsJoIndexed
.join(predsLinJoIndexed).where(0).equalTo(0)...
And it seems to create wrong pairs.
I see some possible approaches, but they're either non-Flink or not very nice:
I could of course assign an index to each element upon the stream's
creation and have e.g. a stream of Tuples.
Work with event-time timestamps. (I suspect there isn't a way to key by timestamps, and even if there was, it wouldn't be useful for
joining multiple streams like this unless the timestamps are
actually assigned as indices.)
We could try "collecting" the stream first but then we wouldn't be using Flink anymore.
The 1. approach seems like the most viable one, but it also seems redundant given that the stream should by definition be a sequential collection and as such, the elements should have a sense of orderliness (e.g. `I'm the 36th element because 35 elements already came before me.`).
I think you're going to have to assign index values to elements, so that you can partition the data sets by this index, and thus ensure that two records which need to be joined are being processed by the same sub-task. Once you've done that, a simple groupBy(index) and reduce() would work.
But assigning increasing ids without gaps isn't trivial, if you want to be reading your source data with parallelism > 1. In that case I'd create a RichMapFunction that uses the runtimeContext sub-task id and number of sub-tasks to calculate non-overlapping and monotonic indexes.

An alternative method to create an AVL tree from a sorted array in O(n) time

I need some help in this data structure homework problem. I was requested to write an algorithm that creates an AVL tree from a sorted array in O(n) time.
I read this solution method: Creating a Binary Search Tree from a sorted array
They do it recursively for the two halves of the sorted array and it works.
I found a different solution and I want to check if it's valid.
My solution is to store another property of the root called "root.minimum" that will contain a pointer to the minimum.
Then, for the k'th element, we'll add it recursively to the AVL tree of the previous k-1 elements. We know that the k'th element is smaller than the minimum, so we'll add it to the left of root.minimum to create the new tree.
Now the tree is no longer balanced, but all we need to do to fix it is just one right rotation of the previous minimum.
This way the insertion takes O(1) for every node, and in total O(n).
Is this method valid to solve the problem?
Edit: I meant that I"m starting from the largest element. And then continue adding the rest according to the order. So each element I'm adding is smaller than the rest of them so I add it to the left of root.minimum. Then all I have to do to balance the tree is a right rotation which is O(1). Is this a correct solution?
If you pick a random element as the root in the first place (which is probably not the best idea, since we know the root should be the middle element), you put root itself in the root.minimum. Then for each new element, if it is smaller than root.minimum, you do as you said and make the tree balanced in O(1) time. But what if it is larger? In that case we need to compare it with the root.minimum of the right child, and if it is also larger, with the root.minimum of the right child of the right child and so on. This might take O(k) in the worst case, which will result in O(n^2) in the end. Also, this way, you are not using the sorted property of the array.

Efficient management of hierarchyid values in MS SQL Server

With the hierarchyid datatype in SQL Server 2008 and onward, would there be any benefit to trying to optimize the issuing of the next child of /1/1/8/ [ /1/1/8/x/ ] such that x is the closest non-negative whole number to 1 possible?
An easy solution seems to be to find the maximum assigned child value and getting the sibling to the right but it seems like you'd eventually exhaust this (in theory if not in practice) since you're never reclaiming any of the values and to my understanding, negatives and non-wholes consume more space.
EXAMPLE: If I've got a parent /1/1/8/ who has these children (and order of the children doesn't matter and reassignment of the values is ok):
/1/1/8/-400/
/1/1/8/1/
/1/1/8/4/
/1/1/8/40/
/1/1/8/18/
/1/1/8/9999999999/
wouldn't I want the next child to have /1/1/8/2/ ?
Here's the thing.
What you are saying will be "optimal" is not necessarily optimal.
When I am inserting values into a hierarchy, I generally do not care what the order is for the child nodes of a particular node.
If I do, that is why there are two parameters in GetDescendant.
If I want to prepend the node into the order(i.e make it first), I use a first parameter of NULL and a second parameter that is the lowest value of the other children.
If I want to append the node into the order (i.e. make it last), I use a first parameter of the maximum value of the other children and a second parameter of NULL.
If I want to insert between two other child nodes, I need both the one that will be before and the one that will be after the node I am inserting.
In any case, generally the values in the hierarchy field don't really matter, because you will order by a different field like Name or something.
Ergo, the most "efficient" method of adding things into a hierarchy is to either prepend or append, since finding the MIN or MAX hierarchy value is easy, and doing what you are describing requires several queries to find the first "hole" in the tree.
In other words, don't put a lot of meaning onto the string representation of a hierarchy unless you are using them for an application in which you are using the hierarchy value to sort by.
Even in that case, you probably don't want to fill in hierarchy values as you describe, and probably want to append to the end anyway.
Hope this helped.

Binary Search Tree formula for the number of structurally different trees that can exist with nodes that have either 0 or 1 children

I am trying to write a formula to find:
"The number of structurally different binary trees that can exist with nodes that have either 0 or 1 children".
How would I go about doing this?
Seems to me that a "binary tree" that has nodes with only 0 or 1 children is a chain. If by "structurally different" you mean that you treat differently whether a given non-terminal node has a left child or a right child, then observe that you can describe that tree with a binary number that is N-1 bits long. So the number of different trees for a given N would be 2**N-1.
(And, obviously, if you mean how many different "shapes" of the "tree" can exist for a given N, the answer is 1.)