Binary Search Tree formula for the number of structurally different trees that can exist with nodes that have either 0 or 1 children - binary-search-tree

I am trying to write a formula to find:
"The number of structurally different binary trees that can exist with nodes that have either 0 or 1 children".
How would I go about doing this?

Seems to me that a "binary tree" that has nodes with only 0 or 1 children is a chain. If by "structurally different" you mean that you treat differently whether a given non-terminal node has a left child or a right child, then observe that you can describe that tree with a binary number that is N-1 bits long. So the number of different trees for a given N would be 2**N-1.
(And, obviously, if you mean how many different "shapes" of the "tree" can exist for a given N, the answer is 1.)

Related

Can a binary search tree be skewed?

What I am trying to ask is whether a binary search tree is self-blancing or if it can also become skewed?
I tried looking for an unbalanced binary search tree and could not really find anything.
So is a BST different from a self-balancing BST?
is a BST different from a self-balancing BST?
Self-balancing binary search trees are binary search trees which have logic added to them for keeping the tree more or less balanced.
So every self-balancing BST is a BST, but not every BST is self-balancing.
In fact, with a binary search tree we can refer to a data structure without any algorithm that goes with it. For instance:
4
/ \
2 8
\
3
This represents a binary search tree. Nothing more needs to be said about how you would insert or delete a node. It might even be immutable, not allowing any insertions or deletions. It still is a binary search tree.
Can a binary search tree be skewed?
Yes. The following is a valid binary search tree, although not one that will allow for efficient look-ups:
5
/
4
/
3
/
2
/
1
A self-balancing binary search tree, is one that has specific algorithms associated with it for insertion and deletion that will include rebalancing logic.
There are many different self-balancing binary search trees:
AVL tree
Red-black tree
Variants:
AA tree
B-tree
Variants:
B+ tree
Splay tree
Treap
Scapegoat tree
Tango tree

How to handle a tree given in an array of pairs?

I'm struggling with finding the best of handling tree problems where the input is given as an array/list of pairs.
For example a tree is given as input in the format:
[(1,3),(1,2),(2,5)(2,4),(5,8)]
Where the first value in a pair is the parent, and the second value in a pair is the child.
I'm used to being given the root in tree problems. How would one go about storing this for problems such as "Lowest Common Ancestor"?
It depends on which problem you need to solve. For the problem of finding the lowest common ancestor of two nodes, you'll benefit most from a structure where you can find the parent of a given node in constant time. If it is already given that the nodes are numbered from 1 to n (without gaps), then an array is a good structure, such that arr[child] == parent. If the identifiers for the nodes are not that predictable, then use a hashmap/dictionary, such that map.get(child) == parent.

What is the most possible height when the binary search tree haw n nodes?

Is there a mathematical type for the most possible height of a tree with exactly n nodes?
It can be anything. If you are not implementing a balanced binary search tree (like AVL tree or Red-Black tree), then the height of the tree will depend on the inputs you give. In the worst-case, height can be equal to the number of nodes(if each value is greater than the previous one or each value is less than the previous one). If you need more info, please consider describing the specific use case for which this question was asked.

Is a given key a member of a binary tree - probabilistic answer

The Problem:
Given a BST with N nodes, with a domain of cardinality D (domain being the possible values for the node keys).
Given a key that is in the domain but may or may not be a member of the BST.
At the start, our confidence that the node is in the tree should be 1/D, but as we go deeper into the tree both D and N are split approximately in half. That would suggest that our confidence that our key is a member of the tree should remain constant until we hit the bottom or discover the key. However, I'm not sure if that reasoning is complete, since it seems more like we are choosing N nodes from D.
I was thinking something along the lines of this, but the reasoning here still doesn't seem complete. Can somebody point me in the right direction?
Apriori, the probability that your key in is the tree is N/D.
Without loss of generality, let assume that the node's value range is [1..D].
When you walk down the tree, either:
The current node matches your key, hence P = 1
The current node has value C which is larger than your key, you go left, but you don't know how many items are in the left sub-tree. Now you can make one of these assumptions:
The tree is balanced. The range in the subtree is [1..C-1], and there are (D-1)/2 nodes in the subtree. Hence, P = ((D-1)/2)/(C-1)
The tree is not balanced. The range in the subtree is [1..C-1], and the maximum likelihood estimation for the number of nodes in the subtree is N * (C-1)/D. Hence, P = (N*(C-1)/D)/(C-1) = N/D. (no change)
If you know more about how the tree was constructed - you can make a better MLE for the number of nodes in the subtree.
The current node has value C which is smaller than your key, you go right, but you don't know how many items are in the right sub-tree.
...

In this minimal perfect hashing function, what is meant by FirstLetter and Predecessor?

I'm implementing a Minimalistic Acyclic Finite State Automaton (MA-FSA; a specific kind of DAG) in Go, and would like to associate some extra data with nodes that indicate EOW (end-of-word). With MA-FSA, the traditional approach is not possible because there are multiple words that might end at that node. So I'm looking into minimal perfect hashing functions as an alternative.
In the "Correction" box at the top of his blog post, Steve Hanov says that he used the method described in this paper by Lucchesi and Kowaltowski. In looking at Figure 12 (page 19), it describes the hashing function.
On line 8, it refers to FirstLetter and Predecessor(), but it doesn't describe what they are. Or I'm not seeing it. What are they?
All I can figure out is that it's just traversing the tree, adding up Number from each node as it goes, but that can't possibly be right. It produces numbers that are too large and it's not one-to-one, like the paper says. Am I misreading something?
The paper says:
Let us assume that the representation of our automaton includes, for each state, an integer which gives the number of words that would be accepted by the automaton starting from that state.
So I believe this: for C <- FirstLetter to Predecessor(Word[I ]) do
Means: for (c = 'a'; c < word[i]; c++)
(They're just trying to be alphabet-independent.)
Think of it this way: enumerate all accepted words. Sort them. Find your word in the list. Its index is the word's hash value.
Their algorithm avoids storing the complete list by keeping track of how many words are reachable from a given node. So you get to a node, and check all the outgoing edges to other nodes that involve a letter of the alphabet before your next letter. All of the words reachable from those nodes must be on the list before your word, so you can calculate what position your word must occupy in the list.
I have updated my DAWG example to show using it as a Map from keys to values. Each node stores the number of final nodes reachable from it (including itself). Then when the trie is traversed, we add up the counts of any that we skip over. That way, each word in the trie has a unique number. You can then look up the number in an array to get the data associated with the word.
https://gist.github.com/smhanov/94230b422c2100ae4218